• rs186 7 hours ago

    Many of the examples seem very easy -- I suspect that without LLMs, just simple Google searches lead you to a stackoverflow question that asks the same thing which. I wonder how this performs in bigger, more complex codebase.

    Also, my personal experience with LLMs fixing compilation errors is: when it works, it works great. But when it doesn't, it's so clueless and lost that it's a complete waste of time to employ LLM in the first place -- you are much better off debugging the code yourself using old fashioned method.

    • jumploops 10 hours ago

      I’m curious how this performs against Claude Code/Codex.

      The “RustAssistant Algorithm” looks to be a simple LLM workflow[0], and their testing was limited to GPT-4 and GPT-3.5.

      In my experience (building a simple Rust service using OpenAI’s o1), the LLM will happily fix compilation issues but will also inadvertently change some out-of-context functionality to make everything “just work.”

      The most common issues I experienced were subtle changes to ownership, especially when using non-standard or frequently updated crates, which caused performance degradations in the test cases.

      Therefore I wouldn’t really trust GPT-4 (and certainly not 3.5) to modify my code, even if just to fix compilation errors, without some additional reasoning steps or oversight.

      [0] https://www.anthropic.com/engineering/building-effective-age...

      • pveierland 9 hours ago

        Anecdotally, Gemini 2.5 Pro has been yielding good results lately for Rust. It's been able to one-shot pretty intricate proc macros that required multiple supporting functions (~200 LoC).

        Strong typing is super helpful when using AI, since if you're properly grounded and understand the interface well, and you are specifying against that interface, then the mental burden of understanding the output and integrating with the rest of the system is much lower compared to when large amounts of new structure is created without well defined and understood bounds.

        • croemer 3 hours ago

          Was the paper really written 2 years ago?

          The paper states "We exclude error codes that are no longer relevant in the latest version of the Rust compiler (1.67.1)".

          A quick search shows that Rust 1.68.0 was released in March 2023: https://releases.rs/docs/1.68.0/

          Update: looks like it really is 2 years old. "We evaluate both GPT-3.5-turbo (which we call as GPT-3.5) and GPT-4"

          • chaosprint 3 hours ago

            I am creator and maintainer of several Rust projects:

            https://github.com/chaosprint/glicol

            https://github.com/chaosprint/asak

            For LLM, even the latest Gemini 2.5 Pro and Claude 3.7 Thinking, it is difficult to give a code that can be compiled at once.

            I think the main challenges are:

            1. Their training material is relatively lagging. Most Rust projects are not 1.0, and the API is constantly changing, which is also the source of most compilation errors.

            2. Trying to do too much at one time increases the probability of errors.

            3. The agent does not follow human's work habits very well, go to docs.rs to read the latest documents and look at examples. After making mistakes, search for network resources such as GitHub.

            Maybe this is where cursor rules and mcp can work hard. But at present, it is far behind.

            • rgoulter 10 hours ago

              At a glance, this seems really neat. -- I reckon one thing LLMs have been useful to help with is "the things I'd copy-paste from stack overflow". A loop of "let's fix each error" reminds me of that.

              I'd also give +1 to "LLMs as force multiplier". -- If you know what you're doing & understand what's going on, it seems very useful to have an LLM-supported tool able to help automatically resolve compilation errors. -- But if you don't know what you're doing, I'd worry perhaps the LLM will help you implement code that's written with poor taste.

              I can imagine LLMs could also help explain errors on demand. -- "You're trying to do this, you can't do that because..., instead, what you should do is...".

              • MathiasPius 9 hours ago

                I suspect this might be helpful for minor integration challenges or library upgrades like others have mentioned, but in my experience, the vast majority of Rust compilation issues fall into one of two buckets:

                1. Typos, oversights (like when adding new enum variants), etc. All things which in most cases are solved with a single keystroke using non-LLM LSPs.

                2. Wrong assumptions (on my part) about lifetimes, ownership, or overall architecture. All problems which I very much doubt an LLM will be able to reason about, because the problems usually lie in my understanding or modelling of the problem domain, not anything to do with the code itself.

                • croemer 3 hours ago

                  Paper is a bit pointless if one can't use the tool.

                  The paper links to a Github repo with nothing but a 3 sentence README, no activity for 9 months, reading

                  > We are in the process of open-sourcing the implementation of RustAssistant. Watch this space for updates.

                  • NoboruWataya 10 hours ago

                    Anecdotally, ChatGPT (I use the free tier) does not seem to be very good at Rust. For any problem with any complexity it will very often suggest solutions which violate the borrowing rules. When the error is pointed out to it, it will acknowledge the error and suggest a revised solution with either the same or a different borrowing issue. And repeat.

                    A 74% success rate may be an impressive improvement over the SOTA for LLMs, but frankly a tool designed to fix your errors being wrong, at best, 1 in 4 times seems like it would be rather frustrating.

                    • noodletheworld 10 hours ago

                      Hot take: this is the future.

                      Strongly typed languages have a fundamentally superior iteration strategy for coding agents.

                      The rust compiler, particularly, will often give extremely specific “how to fix” advice… but in general I see this as a future trend with rust and, increasingly, other languages.

                      Fundamentally, being able to assert “this code compiles” (and iterate until it does) before returning “completed task” is superior for agents to dynamic languages where the only possible verification is runtime.

                      (And at best the agent can assert “i guess it looks ok”)

                      • infogulch 7 hours ago

                        I wonder if the reason why LLMs are not very good at debugging is because there's not very much code published that is in this intermediate state with obvious compilation errors.

                        • k_bx 10 hours ago

                          So far the best way to fix Rust for me was to use OpenAI's CODEX tool. Rust libraries change APIs often and evolve quickly, but luckily all the code is available under ~/.cargo/registry, so it can go and read the actual library code. Very useful!

                          • manmal 9 hours ago

                            Maybe this is the right thread to ask: I’ve read that Elixir is a bit under supported by many LLMs. Whereas Ruby/Rails and Python work very well. Are there any recommendations for models that seem particularly useful for Elixir?

                            • pjmlp 7 hours ago

                              With limited bandwidth, so will check later, it would be great if it could do code suggestions for affine types related errors, or explain what is wrong, this would help a lot regarding Rust's adoption.

                              • flohofwoe 10 hours ago

                                So Microsoft programmers will become code monkeys that stumble from one compiler error to the next without any idea what they are actually doing, got it ;)

                                (it's also a poor look for Rust's ergonomics tbh, but that's not a new issue)

                                • CryZe 9 hours ago

                                  I'd love to see VSCode integrate all the LSP information into Copilot. That seems to be the natural evolution of this idea.

                                  • delduca 8 hours ago

                                    > unlike unsafe languages like C/C++

                                    The world is unsafe!

                                    • petesergeant 9 hours ago

                                      Every coding assistant or LLM I've used generally makes a real hash of TypeScript's types, so I'm a little skeptical, but also:

                                      > RustAssistant is able to achieve an impressive peak accuracy of roughly 74% on real-world compilation errors in popular open-source Rust repositories.

                                      74% feels like it would be just the right amount that people would keep hitting "retry" without thinking about the error at all. I've found LLMs great for throwing together simple scripts in languages I just don't know or can't lookup the syntax for, but I'm still struggling to get serious work out of them in languages I know well where I'm trying to do anything vaguely complicated.

                                      Worse, they often produce plausible code that does something in a weird or suboptimal way. Tests that don't actually really test anything, or more subtle but actual bugs in logic, that you wouldn't write yourself, but need to be very on the ball to catch in code you're reviewing.

                                      • vaylian 9 hours ago

                                        > These unique Rust features also pose a steep learning curve for programmers.

                                        This is a common misunderstanding of what a learning curve is:

                                        https://en.wikipedia.org/wiki/Learning_curve#%22Steep_learni...