• narush a day ago

    Hey HN -- study author here! (See previous thread on the paper here [1].)

    I think this blog post is an interesting take on one specific factor that is likely contributing to slowdown. We discuss this in the paper [2] in the section "Implicit repository context (C.1.5)" -- check it out if you want to see some developer quotes about this factor.

    > This is why AI coding tools, as they exist today, will generally slow someone down if they know what they are doing, and are working on a project that they understand.

    I made this point in the other thread discussing the study, but in general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the full factors table on page 11).

    > If there are no takers then I might try experimenting on myself.

    This sounds super cool! I'd be very excited to see how you set this up + how it turns out... please do shoot me an email (in the paper) if you do this!

    > AI slows down open source developers. Peter Naur can teach us why

    Nit: I appreciate how hard it is to write short titles summarizing the paper (the graph title is the best I was able to do after a lot of trying) -- but I might have written this "Early-2025 AI slows down experienced open-source developers. Peter Naur can give us more context about one specific factor." It's admittedly less of a catchy-title, but I think getting the qualifications right are really important!

    Thanks again for the sweet write-up! I'll hang around in the comments today as well.

    [1] https://news.ycombinator.com/item?id=44522772

    [2] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

    • seanwilson a day ago

      If this makes sense, how is the study able to give a reasonable measure of how long an issue/task should have taken, vs how long it took with AI to determine that using AI was slower?

      Or it's comparing how long the dev thought it should take with AI vs how long it actually took, which now includes the dev's guess of how AI impacts their productivity?

      When it's hard to estimate how difficult an issue should be to complete, how does the study account for this? What percent speed up or slow down would be noise due to estimates being difficult?

      I do appreciate that this stuff is very hard to measure.

      • krona a day ago

        An easier way to think about it might be if you timed how long it took each ticket in your backlog. You also recorded whether you were drunk or not when you worked on it, and the ticket was selected at random from your backlog. The assumption (null-hypothesis) is that being drunk has no effect on ticket completion time.

        Using the magic of statistics, if you have completed enough tickets, we can determine whether the null-hypothesis holds (for a given level of statistical certainty), and if it doesn't, low large is the difference (with a margin of error).

        That's not to say there couldn't be other causes for the difference (if there is one), but that's how science proceeds, generally.

        • jiggawatts 19 hours ago

          The challenge with “controlled experiments” is that saying to developers to “use AI for all of your tickets for a month” forces a specific tool onto problems that may not benefit from that tool.

          • msgodel 19 hours ago

            Most corporate software problems don't need AI at all. They're really coordination/communication/administration problems hiding as technical problems.

      • jwhiles a day ago

        Thanks for the response, and apologies for misrepresenting your results somewhat! I'm probably not going to change the title since I am at heart and polemicist and a sloppy thinker, but I'll update the article to call out this misrepresentation.

        That said, I think that what I wrote more or less encompasses three of the factors you call out as being likely to contribute: "High developer familiarity with reposito- ries", "Large and complex repositories", and "Implicit repository context".

        I thought more about experimenting on myself, and while I hope to do it - I think it will be very hard to create a controlled enviornment whilst also responding to the demands the job puts on me. I also don't have the luxury of a list of well scoped tasks that could feasibly be completed in a few hours.

        • karmakaze 18 hours ago

          I would expect any change to an optimized workflow (developing own well understood project) to initially be slower. What I'd like to see is how these same developers do 6 months or a year from now after using AI has become the natural workflow on these same projects. The article mentions that these results don't extrapolate to other devs, but it's important to note that it may not extrapolate over time to these same devs.

          I myself am just getting started and I can see how so many things can be scripted with AI that would be very difficult to (semi-)automate without. You gotta ask yourself "Is it worth the time?"[0]

          [0] https://xkcd.com/1205/

          • antonvs a day ago

            > Early-2025 AI slows down experienced open-source developers.

            Even that's too general, because it'll depend on what the task is. It's not as if open source developers in general never work on tasks where AI could save time.

            • narush a day ago

              We call this over-generalization out specifically in the "We do not provide evidence that:" table in the blog post and paper - I agree there are tasks these developers are likely sped up on with early-2025 tools.

              • 2muchcoffeeman 21 hours ago

                I think this will be the key. Finding appropriate tasks. Even on code bases I know, I can find tedious things for the AI to do. Sometimes I can find tedious things for it to do that I would never have dreamt of doing in the past. Now, I think “will it do it?”.

                Once I got a hang of identifying problems, or being more targeted, I was spending less time messing about and got things done quicker.

            • calf 19 hours ago

              Slowing down isn't necessarily bad, maybe slow programming (literate/Knuth comes to mind as another early argument) encourages better theory formation. Maybe programming today is like fast food, and proper theory and abstraction (and language design) requires a good measure of slow and deliberate work that has not been the norm in industry.

              • mung_daal a day ago

                [dead]

              • munificent a day ago

                > The inability of developers to tell if a tool sped them up or slowed them down is fascinating in itself, probably applies to many other forms of human endeavour, and explains things as varied as why so many people think that AI has made them 10 times more productive, why I continue to use Vim, why people drive in London etc.

                In boating, there's a notion of a "set and drift" which describes how wind and current pushes a boat off course. If a mariner isn't careful, they'll end up far from their destination because of it.

                This is because when you're sitting in a boat, your perception of motion is relative and local. You feel the breeze on your face, and you see how the boat cuts through the surrounding water. You interpret that as motion towards your destination, but it can equally consist of wind and current where the medium itself is moving.

                I think a similar effect explains all of these. Our perception of "making progress" is mostly a sense of motion and "stuff happening" in our immediate vicinity. It's not based on a perception of the goal getting closer, which is much harder to measure and develop an intuition for.

                So people tend to choose strategies that make them feel like they're making progress even if it's not the most effective strategy. I think this is why people often take "shortcuts" when driving that are actually longer. All of the twists and turns keep them busy and make them feel like they're making more progress than zoning out on a boring interstate does.

                • wrsh07 a day ago

                  Something I noticed early on when using AI tools was that it was great because I didn't get blocked. Somehow, I always wanted to keep going and always felt like I could keep going.

                  The problem, of course, is that one might thoughtlessly invoke the ai tool when it would be faster to make the one line change directly

                  Edit

                  This could make sense with the driving analogy. If the road I was planning to take is closed, gps will happily tell me to try something else. But if that fails too, it might go back to the original suggestion.

                  • thinkingemote a day ago

                    Exactly! Waze the navigation app tends to route users on longer routes but which feels more fast. When driving we perceive our journey as fast or slow not by the actual length but by our memories of what happened. Waze knows human drivers are happier with driving a route that may be longer in time and distance of they feel like they are making progress with the twists and turns.

                    Ai tools makes programming feel easier. That it might be actually less productive is interesting but we humans prefer the easier shortcuts. Our memories of coding with AI tells us that we didn't struggle and therefore we made progress.

                    • tjr a day ago

                      That sounds like a navigation tool that I absolutely do not want! Occasionally I do enjoy meandering around, but usually fastest / shortest path would be preferred.

                      And I'm not sure about the other either. In my 20+ year career in aerospace software, the most memorable times were solving interesting problems, not days with no struggle just churning out code.

                      • thinkingemote a day ago

                        Indeed it is removing the memorable events of achievement!

                        Generally memorable things are different than unmemorable things. Work is unmemorable. Driving is unmemorable except when something negative happens. Waze tries to give some positive feelings to the driving route. Waze knows that people want positive experiences sometimes more than efficiency.

                        Being stuck in a traffic jam is more memorable than not being so. Or we remember the negative feeling more than the fact that our drive actually wasn't inefficient.

                        AI tools makes us have a less negative day of work. so we feel like we have no traffic jams. "I got so much done" really means "I didn't get stuck". But it's also removing the positive feelings too!

                        It's an illusion of progress through our feelings and memories.

                        Or programming with AI brings different feedback mechanisms and systems and different emotional engagements and different memory behaviours. It's very interesting!

                    • PicassoCTs a day ago

                      I also think that AI written code- is just not read. People hate code-reviews, and actively refuse to read code- because that is hard work, reading into other peoples thoughts and ideas.

                      This is why pushing for new code, rewrites, new frameworks is so popular. https://www.joelonsoftware.com/2000/04/06/things-you-should-...

                      So a ton of ai-generated code- is just that, never read. Its generated, tested against test-functions - and thats it. I wouldn't wonder, if some of these devs themselves have only marginal ideas whats in there codebases and why.

                      • tjr a day ago

                        I have mostly worked in aerospace software, and find this rather horrifying. I suppose, if your tests are in fact good and comprehensive enough, there could be a logical argument for not needing to understand the code, but if we're talking people's safety in the hands of your software, I don't know if there is any number of tests I would accept in exchange for willingly giving up understanding of the code.

                        • asadotzler 21 hours ago

                          You're transferring the need to be really good at coding and understanding code to the need to be really good at testing and understanding tests, which 9/10 times requires being good at coding and understanding code. There are no free lunches.

                      • Alex_L_Wood a day ago

                        We all as humans are hardwired to prefer greedy algorithms, basically.

                        • jiggawatts 19 hours ago

                          > The inability of developers to tell if a tool sped them up or slowed them down is fascinating in itself

                          Linux/UNIX users are convinced of the superiority of keyboard control and CLI tools, but studies have shown that the mouse is faster for almost all common tasks.

                          Keyboard input feels faster because there are more actions per second.

                          • mhuffman 19 hours ago

                            >but studies have shown that the mouse is faster for almost all common tasks.

                            Do you think that daily CLI Linux/UNIX users might have a different list of what they consider "common tasks"?

                            • mrheosuper 15 hours ago

                              I dont think they have different list.

                              Replying email, copy/moving files, writting document, etc

                              • shakna 12 hours ago

                                I don't really use CLI for any of that. Despite using it almost exclusively. (Email on my phone - rarely need to move files.)

                                I use it for:

                                Arbitrary deep searches in specific root trees. (Where did I make my definitions of foo last time?)

                                Generating 20 different formats of documents. (Yay for supporting a bunch of platforms.)

                                Rewriting codebases - this might be the closest to "writing documents". (Mostly git things. And most git GUIs are... Trash.)

                        • blake1 a day ago

                          I think a reasonable summary of the study referenced is that: "AI creates the perception of productivity enhancements far beyond the reality."

                          Even within the study, there were some participants who saw mild improvements to productivity, but most had a significant drop in productivity. This thread is now full of people telling their story about huge productivity gains they made with AI, but none of the comments contend with the central insight of this study: that these productivity gains are illusions. AI is a product designed to make you value the product.

                          In matters of personal value, perception is reality, no question. Anyone relying heavily on AI should really be worried that it is mostly a tool for warping their self-perception, one that creates dependency and a false sense of accomplishment. After all, it speaks a highly optimized stream of tokens at you, and you really have to wonder what the optimization goal was.

                          • BriggyDwiggs42 a day ago

                            I’ve noticed that you can definitely use them to help you learn something, but that your understanding tends to be more abstract and LLM-like that way. You definitely want to mix it up when learning too.

                            • daxfohl a day ago

                              I've also had bad results with hallucinations there. I was trying to learn more about multi-dimensional qubit algorithms, and spent a whole day learning a bunch of stuff that was fascinating but plain wrong. I only figured out it was wrong at the end of the day when I tried to do a simulation and the results weren't consistent.

                              Early in the chat it substituted a `-1` for an `i`, and everything that followed was garbage. There were also some errors that I spotted real-time and got it to correct itself.

                              But yeah, IDK, it presents itself so confidently and "knows" so much and is so easy to use, that it's hard not to try to use as a reference / teacher. But it's also quite dangerous if you're not confirming things; it can send you down incorrect paths and waste a ton of time. I haven't decided whether the cost is worth the benefit or not.

                              Presumably they'll get better at this over time, so in the long run (probably no more than a year) it'll likely easily exceed the ROI breakeven point, but for now, you do have to remain vigilant.

                              • tonyedgecombe a day ago

                                I keep wondering whether the best way to use these tools is to do the work yourself then ask the AI to critique it, to find the bugs, optimisations or missing features.

                              • thinkingemote 21 hours ago

                                It's like the difference between being fast and quick. AI tools make the developer feel quick but they may not be fast. It's less cognitive effort in some ways. It's an interesting illusion, one that is based on changing emotions from different feedback loops and the effects of how memory forms.

                                • asadotzler 21 hours ago

                                  Quickness is a burst; speed is a flow.

                                  Or, "slow is smooth, and smooth is fast"

                                  • aspenmayer 17 hours ago

                                    I’ve heard it that way, which is more memorable. I’ve also heard it this way:

                                    “Slow is smooth, smooth is efficient, and speed is the efficiency of motion.”

                              • nico a day ago

                                > They are experienced open source developers, working on their own projects

                                I just started working on a 3-month old codebase written by someone else, in a framework and architecture I had never used before

                                Within a couple hours, with the help of Claude Code, I had already created a really nice system to replicate data from staging to local development. Something I had built before in other projects, and I new that manually it would take me a full day or two, especially without experience in the architecture

                                That immediately sped up my development even more, as now I had better data to test things locally

                                Then a couple hours later, I had already pushed my first PR. All code following the proper coding style and practices of the existing project and the framework. That PR, would have taken me at least a couple of days and up to 2 weeks to fully manually write out and test

                                So sure, AI won’t speed everyone or everything up. But at least in this one case, it gave me a huge boost

                                As I keep going, I expect things to slow down a bit, as the complexity of the project grows. However, it’s also given me the chance to get an amazing jumpstart

                                • Vegenoid a day ago

                                  I have had similar experiences as you, but this is not the kind of work that the study is talking about:

                                  “When open source developers working in codebases that they are deeply familiar with use AI tools to complete a task, they take longer to complete that task”

                                  I have anecdotally found this to be true as well, that an LLM greatly accelerates my ramp up time in a new codebase, but then actually leads me astray once I am familiar with the project.

                                  • Navarr a day ago

                                    > I have anecdotally found this to be true as well, that an LLM greatly accelerates my ramp up time in a new codebase, but then actually leads me astray once I am familiar with the project.

                                    If you are unfamiliar with the project, how do you determine that it wasn't leading you astray in the first place? Do you ever revisit what you had done with AI previously to make sure that, once you know your way around, it was doing it the right way?

                                    • Vegenoid a day ago

                                      In some cases, I have not revisited, as I was happy to simply make a small modification for my use only. In others, I have taken the time to ensure the changes are suitable for upstreaming. In my experience, which I have not methodically recorded in any way, the LLM’s changes at this early stage have been pretty good. This is also partly because the changes I am making at the early stage are generally small, usually not requiring adding new functionality but simply hooking up existing functionality to a new input or output.

                                      What’s most useful about the LLM in the early stages is not the actual code it writes, but its reasoning that helps me learn about the structure of the project. I don’t take the code blind, I am more interested in the reasoning than the code itself. I have found this to be reliably useful.

                                      • quantumHazer a day ago

                                        no, they just claim that AI coding tools are magic and drink their kool-aid

                                      • Gormo a day ago

                                        > I have anecdotally found this to be true as well, that an LLM greatly accelerates my ramp up time in a new codebase, but then actually leads me astray once I am familiar with the project.

                                        How does using AI impact the amount of time it takes you to become sufficiently familiar with the project to recognize when you are being led astray?

                                        One of the worries I have with the fast ramp-up is that a lot of that ramp-up time isn't just grunt work to be optimized a way, it's active learning, and bypassing too much of it can leave you with an incomplete understanding of the problem domain that slows you down perpetually.

                                        Sometimes, there are real efficiencies to be gained; other times those perceived efficiencies are actually incurring heavy technical debt, and I suspect that overuse of AI is usually the latter.

                                        • pragma_x a day ago

                                          Not just new code-bases. I recently used an LLM to accelerate my learning of Rust.

                                          Coming from other programming languages, I had a lot of questions that would be tough to nail down in a Google search, or combing through docs and/or tutorials. In retrospect, it's super fast at finding answers to things that _don't exist_ explicitly, or are implied through the lack of documentation, or exist at the intersection of wildly different resources:

                                          - Can I get compile-time type information of Enum values?

                                          - Can I specialize a generic function/type based on Enum values?

                                          - How can I use macros to reflect on struct fields?

                                          - Can I use an enum without its enclosing namespace, as I can in C++?

                                          - Does rust have a 'with' clause?

                                          - How do I avoid declaring timelines on my types?

                                          - What is an idiomatic way to implement the Strategy pattern?

                                          - What is an idiomatic way to return a closure from a function?

                                          ...and so on. This "conversation" happened here and there over the period of two weeks. Not only was ChatGPT up to the task, but it was able to suggest what technologies would get me close to the mark if Rust wasn't built to do what I had in mind. I'm now much more comfortable and competent in the language, but miles ahead of where I would have been without it.

                                          • VonTum 14 hours ago

                                            For really basic syntax stuff it works, but the moment you ask its advice on anything involving ChatGPT has confidently led me incredibly wrong right-sounding trails.

                                            To their credit, the people on the Rust forum have been really responsive at answering my questions and poking holes in incorrect unsafe implementations, and it is from speaking to them that I truly feel I have learned the language well.

                                        • davidclark a day ago

                                          > That PR, would have taken me at least a couple of days and up to 2 weeks to fully manually write out and test

                                          What is your accuracy on software development estimates? I always see these productivity claims matched again “It would’ve taken me” timelines.

                                          But, it’s never examined if we’re good at estimating. I know I am not good at estimates.

                                          It’s also never examined if the quality of the PR is the same as it would’ve been. Are you skipping steps and system understanding which let you go faster, but with a higher % chance of bugs? You can do that without AI and get the same speed up.

                                          • OptionOfT a day ago

                                            Now the question is: did you gain the same knowledge and proficiency in the codebase that you would've gained organically?

                                            I find that when working with an LLM the difference in knowledge is the same as learning a new language. Learning to understanding another language is easier than learning to speak another language.

                                            It's like my knowledge of C++. I can read it, and I can make modifications of existing files. But writing something from scratch without a template? That's a lot harder.

                                            • nico a day ago

                                              Some additional notes given the comments in the thread

                                              * I wasn’t trying to be dismissive of the article or the study, just wanted to present a different context in which AI tools do help a lot

                                              * It’s not just code. It also helps with a lot of tasks. For example, Claude Code figured out how to “manually” connect to the AWS cluster that hosted the source db, tested different commands via docker inside the project containers and overall helped immensely with discovery of the overall structure and infrastructure of the project

                                              * My professional experience as a developer, has been that 80-90% of the time, results trump code quality. That’s just the projects and companies I’ve been personally involved with. Mostly saas products in which business goals are usually considered more important than the specifics of the tech stack used. This doesn’t mean that 80-90% of code is garbage, it just means that most of the time readability, maintainability and shipping are more important than DRY, clever solutions or optimizations

                                              * I don’t know how helpful AI is or could be for things that require super clever algorithms or special data structures, or where code quality is incredibly important

                                              * Having said that, the AI tools I’ve used can write pretty good quality code, as long as they are provided with good examples and references, and the developer is on top of properly managing the context

                                              * Additionally, these tools are improving almost on a weekly or monthly basis. My experience with them has drastically changed even in the last 3 months

                                              At the end of the day, AI is not magic, it’s a tool, and I as the developer, am still accountable for the code and results I’m expected to deliver

                                              • PaulDavisThe1st a day ago

                                                TFA was specifically about people very familiar with the project and codebase that they are working on. Your anecdots is precisely the opposite of the situation is was about, and it acknowledged the sort of process you describe.

                                                • kevmo314 a day ago

                                                  You've missed the point of the article, which in fact agrees with your anecdote.

                                                  > It's equally common for developers to work in environments where little value is placed on understanding systems, but a lot of value is placed on quickly delivering changes that mostly work. In this context, I think that AI tools have more of an advantage. They can ingest the unfamiliar codebase faster than any human can, and can often generate changes that will essentially work.

                                                  • moogleii a day ago

                                                    That would be an aside, or a comment, not the point of the article.

                                                    • antonvs a day ago

                                                      > You've missed the point of the article

                                                      Sadly clickbait headlines like the OP, "AI slows down open source developers," spread this misinformation, ensuring that a majority of people will have the same misapprehension.

                                                      • raincole a day ago

                                                        Which is a good thing for people who are currently benefiting from AI, though. The slower other programmers adopt AI, the more edge those who are proficient with it have.

                                                        It took me an embarrassingly long time to realize a simple fact: using AI well is a shallow skill that everyone can learn in days or even hours if they want. And then my small advantage of knowing AI tools will disappear. Since the realization I've been always upvoting articles that claims AI makes you less productive (like the OP).

                                                        • antonvs 4 hours ago

                                                          I'd recommend spending more effort on improving your actual skills. The advantage of being good at what you do tends to be quite stable and more beneficial.

                                                          • rightbyte a day ago

                                                            So you bother push some sort of self proclaimed false narrative with upvotes but then you try to counteract it by spelling it out?

                                                      • samtp a day ago

                                                        Well that's exactly what it does well at the moment. Boilerplate starter templates, landing pages, throwaway apps, etc. But for projects that need precision like data pipelines, security - it code generated has many subtle flaws that can/will cause giant headaches in your project unless you dig through every line produced

                                                        • quantumHazer a day ago

                                                          You clearly have not read the study. Problem is developers thought they were 20% faster, but they were actually slower. Anyway from a fast review about your profile you're in conflict of interest about vibe coding, so I will definitely take your opinion with a grain of salt.

                                                          • floren a day ago

                                                            > Anyway from a fast review about your profile you're in conflict of interest about vibe coding

                                                            Seems to happen every time, doesn't it?

                                                          • xoralkindi a day ago

                                                            How are you confident in the code, coding style and practices simply because the LLM says so. How do you know it is not hallucinating since you don't understand the codebase?

                                                            • undefined a day ago
                                                              [deleted]
                                                            • bko a day ago

                                                              When anecdote and data don't align, it's usually the data that's wrong.

                                                              Not always the case, but whenever I read about these strained studies or arguments about how AI is actually making people less productive, I can't help but wonder why nearly every programmer I know, myself included, finds value in these tools. I wonder if the same thing happened with higher level programming languages where people argued, you may THINK not managing your own garbage collector will lead to more productivity but actually...

                                                              Even if we weren't more "productive", millions prefer to use these tools, so it has to count for something. And I don't need a "study" to tell me that

                                                              • adrian_b a day ago

                                                                TFA says clearly that it is likely that AI will make more productive anyone working on an unfamiliar code base, but make less productive those working on a project they understand well, and it gives reasonable arguments for why this is likely to happen.

                                                                Moreover, it acknowledges that for programmers working in most companies the first case is much more frequent.

                                                                • bko a day ago

                                                                  I have written every line of code in the code base I mostly work in and I still find it incredibly valuable. Millions use these tools and a large percentage of them find them useful in their familiar code base.

                                                                  Again, overwhelming anecdote and millions of users > "study"

                                                                  • almatabata a day ago

                                                                    > Interestingly the developers predict that AI will make them faster, and continue to believe that it did make them faster, even after completing the task slower than they otherwise would!

                                                                    In this case clearly anecdotes are not enough. If that quote from the article is accurate, it shows that you cannot trust the developers time perception.

                                                                    I agree, its only one study and we should not take it as the final answer. It definitely justifies doing a few follow up evaluations to see if this

                                                                    • overfeed 20 hours ago

                                                                      > If that quote from the article is accurate, it shows that you cannot trust the developers time perception.

                                                                      The scientific method goes right out the window when it comes to true believers. It reminds me of weed-smokers who insist getting high makes them deep-thinkers: it feels that way in the moment, but if you've ever been a sober person caught up in a "deep" discussion among people high on THC, oh boy...

                                                                      • bko a day ago

                                                                        Or I cannot trust a contrived laboratory setting with it's garden of forking paths.

                                                                        https://mleverything.substack.com/p/garden-of-forking-paths-...

                                                                        • almatabata a day ago

                                                                          I did not say to trust it. I do not need to trust it.

                                                                          If I run my own tests on my own codebase I will definitely use some objective time measurement method and a subjective one. I really want to know if there is a big difference.

                                                                          I really wonder if its just the individuals bias showing. If you are pro-AI you might overestimate one, and if you are against it you might under-estimate it.

                                                                          • bko a day ago

                                                                            That's fair, I agree.

                                                                  • rsynnott a day ago

                                                                    > I can't help but wonder why nearly every programmer I know, myself included, finds value in these tools.

                                                                    One of the more interesting findings of the study mentioned was that the LLM users, even where use of an LLM had apparently degraded their performance, tended to believe it had enhanced it. Anecdote is a _really_ bad argument against data that shows a _perception_ problem.

                                                                    > Even if we weren't more "productive", millions prefer to use these tools, so it has to count for something.

                                                                    I mean, on that basis, so does homeopathy.

                                                                    Like, it's just one study. It's not the last word. But "my anecdotes disprove it" probably isn't a _terribly_ helpful approach.

                                                                    • ted_bunny a day ago

                                                                      Also, "anecdotes > data" as a general heuristic is a red flag. But like if clowns had a country and their flag were red. That kind.

                                                                  • markstos a day ago

                                                                    I had a similar experience with AI and open source. AI allowed me to implement features in a language and stack I didn't know well. I had wanted these features for months and no one else was volunteering to implement them. I had tried to study the stack directly myself, but found the total picture to be complex and under-documented for people getting started.

                                                                    Using Warp terminal (which used Claude) I was get past those barriers and achieve results that weren't happening at all before.

                                                                  • tomasz_fm a day ago

                                                                    Only one developer in this study had more than 50h of Cursor experience, including time spent using Cursor during the study. That one developer saw a 25% speed improvement.

                                                                    Everyone else was an absolute Cursor beginner with barely any Cursor experience. I don't find it surprising that using tools they're unfamiliar with slows software engineers down.

                                                                    I don't think this study can be used to reach any sort of conclusion on use of AI and development speed.

                                                                    • narush a day ago

                                                                      Hey, thanks for digging into the details here! Copying a relevant comment (https://news.ycombinator.com/item?id=44523638) from the other thread on the paper, in case it's help on this point.

                                                                      1. Some prior studies that find speedup do so with developers that have similar (or less!) experience with the tools they use. In other words, the "steep learning curve" theory doesn't differentially explain our results vs. other results.

                                                                      2. Prior to the study, 90+% of developers had reasonable experience prompting LLMs. Before we found slowdown, this was the only concern that most external reviewers had about experience was about prompting -- as prompting was considered the primary skill. In general, the standard wisdom was/is Cursor is very easy to pick up if you're used to VSCode, which most developers used prior to the study.

                                                                      3. Imagine all these developers had a TON of AI experience. One thing this might do is make them worse programmers when not using AI (relatable, at least for me), which in turn would raise the speedup we find (but not because AI was better, but just because with AI is much worse). In other words, we're sorta in between a rock and a hard place here -- it's just plain hard to figure out what the right baseline should be!

                                                                      4. We shared information on developer prior experience with expert forecasters. Even with this information, forecasters were still dramatically over-optimistic about speedup.

                                                                      5. As you say, it's totally possible that there is a long-tail of skills to using these tools -- things you only pick up and realize after hundreds of hours of usage. Our study doesn't really speak to this. I'd be excited for future literature to explore this more.

                                                                      In general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the factors table on page 11).

                                                                      I'll also note that one really important takeaway -- that developer self-reports after using AI are overoptimistic to the point of being on the wrong side of speedup/slowdown -- isn't a function of which tool they use. The need for robust, on-the-ground measurements to accurately judge productivity gains is a key takeaway here for me!

                                                                      (You can see a lot more detail in section C.2.7 of the paper ("Below-average use of AI tools") -- where we explore the points here in more detail.)

                                                                      • brulard 19 hours ago

                                                                        1. That does not support these results in any way 2. Having experience prompting is quite a little part of being able to use agentic IDE tools. It's like relating cutting onion to being a good cook

                                                                        I think we should all focus on how the effectivity is going to change in the long-term. We all know AI tooling is not going to disappear but to become better and better. I wouldn't be afraid to lose some productivity for months if I would acquire new skills for the future.

                                                                      • WhyNotHugo a day ago

                                                                        An interesting little detail. Any seasoned developer is likely going to take substantially longer if they have to use any IDE except their everyday one.

                                                                        I've been using Vim/Neovim for over a decade. I'm sure if I wanted to use something like Cursor, it would take me at least a month before I can productive even a fraction of my usual.

                                                                        • bagacrap 13 hours ago

                                                                          I recently switched from vim (16 years) to vscode and perceived my productivity to be about the same after one week.

                                                                          No objective measurements here; it might have even increased. But either way, "a month to regain a fraction of productivity" is extreme hyperbole, for me at least.

                                                                        • Art9681 a day ago

                                                                          This is exactly my same take. Any tool an engineer is inexperienced with will slow them down. AI is no different.

                                                                          • bluefirebrand a day ago

                                                                            This runs counter to the starry eyed promises of AI letting people with no experience accomplish things

                                                                            • TeMPOraL a day ago

                                                                              That promise is true, though, and the two claims are not opposite. The devil is in details, specifically in what you mean by "people" and "accomplish things".

                                                                              If by "people" you mean "general public", and by "accomplish things" you mean solving some immediate problems, that may or may not involve authoring a script or even a small app - then yes, this is already happening, and is a big reason behind the AI hype as it is.

                                                                              If by "people" you mean "experienced software engineers", and by "accomplish things" you mean meaningful contributions to a large software product, measured by high internal code and process quality standards, then no - AI tools may not help with that directly, though chances are greater when you have enough experience with those tools to reliably give them right context and steer away from failure modes.

                                                                              Still, solving one-off problems != incremental improvements to a large system.

                                                                              • bluefirebrand a day ago

                                                                                > If by "people" you mean "experienced software engineers",

                                                                                My post is a single sentence and I literally wrote "people with no experience"

                                                                                • helloplanets a day ago

                                                                                  He addressed your point in the paragraph before that. The paragraph from which you quoted was meant to show the difference between your point and the fact that the original research was indeed measuring software engineers.

                                                                                  • bluefirebrand a day ago

                                                                                    My point is that I was very clear about what people I was referring to.

                                                                                    No need for all the "if by people you mean" rigamarole

                                                                                    • ben_w a day ago

                                                                                      Then your previous point is false, because "X helps Y" doesn't run counter to any promise that "X helps Z".

                                                                                      You said the second. You responded to the first.

                                                                                      Y = [experts]

                                                                                      Z = [noobs]

                                                                                      {Y, Z} ⊆ [all humans]

                                                                              • jonfw a day ago

                                                                                AI let's people with no experience accomplish things. People who have experience can create those things without AI. Those experienced folks will likely outperform novices, even when novices leverage AI.

                                                                                None of these statements are controversial. What we have to establish is- Does the experienced AI builder outperform the experienced manual coder?

                                                                            • undefined a day ago
                                                                              [deleted]
                                                                            • yomismoaqui a day ago

                                                                              Someone on X said that these agentic AI tools (Claude Code, Amp, Gemini Cli) are to programming like the table saw was to hand-made woodworking.

                                                                              It can make some things faster and better than a human with a saw, but you have to learn how to use them right (or you will loose some fingers).

                                                                              I personally find that agentic AI tools make me be more ambitious in my projects, I can tackle some things I didn't tthougth about doing before. And I also delegate work that I don't like to them because they are going to do it better and quicker than me. So my mind is free to think on the real problems like architecture, the technical debt balance of my code...

                                                                              Problem is that there is the temptation of letting the AI agent do everything and just commit the result without understanding YOUR code (yes, it was generated by an AI but if you sign the commit YOU are responsible for that code).

                                                                              So as with any tool try to take the time to understand how to better use it and see if it works for you.

                                                                              • candiddevmike a day ago

                                                                                > to programming like the table saw was to hand-made woodworking

                                                                                This is a ridiculous comparison because the table saw is a precision tool (compared to manual woodworking) when agentic AI is anything but IMO.

                                                                                • marcellus23 a day ago

                                                                                  The nature of the comparison is in the second paragraph. It's nothing to do with how precise it is.

                                                                                • bgwalter a day ago

                                                                                  "You are using it wrong!"

                                                                                  This is insulting to all pre-2023 open source developers, who produced the entire stack that the "AI" robber barons use in their companies.

                                                                                  It is even more insulting because no actual software of value has been demonstrably produced using "AI".

                                                                                  • yomismoaqui a day ago

                                                                                    > It is even more insulting because no actual software of value has been demonstrably produced using "AI".

                                                                                    Claude Code and Amp (equivalent from Sourcegraph) are created by humans using these same tools to add new features and fix bugs.

                                                                                    Having used both tools for some weeks I can tell you that they provide a great value to me, enough that I see paying $100 monthly as a bargain related to that value.

                                                                                    Edit: typo

                                                                                    • jdiff a day ago

                                                                                      GP is pointing out the distinct lack of AI driven development in the wild. At this point, agents should be visibly maintaining at least a few popular codebases across this world wide web. The fact that there aren't raises some eyebrows for the claims that are regularly made by proponents. Not just the breathless proponents, either. Even taking claims very conservatively, FOSS maintainer burnout should be a thing of the past, but the only noted interaction with AI seems to be amplifying it.

                                                                                      • yomismoaqui a day ago

                                                                                        It's disingenuous to expect that tools that are publicly available for less than a year have a massive adoption in the wild.

                                                                                        Think that these were internal tools that provided value to engineers on Anthropic, OpenAI, Google & others and now are starting to be adopted by the general public.

                                                                                        Some people are overhyped and some seem hurt because I don't know, maybe they define themselves by their ability to write code by hand.

                                                                                        I have no horse in this race and I can only tell you about my experience and I can tell you that the change is coming.

                                                                                        Also if you don't trust a random HN nickname go read about the experiences of people like Armin Ronacher (Flask creator), Steve Yegge or Thomas H. Ptacek.

                                                                                        - https://lucumr.pocoo.org/2025/6/4/changes/ - https://sourcegraph.com/blog/the-brute-squad - https://fly.io/blog/youre-all-nuts/

                                                                                        • jdiff 18 hours ago

                                                                                          I'm not asking for massive adoption. I'm asking for public facing evidence of what many claim privately, that they have evolved their job into managing agents and reviewing vs writing anything themselves.

                                                                                          Again, not massive adoption, just one codebase that's used in production with this property. If it's such a productivity boost, there has to be at least one public facing project that's done the same as random HN nicknames and nonrandom named individuals.

                                                                                          • asadotzler 21 hours ago

                                                                                            >It's disingenuous to expect that tools that are publicly available for less than a year have a massive adoption in the wild.

                                                                                            Github got massive adoption in a year, probably 100K developers and tens of thousands of projects including big names like Ruby on Rails.

                                                                                            I'm sure if I spent more than 2 minutes on this I'd have even more examples but this one is enough to neuter your claims.

                                                                                  • antimora a day ago

                                                                                    I'm one of the regular code reviewers for Burn (a deep learning framework in Rust). I recently had to close a PR because the submitter's bug fix was clearly written entirely by an AI agent. The "fix" simply muted an error instead of addressing the root cause. This is exactly what AI tends to do when it can't identify the actual problem. The code was unnecessarily verbose and even included tests for muting the error. Based on the person's profile, I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

                                                                                    • dawnerd a day ago

                                                                                      That's what I love about LLMs. You can spot it doesn't know the answer, tell it that it's wrong and it'll go, "You're absolutely right. Let me actually fix it"

                                                                                      It scares me how much code is being produced by people without enough experience to spot issues or people that just gave up caring. We're going to be in for wild ride when all the exploits start flowing.

                                                                                      • cogman10 a day ago

                                                                                        My favorite LLM moment. I wrote some code, asked the LLM "Find any bugs or problems with this code" and of course what it did was hyperfocus on an out of date comment (that I didn't write). Since the problem no longer existed identified in the comment, the LLM just spat out like 100 lines of garbage to refactor the code.

                                                                                        • rectang a day ago

                                                                                          > "You're absolutely right."

                                                                                          I admit a tendency to anthropomorphize the LLM and get irritated by this quirk of language, although it's not bad enough to prevent me from leveraging the LLM to its fullest.

                                                                                          The key when acknowledging fault is to show your sincerity through actual effort. For technical problems, that means demonstrating that you have worked to analyze the issue, take corrective action, and verify the solution.

                                                                                          But of course current LLMs are weak at understanding, so they can't pull that off. I wish that the LLM could say, "I don't know", but apparently the current tech can't know that that it doesn't know.

                                                                                          And so, as the LLM flails over and over, it shamelessly kisses ass and bullshits you about the work its doing.

                                                                                          I figure that this quirk of LLMs will be minimized in the near future by tweaking the language to be slightly less obsequious. Improved modeling and acknowledging uncertainty will be a heavier lift.

                                                                                          • daxfohl a day ago

                                                                                            It'd be nice if github had a feature that updated the issue with this context automatically too, so that if this agent gives up and closes the PR, the next agent doesn't go and do the exact same thing.

                                                                                            • candiddevmike a day ago

                                                                                              > tell it that it's wrong and it'll go, "You're absolutely right. Let me actually fix it"

                                                                                              ...and then it still doesn't actually fix it

                                                                                              • mlyle a day ago

                                                                                                So, I recently have done my first couple heavily AI augmented tasks for hobby projects.

                                                                                                I wrote a TON of LVGL code. The result wasn’t perfect for placement, but when I iterated a couple of times, it fixed almost all of the issues. The result is a little hacked together but a bit better than my typical first pass writing UI code. I think this saved me a factor of 10 in time. Next I am going to see how much of the cleanup and factoring of the pile of code it can do.

                                                                                                Next I had it write a bunch of low level code to init hardware. It saved me a little time compared to reading the reference manual, and was more pleasant, but it wasn’t perfectly correct. If I did not have domain expertise I would not have been able to complete the task with the LLM.

                                                                                                • la_fayette a day ago

                                                                                                  When you argued that it saved you time by a factor of 10, have you even measured that properly? I initially also had the feeling that LLMs save me time, but in the end it didn't. I roughly compared my performance to past performance by the amount of stories done and LLMs made me slower even if I thought I am saving time...

                                                                                                  From several month of deep work with LLMs I think they are amazing pattern matchers, but not problem solvers. They suggest a solution pattern based on their trained weights. This even could result in real solutions, e.g., when programming Tetris or so, but not when working on somewhat unique problems...

                                                                                                  • mlyle a day ago

                                                                                                    I am pretty confident. Last similar LVGL thing I did took me 10-12 hours, and I had a quicker iteration time (running locally instead of the test hardware). Here I spent a little more than an hour, testing on real hardware, and the last 20 minutes was nitpicking.

                                                                                                    Writing front-end display code and instantiating components to look right is very much playing to the model’s strength, though. A carefully written sentence plus context would become 40 lines of detail-dense but formulaic code.

                                                                                                    (I have also had a lot of luck asking it to make a first pass at typesetting things in Tex, too, for similar reasons)

                                                                                                    • delusional a day ago

                                                                                                      There was a recent study that found that LLM users in general tend to feel like they were more productive with AI while actually being less productive.

                                                                                                      • asadotzler 21 hours ago

                                                                                                        presumably the study this very HN discussion responds to.

                                                                                                        • delusional 19 hours ago

                                                                                                          Heh, yep. Guess I sometimes forget to read the content before commenting too.

                                                                                                    • stavros a day ago

                                                                                                      > If I did not have domain expertise I would not have been able to complete the task with the LLM.

                                                                                                      This kind of sums up my experience with LLMs too. They save me a lot of time reading documentation, but I need to review a lot of what they write, or it will just become too brittle and verbose.

                                                                                                    • Retr0id a day ago

                                                                                                      I was trying out Copilot recently for something trivial. It made the change as requested, but also added a comment that stated something obvious.

                                                                                                      I asked it to remove the comment, which it enthusiastically agreed to, and then... didn't. I couldn't tell if it was the LLM being dense or just a bug in Copilot's implementation.

                                                                                                      • seunosewa a day ago

                                                                                                        Some prompts can help:

                                                                                                        "Find the root cause of this problem and explain it"

                                                                                                        "Explain why the previous fix didn't work."

                                                                                                        Often, it's best to undo the action and provide more context/tips.

                                                                                                        Often, switching to Gemini 2.5 Pro when Claude is stumped helps a lot.

                                                                                                        • brazzy a day ago

                                                                                                          My favourite recent experience was switching multiple times between using a library function and rolling its own implementation, each time claiming that it's "simplifying" the code and making it "more reliable".

                                                                                                          • colechristensen a day ago

                                                                                                            Sometimes it does... sometimes.

                                                                                                            I recently had a nice conversation looking for some reading suggestions from an LLM. The first round of suggestions were superb, some of them I'd already read, some were entirely new and turned out great. Maybe a dozen or so great suggestions. Then it was like squeezing blood from a stone but I did get a few more. After that it was like talking to a babbling idiot. Repeating the same suggestions over and over, failing to listen to instructions, and generally just being useless.

                                                                                                            LLMs are great on the first pass but the further you get away from that they degrade into uselessness.

                                                                                                            • aquariusDue a day ago

                                                                                                              Yeah, when I first heard about "one-shot"ing it felt more like a trick instead of a useful heuristic but with time my experience mimics yours, nowadays I try to one-shot small-ish changes instead of going back and forth.

                                                                                                              • daxfohl a day ago

                                                                                                                I've had some luck in these cases prompting "your context seems to be getting too bloated. summarize this conversation into a prompt that I can feed into a new chat with a fresh context. make sure to include <...>".

                                                                                                                Sometimes it works well the first time, and sometimes it spits out a summary where you can see what it is confused about, and you can guide it to create a better summary. Sometimes just having that summary in its context gets it over the hump and you can just say "actually I'm going to continue with you; please reference this summary going forward", and sometimes you actually do have to restart the LLM with the new context. And of course sometimes there's nothing that works at all.

                                                                                                                • dawnerd a day ago

                                                                                                                  I’ve had really good luck with having gpt generate a todo list that’s very, very detailed. Then having Claude use it to check items off. Still far from perfect but since doing that haven’t run into context issues since I can just start a new chat and feed it the todo (the todo also contains project info).

                                                                                                          • colechristensen a day ago

                                                                                                            I also get things like this from very experienced engineers working outside their area of expertise. It's obviously less of the completely boneheaded suggestion but still doing exactly the wrong thing suggested by AI that required a person to step in and correct.

                                                                                                          • Macha a day ago

                                                                                                            I recently reviewed a MR from a coworker. There was a test that was clearly written by AI, except I guess however he prompted it, it gave some rather poor variable names like "thing1", "thing2", etc. in test cases. Basically, these were multiple permutations of data that all needed to be represented in the result set. So I asked for them to be named distinctively, maybe by what makes them special.

                                                                                                            It's clear he just took that feedback and asked the AI to make the change, and it came up with a change that gave them all very long, very unique names, that just listed all the unique properties in the test case. But to the extent that they sort of became noise.

                                                                                                            It's clear writing the PR was very fast for that developer, I'm sure they felt they were X times faster than writing it themselves. But this isn't a good outcome for the tool either. And I'm sure if they'd reviewed it to the extent I did, a lot of that gained time would have dissipated.

                                                                                                            • meindnoch a day ago

                                                                                                              >a deep learning framework in Rust [...] This is becoming a troubling trend with AI tools.

                                                                                                              The serpent is devouring its own tail.

                                                                                                              • TeMPOraL a day ago

                                                                                                                OTOH when they'll start getting good AI contributions, then... it'll be too late for us all.

                                                                                                                • LoganDark a day ago

                                                                                                                  Deep learning can be incredibly cool and not just used for AI slop.

                                                                                                                • jampa a day ago

                                                                                                                  > I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

                                                                                                                  It has been for a while, AI just makes SPAM more effective:

                                                                                                                  https://news.ycombinator.com/item?id=24643894

                                                                                                                  • pennomi a day ago

                                                                                                                    This is the most frustrating thing LLMs do. They put wide try:catch structures around the code making it impossible to actually track down the source of a problem. I want my code to fail fast and HARD during development so I can solve every problem immediately.

                                                                                                                    • daxfohl a day ago

                                                                                                                      Seems like there's a need for github to create a separate flow for AI-cretaed PRs. Project maintainers should be able to stipulate rules like this in English, and an AI "pre-reviewer" would check that the AI has followed all these rules before the PR is created, and chat with the AI submitter to resolve any violations. For exceptional cases, a human submitter is required.

                                                                                                                      Granted, the compute required is probably more expensive than github would offer for free, and IDK whether it'd be within budget for many open-source projects.

                                                                                                                      Also granted, something like this may be useful for human-sourced PRs as well, though perhaps post-submission so that maintainers can see and provide some manual assistance if desired. (And also granted, in some cases maybe maintainers would want to provide manual assistance to AI submissions, but I expect the initial triaging based on whether it's a human or AI would be what makes sense in most cases).

                                                                                                                      • kfajdsl 21 hours ago

                                                                                                                        This is my number one complaint with LLM produced code too. The worst thing is when it swallows an error to print its own error message with far less info and no traceback.

                                                                                                                        In my rules I tell it that try catches are completely banned unless I explicitly ask for one (an okay tradeoff, since usually my error boundaries are pretty wide and I know where I want them). I know the context length is getting too long when it starts ignore that.

                                                                                                                      • 0xbadcafebee a day ago

                                                                                                                        > The "fix" simply muted an error instead of addressing the root cause.

                                                                                                                        FWIW, I have seen human developers do this countless times. In fact there are many people in engineering that will argue for these kinds of "fixes" by default. Usually it's in closed-source projects where the shittiness is hidden from the world, but trust me, it's common.

                                                                                                                        > I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

                                                                                                                        There was already a problem (pre-AI) with shitty PRs on GitHub made to try to game a system. Regardless of how they made the change, the underlying problem is a policy one: how to deal with people making shitty changes for ulterior motives. I expect the solution is actually more AI to detect shitty changes from suspicious submitters.

                                                                                                                        Another solution (that I know nobody's going to go for): stop using GitHub. Back in the "olden times", we just had CVS, mailing lists and patches. You had to perform some effort in order to get to the point of getting the change done and merged, and it was not necessarily obvious afterward that you had contributed. This would probably stop 99% of people who are hoping for a quick change to boost their profile.

                                                                                                                        • nerdjon a day ago

                                                                                                                          I will never forget being in a code review for a upcoming release, there was a method that was... different. Like massively different with no good reason why it was changed as much as it was for such a small addition.

                                                                                                                          We asked the person why they made the change, and "silence". They had no reason. It became painfully clear that all they did was copy and paste the method into an LLM and say "add this thing" and it spit out a completely redone method.

                                                                                                                          So now we had a change that no one in the company actually knew just because the developer took a shortcut. (this change was rejected and reverted).

                                                                                                                          The scariest thing to me is no one actually knowing what code is running anymore with these models having a tendency to make change for the sake of making change (and likely not actually addressing the root thing but a shortcut like you mentioned)

                                                                                                                          • tomrod a day ago

                                                                                                                            As a side question: I work in AI, but mostly python and theory work. How can I best jump into Burn? Rust has been intriguing to me for a long time

                                                                                                                            • lvl155 a day ago

                                                                                                                              This is a real problem that’s only going to get worse. With the major model providers basically keeping all the data themselves, I frankly don’t like this trend long term.

                                                                                                                              • doug_durham a day ago

                                                                                                                                You should be rejecting the PR because the fix was insufficient, not because it was AI agent written. Bad code is bad code regardless of the source. I think the fixation on how the code was generated is not productive.

                                                                                                                                • glitchc a day ago

                                                                                                                                  No, that's not how code review works. Getting inside the mind of the developer, understanding how they thought about the fix, is critical to the review process.

                                                                                                                                  If an actual developer wrote this code and submitted it willingly, it would either constitute malice, an attempt to sabotage the codebase or inject a trojan, or stupidity, for failing to understand the purpose of the error message. With an LLM we mostly have stupidity. Flagging it as such reveals the source of the stupidity, as LLMs do not actually understand anything.

                                                                                                                                  • RobinL a day ago

                                                                                                                                    The problem is that code often takes as long to review as to write, and AI potentially reduces the quality bar to pull requests. So maintainers have a problem of lots of low quality PRs that take time to reject

                                                                                                                                    • rustyminnow a day ago

                                                                                                                                      > You should be rejecting the PR because the fix was insufficient

                                                                                                                                      I mean they probly could've articulated it your way, but I think that's basically what they did... they point out the insufficient "fix" later, but the root cause of the "fix" was blind trust in AI output, so that's the part of the story they lead with.

                                                                                                                                  • andix a day ago

                                                                                                                                    What I noticed: AI development constantly breaks my flow. It makes me more tired, and I work for shorter time periods on coding.

                                                                                                                                    It's a myth that you can code a whole day long. I usually do intervals of 1-3 hours for coding, with some breaks in between. Procrastination can even happen on work related things, like reading other project members code/changes for an hour. It has a benefit to some extent, but during this time I don't get my work done.

                                                                                                                                    Agentic AI works the best for me. Small refactoring tasks on a selected code snippet can be helpful, but isn't a huge time saver. The worst are AI code completions (first version Copilot style), they are much more noise then help.

                                                                                                                                    • rightbyte 21 hours ago

                                                                                                                                      It would be interesting to record what one do in a day at the desk. Probably quite depressing to watch.

                                                                                                                                      Like, I think 1h would be streaching it for mature codebases.

                                                                                                                                      • andix 21 hours ago

                                                                                                                                        The 1h I'm talking about is not all the time I might spend reading on code. It's the time I might procrastinate on my tasks with reading unrelated code.

                                                                                                                                        Like doom scrolling on social media: Let's see what the fancy new guy got done this week. I need to feel better, I'm just going to look at the commits of the guy in the other team that always breaks production. Let's see how close he got to that recently, ...

                                                                                                                                    • lsy a day ago

                                                                                                                                      Typically debugging, e.g., a tricky race condition in an unfamiliar code base would require adding logging, refactoring library calls, inspecting existing logs, and even rewriting parts of your program to be more modular or understandable. This is part of the theory-building.

                                                                                                                                      When you have an AI that says "here is the race condition and here is the code change to make to fix it", that might be "faster" in the immediate sense, but it means you aren't understanding the program better or making it easier for anyone else to understand. There is also the question of whether this process is sustainable: does an AI-edited program eventually fall so far outside what is "normal" for a program that the AI becomes unable to model correct responses?

                                                                                                                                      • sodapopcan a day ago

                                                                                                                                        This is always my thought whenever I hear the "AI let me build a feature in a codebase I didn't know in a language I didn't know" (which is often, there is at one in these comments). Great, but what have you learned? This is fine for small contributions, I guess, but I don't hear a lot of stories of long-term maintenance. Unpopular opinion, though, I know.

                                                                                                                                        • threetonesun a day ago

                                                                                                                                          I guess it's a question of how anyone learns. There's some value in typing code, I suppose, but with tab complete that's been gone for a long time. Letting AI write something and then reading it seems as good as copying and pasting from some other source.

                                                                                                                                          • sodapopcan a day ago

                                                                                                                                            I'm not super qualified to answer as I haven't gone deep into AI at all. But from my limited observations I'd say yes and no. You generally aren't copy/pasting entire features, just snippets that you yourself have to string together in a sensible way. Of course there are lots of people who still do this and what's why I find most people in this industry infuriating to work with. It's all good when it's boilerplate, and that's actually my primary use of "AI"—it's essentially been a snippets replacement (and is quite good at that).

                                                                                                                                      • doc_manhat a day ago

                                                                                                                                        I directionally disagree with this:

                                                                                                                                        ``` It's common for engineers to end up working on projects which they don't have an accurate mental model of. Projects built by people who have long since left the company for pastures new. It's equally common for developers to work in environments where little value is placed on understanding systems, but a lot of value is placed on quickly delivering changes that mostly work. In this context, I think that AI tools have more of an advantage. They can ingest the unfamiliar codebase faster than any human can, and can often generate changes that will essentially work. ```

                                                                                                                                        Reason: you cannot evaluate the work accurately if you have no mental model. If there's a bug given the systems unwritten assumptions you may not catch it.

                                                                                                                                        Having said that it also depends on how important it is to be writing bug free code in the given domain I guess.

                                                                                                                                        I like AI particularly for green field stuff and one off scripts as it let's you go faster here. Basically you build up the mental model as you're coding with the AI.

                                                                                                                                        Not sure about whether this breaks down at a certain codebase size though.

                                                                                                                                        • horsawlarway a day ago

                                                                                                                                          Just anecdotally - I think your reason for disagreeing is a valid statement, but not a valid counterpoint to the argument being made.

                                                                                                                                          So

                                                                                                                                          > Reason: you cannot evaluate the work accurately if you have no mental model. If there's a bug given the systems unwritten assumptions you may not catch it.

                                                                                                                                          This is completely correct. It's a very fair statement. The problem is that a developer coming into a large legacy project is in this spot regardless of the existence of AI.

                                                                                                                                          I've found that asking AI tools to generate a changeset in this case is actually a pretty solid way of starting to learn the mental model.

                                                                                                                                          I want to see where it tries to make changes, what files it wants to touch, what libraries and patterns it uses, etc.

                                                                                                                                          It's a poor man's proxy for having a subject matter expert in the code give you pointers. But it doesn't take anyone else's time, and as long as you're not just trying to dump output into a PR can actually be a pretty good resource.

                                                                                                                                          The key is not letting it dump out a lot of code, in favor of directional signaling.

                                                                                                                                          ex: Prompts like "Which files should I edit to implement a feature which does [detailed description of feature]?" Or "Where is [specific functionality] implemented in this codebase?" Have been real timesavers for me.

                                                                                                                                          The actual code generation has probably been a net time loss.

                                                                                                                                          • Roscius a day ago

                                                                                                                                            > I've found that asking AI tools to generate a changeset in this case is actually a pretty solid way of starting to learn the mental model.

                                                                                                                                            This. Leveraging the AI to start to develop the mental model is an advantage. But, using the AI is a non-trivial skill set that needs to be learned. Skepticism of what it's saying is important. AI can be really useful just like a 747 can be useful, but you don't want someone picked off the street at random flying it.

                                                                                                                                            • bluefirebrand a day ago

                                                                                                                                              > This. Leveraging the AI to start to develop the mental model is an advantage

                                                                                                                                              Is there any evidence that AI helps you build the mental model of an unfamiliar codebase more quickly?

                                                                                                                                              In my experience trying to use AI for this it often leads me into the weeds

                                                                                                                                            • doc_manhat a day ago

                                                                                                                                              Yeah fair points particularly for larger codebases I could see this being a huge time saver.

                                                                                                                                          • piker a day ago

                                                                                                                                            My main two attempts at using an “agentic” coding workflow were trying to incorporate an Outlook COM interface into my rust code base and to streamline an existing abstract windows API interaction to avoid copying memory a couple of times. Both wasted tremendous amounts of time and were ultimately abandoned leaving me only slightly more educated about windows development. They make great autocompletion engines but I just cannot see them being useful in my project otherwise.

                                                                                                                                            • jdiff a day ago

                                                                                                                                              They make great autocompletion engines, most of the time. It's nice when it can recognize that I'm replicating a specific math formula and expands out the next dozen lines for me. It's less nice when it predicts code that's not even syntactically valid for the language or the correct API for the library I'm using. Those times, for whatever reason, seem to be popping up a lot in the last few weeks so I find myself disabling those suggestions more often than not.

                                                                                                                                              • crinkly a day ago

                                                                                                                                                This is typically what I see when I’ve seen it applied. And as always trying to hammer nails in with a banana.

                                                                                                                                                Rather than fit two generally disparate things together it’s probably better to just use VSTO and C# (hammer and nails) rather than some unholy combination no one else has tried or suffered through. When it goes wrong there’s more info to get you unstuck.

                                                                                                                                                • piker a day ago

                                                                                                                                                  To be fair though, unsafe rust (where the COM lives) is basically just C, so I totally expected it to be tractable in the same way it has been tractable for the last 20ish years? But it isn’t.

                                                                                                                                                  Why is interacting with the OS’ API in a compiled language the wrong approach in 2025? Why must I use this managed Frankenstein’s monster of dotnet? I didn’t want to ship or expect a whole runtime for what should be a tiny convenience DLL. Insane

                                                                                                                                                • charcircuit a day ago

                                                                                                                                                  I had the opposite experience. Gemini was able to work with COM and accomplish what I needed despite me never using COM before.

                                                                                                                                                  • tonyedgecombe a day ago

                                                                                                                                                    I've done a lot of work with COM over the years and that is the last technology I would trust to an AI. It's very easy to write COM code that appears to work but contains subtle bugs.

                                                                                                                                                    • piker a day ago

                                                                                                                                                      That was my issue. Integration works, Outlook itself not so much, afterwards. (I.e. memory error.)

                                                                                                                                                    • piker a day ago

                                                                                                                                                      Actually hadn’t tried Gemini with it yet. Perhaps worth taking a look.

                                                                                                                                                  • trey-jones a day ago

                                                                                                                                                    Doing my own post-mortem of a recent project (the first that I've leaned on "AI" tools to any extent), my feeling was the following:

                                                                                                                                                    1. It did not make me faster. I don't know that I expected it to.

                                                                                                                                                    2. It's very possible that it made me slower.

                                                                                                                                                    3. The quality of my work was better.

                                                                                                                                                    Slower and better are related here, because I used these tools more to either check ideas that I had for soundness, or to get some fresh ideas if I didn't have a good one. In many cases the workflow would be: "I don't like that idea, what else do you have for me?"

                                                                                                                                                    There were also instances of being led by my tools into a rabbit hole that I eventually just abandoned, so that also contributes to the slowness. This might happen in instances where I'm using "AI" to help cover areas that I'm less of an expert in (and these were great learning experiences). In my areas of expertise, it was much more likely that I would refine my ideas, or the "AI" tool's ideas into something that I was ultimately very pleased with, hence the improved quality.

                                                                                                                                                    Now, some people might think that speed is the only metric that matters, and certainly it's harder to quantify quality - but it definitely felt worth it to me.

                                                                                                                                                    • jpc0 a day ago

                                                                                                                                                      I do this a lot and absolutely think it might even improve it, and this is why I like the current crop of AIs that are more likely to be argumentative and not just capitulate.

                                                                                                                                                      I will ask the AI for an idea and then start blowing holes in its idea, or will ask it to do the same for my idea.

                                                                                                                                                      And I might end up not going with it’s idea regardless but it got me thinking about things I wouldn’t have thought about.

                                                                                                                                                      Effectively its like chatting to a coworker that has a reasonable idea about the domain and can bounce ideas around.

                                                                                                                                                      • trey-jones a day ago

                                                                                                                                                        I'm on record saying it's "like the smartest coworker I've ever had" (no offense).

                                                                                                                                                    • uludag a day ago

                                                                                                                                                      Great article and I was having very similar thoughts with regards to this productivity study and the "Programming as Theory Building" paper. I'm starting to be convinced that if you are the original author of a program and still have the program's context in the head, you are the asymptote to which any and all AI systems will approach but never surpass: maybe not in terms of raw coding speed, but in terms of understanding the program, its vision of development, its deficiencies and hacks, its context, its users and what they want, the broader culture the program exists in, etc.

                                                                                                                                                      I really like how the author then brought up the point that for most daily work we don't have the theory built, even a small fraction of it, and that this may or may not change the equation.

                                                                                                                                                      • conartist6 a day ago

                                                                                                                                                        Thanks, <3

                                                                                                                                                      • sltr 3 hours ago

                                                                                                                                                        A couple of months ago I put forth Naur's program theory as an argument why LLM's can't replace human developers:

                                                                                                                                                        > LLMs as they currently exist cannot master a theory, design, or mental construct because they don't remember beyond their context window. Only humans can can gain and retain program theory.

                                                                                                                                                        https://news.ycombinator.com/item?id=44114631

                                                                                                                                                        • neuroelectron a day ago

                                                                                                                                                          Good article and it makes sense. I wish I had sometime in my career worked on a codebase that was possible to be understood without 10 years of experience. Instead most of my development time was spent tracing execution paths through tangles of abstractions in nested objects in 10M LOC legacy codebases. My buddy who introduced me to the job is still doing it today and now uses AI and this has given him the free time to start working on his own side projects. So there's certain types if jobs where AI will certainly speed up your development.

                                                                                                                                                          • hakfoo 13 hours ago

                                                                                                                                                            My two cents:

                                                                                                                                                            My experience with AI is that it's workable for "approximate" things, but it's frustratingly difficult to use as a precision tool.

                                                                                                                                                            It works great for the trivial demos, where you say "here's an API, build a client" without significant constraints, because that use case is a pretty wide, vague goal. I wasn't going to hold it accountable for matching existing corporate branding, code style, or how to use storage efficiently, so it can work fine.

                                                                                                                                                            But most of the real work is in the "precision tool" space. You aren't building that many blank-slate API clients, many of the actual tickets are "flip bit 29 of data structure XQ33 when it's a married taxpayer filing singly and huckleberries are in season". The actual change is 3 lines of code, and the effort is in thinking and understanding the problem (and the hundreds of lines of misdocumented code surrounding the problem).

                                                                                                                                                            I've had Claude decide it wanted to refactor a bunch of unrelated code after asking for a minor, specific change. Or the classic "here's 2000 lines of code that solve the problem in a highly Enterprise way, when the real developer would look at the problem and spit up 150 lines of actual functionality". You can either spend 30 minutes writing the prompt to do the specific precision thing you want and only that, or you can just write the fix directly.

                                                                                                                                                            • kazinator 5 hours ago

                                                                                                                                                              > Interestingly the developers predict that AI will make them faster, and continue to believe that it did make them faster, even after completing the task slower than they otherwise would!

                                                                                                                                                              It's like a form of gambling in which you don't have a simple indicator that you're going broke. The addiction is the same though.

                                                                                                                                                              • omnicognate a day ago

                                                                                                                                                                All these studies that show "AI makes developers x% more/less productive" are predicated on the idea that developer "productivity" can be usefully captured in a single objectively measurable number.

                                                                                                                                                                Just one problem with that...

                                                                                                                                                                • narush a day ago

                                                                                                                                                                  Thanks for the feedback! I strongly agree this is not the only measure of developer productivity -- but it's certainly one of them. I think this measure as speaks very directly to how _many_ developers (myself included) understand the impact of AI tools on their own work currently (e.g. just speeding up implementation speed).

                                                                                                                                                                  (The SPACE [1] framework is a pretty overview of considerations here; I agree with a lot of it, although I'll note that METR [2] has different motivations for studying developer productivity than Microsoft does.)

                                                                                                                                                                  [1] https://dl.acm.org/doi/10.1145/3454122.3454124

                                                                                                                                                                  [2] https://metr.org/about

                                                                                                                                                                  • charcircuit a day ago

                                                                                                                                                                    As long as the true productivity is correlated with that number it should be fine.

                                                                                                                                                                  • stevekrouse 21 hours ago

                                                                                                                                                                    Such a great essay! Peter Naur's thesis is also the central point in my talk about vibe coding from last month: https://www.youtube.com/watch?v=1WC8dxMC4Xw

                                                                                                                                                                    I'm spending an inordinate amount of time turning that video into an essay, but I feel like I'm being scooped already, so here's my current draft in case anyone wants to get a sneak preview: https://valdottown--89ed76076a6544019f981f7d4397d736.web.val...

                                                                                                                                                                    Feedback appreciated :)

                                                                                                                                                                    • undefined a day ago
                                                                                                                                                                      [deleted]
                                                                                                                                                                      • joshmarlow a day ago

                                                                                                                                                                        I've gotten some pretty cool things working with LLMs doing most of the heavy lifting using the following approaches:

                                                                                                                                                                        * spec out project goals and relevant context in a README and spec out all components; have the AI build out each component and compose them. I understand the high-level but don't necessarily know all of the low-level details. This is particularly helpful when I'm not deeply familiar with some of the underlying technologies/libraries. * having an AI write tests for code that I've verified is working. As we all know, testing is tedious - so of course I want to automate it. And we written tests (for well written code) can be pretty easy to review.

                                                                                                                                                                        • Kim_Bruning a day ago

                                                                                                                                                                          I think different people use these tools differently. I've got mine set up to start in "rubber duck" mode, where I do rubber duck programming, before asking the AI to help me with certain tasks (if at all). Low impact utility scripts? The AI gets let off the leash. Critical core logic? I might do most of the work myself (though having a rubber duck can still be good!)

                                                                                                                                                                          • i_love_retros a day ago

                                                                                                                                                                            The AI hype will die off just like block chain and web3. LLMs are a solution in search of a problem.

                                                                                                                                                                            All the VCs are gonna lose a ton of money! OpenAI will be NopenAI, relegated to the dustbin of history.

                                                                                                                                                                            We never asked for this, nobody wants it.

                                                                                                                                                                            Companies using AI and promoting it in their products will be seen as tacky and cheap. Just like developers and artists that use it.

                                                                                                                                                                            • remorses 21 hours ago

                                                                                                                                                                              Using AI agents productively requires setting up a repository for collaboration, it means writing docs and making the build process easy and fast.

                                                                                                                                                                              As any other tool AI is slow to adopt but has huge gains later on

                                                                                                                                                                              • ringeryless a day ago

                                                                                                                                                                                not to mention the annoyance of AI assisted issues being opened, many times incorrectly due to hallucinations. these tickets hammer human teams with nonsense and suck resources away from real issues.

                                                                                                                                                                                • wellpast a day ago

                                                                                                                                                                                  The fact that the devs thought the AI saved them time is no surprise to me… at least at this point in my career.

                                                                                                                                                                                  Developers (people?) in general for some reason just simply cannot see time. It’s why so many people don’t believe in estimation.

                                                                                                                                                                                  What I don’t understand is why. Is this like a general human brain limitation (like not being able to visualize four dimensions, or how some folks don’t have an internal monologue)?

                                                                                                                                                                                  Or is this more psychodynamic or emotional?

                                                                                                                                                                                  It’s been super clear and interesting to me how developers I work with want to believe AI (code generation) is saving them time when it’s clearly obviously not.

                                                                                                                                                                                  Is it just the hope that one day it will? Is it fetishization of AI?

                                                                                                                                                                                  Why in an industry that so requires clarity of thinking and expression (computer processors don’t like ambiguity), can we be so bad at talking about, thinking about… time?

                                                                                                                                                                                  Don’t get me started on the static type enthusiasts who think their strong type system (another seeming fetish) is saving them time.

                                                                                                                                                                                  • hartator a day ago

                                                                                                                                                                                    I am not super sure how to quickly writing benchmark scripts that are one-shot used slows anyone down, but okay.

                                                                                                                                                                                    • diamond559 a day ago

                                                                                                                                                                                      Measure twice cut once. Not cut 100 times and hope it does it right once.

                                                                                                                                                                                      • xyst a day ago

                                                                                                                                                                                        Not surprising. Use of LLM has only been helpful in initial exploration of unknown code bases or languages for me.

                                                                                                                                                                                        Using it beyond that is just more work. First parse the broken response, remove any useless junk, have it reprocess with updated query.

                                                                                                                                                                                        It’s a nice tool to have (just as search engines gave us easy access to multiple sources/forums), but its limitations are well known. Trying to use it 100% as intended is a massive waste of time and resources (energy use…)

                                                                                                                                                                                        • afro88 a day ago

                                                                                                                                                                                          I said this when the linked paper was shared and got downvotes: it's based on early 2025 data. My point isn't that it should be completely up to date, but that how we need to consider it in that context. This is pre Claude 4, Claude Code. Pre Gemini 2.5 even. These models are such a big step up from what came previously.

                                                                                                                                                                                          Just like we put a (2023) on articles here so they are considered in the right context, so too this paper should be. Blanket "AI tools slow sown development" statements with a "look this rigorous paper says so!" is ignoring a key variable: the rate of effectiveness improvement. If said paper evaluated with the current models, the picture would be different. Also in 3 months time. AI tools aren't a static thing that either works or don't indefinitely.

                                                                                                                                                                                          • tonyedgecombe a day ago

                                                                                                                                                                                            >This is pre Claude 4, Claude Code. Pre Gemini 2.5 even.

                                                                                                                                                                                            The most interesting point from the article wasn't about how well the AI's worked, rather it was the gap between peoples perception and their actual results.

                                                                                                                                                                                          • gjsman-1000 a day ago

                                                                                                                                                                                            What I thought was fascinating, and should be a warning sign to everyone here:

                                                                                                                                                                                            Before beginning the study, the average developer expected about a 20% productivity boost.

                                                                                                                                                                                            After ending the study, the average developer (potentially: you) believed they actually were 20% more productive.

                                                                                                                                                                                            In reality, they were 0% more productive at best, and 40% less productive at worst.

                                                                                                                                                                                            Think about what it would be like to be that developer; off by 60% about your own output.

                                                                                                                                                                                            If you can't even gauge your own output without being 40% off on average, 60% off at worst; be cautious about strong opinions on anything in life. Especially politically.

                                                                                                                                                                                            Edit 1: Also consider, quite terrifyingly, if said developers were in an online group, together, like... here. The one developer who said she thought it made everyone slower (the truth in this particular case), would be unanimously considered an idiot, downvoted to the full -4, even with the benefit of hindsight.

                                                                                                                                                                                            Edit 2: I suppose this goes to show, that even on Hacker News, where there are relatively high-IQ and self-aware individuals present... 95% of the crowd can still possibly be wildly delusional. Stick to your gut, regardless of the crowd, and regardless of who is in it.

                                                                                                                                                                                            • bluefirebrand a day ago

                                                                                                                                                                                              > Also consider, quite terrifyingly, if said developers were in an online group, together, like... here. The one developer who said she thought it made everyone slower (the truth in this particular case), would be unanimously considered an idiot, downvoted to the full -4, even with the benefit of hindsight

                                                                                                                                                                                              Yeah, this is me at my job right now. Every time I express even the mildest skepticism about the value of our Cursor subscription, I'm getting follow up conversations basically telling me to shut up about it

                                                                                                                                                                                              It's been very demoralizing. You're not allowed to question the Emperor's new clothes

                                                                                                                                                                                              • quantumHazer a day ago

                                                                                                                                                                                                This should really be top comment. The problem is this tools can really give us some value in certain type of areas, but they are not like they are marketed.

                                                                                                                                                                                                • undefined a day ago
                                                                                                                                                                                                  [deleted]
                                                                                                                                                                                                  • pphysch a day ago

                                                                                                                                                                                                    Given how deadlines/timelines tend to (not) work in SWE, this is not surprising.

                                                                                                                                                                                                    • gjsman-1000 a day ago

                                                                                                                                                                                                      Perhaps; but this is a developer's own output with an AI tool, compared against their own historical output when they didn't use it. Apparently, the average developer (read: quite possibly most people here) can't even hit the broadside of a barn in estimating their own productivity.

                                                                                                                                                                                                      • dragonwriter a day ago

                                                                                                                                                                                                        That this is generally a problem, and was established as such before software development existed (the big thing people usually point to is a RAND corp from the 1940s) and is the whole motivation for Wideband Delphi estimation methods invented shortly afterwards (of which agile "planning poker" is simply a particular more recent realization) for forward estimation, and why lean methods center on using a plan-do-check-act cycle for process improvements rather than seat of the pants and subjective feel.

                                                                                                                                                                                                        But despite the popularity of some of this (planning poker, particularly; PDCA for process improvements is sadly less popular) as ritual, those elements have become part of a cargo cult where almost no one remembers why we do it.

                                                                                                                                                                                                        • freedomben a day ago

                                                                                                                                                                                                          But this is still regarding forward estimating of future work, whereas GP is talking about gauging actual, past work done. The problems with forward estimation are indeed widely known, but I doubt most people realize that they are so bad at even knowing how productive they were.

                                                                                                                                                                                                        • sureglymop a day ago

                                                                                                                                                                                                          That doesn't surprise me at all. Isn't software engineering in essence about being constantly confronted with new problems to solve and having to come up with a sufficient one on the fly? It seems very hard to estimate this, even if you know yourself well.

                                                                                                                                                                                                          • lupire a day ago

                                                                                                                                                                                                            They were 20% underestimating how long it took them to do a 1-8 hr task that they had just completed.

                                                                                                                                                                                                            It's like Tog's study that people think Keyboard is faster than the mouse even when they are faster with the mouse. Because they are measuring how they feel, not what is actually happening.

                                                                                                                                                                                                            https://www.asktog.com/TOI/toi06KeyboardVMouse1.html

                                                                                                                                                                                                            • marcosdumay a day ago

                                                                                                                                                                                                              That is a very weird set of findings.

                                                                                                                                                                                                              This one in particular:

                                                                                                                                                                                                              > It takes two seconds to decide upon which special-function key to press.

                                                                                                                                                                                                              seems to indicate the study was done on people with no familiarity at all with the software they were testing.

                                                                                                                                                                                                              Either way, I don't think there is any evidence out there supporting that either of keyboard-only or mouse-only is faster or equivalent to keyboard+mouse for well known GUIs.

                                                                                                                                                                                                              • Jensson 9 hours ago

                                                                                                                                                                                                                I code with mouse to move the cursor, and I coded fast enough to compete with the fastest in the world in competitive coding. So I don't think keyboard is significantly faster, if it were I wouldn't have been able to write solutions as fast as the fastest in the world did.

                                                                                                                                                                                                                But I think its fair to say that its much easier to learn to use a mouse effectively than keyboard navigation, so most people are probably faster with a mouse.

                                                                                                                                                                                                                I did try to use keyboard only editors, spend a lot of time on that, but I was always much faster with mouse cursor to navigate and rearrange or copy parts of the code.

                                                                                                                                                                                                    • methuselah_in a day ago

                                                                                                                                                                                                      Those of current generation students who have access to ai might become slow over time. Because when things are not readily available then they have to struggle and work harder in that process, at that time I thing human a lot of secondary things ! Now when everything is easily available especially knowledge without knowing how to struggle with basics. It will eventually make kids dumb. But can be opposite also. Eventually even I become slow even I keep on using chat gpt or gemini.

                                                                                                                                                                                                      • cratermoon a day ago
                                                                                                                                                                                                        • alganet a day ago

                                                                                                                                                                                                          This idea that some developers have some "mental model" and others not is an extraordinary claim, and I don't see extraordinary evidence.

                                                                                                                                                                                                          It sounds like a good thing, right? "Wow, mental model. I want that, I want to be good and have big brain", which encourages you to believe the bullshit.

                                                                                                                                                                                                          The truth is, this paper is irrelevant and a waste of time. It only serves the purpose of creating discussion around the subject. It's not science, it's a cupholder for marketing.

                                                                                                                                                                                                          • imiric a day ago

                                                                                                                                                                                                            You couldn't be more wrong. If you've ever programmed, or worked with programmers, that is not an extraordinary claim at all, but a widely accepted fact.

                                                                                                                                                                                                            A mental model of the software is what allows a programmer to intuitively know why the software is behaving a certain way, or what the most optimal design for a feature would be. In the vast majority of cases these intuitions are correct, and other programmers should pay attention to them. This ability is what separates those with a mental model and those without.

                                                                                                                                                                                                            On the other hand, LLMs are unable to do this, and are usually not used in ways that help build a mental model. At best, they can summarize the design of a system or answer questions about its behavior, which can be helpful, but a mental model is an abstract model of the software, not a textual summary of its design or behavior. Those neural pathways can only be activated by natural learning and manual programming.

                                                                                                                                                                                                            • alganet a day ago

                                                                                                                                                                                                              > You couldn't be more wrong.

                                                                                                                                                                                                              Explanation missing.

                                                                                                                                                                                                              > If you've ever programmed, or worked with programmers, that is not an extraordinary claim at all.

                                                                                                                                                                                                              One step ahead of you. I already say this is engineered to encourage belief "I want to be good, big brain, and open source is good, I want to be good big brain".

                                                                                                                                                                                                              It's marketing.

                                                                                                                                                                                                              > A mental model of the software is what allows a programmer [yadda yadda]

                                                                                                                                                                                                              I'm not saying it doesn't exist, I'm saying the paper doesn't provide any relevant information regarding the phenomena.

                                                                                                                                                                                                              > Those neural pathways can only be activated by natural learning and manual programming.

                                                                                                                                                                                                              Again, probably true. But the paper doesn't provide any relevant information regarding this phenomena.

                                                                                                                                                                                                              ---

                                                                                                                                                                                                              Your answer seems to disagree with me, but displays a disjointed understanding of what I'm really addressing.

                                                                                                                                                                                                              ---

                                                                                                                                                                                                              As a lighthearted fun analogy, I present:

                                                                                                                                                                                                              https://isotropic.org/papers/chicken.pdf

                                                                                                                                                                                                              The paper does not prove the existence of chickens. It says chicken a lot, but never addresses the phenomena of chickens existing.

                                                                                                                                                                                                              • imiric a day ago

                                                                                                                                                                                                                I'm confused by what your point is, then. You want evidence of an abstraction that exists in the minds of experienced developers? That's like asking for evidence of humor or love. We accept these things as real because of shared experiences, not because of concrete evidence.

                                                                                                                                                                                                                • alganet a day ago

                                                                                                                                                                                                                  My point is that the paper has no point, the article on the paper is a stretch, and none of this is relevant in any way except creating chatter.

                                                                                                                                                                                                                  It's useless from the research perspective. But it is a cup-holder for marketing something.

                                                                                                                                                                                                                  I already laid this out very clearly in my first comment.

                                                                                                                                                                                                          • bunderbunder a day ago

                                                                                                                                                                                                            > It's a really fabulous study...

                                                                                                                                                                                                            Ehhhh... not so much. It had serious design flaws in both the protocol and the analysis. This blog post is a fairly approachable explanation of what's wrong with it: https://www.argmin.net/p/are-developers-finally-out-of-a-job

                                                                                                                                                                                                            • narush a day ago

                                                                                                                                                                                                              Hey, thanks for linking this! I'm a study author, and I greatly appreciate that this author dug into the appendix and provided feedback so that other folks can read it as well.

                                                                                                                                                                                                              A few notes if it's helpful:

                                                                                                                                                                                                              1. This post is primarily worried about ordering considerations -- I think this is a valid concern. We explicitly call this out in the paper [1] as a factor we can't rule out -- see "Bias from issue completion order (C.2.4)". We have no evidence this occurred, but we also don't have evidence it didn't.

                                                                                                                                                                                                              2. "I mean, rather than boring us with these robustness checks, METR could just release a CSV with three columns (developer ID, task condition, time)." Seconded :) We're planning on open-sourcing pretty much this data (and some core analysis code) later this week here: https://github.com/METR/Measuring-Early-2025-AI-on-Exp-OSS-D... - star if you want to dig in when it comes out.

                                                                                                                                                                                                              3. As I said in my comment on the post, the takeaway at the end of the post is that "What we can glean from this study is that even expert developers aren’t great at predicting how long tasks will take. And despite the new coding tools being incredibly useful, people are certainly far too optimistic about the dramatic gains in productivity they will bring." I think this is a reasonable takeaway from the study overall. As we say in the "We do not provide evidence that:" section of the paper (Page 17), we don't provide evidence across all developers (or even most developers) -- and ofc, this is just a point-in-time measurement that could totally be different by now (from tooling and model improvements in the past month alone).

                                                                                                                                                                                                              Thanks again for linking, and to the original author for their detailed review. It's greatly appreciated!

                                                                                                                                                                                                              [1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

                                                                                                                                                                                                              • bunderbunder a day ago

                                                                                                                                                                                                                Thanks for the response, you make some very points. Sorry, I had missed your response on the original post. I don't know if it was there yet, or because for some reason their blog is configured to only show the first two comments by default. :/ Either way, my bad.

                                                                                                                                                                                                                I think my bias as someone who spends too much time looking at social science papers is that the protocol allows for spillover effects that, to me, imply that the results must be interpreted much more cautiously than a lot of people are doing. (And then on top of that I'm trying to be hyper-cautious and skeptical when I see a paper whose conclusions align with my biases on this topic.)

                                                                                                                                                                                                                Granted, that sort of thing is my complaint about basically every study on developer productivity when using LLMs that I've seen so far. So I appreciate how difficult this is to study in practice.

                                                                                                                                                                                                            • d00mB0t a day ago

                                                                                                                                                                                                              Blasphemy! How dare you say our Emperor has no clothes! AI is becoming a cult and I'm not here for it.

                                                                                                                                                                                                              • gr8beehive a day ago

                                                                                                                                                                                                                Mirror neurons got people drinking the same stupid kool aid without realizing it.

                                                                                                                                                                                                                • whatever1 a day ago

                                                                                                                                                                                                                  They didn’t use the latest model that was released yesterday night. Follow my paid course to learn how to vibe code/s

                                                                                                                                                                                                                  • rosspackard a day ago

                                                                                                                                                                                                                    One mediocre paper/study (it should not even be called that with all the bias and sample size issues) and now we have to put up with stories re-hashing and dissecting it. I really hope these don't get upvoted more in the future.

                                                                                                                                                                                                                    16 devs. And they weren't allowed to pick which tasks they used the AI on. Ridiculous. Also using it on "old and >1 million line" codebases and then extrapolating that to software engineering in general.

                                                                                                                                                                                                                    Writers like this then theorize why AI isn't helpful, then those "theories" get repeated until it feels less like a theory and more like a fact and it all proliferates into an echo chamber of AI isn't a useful tool. There have been too many anecdotes and my own personal experience to ignore that it isn't useful.

                                                                                                                                                                                                                    It is a tool and you have to learn it to be successful with it.

                                                                                                                                                                                                                    • davidcbc a day ago

                                                                                                                                                                                                                      > And they weren't allowed to pick which tasks they used the AI on.

                                                                                                                                                                                                                      They were allowed to pick whether or not to use AI on a subset of tasks. They weren't forced to use AI on tasks that don't make sense for AI

                                                                                                                                                                                                                      • throwaway284927 a day ago

                                                                                                                                                                                                                        That is not true, usage of AI was decided randomly. From the paper:

                                                                                                                                                                                                                        "To directly measure the impact of AI tools on developer productivity, we conduct a randomized controlled trial by having 16 developers complete 246 tasks (2.0 hours on average) on well-known open-source repositories (23,000 stars on average) they regularly contribute to. Each task is randomly assigned to allow or disallow AI usage, and we measure how long it takes developers to complete tasks in each condition."

                                                                                                                                                                                                                        • davidcbc a day ago

                                                                                                                                                                                                                          Directly from the paper:

                                                                                                                                                                                                                          > If AI is allowed, developers can use any AI tools or models they choose, including no AI tooling if they expect it to not be helpful. If AI is not allowed, no generative AI tooling can be used.

                                                                                                                                                                                                                          AI is allowed not required

                                                                                                                                                                                                                          • throwaway284927 a day ago

                                                                                                                                                                                                                            True, my bad, I didn't read you correctly. What you said was true.

                                                                                                                                                                                                                            I do believe however that it's important to emphasize the fact that they didn't got to choose in general, though, which I think your wording (even though it is correct) does not make evident.

                                                                                                                                                                                                                        • rosspackard a day ago

                                                                                                                                                                                                                          Half the tasks they were not allowed to use AI.

                                                                                                                                                                                                                          • davidcbc a day ago

                                                                                                                                                                                                                            Yes, and the other half they had the option to use AI. That's why I said they were allowed to pick whether or not to use AI on a subset of tasks. On the other subset they were not allowed to use AI.

                                                                                                                                                                                                                        • RamblingCTO a day ago

                                                                                                                                                                                                                          It's just the same with all the anecdotal evidence of some hype guys on twitter claiming 10x performance on coding ... Same same but different

                                                                                                                                                                                                                          • steveklabnik a day ago

                                                                                                                                                                                                                            > and then extrapolating that to software engineering in general.

                                                                                                                                                                                                                            To the credit of the paper authors, they were very clear that they were not making a claim against software engineering in general. But everyone wants to reinforce their biases, so...

                                                                                                                                                                                                                            • rosspackard a day ago

                                                                                                                                                                                                                              Great for the authors. But everyone else seems to be extrapolating. Authors have a responsibility and should recognize how their work will be used.

                                                                                                                                                                                                                              Metr may overall have an ok mission, but their motivation is questionable. They published something like this to get attention. Mission accomplished on that but they had to have known how this would be twisted.

                                                                                                                                                                                                                            • jplusequalt a day ago

                                                                                                                                                                                                                              >One mediocre paper/study (it should not even be called that with all the bias and sample size issues)

                                                                                                                                                                                                                              Can you bring up any specific issues with the metr study? Alternatively, can you site a journal that critiques it?

                                                                                                                                                                                                                              • rosspackard a day ago

                                                                                                                                                                                                                                It was just published. Too new for someone to conduct a direct study to critique and journals don't just publish critiques anyway. It would have to be a study that disputes the results.

                                                                                                                                                                                                                                They used 16 developers. The confidence intervals are wide and a few atypical issues per dev could swing the headline figure.

                                                                                                                                                                                                                                Veteran maintainers on projects they know inside-out. This is a bias.

                                                                                                                                                                                                                                Devs supplied the issue list (then randomized) which still leads to subtle self-selection bias. Maintainers may pick tasks they enjoy or that showcase deep repo knowledge—exactly where AI probably has least marginal value.

                                                                                                                                                                                                                                Time was not independently logged and was self-reported.

                                                                                                                                                                                                                                No possible direct quality metric is possible. Could the AI code be better?

                                                                                                                                                                                                                                The Hawthorne effect. Knowing they are observed paid may make devs over-document, over-prompt, or simply take their time.

                                                                                                                                                                                                                                Many of the devs were new to Cursor

                                                                                                                                                                                                                                Bias in forecasting.

                                                                                                                                                                                                                                • jplusequalt 4 hours ago

                                                                                                                                                                                                                                  TBH, most of your points are a bit of a reach.

                                                                                                                                                                                                                                  >They used 16 developers. The confidence intervals are wide and a few atypical issues per dev could swing the headline figure

                                                                                                                                                                                                                                  This is reasonable, but there have been enough anecdotal evidence from developers over the last 3 years for me to believe the data is measuring something real.

                                                                                                                                                                                                                                  >Veteran maintainers on projects they know inside-out. This is a bias

                                                                                                                                                                                                                                  I think this is complete BS. The study was trying to measure the real world impact of these tools with experienced developers. I think having them try them out on greenfield work, or a code-base they are not familiar with, makes it harder to measure this.

                                                                                                                                                                                                                                  Also, let's be honest--if the study showed that LLMs DID increase productivity on greenfield work, does that even matter? How many developers out there are starting greenfield projects on a weekly basis? I'd argue very few. So if the study is suggesting that experienced developers are better working on code they're already familiar with without the assistance of an LLM, then that means the vast majority of software development work could be better off without LLMs.

                                                                                                                                                                                                                                  >Devs supplied the issue list (then randomized) which still leads to subtle self-selection bias. Maintainers may pick tasks they enjoy or that showcase deep repo knowledge—exactly where AI probably has least marginal value

                                                                                                                                                                                                                                  Again, for MANY developers, they are going to have deep repo knowledge. If they're not faster with LLMs, despite the knowledge, why use them? You're trying to prop up this as bias against the study, but IMO you're missing the point.

                                                                                                                                                                                                                            • mkagenius a day ago

                                                                                                                                                                                                                              AI tends to slow us down because we don't really know what it's good at. Can it write a proper Nginx config? I don't know—let's try. And then we end up wasting 30 minutes on it.

                                                                                                                                                                                                                              Fully autonomous coding tools like v0, a0, or Aider work well as long as the context is small. But once the context grows—usually due to mistakes made in earlier steps—they just can’t keep up. There's no real benefit of "try again" loop yet.

                                                                                                                                                                                                                              For now, I think simple VSCode extensions are the most useful. You get focused assistance on small files or snippets you’re working on, and that’s usually all you need.

                                                                                                                                                                                                                              • ethan_smith a day ago

                                                                                                                                                                                                                                The context switching cost between coding and AI interaction is substantial and rarely measured in these studies. Each prompt/review cycle breaks flow state, which is particularly damaging for complex programming tasks where deep concentration yields the greatest productivity.

                                                                                                                                                                                                                                • bluefirebrand a day ago

                                                                                                                                                                                                                                  This has been my experience too

                                                                                                                                                                                                                                  Ever since my company made switching to Cursor mandatory, I have not been able to hit any kind of flow. I know my own productivity has plummeted and I suspect many others are as well, but no one is saying anything

                                                                                                                                                                                                                                  I have spoken up once or twice and only been smacked down for my troubles, so I am not surprised everyone else is clammed up