Comments Page - Researchers seeing little evidence of benefit from co pilots

« Back Researchers seeing little evidence of benefit from co pilotscio.comSubmitted by chairmansteve a year ago

jchw a year ago
This is my personal experience, despite everyone swearing that it's a game changer. I've tried a fair few times now because people swear everything is revolutionary but I find them almost as annoying as helpful, and as many others have noticed you really have to be careful before accepting the code it outputs as correct, the subtly incorrect bits are extremely insidious. An example: in one case I was trying an AI tool to write some reasonably tricky logic for validation and at some point it got something very close to right but flipped part of a conditional. It took me probably 30 minutes to notice even though it should have been pretty obvious.
The best I can say is that the implementation in Jetbrains IntelliJ IDEA is pretty good. It's basically only useful for some repetitive Java boilerplate, but actually that's perfect, it's mindless enough yet easy to validate. It makes me dislike programming in Java a little bit less.
- neocron a year ago
  I somewhat agree, but I don't use tools like copilot integrated into my IDE at all (tried it in beta) and I just hate that the suggestions it made were even worse than what the IDE suggested before. It was kinda what I wanted but like you said with small errors that were hard to findy because mostly it was working.
  But what I reall like is using stuff like edge copilot and other chat-like ai tools to just ask questions and get some, mosty really well, summarized answers. Usually about tech I never used before, haven't used in ages or need to be migrated to much newee versions with breaking changes.
  It saves me a lot of reading the docs, fiddling with it, reading the docs again, reading quite a few posts about it e.g. on stack overflow and than finally getting to work on the actual project.
  I just ask my questions, try the suggested approach, fact check the docs and get to work
- nicholasjarnold a year ago
  While I agree with your sentiment regarding implementing complex bespoke logic, I think there are likely measurable gains when an experienced engineer is operating outside of their current comfort zone.
  Allow me an anecdote - I was recently doing some HTML/CSS to update the public website for a client. This is not where I typically operate, though I obviously have some exposure over the years. Using Supermaven I was able to very quickly perform website surgery that would have ultimately taken me much more time reading Tailwind documentation and loading my RAM with details that would be mostly superfluous to 99% of my professional workload. It's in contexts like this where I think these tools shine.
  'Game changer' is clickbait phrasing for these tools in their current incarnations, but they can certainly save time while producing acceptable results when used appropriately.
- sanitycheck a year ago
  I've done the same, and my experiences also match those in the article.
  Using a LLM as an autocomplete (typescript, kotlin) I found it'd be right about 10% of the time, very obviously wrong ~30% of the time, but mostly (~60%) *plausible*. Plausible is the worst, because if I don't spot the error immediately it'll come back to bite me in the coming hours/weeks and it takes me much longer to fully comprehend existing code than it does to write new code myself.
  Asking questions fares better, but I never have easy questions and for difficult ones it'll usually be wrong. Like, if I'm asking an LLM a question I'm usually at a point where I've read the docs and there's something I don't understand - so it's not that surprising that the LLM won't understand it either. Occasionally pointing out the flaw will get a better answer, but more often it just leads to a circle of flawed answers. I feel like people who say this is some sort of revolution are perhaps people who weren't very good at finding information themselves.
  The only places where this stuff seems to be an unqualified success so far is for jobs where the quality of the output is not very important.
- com2kid a year ago
  Cursor.ai is worlds better than copilot.
  Open up a new code base, ask it to analyze the code and summarize the different main parts of it.
  I was making a json schema for a typescript type and as soon as I typed the name of the json schmea, cursor offered to auto complete the schema.
  "I think there is a concurrency bug in this file, can you find it for me?" that alone saved me potentially hours of debugging.
  When working with APIs I don't often touch, such as Canvas, AI saves me hours to days of effort.
nbbnbb a year ago
I don't have any formal data to prove this available without losing anonymity and probably getting sued by my employer but the introduction of them at my organisation correlates directly to a measurable rise in bugs and incidents. From causal analysis, the tools themselves are not directly responsible as such despite having limited veracity, but people trust them and do not do their jobs properly. There is also a mystique around them being the solution for all validation processes which leads to suboptimal attention at the validation stage on the hope that some vendor we already have is going to magically make a problem go away like they said they would at the last conference. I figure at this point the gain might be a negative on a social and human perspective the moment the idea was commercialised.
Urgh. I can't wait to retire.
wanderingbit a year ago
This finding bewilders me, because my copilot (I use Sourcegraph’s Cody) has become an essential part of my dev productivity toolset. Being able to get answers to questions that would normally break me out of flow mode by simply Option + C’ing to open up a New Chat has been a productivity boost for me. Getting it to give me little snippets of code that I can use helps keep me in flow mode. Getting it to do a first pass on function comments, which I then edit, has made it much easier to get over the activation energy barrier that usually holds me back from doing full commenting.
I can’t say if the bug count is higher or not. Maybe it is higher in terms of total number of bugs I write throughout my coding session. But if bug count goes up 10% then the speed with which I fix those bugs and get to a final edit of my code is 30% or 40% faster, so the bug count is not the right metric.
Maybe the differentiator is that I am a solo-dev for all this work, and so the negative effects of the copilot are only experienced by me. If I were in a 10 person team, the bugs and the weird out of context code snippets would be magnified by the 9 other people, and the negative effects would be strong. But I don’t know.
- aithrowawaycomm a year ago
  This might be the most relevant difference:
  > Like the Uplevel study, Gekht also sees AI assistants introducing errors in code. Each new iteration of the AI-generated code ends up being less consistent when different parts of the code are developed using different prompts.
  > “It becomes increasingly more challenging to understand and debug the AI-generated code, and troubleshooting becomes so resource-intensive that it is easier to rewrite the code from scratch than fix it,” he says.
  In particular if there's little standardization in prompting styles across the team, I could see things getting confusing.
  But there are also bad incentives on teams that don't exist for solo devs: e.g. presumably you aren't shipping code solely because your manager is getting on to you about missed deadlines, without any business justifications for the hurry. AI codegen that effectively optimizes to the manager / user story seems bad for business.
pfisherman a year ago
Had the chance to watch some non programmers use copilot for data science (using pandas) and it was an eye opening experience. I came away with the feeling that the tool landed in a sort of “uncanny valley” of productivity. If you can’t write the code without copilot then you won’t be able to debug the errors it makes. And if you know enough to spot and debug the errors, then copilot just gets in the way.
thepuppet33r a year ago
Genuinely thought this was an article about copilots in planes and was terrified that airlines were going to cut back to one pilot in the cockpit to save a little more money.
- robinsonrc a year ago
  Yeah, I assumed it couldn’t be that, but it did read like that. “Copilots” is not the right term. It doesn’t match the title of the article I’m seeing.
stumblehump a year ago
It sadly goes not the way of 'done better' but 'done at all' in terms of the helpful AI nonsense. Let me illustrate: I work in a mostly tech-illiterate context, big corpo, all proprietary and 5 departments away, so nobody would move a finger for your customising wishes without a month of foreplay. Me, a teenage script kiddie (css hacks for Myspace, minor AS2 template diddling) started with getting more complicated macros for Sheets. Then, some semi-interactive Apps Script, to collate and structure data. A bit of FTP automation here, JSON parsing there, Geodata alignment, context aware scrapes, PowerShell, minor Python one shots... and now, thanks to our new shitcode overlords, I find myself in the cockpit of a McGuyver supermachine, being able to run dozens of no longer manual, no more weeks in processing and subcontractor-sloppy tasks. Funnily enough I also started seeing patterns, typing out changes in-code, refactoring and integrating old chunks. Why not, then?
taftster a year ago
I think an interesting use for copilot would be to ask it to find a bug given the description of an observed behavior. Let's say you're not super familiar with a code base, but yet you have found a bug (or "feature") that should be addressed. Having copilot narrow in on the logical code points to potentially address the issue would be invaluable.
Additionally, I find the copilot code suggestions during code reviews / pull requests sometimes useful. At times, it can offer some insightful bits about a code segment, such as potential exception handling fixes, etc.
I'd like to explore having copilot write unit tests, including representative test data, that can execute edge code paths. I haven't done this yet, but this seems exactly the type of thing that a "copilot" would do for me (not too unlike paired-programming, maybe).
Having a copilot completely write my code base, that's another thing entirely. There would be too much going back and verifying that it got it right. And additionally, I've seen it completely conjure up bogus solutions as well. For example, I've had copilot offer a configuration change that was completely fabricated; it looked legitimate enough that a senior systems engineer attempted to install/deliver the "fix" it offered when the suggestion was completely made up.
Overall, I guess my experience with copilot is not much different than working with any human. Trust but verify.
gtvwill a year ago
Eh common theme amongst coders but I feel like it's less the LLM and more pebkac. You have a new tool, it can be hugely productive. You just need to use it right. Stop expecting it to write your whole app or create new formulas for hyper complex problems that haven't yet been solved or aren't common. It's a reference tool that's better than reading the docs or browsing stack overflow. Ask it for snippets of code, break up your tasks, use it to compare a number of methods that achieve the same result, discuss with it different approaches to the same problem.
```
  Much like how a nailgun won't just magically build you a house, it'll just let you build one quicker. 
```
I get great benefit out of llms for coding. I'm not a good coder. But I am decent at planning and understanding what I want. Llms get me there 100x quicker than not using them. I don't need 4 years of cs to learn all the tedious algos for sorting or searching. I just ask an ai for a bunch of examples, assess them for what they are and get on with it. It can tell me the common pros and cons of it all and much like any other decision in business I make my best judgement and go with it.
Need to sort a heap of data into x y z or convert it from x to y? Llm will show me the way, now I don't need to hire someone to do it for me.
But alas, so many seem to think a language interpretation tool is actually a do it all one stop shop of production. Pebkac, your using the tool wrong.
- boredtofears a year ago
  Yeah, this tracks — the only ones I see blindly trusting its outputs are the ones that can’t properly identify errors in the code. Pebkac indeed. If the output works with a simple happy path test it’s deemed as working with blissful ignorance to all invariant states.
mxxx a year ago
The thing that I’ve seen my team use it most for is explaining blocks of code. We maintain a bunch of legacy systems that don’t get touched often and are written in stacks that our engineers aren’t completely fluent with, and it can be helpful when they’ve traced an issue to a particular function but the original intent or purpose of the code is obtuse.
Eisenstein a year ago
> “Using LLMs to improve your productivity requires both the LLM to be competitive with an actual human in its abilities
No it does not. Does an assistant have to be as qualified as their boss?
> “The LLM does not possess critical thinking, self-awareness, or the ability to think.”
This is completely irrelevant. The LLM can understand your instructions and it can type 30,000 times faster than you.
- surgical_fire a year ago
  The speed with which the assistant types doesn't help if the content is bogus.
  Doubly so when the content is at the same time plausible, seemingly valid, but horribly wrong. It is the worst possible combination, one that normally takes more time to find out what is wrong and fix it then it would have been to not use the assistant in the first place.
  Eisenstein a year ago
  > The speed with which the assistant types doesn't help if the content is bogus.
  But I can read faster than I can type.
- undefined a year ago
  [deleted]
- infamouscow a year ago
  > No it does not. Does an assistant have to be as qualified as their boss?
  The delusion that LLMs or so-called "AI" must be equals to their human overseers is laughable. If these tools are to transcend their current role as mere comic relief, it's crucial to recognize that in any functional hierarchy, the "supervisors" are the true senior figures. This nonsense about management being a separate entity might apply in the tech bubble, but in the real economy, it's a fantasy—one that any serious adult left the cradle believing.
  > This is completely irrelevant. The LLM can understand your instructions and it can type 30,000 times faster than you.
  I suppose this sounds like some sort of achievement—though why anyone would be impressed is beyond me. It reveals an ignorance so profound that it fails even to reach the bar of a lackluster undergraduate's grasp of what non-deterministic polynomial time entails.
  Eisenstein a year ago
  > It reveals an ignorance so profound that it fails even to reach the bar of a lackluster undergraduate's grasp of what non-deterministic polynomial time entails.
  I say this with all honesty and with good faith: you should reflect on what motivated you to write that sentence and reassess why you thought it would be appropriate or clever. By writing such a thing you are embarrassing yourself and revealing a deep insecurity. It is also poorly phrased and sounds contrived. Do better.
  infamouscow a year ago
  My derision is not only warranted, it's practically charitable given the absurdity on display. The fundamental misunderstanding paraded here isn't just embarrassing; it's symptomatic of the broader intellectual decay that now passes for discourse in tech circles.
  Let's cut through the nonsense: the speed at which an LLM processes text is neither remarkable nor revolutionary—it's a brute-force algorithm churning through data with all the grace of a sledgehammer. By current standards, it's laughably inefficient, and your fixation on it reveals a stunning lack of depth. If you attempted to peddle these claims in a serious computer science department, you’d be met with the kind of laughter reserved for first-year undergraduates who haven’t yet grasped the basics.
  And as for your response—how utterly pathetic. If my comparison of LLM efficiency to non-deterministic polynomial time were truly as off-base as you'd like to pretend, you'd have taken the time to refute it properly, to demonstrate where I’ve gone wrong. But instead, you've chosen the coward's path: declaring your supposed superiority while sidestepping any actual argument. You didn't refute my point because you couldn't, and now you resort to some vague, self-righteous posturing, suggesting that I should feel embarrassed. The only embarrassment here is yours—for thinking such a shallow tactic could pass for a real rebuttal.
  It's also come to my attention that all this ridiculous AI hype marketing is having a far more sinister effect—it's causing some of the brightest young minds to give up before they even start. This is something I refuse to accept. If crushing the ignorant morons who perpetuate nonsense means I come across as mean, then so be it. I'll gladly take that role if it means clearing the path for those who actually have the potential to contribute something real to tech, rather than getting lost in the hype-fueled fog of delusion.
  And if you think this is some grand breakthrough, let me assure you, those who actually know what they're talking about predicted ChatGPT's emergence just over a year before it became a buzzword to the public. These same experts, the ones who haven't overdosed on AI hype, would be the first to tell you that you’ve been duped—hook, line, and sinker.
  Eisenstein a year ago
  It is obvious you are playing a character and get a kick out of trolling people you think are beneath you. It won't fill that need for validation you have, though. I suggest you pivot and volunteer teaching code to seniors or something wholesome so you can fulfill your emotional void with good deeds.
marcinzm a year ago
Cursor IDE with Claude 3.5 has been very beneficial for me in terms of productivity. Others a lot less so.
itsdrewmiller a year ago
o1 is the first time I've really trusted the output of coding prompts beyond glorified autocomplete - it's a cut above what's currently out there.
- mnk47 a year ago
  Any tips on prompts for o1? I'm struggling to figure out how much scope/detail/context I should include in my prompts.
  itsdrewmiller a year ago
  It's a little spooky about being able to infer requirements without writing them out, and taking a large prompt and building a large codebase from it. I would be as clear as possible about the desired output - if you want to run it in heroku, let it deal with the procfile, etc. - but try to treat it as 100% declarative beyond that.
undefined a year ago
[deleted]
kuldeepbhai a year ago
jggdlhldvkgdkgdphkdgohelhhoddkgwkhhe lhpd