One interesting bit of context is that the author of this post is a legit world-class software engineer already (though probably too modest to admit it). Former staff engineer at Google and co-founder / CTO of Tailscale. He doesn't need LLMs. That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me personally.
His post reminds me of an old idea I had of a language where all you wrote was function signatures and high-level control flow, and maybe some conformance tests around them. The language was designed around filling in the implementations for you. 20 years ago that would have been from a live online database, with implementations vying for popularity on the basis of speed or correctness. Nowadays LLMs would generate most of it on the fly, presumably.
Most ideas are unoriginal, so I wouldn't be surprised if this has been tried already.
> That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me personally.
There is likely to be a great rift in how very talented people look at sharper tools.
I've seen the same division pop up with CNC machines, 3d printers, IDEs and now LLMs.
If you are good at doing something, you might find the new tool's output to be sub-par over what you can achieve yourself, but often the lower quality output comes much faster than you can generate.
That causes the people who are deliberate & precise about their process to hate the new tool completely - expressing in the actual code (or paint, or marks on wood) is much better than trying to explain it in a less precise language in the middle of it. The only exception I've seen is that engineering folks often use a blueprint & refine it on paper.
There's a double translation overhead which is wasteful if you don't need it.
If you have dealt with a new hire while being the senior of the pair, there's that familiar feeling of wanting to grab their keyboard instead of explaining how to build that regex - being able to do more things than you can explain or just having a higher bandwidth pipe into the actual task is a common sign of mastery.
The incrementalists on the other hand, tend to love the new tool as they tend to build 6 different things before picking what works the best, slowly iterating towards what they had in mind in the first place.
I got into this profession simply because I could Ctrl-Z to the previous step much more easily than my then favourite chemical engineering goals. In Chemistry, if you get a step wrong, you go to the start & start over. Plus even when things work, yield is just a pain there (prove it first, then you scale up ingredients etc).
Just from the name of sketch.dev, it appears that this author is of the 'sketch first & refine' model where the new tool just speeds up that loop of infinite refinement.
I have also many years of programming experience and find myself strongly "accelerated" by LLMs when writing code. But, if you think at it, it makes sense that many seasoned programmers are using LLMs better. LLMs are a helpful tool, but also a hard-to-use tool, and in general it's fair to think that better programmers can do a better use of some assistant (human or otherwise): better understanding its strengths, identifying faster the good and bad output, providing better guidance to correct the approach...
Other than that, what correlates more strongly with the ability to use LLMs effectively is, I believe, language skills: the ability to describe problems very clearly. LLMs reply quality changes very significantly with the quality of the prompt. Experienced programmers that can also communicate effectively provide the model with many design hints, details where to focus, ..., basically escaping many local minima immediately.
> [David, Former staff engineer at Google ... CTO of Tailscale,] doesn't need LLMs. That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me...
Don't doubt for a second the pedigree of founding engs at Tailscale, but David is careful to point out exactly why LLMs work for them (but might not for others):
I am doing a particular kind of programming, product development, which could be roughly described as trying to bring programs to a user through a robust interface. That means I am building a lot, throwing away a lot, and bouncing around between environments. Some days I mostly write typescript, some days mostly Go. I spent a week in a C++ codebase last month exploring an idea, and just had an opportunity to learn the HTTP server-side events format. I am all over the place, constantly forgetting and relearning.
If you spend more time proving your optimization of a cryptographic algorithm is not vulnerable to timing attacks than you do writing the code, I don't think any of my observations here are going to be useful to you.
That approach sounds similar to the Idris programming language with Type Driven Development. It starts by planning out the program structure with types and function signatures. Then the function implementation (aka holes) can be filled in after the function signatures and types are set.
I feel like this is a great approach for LLM assisted programming because things like types, function signatures, pre/post conditions, etc. give more clarity and guidance to the LLM. The more constraints that the LLM has to operate under, the less likely it is to get off track and be inconsistent.
I've taken a shot at doing some little projects for fun with this style of programming in TypeScript and it works pretty well. The programs are written in layers with the domain design, types, schema, and function contracts being figured out first (optionally with some LLM help). Then the function implementations can be figured out towards the end.
It might be fun to try Effect-TS for ADTs + contracts + compile time type validation. It seems like that locks down a lot of the details so it might be good for LLMs. It's fun to play around with different techniques and see what works!
I am not a genius but have a couple of decades experience and finally started using LLMs in anger in the last few weeks. I have to admit that when my free quota from GitHub Copilot ran out (I had already run out of Jetbrains AI as well!! Our company will start paying for some service as the trials have been very successful), I had a slight bad feeling as my experience was very similar to OP: it's really useful to get me started, and I can finish it much more easily from what the AI gives me than if I started from scratch. Sometimes it just fills in boilerplate, other times it actually tells me which functions to call on an unfamiliar API. And it turns out it's really good at generating tests, so it makes my testing more comprehensive as it's so much faster to just write them out (and refine a bit usually by hand). The chat almost completely replaced my StackOverflow queries, which saves me much time and anxiety (God forbid I have to ask something on SO as that's a time sink: if I just quickly type out something I am just asking to be obliterated by the "helpful" SO moderators... with the AI, I just barely type anything at all, leave it with typos and all, the AI still gets me!).
I have been using LLM to generate functional code from *pseudo-code* with excellent results. I am starting to experiment with UML diagrams, both with LLM and computer vision to actually generate code from UML diagrams; for example a simple activity diagram could be the prompt on LLM 's, and might look like:
Start -> Enter Credentials -> Validate -> [Valid] -> Welcome Message -> [Invalid] -> Error Message
Corresponding Code (Python Example):
class LoginSystem:
def validate_credentials(self, username, password):
if username == "admin" and password == "password":
return True
return False
def login(self, username, password):
if self.validate_credentials(username, password):
return "Welcome!"
else:
return "Invalid credentials, please try again."
*Edited for clarityI think what you're describing is basically "interface driven development" and "test driven development" taken to the extreme: where the formal specification of an implementation is defined by the test suite. I suppose a cynic would say that's what you get if you left an AI alone in a room with Hyrum's Law.
> His post reminds me of an old idea I had of a language where all you wrote was function signatures and high-level control flow
Regardless of language, that's basically how you approach the design of a new large project - top down architecture first, then split the implementation into modules, design the major data types, write function signatures. By the time you are done what is left is basically the grunt work of implementing it all, which is the part that LLMs should be decent at, especially if the functions/methods are documented to level (input/output assertions as well as functionality) where it can also write good unit tests for them.
>> where all you wrote was function signatures and high-level control flow, and maybe some conformance tests around them
AIUI that’s where idris is headed
> designed around filling in the implementations for you. 20 years ago that would have been from a live online database
This reminds me a bit of PowerBuilder (or was it PowerDesigner?) from early 1990s. They sold it to SAP later, I was told it's still being used today.
Isn't that the idea behind UML? Which didn't work out so well, however, with the advent of LLMs today, I think that premise could work.
I knew he was a world-class engineer the moment I saw that his site didn't bother with CSS stylesheets, ads, pictures, or anything beyond a rudimentary layout.
The whole article page reads like a site from the '90s, written from scratch in HTML.
That's when I knew the article would go hard.
Substantive pieces don't need fluffy UIs - the idea takes the stage, not the window dressing.
he is using llm for coding. you don't become staff engineer by being a badass coder. Not sure how they are related.
[dead]
Being a dev at a large company is usually the sign that you're not very good though. And anyone can start a company with the right connections.
> A lot of the value I personally get out of chat-driven programming is I reach a point in the day when I know what needs to be written, I can describe it, but I don’t have the energy to create a new file, start typing, then start looking up the libraries I need... LLMs perform that service for me in programming. They give me a first draft, with some good ideas, with several of the dependencies I need, and often some mistakes. Often, I find fixing those mistakes is a lot easier than starting from scratch.
This to me is the biggest advantage of LLMs. They dramatically reduce the activation energy of doing something you are unfamiliar with. Much in the way that you're a lot more likely to try kitesurfing if you are at the beach standing next to a kitesurfing instructor.
While LLMs may not yet have human-level depth, it's clear that they already have vastly superhuman breadth. You can argue about the current level of expertise (does it have undergrad knowledge in every field? PhD level knowledge in every field?) but you can't argue about the breadth of fields, nor that the level of expertise improves every year.
My guess is that the programmers who find LLMs useful are people who do a lot of different kinds of programming every week (and thus are constantly going from incompetent to competent in things that other people already know), rather than domain experts who do the same kind of narrow and specialized work every day.
I think your biggest takeaway should be that they person writing the blog post is extremely well-known versed in programming and has labored over code for hours, along with writing tests, debugging, etc. He knows what he would like because it's second nature. He was able to get the best from the LLM because his vision of what the code should look like helped craft a solid prompt.
Newer people into programming might not have as good of a time because they may skip actually learning something fundamentals and rely on LLMs as a crutch. Nothing wrong with that, I suppose, but there might be at some point when everything goes up in smoke and the LLM is out of answers.
No amount of italic font is going to change that.
The first rule of programming with LLMs is don't use them for anything you don't know how to do. If you can look at the solution and immediately know what's wrong with it, they are a time saver otherwise...
I find chat for search is really helpful (as the article states)
That seems like a wild restriction.
You can give them more latitude for things you know how to check.
I didn't know how to setup the right gnarly typescript generic type to solve my problem but I could easily verify it's correct.
That's the wrong approach.
I use chat for things I don't know how to do all the time. I might not know how to do it, but I sure know how to test that what I'm being told is correct. And as long as it's not, I iterate with the chat bot.
"Trust but verify" is still useful especially when you ask LLMs to do stuff you don't know. I've used LLMs to help me get started on tasks where I wasn't even sure of what a solution was. I would then inspect the code and review any relevant documentation to see if the proposed solution would work. This has been time consuming but I've learned a lot regardless.
I'd like to rephrase as, "don't deploy LLM generated code if you don't know how it works (or what it does)"
This means, it's okay to use LLM to try something new that you're on the fence about. Learn it and then once you've learned that concept or the idea, you can go ahead to use same code if it's good enough.
IMO this is a bad take. I use LLMs for things I don’t know how to do myself all the time. Now, I wouldn’t use one to write some new crypto functions because the risk associated with getting it wrong is huge, but if I need to write something like a wrapper around some cloud provider SDK that I’m unfamiliar with, it gets me 90% of the way there. It also is way more likely to know at least _some_ of the best practices where I’ll likely know none. Even for more complex things getting some working hello world examples from an LLM gives me way more threads to pull on and research than web searching ever has.
You can ask the LLM to teach it to you step by step, and then you can validate it by doing it as well as you go, still quicker than learning it and not knowing how to debug it.
Learning how something works is critical or it's far worse than technical debt.
How you use the LLM matters.
Having an LLM do something for you that you don't know how to do is asking for trouble. An expert likely can off load a few things they aren't all that important, but any junior is going to dig themselves into a significant hole with this technique.
But asking an LLM to help you learn how to do something is often an option. Can't one just learn it using other resources? Of course. LLMs shouldn't be a must have. If at any point you have to depend upon the LLM, that is a red flag. It should be a possible tool, used when it saves time, but swapped for other options when they make sense.
For an example, I had a library I was new to and asked copilot how to do some specific task. It gave me the options. I used this output to go to google and find the matching documentation and gave it a read. I then when back to copilot and wrote up my understanding of what the documentation said and checked to see if copilot had anything to add.
Could I have just read the entire documentation? That is an option, but one that costs more time to give deeper expertise. Sometimes that is the option to go with, but in this case having a more shallow knowledge to get a proof of concept thrown together fit my situation better.
Anyone just copying an AI's output and putting it in a PR without understanding what it does? That's asking for trouble and it will come back to bite them.
I completely agree. In graphics programming, I love having it do things that are annoying but easy to verify (like setting up frame buffers in WebGL). I also ask it do more ambitious things like implementing an algorithm in shader code, and it will sometimes give a result that is mostly correct but subtly wrong. I only have been able to catch those subtle errors because I know what to look for.
>>f you can look at the solution and immediately know what's wrong with it, they are a time saver otherwise...
Indeed getting good at writing code using LLMs demands being very good at reading code.
To that extent its more like blitz chess than autocomplete. You need to think and verify in trees as it goes.
Exactly, you have to (vaguely) know what you’re looking for and have some basic ideas of what algorithms would work. AI is good at helping with syntax stuff but not really good at thinking.
> ... don't use them for anything you don't know how to do ... I find chat for search is really helpful (as the article states)
Not really. I often use Chat to understand codebases. Instead trying to navigate mature, large-ish FOSS projects (like say, the Android Run Time) by looking at it file by file, method by method, field by field (all to laborious), I just ask ... Copilot. It is way, way faster than I and are mostly directionally correct with its answers.
Don't use them for anything you don't know how to test. If you can write unit tests you understand and it passes them all (or visually inspect/test a GUI it generated), you know it's doing well.
My experience is the opposite. I find them most valuable for helping me do things that would be extremely hard or impossible for me to figure out. To wit, I just used one to decode a pagination cursor format and write a function that takes a datetime and generates a valid cursor. Ain’t nobody got time for that.
I no longer work in tech, but I still write simple applications to make my work life easier.
I frequently use what OP refers to as chat-driven programming, and I find it incredibly useful. My process starts by explaining a minimum viable product to the chat, which then generates the code for me. Sometimes, the code requires a bit of manual tweaking, but it’s usually a solid starting point. From there, I describe each new feature I want to add—often pasting in specific functions for the chat to modify or expand.
This approach significantly boosts what I can get done in one coding session. I can take an idea and turn it into something functional on the same day. It allows me to quickly test all my ideas, and if one doesn’t help as expected, I haven’t wasted much time or effort.
The biggest downside, however, is the rapid accumulation of technical debt. The code can get messy quickly. There's often a lot of redundancy and after a few iterations it can be quite daunting to modify.
I think "Chat driven programming" is the most common type of the most hyped LLM-based programming I see on twitter that I just can't relate to. I've incorporated LLMs mainly as auto-complete and search; asking ChatGPT to write a quick script or to scaffold some code for which the documentation is too esoteric to parse.
But having the LLM do things for me, I frequently run into issues where it feels like I'm wasting my time with an intern. "Chat-based LLMs do best with exam-style questions" really speaks to me, however I find that constructing my prompts in such a way where the LLM does what I want uses just as much brainpower as just programming the thing my self.
I do find ChatGPT (o1 especially) really good at optimizing existing code.
Our company has a no AI use policy. The assumption is zero trust. We simply can’t know whether a model or its framework could or would send proprietary code outside the network. So it’s best to assume all LLMs/AI is or will send code or fragments of code. While I applaud the incredible work by their creators, I’m not sure how a responsible enterprise class company could rely on “trust us bro” EULAs or repo readmes.
One mode I felt was missed was "thought partner", especially while debugging (aka rubber ducking).
We had an issue recently with a task queue seemingly randomly stalling. We were able to arrive at the root cause much more quickly than we would have because of a back-and-forth brainstorming session with Claude, which involved describing the issue we were seeing, pasting in code from library to ask questions, asking it to write some code to add some missing telemetry, and then probing it for ideas on what might be going wrong. An issue that may have taken days to debug took about an hour to identify.
Think of it as rubber ducking with a very strong generalist engineer who knows about basically any technical concepts.
I definitely respect David's opinion given his caliber, but pieces like this make me feel strange that I just don't have a burning desire to use them.
Like, yesterday I made some light changes to a containerized VPN proxy that I maintain. My first thought wasn't "how would Claude do this?" Same thing with an API I made a few weeks ago that scrapes a flight data website to summarize flights in JSON form.
I knew I would need to write some boilerplate and that I'd have to visit SO for some stuff, but asking Claude or o1 to write the tests or boilerplate for me wasn't something I wanted or needed to do. I guess it makes me slower, sure, but I actually enjoy the process of making the software end to end.
Then again, I do all of my programming on Vim and, technically, writing software isn't my day job (I'm in pre-sales, so, best case, I'm writing POC stuff). Perhaps I'd feel differently if I were doing this day in, day out. (Interestingly, I feel the same way about AI in this sense that I do about VSCode. I've used it; I know what's it capable of; I have no interest in it at all.)
The closest I got to "I'll use LLMs for something real" was using it in my backend app that tracks all of my expenses to parse pictures of receipts. Theoretically, this will save me 30 seconds per scan, as I won't need to add all of the transaction metadata myself. Realistically, this would (a) make my review process slower, as LLMs are not yet capable of saying "I'm not sure" and I'd have to manually check each transaction at review time, (b) make my submit API endpoint slower since it takes relatively-forever for it to analyze images (or at least it did when I experimented with this on GPT4-turbo last year), and (c) drive my costs way up (this service costs almost nothing to run, as I run it within Lambda's free tier limit).
The killer feature about LLMs with programming in my opinion is autocomplete (the simple copilot feature). I can probably be 2-3x more productive as I'm not typing (or thinking much). It does a fairly good job pulling in nearby context to help it. And that's even without a language server.
Using it to generate blocks of code in a chat like manner in my opinion just never works well enough in the domains I use it on. I'll try to get it to generate something and then realize when I get some functional result I could've done it faster and more effectively.
Funny enough, other commenters here hate autocomplete but love chat.
The use of LLMs reminds me a bit of how people use search engines.
Some years ago I gave a task to some of my younger (but intelligent) coworkers.
They spent about 50 minutes searching in google and came back to me saying they couldn't find what they were looking for.
I then typed in a query, clicked one of the first search results and BAM! - there was the information they were unable to find.
What was the difference? It was the keywords / phrases we were using.
I'm not a 'programmer'. At best, I'm a hacker, at best. I don't work in a team. All my code is mostly one time usage to just get some little thing done, sometimes a bit of personal stuff too. I mostly use Excel anyways, and then python, and even then, I hate python because half the time I'm just dealing with library issues (not a joke, I measured it (and, no, I'm not learning another language, but thank you)). I'm in biotech, a very non code-y section of it too.
LLMs are just a life saver. Literally.
They take my code time down from weeks to an afternoon, sometimes less. Any they're kind.
I'm trying to write a baseball simulator on my own, as a stretch goal. I'm writing my own functions now, a step up for me. The code is to take in real stats, do Monte Carlo, get results. Basic stuff. Such a task was impossible for me before LLMs. I've tried it a few times. No go. Now with LLMs, I've got the skeleton working and should be good to go before opening day. I'm hoping that I can use it for some novels that I am writing to get more realistic stats (don't ask).
I know a lot of HN is very dismissive of LLMs as code help. But to me, a non programmer, they've opened it up. I can do things I never imagined that I could. Is it prod ready? Hell no, please God no. But is it good enough for me to putz with and get just working? Absolutely.
I've downloaded a bunch of free ones from huggingface and Meta just to be sure they can't take them away from me. I'm never going back to that frustration, that 'Why can't I just be not so stupid?', that self-hating, that darkness. They have liberated me.
What the author is asking about, a quick sketchpad where you can try out code quickly and chat with the AI, already exists in the JetBrains IDEs. It's called a scratch file[1].
As far as I know, the idea of a scratch "buffer" comes from emacs. But in Jetbrains IDEs, you have the full IDE support even with context from your current project (you can pick the "modules" you want to have in context). Given the good integration with LLMs, that's basically what the author seems to want. Perhaps give GoLand[2] a try.
Disclosure: no, I don't work for Jetbrains :D just a very happy customer.
I disagree about search. While LLM can give you an answer faster, good doc (eg. MDN article in CSS example) will :
- be way more reliable
- probably be up to date on how you should solve it in latest/recommend approach
- put you in a place where you can search for adjecent tech
LLM with search has potential but I'd like if current tools are more oriented on source material rather than AI paraphrasing.
I’m a hobby programmer who never worked a programming job. Last week I was bored, I asked o1 to help me to write a Solitaire card game using React because I’m very rusty with web development.
The first few steps were great. Guided me to install things and setup a project structure. The model even generated codes for a few files.
Then something went wrong, the model kept telling me what to do in vague, but didn’t output codes anymore. So I asked for further help, and now it started contradicting itself, rewriting business logic that were implemented in the first response, 3-4 pieces of code snippets of the same file that aren’t compatible etc, and it all fell apart.
lots of colleauges using copilot or whatever for autocomplete - I just find that annoying.
or writing tests - that's ... not so helpful. worst is when a lazy dev takes the generated tests and leaves it at that: usually just a few placeholders that test the happy path but ignore obvious corner cases. (I suppose for API tests that comes down to adding test case parameters)
but chatting about a large codebase, I've been amazed at how helpful it can be.
what software patterns can you see in this repo? how does the implementation compare to others in the organisation? what common features of the pattern are missing?
also, like a linter on steroids, chat can help explore how my project might be refactored to better match the organisation's coding style.
That quartile reservoir sampler example is ... intriguing?
My experience with LLM code is that it can't come up with anything even remotely novel. If I say "make it run in amortized O(1)" then 99 times out of 100 I'll get a solution so wildly incorrect (but confidently asserting its own correctness) that it can't possibly be reshaped into something reasonable without a re-write. The remaining 1/100 times aren't usually "good" either.
For the reservoir sampler -- here, it did do the job. David almost certainly knows enough to know the limits of that code and is happy with its limitations. I've solved that particular problem at $WORK though (reservoir sampling for percentile estimates), and for the life of me I can't find a single LLM prompt or sequence of prompts that comes anywhere close to optimality unless that prompt also includes the sorts of insights which lead to an amortized O(1) algorithm being possible (and, even then, you still have to re-run the query many times to get a useful response).
Picking on the article's solution a bit, why on earth is `sorted` appearing in the quantile estimation phase? That's fine if you're only using the data structure once (init -> finalize), but it's uselessly slow otherwise, even ignoring splay trees or anything else you could use to speed up the final inference further.
I personally find LLMs helpful for development when either (1) you can tolerate those sorts of mishaps (e.g., I just want to run a certain algorithm through Scala and don't really care how slow it is if I can run it once and hexedit the output), or (2) you can supply all the auxilliary information so that the LLM has a decent chance of doing it right -- once you've solved the hard problems, the LLM can often get the boilerplate correct when framing and encapsulating your ideas.
I’ve been working with Cursor’s agent mode a lot this week and am seeing where we need a new kind of tool. Because it sees the whole codebase, the agent will quickly get into a state where it’s changed several files to implement some layering or refactor something. This requires a response from the developer that’s sort of like a code review, in that you need to see changes and make comments across multiple files, but unlike a code review, it’s not finished code. It probably doesn’t compile, big chunks of it are not quite what you want, it’s not structured into coherent changesets…it’s kind of like you gave the intern the problem and they submitted a bit of a mess. It would be a terrible PR, but it’s a useful intermediate state to take another step from.
It feels like the IDE needs a new mode to deal with this state, and that SCM needs to be involved somehow too. Somehow help the developer guide this somewhat flaky stream of edits and sculpt it into a good changeset.
Essentially, an LLM is a compressed database with a universal translator.
So what we can get out of it is everything that has been written (and publicly released) before translated to any language it knows about.
This has some consequences.
1. Programmers still need to know what algorithms or interfaces or models they want.
2. Programmers do not have to know a language very well anymore, to write code, but the have to for bug fixing. Consequently the rift between garbage software and quality software will grow.
3. New programming languages will face a big economical hurdle to take off.
But the question must be asked: At what cost?
Are the results a paradigm shift so much better that it's worth the hundreds of billions sunk into the hardware and data centers? Is spicy autocomplete worth the equivalent of flying from New York to London while guzzling thousands of liters of water?
It might work, for some definition of useful, but what happens when the AI companies try to claw back some of that half a trillion dollars they burnt?
I thought his project, sketch.dev is of very poor quality. I wouldn't ship something like this - the auth process is awful and broke, I still can't login. If after 14 hours of the post the service is still rugged to death, it also means the scalability of the app is bad. If we are going to use LLMs to replace hours of programming, we should aim for quality too.
It seems like everything I see about success using LLMs for this kind of work is for greenfield. What about three weeks later when the job changes to maintenance and interation on something that's already working? Are people applying LLMs to that space?
I’ll say that the payoff for investing the time to learn how to do this right is huge. Especially with cursor which allows me to easily chat around context (docs, library files, etc)
I've recently started using Cursor because it means I can now write python where two weeks ago I couldn't write python. It wrote the first pass of an API implementation by feeding it the PDF documentation. I've spent a few days testing and massaging it into a well formed, well structured library, pair-programming style.
Then I needed to write a simple command line utility, so I wrote it in Go, even though I've never written Go before. Being able to make tiny standalone executables which do real work is incredible.
Now if I ever need to write something, I can choose the language most suited to the task, not the one I happen to have the most experience with.
That's a superpower.
I have written a small fullstack app over the holidays, mostly with LLMs, to see how far would they get me. Turns out, they can easily write 90% of the code, but you still need to review everything, make the main architectural decisions and debug stuff when AI cant solve the bug after 2-3 iterations. I get a huge productivity boost and at the same time am not afraid that they will replace me. At least not yet.
Can't recommend aider enough. I've tried many different coding tools, but they all seem like a leaky abstraction over LLMs medium of sequential text generation. Aider, on the other hand, leans into it in the best possible way.
Currently a lot of my work consists of looking at large, (to me) unknown code bases and figuring out how certain things work. I think LLMs are currently very bad at this and it is my understanding that there are problems in increasing context window sizes to multiple millions of tokens, so I wonder if LLMs will ever get good at this.
I've been doing that for a while as well and mostly agree. Although one thing that I find useful is to build the local infrastructure to be able to collect useful prompts and the ability to work with files and urls. Web interface is limiting alone.
I like gptresearcher and all of the glue put in place to be able to extend prompts and agents etc. Not to mention the ability to fetch resources from the web and do research type summaries on it.
All in all it reminds me the work of security researchers, pentesters and analysts. Throughout the career they would build a set of tools and scripts to solve various problems. LLMs kind of force the devs to create/select tools for themselves to ease the burden of their specific line of work as well. You could work without LLMs but maybe it will be a bit more difficult to stand out in the future.
I've been coding professionally for 30 years.
I'm probably in the same place as the author, using Chat-GPT to create functions etc, then cut and pasting that into VSCode.
I've started using cline which allows me to code using prompts inside VSCode.
i.e. Create a new page so that users can add tasks to a tasks table.
I'm getting mixed results, but it is very promising. I create a clinerules file which gets added to the system prompt so the AI is more aware of my architecture. I'm also looking at overiding the cline system prompt to both make it fit my architecture better and also to remove stuff I don't need.
I jokingly imagine in the future we won't get asked how long a new feature will take, rather, how many tokens will it take.
The search part really resonates with me. I do a lot of odd/unusual/one-off things for my side projects, and I use LLMs extensively in helping me find a path forward. It's like an infinitely patient, all-knowing expert that pulls together info from any and all domain. Sometimes it will have answers that I am unable to find another way (eg, what's the difference between "busy s..." and "busy p..." AT command response on the esp8285?). It saves me hours of struggle, and I would not want to go back to the old ways.
My main usage is in helping me approach domains and tools I don't know enough to confidently know how best to get started.
So one thing that doesn't get a mention in the article but is quite significant I think is the long lag of knowledge cutoff dates: looking at even the latest and greatest, there is one year or more of missing information.
I would love for someone more versed than me to tell us how best to use RAG or LoRA to get the model to answer with fully up to date knowledge on libraries, frameworks, ...
I have been getting more value out of LLMs recently, and the great irony is it is because of a few different packages in emacs and the wonderful CLI LLM chat programming tool 'aider'.
My workflow puts LLM chat at my fingertips, and I can control the context. Pretty much any text in emacs can be sent to a LLM of your choice via API.
Aider is even better, it does a bunch of tricks to improve performance, and is rapidly becoming a 'must have' benchmark for LLM coding. It integrates with git so each chat modification becomes a new git commit. Easy to undo changes, redo changes, etc. It also has a bunch of hacks because while o1 is good as reasoning, it (apparently) doesn't do code modification well. Aider will send different types of requests to different 'strengths' of LLMs etc. Although if you can use sonnet, you can just use that and be done with it.
It's pretty good, but ultimately it's still just a tool for transforming words into code. It won't help you think or understand.
I feel bad for new kids who won't develop muscle and sight strength to read/write code. Because you still need to read/write code, and can't rely on the chat interface for everything.
> Search. If I have a question about a complex environment, say “how do I make a button transparent in CSS” I will get a far better answer asking any consumer-based LLM, than I do using an old fashioned web search engine.
I don't think this is about LLMs getting better, but search becoming worse. In no small thanks to LLMs polluting the results. Do search images for terms and count how many are AI generated.
I can say I got better result from Google X years ago vs Google of today.
> There are three ways I use LLMs in my day-to-day programming: 1/ Autocomplete 2/ Search 3/ Chat-driven programming
I do mostly 2/ Search, which is like a personalized Stack Overflow and sometimes feels incredible. You can ask a general question about a specific problem and then dive into some specific point to make sure you understand every part clearly. This works best for things one doesn't know enough about, but has a general idea of how the solution should sound or what it should do. Or, copy-pasting error messages from tools like Docker and have the LLM debug it for you really feels like magic.
For some reason I have always disliked autocomplete anywhere, so I don't do that.
The third way, chat-driven programming, is more difficult, because the code generated by LLMs can be large, and can also be wrong. LLMs are too eager to help, and they will try to find a solution even if there isn't one, and will invent it if necessary. Telling them in the prompt to say "I don't know" or "it's impossible" if need be, can help.
But, like the author says, it's very helpful to get started on something.
> That is why I still use an LLM via a web browser, because I want a blank slate on which to craft a well-contained request
That's also what I do. I wouldn't like having something in the IDE trying to second guess what I write or suddenly absorbing everything into context and coming up with answers that it thinks make a lot of sense but actually don't.
But the main benefit is, like the author says, that it lets one start afresh with every new question or problem, and save focused threads on specific topics.
I think the author is really on the right path with his vision for LLMs as tool for software development. Last week I tried probably all of them with something like a code challenge.
I have to say that I am impressed with sketch.dev, it got me a working example from the first try and it looked cleaner form all the others, similar but cleaner somehow in terms of styling.
The whole time I was using those tools I was thinking that I want exactly this a LLM trained specifically on the Go official documentation, or whatever your favourite language is, ideally fined tuned by the maintainers of the language.
I want the LLM to show me an idiomatic way to write an API using the standard library I don't necessarily want it to do it instead of me, or to be trained on all of the scrapped data they could scrape. Show me a couple of examples maybe explain a concept, give me steps by step guidance.
I also share his frustrations with the chat based approach what annoys me personally the most is the anthropomorphization of the LLMs, yesterday Gemini was even patronizing me...
It seems nice for small projects but I wouldn’t use it for anything serious that I want to maintain long term.
I would write the tests first and foremost: they are the specification. They’re for future me and other maintainers to understand and I wouldn’t want them to be generated: write them with the intention of explaining the module or system to another person. If the code isn’t that important I’ll write unit tests. If I need better assurances I’ll write property tests at a minimum.
If I’m working on concurrent or parallel code or I’m working on designing a distributed system, it’s gotta be a model checker. I’ve verified enough code to know that even a brilliant human cannot find 1-in-a-million programming errors that surface in systems processing millions of transactions a minute. We’re not wired that way. Fortunately we have formal methods. Maths is an excellent language for specifying problems and managing complexity. Induction, category theory, all awesome stuff.
Most importantly though… you have to write the stuff and read it and interact with it to be able to keep it in your head. Programming is theory-building as Naur said.
Personally I just don’t care to read a bunch of code and play, “spot the error;” a game that’s rigged for me to be bad at. It’s much more my speed to write code that obviously has no errors in it because I’ve thought the problem through. Although I struggle with this at times. The struggle is an important part of the process for acquiring new knowledge.
Though I do look forward to algorithms that can find proofs of trivial theorems for me. That would be nice to hand off… although simp does a lot of work like that already. ;)
They’re pretty great for printf debugging. Yesterday I was confounded by a bug so I rapidly added a ton of logging that the LLM wrote instantly, then I had the LLM analyze the state difference between the repro and non repro logs. It found something instantly that it would have taken me a few hours to find, which led me to a fix.
> you’re going to have days of tense back-and-forth about whether the cost of the work is worth the benefit. An LLM will do it in 60 seconds and not make you fight to get it done. Take advantage of the fact that redoing work is extremely cheap.
The fast iteration cycle of getting a baseline (but less than ideal or even completely wrong) is a great point here. Redoing the work is fast and easy but still requires review and validation to know how to request the rework to obtain the optimal result.
His experience mirrors mine. I'm happy he explicitly mentions search, when people have been shouting "this is not meant for search" for a couple years now. Of course it helps with search. I also love the tech for producing first drafts, and it greatly lowers the energy and cognitive load when attacking new tasks, like others are repeating on this thread.
I think at the same time, while the author says this is the second most impressive technology he's seen in his lifetime, it's still a far cry from the bombastic claims being made by the titans of industry regarding its potential. Not uncommon to see claims here on HN of 10x improvements in productivity, or teams of dozens of people being axed, but nothing in the article or in my experience lines up with that.
> I could not go a week without getting frustrated by how much mundane typing I had to do before having a FIM model
For those not in-the-know, I just learned today that code autocomplete is actually called "Fill-in-the-Middle" tasks
LLM auto-complete is good — it suggests more of what I was going to type, and correctly (or close enough) often enough that it’s useful. Especially in the boilerplate-y languages/code I have to use for $dayjob.
Search has been neutral. For finding little facts it’s been about the same as regular search. When digging in, I want comprehensive, dense, reasonably well-written reference documentation. That’s not exactly wide-spread, but LLMs don’t provide this either.
Chat-driven generates too much buggy/incomplete code to be useful, and the chat interface is seriously clunky.
Interesting. I wonder what the equivalent of sketch.dev would look like if it targeted Smalltalk and was embedded in a Smalltalk image (preferably with a local LLM running in smalltalk)?
I'd love to be able to tell my (hypothetical smalltalk) tablet to create an app for me, and work interactively, interacting with the app as it gets built...
Ed: I suppose I should just try and see where cloud ai can take smalltalk today:
This is a great article with lots of useful insights.
But I'm completely unconvinced by the final claim that LLM interfaces should be separate from IDE's, and should be their own websites. No thanks.
I still find most LLMS to be extremely poor programmers .
Claude will often generate tons and tons of useless code quickly using up it's limit. I often find myself yelling at it to stop.
I was just working with it last night.
"Hi Claude, can you add tabs here.": <div>
<MainContent/>
<div/>
Claude will then start generating MainContent.
DeepSeek, despite being free does a much better job than Claude. I don't know if it's smarter, but whatever internal logic it has is much more to the point.
Claude also has a very weird bias towards a handful of UI libraries that has installed, even if those wouldn't be good for your project. I wasted hours on shancn UI which requires a very particular setup to work.
LLM's are generally great at common tasks using a top 5( popularity) language.
Ask it to do something in a Haxe UI library and it'll make up functions that *look* correct.
Overall I like them, they definitely speed things up. I don't think most experienced software engineers have much to worry about for now. But I am really worried about juniors. Why higher a junior engineer, when you can just tell your seniors they need to use Copilot to crank out more code
this is almost exactly how ive been using llms. i dont like the code complete in the ide, personally, and prefer all llm usage to be narrow specific blocks of code. it helps as i bounce between a lot of side projects, projects at work, and freelance projects. not to mention with context switching it really helps keep things moving, imo
I've maintained several SDKs, and the 'cover everything' approach leads to nightmare dependency trees and documentation bloat. imo, the LLM paradigm shifts this even further - why maintain a massive SDK when users can generate precisely what they need? This could fundamentally change how we think about API distribution.
Anyone has good recommendation of LocalLLM for autocompletion
Most editors I use supports online LLM but it's too slow sometimes for me.
Interesting that he had the same thought initially as I did (after running a model myself on my own hardware) : this is like the first time I ran a traceroute across the planet.
Funny, he starts of dismissing an AI IDE to end with building an AI IDE :D (Smells a little bit like not invented here syndrom) Otherwise fascinating article!
This lines up well with my experience. I’ve tried coming at things from the IDE and chat side, and I think we need to merge tooling more to find the sweet spot. Claude is amazing at building small SPAs, and then you hit the context window cutoff and can’t do anything except copy your file out. I suspect IDEs will figure this out before Claude/ChatGPT learn to be good enough at the things folks need from IDEs. But long-term, i suppose you don’t want to have to drop down to code at all and so the constraints of chat might force the exploration of the new paradigm more aggressively.
Hot take of the day, I think making tests and refactors easier is going to be revolutionary for code quality.
The more experienced the engineer the less CSS is on the page. This seems to be a universal truth, I want to learn from these people - but my goodness, but could we at least use margins to center content.
Since all these AI products just put together things they pull from elsewhere, I'm wondering if, eventually, there could be legal issues involving software products put together using such things.
Can’t we just use test-driven development with AI Agents?
1) Idea
2) Tests
3) Code until all tests pass
Does anyone know of any good chat based ui builders. No. Not build a chat app.
Does webflow have something?
My problem is being able to describe what I want in the style I want.
[dead]
[dead]
LLMs are, at their core, search tools. Training is indexing and prompting is querying that index. The granularity being at the n-gram rather than the document level is a huge deal though.
Properly using them requires understanding that. And just like we understand every query won’t find what we want, neither will every prompt. Iterative refinement is virtually required for nontrivial cases. Automating that process, like eg cursor agent, is very promising.