I have an alternate theory: about 10% of developers can actually start something from scratch because they truly understand how things work (not that they always do it, but they could if needed). Another 40% can get the daily job done by copying and pasting code from local sources, Stack Overflow, GitHub, or an LLM—while kinda knowing what’s going on. That leaves 50% who don’t really know much beyond a few LeetCode puzzles and have no real grasp of what they’re copying and pasting.
Given that distribution, I’d guess that well over 50% of Makefiles are just random chunks of copied and pasted code that kinda work. If they’re lifted from something that already works, job done—next ticket.
I’m not blaming the tools themselves. Makefiles are well-known and not too verbose for smaller projects. They can be a bad choice for a 10,000-file monster—though I’ve seen some cleanly written Makefiles even for huge projects. Personally, it wouldn’t be my first choice. That said, I like Makefiles and have been using them on and off for at least 30 years.
> That leaves 50% who don’t really know much beyond a few LeetCode puzzles and have no real grasp of what they’re copying and pasting.
Small nuance: I think people often don’t know because they don’t have the time to figure it out. There are only so many battles you can fight during a day. For example if I’m a C++ programmer working on a ticket, how many layers of the stack should I know? For example, should I know how the CPU registers are called? And what should an AI researcher working always in Jupyter know? I completely encourage anyone to learn as much about the tools and stack as possible, but there is only so much time.
I like Makefiles, but just for me. Each time I create a new personal project, I add a Makefile at the root, even if the only target is the most basic of the corresponding language. This is because I can't remember all the variations of all the languages and frameworks build "sequences". But "$ make" is easy.
I would just change the percentages, but is about as true as it gets.
> That leaves 50% who don’t really know much beyond a few LeetCode puzzles and have no real grasp of what they’re copying and pasting.
Who likely wouldn't have a job if it weren't for LLMs.
pretty sure we've made this complaint about a subset of developers since way before chatgpt and the like.
And that happens not only with developers, but in any profession, which gives me shivers when I go to the doctor!
I think Makefile is maybe the wrong analogy - the problem with most people and makefiles is they write so few of them, the general idea of what make does is at hand, but the muscle memory of how to do it from scratch is not.
But, point taken - I've seen so much code copy-pasta'd from the web, there will be like a bunch of dead stuff in it that's actually not used. A good practice here is to keep deleting stuff until you break it, then put whatever that was back... And delete as much as possible - certainly everything you're not using at the moment.
This is exactly the problem I face with many tools, Makefiles, KVM setups, docker configurations, CI/CD pipelines. My solution so far has been to create a separate repository with all my notes, shell script example programs etc, for these tool, libraries or frameworks. Every time I have to use these tools, I refer to my notes to refresh my memory, and if I learn something new in the process, I update the notes. I can even point an LLM at it now and ask it questions.
The repository is personal, and contains info on tools that are publicly available.
I keep organisation specific knowledge in a similar but separate repo, which I discard when my tenure with a client or employer ends.
What if your client comes back?
On a more practical note, what structure, formats and tools do you use that enable you to feed it to an LLM?
I'm usually contractually obligated to destroy all client IP that I may posses at the end of an engagement. My contracts usually specify that I will retain engagement specific information for a period of six months beyond the end of the contract. If they come back within that time, then I'll have prior context. Otherwise it's gone. Occasionally, a client does come back after a year or two, but most of the knowledge would have been obsolete and outdated anyway.
As for LLMs. I have a couple of python scripts that concatenate files in the repo into a context that I pass to Google's Gemini API or Google AI studio, mostly the latter. It can get expensive in some situations. I don't usually load the whole repository. And I keep the chat context around so I can keep asking question around the same topic.
At my work I've noticed another contributing factor: tools/systems that devs need to interact with at some point, but otherwise provide little perceived value to learn day-to-day.
Example is build system and CI configuration. We absolutely need these but devs don't think they should be expected to deal with them day to day. CI is perceived as a system that should be "set and forget", like yeah we need it but really I have to learn all this just to build the app? Devs expect it to "just work" and if there are complexities then another team (AKA my role) deals with that. As a result, any time devs interact with the system, there's a high motivation to copy from the last working setup and move on with their day to the "real" work.
The best solution I see is meet the devs halfway. Provide them with tooling that is appropriate simple/complex for the task, provide documentation, minimise belief in "magic". Tools like Make kinda fail here because they are too complex and black-box-like.
For me the big problems with CI setups tend to be:
- They're often slow
- They're often proprietary
- They're often dealing with secrets which limits who can work on them
- You generally can't run them locally
So the feedback cycle for working on them is incredibly long. And working on them is therefore a massive pain.
> You generally can't run them locally
I recognize that this is such a disincentive for me taking the initiative to fiddle with and learn about anything like this
The office coffee machine is not „set and forget”, but you wouldn’t expect the entire responsibility for it’s maintenance to be evenly distributed between all people that use it. Similarly, CI needs ownership and having it fall on the last developer that attempted to use it is not an efficient way of working.
Yeah, I think this is the real issue. Too many different tool types that need to interact, so you don't get a chance to get deep knowledge in any of them. If only every piece of software/CI/build/webapp/phone-app/OS was fully implemented in GNU make ;-) There's a tension between using the best tool for the job vs adding yet another tool/dependency.
I have made conscious effort in the past to never copy/paste the initial fleshing-out of a Makefile or a PHP class, or HTML boilerplate, or whatever. Like, for years I stuck to that. Then I stopped making that effort because there is no upside. Or rather, there is no downside to copy+paste+modify. It's faster and you save your brain power for things that actually matter.
If I had a nickel for every time I have seen a Makefile straight up copied from other projects and modified to "work" while leaving completely unrelated unnecessary build steps and targets in place.
It's a major pet peeve of mine.
How do you know what is and isn't related if nothing is documented?
Trial and error?
Well have fun with that :p
Exactly. Bonus points if the person who started the project moved on and you have to be the one to build and maintain it.
Okey but to me, copying - pasting working code (even with sone extra unused bits) really looks no more different than inheriting a library - provided base class, and then extending it to one's needs.
That's literally the basis of all software. There is no need to invent "a Makefile effect/syndrome"
Yes that's an indication that a code sharing mechanism is needed but not implemented. Copying pasting solves that. You don't expect people to rewrite http client for every project which interacts with APIs, so you?
I think this is a good point. As somewhat of a tangent I have vaguely been thinking of the difference between copy pasting and explicitly extending for a bit.
It seems that in many cases, adapting copy pasted code has some benefits over importing and adjusting some library code. https://ui.shadcn.com/ is an example of going the copy paste direction. It seems to me this is preferable when tweaking the exact behaviour is more important than keeping up to date with upstream or adhering to an exact standard. If you customize the behaviour a lot the extra abstraction layer only gets in the way.
This insight might be a bit mundane. But I remember myself bending over backwards a bit too much trying to reuse when copy pasting is fine.
Is not this a very generic phenomenon? I would argue it applies broadly. For example budgeting, you usually start from last year's budget and tweak that, rather than start from scratch. Or when you write an application letter, or a ServiceNow ticket, or whatever. Now I regret that I have brought in ServiceNow in the discussion, it kills the good mood....
There's also "zero based budgeting" (ZBB) that starts from zero and says "justify everything".
Which in my experience sometimes involves copying last years justifications
It might!
But as I understand it and I am not an accountant (IANAA?), for non-ZBB budgets last years budget is usually used as a starting point and increases are justified.
"Here's why I need more money to do the same things as last year, plus more money if you want me to do anything extra".
I'd be curious what our man Le Cost Cutter Elon Musk does for budgeting?
I think LaTeX is the poster child of this. Nobody writes a LaTeX preamble from scratch, you always copy your previous document and tweak it.
Don't do that! If you're always using the same preamble, you should turn it into a .sty file. Then the preamble of new documents is just
\usepackage{myessay}
I use Typst now instead and wrote the "preamble" from scratch. (Because Typst is that much less annoying than LaTeX)
This is “Copy-Pasta Driven Development” [0] and it’s not even related to makefiles. It’s related to the entire industry copying code from here to there without even knowing what they are copying.
TBH I think copilot has made this even worse, as we are blindly accepting chucks of code into our code bases.
[0] https://andrew.grahamyooll.com/blog/copy-pasta-driven-develo...
I have observed the Makefile effect many times for LaTeX documents. Most researchers I worked with had a LaTeX file full of macros that they have been carrying from project to project for years. These were often inherited from more senior researchers, and were hammered into heavily-modified forks of article templates used in their field or thesis templates used at their institution.
This is a great example of an instance of this "Makefile effect" with a possible solution: use Markdown and Pandoc where possible. This won't work in every situation, but sometimes one can compose a basic Beamer presentation or LaTeX paper quickly using largely simple TeX and the same Markdown syntax you already know from GitHub and Reddit.
> use Markdown and Pandoc where possible.
That won’t solve any problem that LaTeX macros solve. Boilerplate in LaTeX has 2 purposes.
The first is to factor frequently-used complex notations. To do this in Markdown you’d need to bolt on a macro preprocessor on top of Markdown.
The second one is to fine-tune typography and layout details (tables are a big offender). This is something that simply cannot be done in Markdown. A table is a table and if you don’t like the style (which is most of the time inadequate) then there is no solution.
A much better solution would be to use Typst, but that still might not work in all situations.
> the tool (or system) is too complicated (or annoying) to use from scratch.
Or boring: some systems require boilerplate with no added value. It's normal to copy & paste from previous works.
Makefiles are a good example. Every makefile author must write their own functionally identical "clean" target. Shouldn't there be an implicit default?
C is not immune, either. How many bits of interesting information do you spot in the following excerpt?
#include <stdio.h>
int main(int argc, char **argv)
{
printf("Hello\n");
return 0;
}
The printf alone is the real payload, the rest conveys no information. (Suggestion for compiler authors: since the programs that include stdio.h outnumber those that don't, wouldn't it be saner for a compiler to automatically do it for us, and accept a flag to not do it in those rare cases where we want to deviate?)> Makefiles are a good example. Every makefile author must write their own functionally identical "clean" target. Shouldn't there be an implicit default?
At some point you have to give the system something to go on, and the part where it starts deleting files seems like a good one where not to guess.
It's plenty implicit in other places. You can for example, without a Makefile even, just do `make foo` and it will do its best to figure out how to do that. If there's a foo.c you'll get a `foo` executable from that with the default settings.
> since the programs that include stdio.h outnumber those that don't
I don't think that is true. There is a lot of embedded systems C out there, plus there are a lot of files in most projects, and include is per file not per project. The project might use stdio in a few files, and not use it in many others.
more implicit behaviors more surprises, like security bugs because default functionality or conversions happen
> The printf alone is the real payload, the rest conveys no information.
What are you talking about? Every line is important.
#include <stdio.h>
This means you need IO in your program. C is a general purpose language , it shouldn't include that unless asked for. You could claim it should include stuff by default, but that would go completely against what C stands for. Code shouldn't have to depend on knowing which flags you need to use to compile successfully (at least not in general like this). int main(int argc, char** argv)
Every program requires a main function. Scripting languages pretend they don't, but they just wrap all top-level code in one. Having that be explicit, again, is important for a low level language like C. By the way, the C standard lets you declare it in a simplified manner: int main(void)
Let's ignore the braces as you could just place them on the same line. printf("Hello\n");
You could just use `puts` here, but apart from that, yeah that's the main payload, cool. return 0;
The C standard actually makes this line optional. Funny but I guess it addresses your complaint that "common stuff" perhaps should not be spelled out all the time?So, here is the actual minimalist Hello world:
#include <stdio.h>
int main(void) {
puts("Hello world\n");
}
> wouldn't it be saner for a compiler to automatically do it for us
no
Copy+tweak happens IRL all the time. There's no reason everyone who bakes should have to reinvent biscuits from scratch. There's no reason chip manufacturers should have to reinvent N-type P-type sandwiches from scratch. The existence of adaptations of previous success does not suggest that baking, or physics, or Make, is overly complicated.
Same with programming: You just copy some old code and modify it, if you have something lying around.
Same with frameworks (Angular, Spring Boot, ...). The tools even come with templates to generate new boilerplate for people who don't have existing ones somewhere.
“Just modify someone else’s Borgmon code and you’re good to go.”
— Broccoli Man <https://www.youtube.com/watch?v=3t6L-FlfeaI>
I see this effect in Java Maven pom.xml files. It's hard to get a straightforward answer on why each build step is needed, what each attribute means, what parts are optional or mandatory, etc. There seems to be a culture of copying these XML files and tweaking a few things without truly understanding what the whole file means. I briefly looked at Ant and Gradle, and their ecosystems don't look any better. The build configuration files seem to have too much unexplainable magic in them.
Java would really benefit from a fresh take on the build story. Maven is definitely a tool that suffers from this phenomenon.
Imo, the only solution is to avoid boilerplate generators and the parent poms projects like spring boot use for things like pom files: you can look at the boilerplate to get ideas for what might be necessary, but, if you’re starting a project, write the pom yourself. It’s a pain the first couple times, but it gets easier to know what you need.
> I briefly looked at …Gradle… The build configuration files seem to have too much unexplainable magic in them.
This is largely due to the use of groovy. When the Kotlin DSL is used instead, it can usually be introspected by (eg) IntelliJ. Otherwise, it’s pretty opaque.
Bullshit. Groovy can be introspected just as well as Kotlin. And the magic in kts files is still there:
configure<SourceSetContainer> {
named("main") {
java.srcDir("src/core/java")
}
}
Unless you know this, there's zero way you will come up with this by typing `configure` and using just auto-completion. Might as well use Groovy and a String for the name of the thing you're configuring. Good tooling would be able to auto-complete from there whether it's Groovy or Kotlin (or Java etc).Honestly for Java I really like Bazel. You should give it a shot. I have a project with a self contained jvm and jars from maven central. Its more explicit than the other options but way less magical IMO.
Most of my simple C projects have make.sh instead that has something like:
clear
gcc some options -o foo && ./foo
You might benefit from make, as you wouldn't need a full rebuild every time, or have to spell out every step.
This only happens because people treat build code at a lower standard than app code. IMO you should treat all code with the same rigour. From build scripts to app code to test code.
Why write hacks in build tools when you wouldn’t do in your app code.
We build tool code with the same quality as the app code. That’s why most tooling we use are written in typescript: type safety, code reuse…
I would argue the main reason is that Make is just bad. There are easier to use alternatives such as scons or rake that don't have this effect applied to them.
I think this is completely normal for tools that you program seldomly. I write makefiles a couple of times a year, I've been using make for more than 40 years now, I use it every day, but I seldomly program it, and when I want something more than simple dependancies I often clone something that already works.
What is a counterexample?
I cannot think of a single tool which is complex enough but does not show the makefile effect
We use Cargo Cult to refer to this phenomenon: https://en.wikipedia.org/wiki/Cargo_cult_programming
At first I couldn't understand what this article is saying. Then, .SECONDEXPANSION: kicked in!
Old IBM mainframe scripting in JCL https://en.wikipedia.org/wiki/Job_Control_Language (so "OS JCL" now, I suppose) used to have a terrible reputation for this, but I've never actually touched the stuff myself.
Good points in general.
On the other hand, there are cases where (beneficial/desired) verbosity prompts copy-paste and tweaking - not due to complexity but from some form of scale or size of the input.
In many cases this is a sign of something that should be dynamic data (put it in a db instead of conf) but that's not always the case and worth the tradeoff in the moment.
I feel this way every time with webpack, npm and babel.
amazon's internal build tool experiences this same phenomena. engineers are hired based on their leetcode ability; which means the average engineer has gaps in their infrastructure and config tool knowledge/skillset. until the industrys hiring practices shift, this trend will continue.
As an undergrad, I did group projects with people who quite literally could not compile and run any actual project on their system outside of a pre-packaged classwork assignment, who essentially could not code at all outside of data structure and algorithm problem sets, who got Google internships the next semester.
But they were definitely brighter than I when it came to such problem sets. I suppose we need both sorts of engineer to make great things
I call it the yoghurt effect.
Alias, cmake effect.
Let's keep "cmake effect" for "trying your best to not have to touch the language and repeatedly looking for something else whenever you do"
It's a very microsoft feeling pile of crap
A clever point, worth discussing
A better name for this might be the JCL effect, as even experienced mainframe sysprogs copypasta the JCL it takes to build their COBOL programs from a known-good example and then mutatis the mutandis, rather than attempt to build a mental model of how JCL works from the impenetrable documentation and write JCL de novo.
It's no big deal to me to write a small Makefile from scratch. My editor (Emacs) even knows to always use tabs when I hit TAB in a Makefile, removing the confusion of whether I inserted tabs (correct) or spaces (horribly incorrect) on the lines with the commands to build a particular target.
It’s like a variant of Pournelles Law.
or ahem... CMake
What is the counter-example?
We recycle known good stuff to avoid getting bogged down and introducing fresh flaws.
The admonition to know what we're doing and act deliberately applies to so much in life, but flies in the face of Milton Friedman's point in "I, Pencil" => https://youtu.be/67tHtpac5ws?si=yhheE1Y5ELfjWXs-
Setup.py, cron, makefile, bash scripts, GitHub actions, and devcontainers all had this effect... Until AI came around.
Now AI does a great job of getting you 90-100% of the way there.