It's problems like these (mistaking a sentinel value for a valid member of a type) that make me think sentinel values are just a mistake. In-band signalling must have been responsible for so many bugs at this point.
At the risk of being that guy who always brings up rust:
enum Process {
Invalid,
Terminated,
Pid(libc::pid_t),
}
There would be no way you could pass instances of the first two enum variants to kill(). Rust is of course not the only language where you can express something like this; e.g. in Scala2: sealed trait Process
object Process (
case object Invalid extends Process
case object Terminated extends Process
case class Pid(pid: Int) extends Process
}
Does D have algebraic data types? Only allowing enums to be represented by integers is a language weakness.It's hard though. I mean, it's easy to say you shouldn't do this. But that becomes isomorphic to "You shouldn't have NULL". And... that's hard, because you really need NULLs for a lot of problems. Or "You shouldn't have error code returns from syscalls", when you really need those.
It's possible to split it out, but then everyone complains about the result. We thought exceptions were going to fix the error code problem, but then we collectively decided we hate them. OSes and static analyzers and instrumentation tools like valgrind/asan/msan have done a really good job of turning NULL dereference into a built-in assertion everywhere, but people won't use them.
Even Rust, which has probably come closest to a genuine solution, has done so only at the cost of immense complexity and somewhat fugly syntax. No free lunch.
I think what's notable here isn't the bug itself, which is pretty routine. It's the fact that simple bugs can be buried so deeply that they get exposed only in a six-hour-long test rig on one particular Linux distribution's backend.
The answer is to make enums explicit e.g. malloc should return either a valid allocation or an error, but you can't easily do that in C. It takes a richer/better type system to make this convenient to use and safe from accidents. Depending on your language and compiler the in memory representation for a nullable and a non-nullable pointer can be the same, but a well designed type system can prevent fallible human programmers from confusing one for the other. Add some destructuring pattern matching to the language the usability improves instead of suffers. Rust added a whole lot more which made the syntax very noisy.
> Even Rust, which has probably come closest to a genuine solution
Haskell Maybe
Scala Option
Java Optional
These things have existed for over a decade (multiple, in the case of Haskell).
All of these didn't just give you something equivalent to C malloc()/free() with a parameter and result type. Maybe monads are useful in a lot of cases, but require a safe type system that prevents accidentally/negligently ignoring errors and language features to make it easy to unwrap the enum variants e.g. destructuring pattern matches and a collections of functions to work with them. I know both Rust and Haskell have those and suspect the other two examples as well. Without them they would be a pain in the ass to use.
I think that’s the point though. This is an area where C just isn’t expressive enough and other languages have come in with a solution to fix that.
The question isn't the ability to handle errors, though. You can do it trivially in C too. It's whether or not the design chosen to permit that has benefits that outweigh its costs.
And as I said I think the jury is very much still out on that one. Again, we tried to do this with exceptions in the early 90's and whiffed badly. Doing it at the validation/assertion/fuzzing/coverage layer with instrumentation tools turns out to be really effective in practice but the uptake remains really low.
Today the mindscape points to Rust's Option as the saviour, which is why I listed it. But as you point out, all these other environments had it much earlier. And... kinda failed. So I'm not hopeful.
In-band sentinels like NULLs and error codes are (1) clear and obvious, (2) near-zero-overhead and (3) extremely useful. Everything else falls down on at least one of those requirements.
> And... kinda failed.
Huh?
I suppose maybe the Java one is dubious(?), but the others were very happily adopted by the ecosystem over sentinel values.
Rust's answer is its type system, but Rust is hardly the only language that can do this.
If a syscalls returns an fd, and on error it returns a negative value, why not define it as a bitfield with one bit a flag? Then in C there'd at least be a field name you could grep for, or grep for it not being there.
Let's steelman this and ignore the portability issues C has with bitfields on ABIs. Many processors don't have a BFI instruction, so working with a bitfield will generate multiple instructions. At the scale of a large program, this can easily result in measurable code bloat and significant (single digit percent) performance overheads on the happy path.
This is actually a discussion that's been going on in the systems programming community for decades. There's a cultural aversion to the runtime costs that's always outweighed any hygiene advantages.
What's the difference between checking the highest bit is set in a struct's bitfield, and checking the highest bit is set when that bit is part of an integer?
Back in the day, there also would have been one's complement systems where there was no sign bit and checking a negative value makes much more sense than "whatever the highest bit is" in that. Thankfully those are mostly dead and gone.
The sign bit has much more defined semantics regardless. You can sign extend it. It's guaranteed to be the MSB, rather than dependent on how your compiler does struct layout and every ABI will agree on where it lives. You don't have to worry about a mismatch between the language semantics of the payload and how any particular ISA treats the "integer". Etc.
> The sign bit has much more defined semantics regardless.
But which of those semantics do we need for conveying which of two states something is in?
If the Linux kernel recommended a more rich type for expressing the result of a syscall, compilers would find a way to express that portably.
Or they wouldn't recommend it. Or compilers wouldn't make it portable. Or libc implementors won't use it. Or lots of other possibilities. And we keep living in the world where we kill a process (often init) when we forget to check a syscall result, and pass fd-or-error somewhere that takes an fd, because those things are both integers.
The standard Linux process management API is a total mess IMO. It doesn't even offer a way to avoid TOCTOU. They've made some improvements recently (https://lwn.net/Articles/794707/) but as far as I know you still can't implement something like pstree without just reading files from `/proc` which is shit.
The problem here wasn't a case of TOCTOU; rather, the program was using a magic constant value that was outside the range of getpid() but which wasn't outside of the domain of kill(2)'s first parameter.
Proper enum types would have solved this problem; e.g. in Rust
enum ProcessHandle {
Pid(pid_t),
Invalid,
Terminated
}
There is no way to pass a Terminated or Invalid process handle to kill, since there is no numeric identifier for it anymore. And, any code that wants to get a PID from a ProcessHandle has to explicitly handle non-PIDs - making it impossible to forget to deal with these special cases.While correct, everything is a number at the ABI boundary, so it’s still possible - if not harder.
Depending on the layout and arch, -2 could well represent Terminated
I think this misunderstands how Rust enums work. The discriminator is separate from the payload, unless niches are involved (they probably wouldn’t be here - note that -2 is a valid pid_t). You cannot go and pass a Rust enum to any FFI function directly (without extra annotations, at least); you’d have to specifically extract the pid_t from the enum before passing it to an FFI function like kill(). And you can only obtain a pid_t from the Pid variant; if the enum holds a different variant, the compiler will stop you from trying to pull out a pid_t.
I understand how Rust enums work, but you’re misunderstanding how ABIs work (or my original point, or both).
Rust enums are a Rust concept. Two identical enums will have different memory layouts on different machines, Rust versions, etc. Even if this was solved, you’re still passing the enum across a boundary. Both sides need to agree on the layout of the enum in memory.
To illustrate the problem, imagine we had two variants: `Pid(NonZeroUsize), LaunchTheNukes`
This can be laid out in memory with a single `usize`: the 0 value becomes the discriminate for `LaunchTheNukes`.
Thus, if you pass `0` (I.e null) across the boundary the other side will interpret it as `LaunchTheNukes`.
In short, if you naively change the layout of an enum and don’t recompile everything that uses the enum (i.e the kernel) then your -2 value could change from being interpreted as `Pid(-2)` in the application to `LaunchTheNukes` in the kernel.
It gets more complex with larger enums and more discriminators, but the general point is that the concepts of enums, type safety and such doesn’t really exist across boundaries like syscalls, because it doesn’t exist in hardware / at the machine level.
Yeah I know, that was just another example of how bad the API is.
UNIX/POSIX process management.
Heh, "even". BSD process groups were around for two decades at least before anyone even thought of the idea of a TOCTOU vulnerability, much less coined an acronym for it. You're just asking for too much here. Yes, it's an API you probably shouldn't use. No, it's not going away.
The solution for this particular problem is cgroups. The build system just wasn't using it.
What inherent TOCTOU vulnerability exists with process groups? Process groups are the way to shoot down a collection of processes atomically in Unix. getpgrp returns the current processes' process group, and killpg(pgrp) sends a signal atomically; so long as the process doesn't switch process groups between those two calls (and using normal APIs only the process can do this itself), there's no race. (IOW, there's no race in `killpg(getpgrp(), signo)`) There are other ways, too, like using SIGHUP delivered when a controlling TTY is closed (not strictly POSIX, but reliable semantics across all the major extant Unix-like systems).
cgroups also allows killing a group of processes race-free, but Linux only got this feature (cgroup.kill) in 2021.[1] Before then even systemd would just walk the cgroup PID list and send a signal to each individual process, which was inherently racy. If this was done from PID 1--init--this could maybe be done correctly by ensuring zombie's weren't reaped--preventing PID recycling--while walking the PID list. OTOH, Linux supports a prctl, PR_SET_CHILD_SUBREAPER, to change a reaper, so perhaps even systemd could never guaranteed to not accidentally shoot down an innocent process.
[1] Merged https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux... I'm not sure what the subsequent release history looks like, but I wouldn't be surprised if some popular LTS distro versions lack this capability.
I suspect the upstream concern was dealing with services that themselves like to spawn process groups. There's no hierarchy of process groups, once you call setpgid() you've escaped the sandbox and the API can't help your parents manage you anymore. The use case in question is a build system that by its nature needs to deal with foreign software.
Also FWIW, you could freeze a cgroup atomically in the very earliest versions and deconstruct it at leisure. You just didn't do it by sending signals.
With TOCTOU, the solution is often just to try to use the resource and handle errors, not to assume that since you checked it, it's available.
PID reuse is the classic example here. There is no neat solution other than process handles, which Unix/Linux does not historically have.
Aren't process handles still an integer under the hood in the end? The solution is presumably the same - use enough bits.
What makes process handles different is that they have a lifetime of their own that is distinct from the lifetime of the process - even if the process exits, the handle still refers to that now-dead process, not to some random new process, until you release the handle.
But if it's still an integer (that you can't see) then it's still subject to reuse to the exact same extent that an integer is or isn't. It's entirely possible to wrap your internal integer at 16 bits and experience handle collisions. Alternatively you are free to use full 64 bit integers on Linux (at least IIUC).
If you can architect a system that never experiences handle collision then quite trivially you can architect an equivalent system that never experiences PID reuse.
You're missing the point. No, handles are not subject to such a problem because their lifetimes are explicit and distinct from the underlying resource. E.g. in Win32, if you have a process handle, it will not ever refer to a process different from the one that it was originally created for until you specifically call CloseHandle on it. The OS will retain that handle in its table and will not reuse it for any other process.
And yes, of course it would be trivially easy to do the same in Linux. Except for that whole part where it would break all the existing code relying on POSIX semantics.
[dead]
That only works if you know the resource in advance. Doesn't work if for example you're trying to delete files matching a glob and you have to enumerate them by name and then delete them by name.
The filesystem API offers a solution for that. The process API doesn't (maybe with procfd; I haven't tried that API yet since it's so new).
I'm not sure that counts as TOCTOU?
What I'm saying is that instead of writing code like:
for fname in someglob:
if os.path.exists(fname):
os.unlink(fname)
You should instead write for fname in someglob:
os.unlink(fname)
and handle the errors.This isn't a TOCTOU problem. This is a problem of using sentinel values to mean other things, when those sentinel values won't ever get returned by getpid(), but kill() will accept those values and do something you didn't intend.
The fix was to check if the PID variable was set to one of the sentinel values, and not call kill() at all if that's what they are.
> It doesn't even offer a way to avoid TOCTOU.
What do you mean? I admit there are cases where it's hard to avoid TOCTOU, but I can't think of any where it's impossible.
For example to kill all of the children of a process you first have to read all of the files in `/proc` to build a process tree, and then send your signals, by which time the process tree may have changed.
Yeah, as far as I know the only way to be sure is by putting the parent process into a cgroup. Then to kill all of the child processes you have to freeze the cgroup, enumerate the pids, send sigterm/sigkill to all of them, before unfreezing again.
Doesn't the new pidfd API you linked give you a race-free way to do that?
Maybe. Tbh I was trying to do something like that on RHEL8 and I think it's too old for procfd.
As long as you aren't trying to reimplement `systemd` badly, you really shouldn't have to deal with this.
"As long as you aren't trying to use the flawed API you won't run into any of its flaws."
Ok then.
More like "if you empty a safe-deposit box that doesn't belong to you, don't be surprised if random people start complaining about their goods being stolen."
Ownership matters for more than just memory. And the "just use GC" answer means deadlocks and leaks galore. If you limit yourself to killing processes you own, you will never have a problem.
So "you're holding it wrong" then? I don't get why you wouldn't want to have a reliable well-designed API for process management?
We already have one. What we don't have is an API for process pillaging.
Think about it - if you aren't init, what exactly is too limiting about "kill the processes or process groups of the children you created yourself"? As a rule, most grandchildren won't bother to enter a new process group, and if they do their immediate parent should be advanced enough to take responsibility for forwarding fatal signals.
Who needs handles for processes when you can have reusable IDs instead? /s
For a while the limit has been around 4 million or something like that, instead of the traditional 32k. Which of course broke a bunch of stuff that assumed "char pid[5];" is good enough. Fun stuff.
The kernel offers up this (here abbreviated) example which you could use for all accesses to /proc/<pid> when you have an open pidfd to ensure that the two refer to the same process:
static int pidfd_metadata_fd(pid_t pid, int pidfd) {
char path[100];
snprintf(path, sizeof(path), "/proc/%d", pid);
int procfd = open(path, O_DIRECTORY | O_RDONLY | O_CLOEXEC);
if (sys_pidfd_send_signal(pidfd, 0, NULL, 0) < 0 && errno != EPERM) {
close(procfd);
procfd = -1;
}
return procfd;
}
And these days you even can poll for pidfds (readable = exited) and you can even use P_PIDFD with waitid. So the very basic process management cycle can be done through pidfds nowadays.> as far as I know you still can't implement something like pstree without just reading files from `/proc` which is shit.
It's also slow! Tools like top/htop/ps -aux etc. use a surprising amount of CPU because they have to do a trillion trips to the kernel and back. Compare this to the win32 equivalent where, iirc, there is just one or two trips to the kernel to acquire a snapshot and you walk that in user-space instead.
char path[100];
snprintf(path, sizeof(path), "/proc/%d", pid);
You're assuming the pid will fit in 93 bytes! The exact same mistake as `char pid[5];`Jokes aside, sometimes userspace deserves to be broken, and `char pid[5]` is IMO one of those cases. The fact that we treat integers as a scarce reusable resource is bad, and the fact that we now have a second-order process identifier to cover that up is worse.
Fun fact that broken code was in the runtime used by a major Fortran compiler
Excellent writeup!
Some observations unrelated to the bug.
1. Nice to see some details about the Debian build server implementation. I have always found it rather obscure. Compared to that OpenSUSE's OBS is refreshingly open. Not only can you just use their cloud service for free, but you can also rather easily set up your own instance.
2. Nice to see that the Debian build infra gets updated. I am a bit suprised that Perl code gets introduced in 2024. Isn't that a bit out of fashion for a reason: It's typically very hard to read code written by others. (As said unrelated. It did not cause the bug in TFA.)
I believe installing https://packages.debian.org/sid/sbuild-debian-developer-setu... will get you most of the way (but I've only ever done manual builds of packages, so I may be wrong).
Are you surprised that programs can live for 10 or 20 years?
Most people who want to rewrite things in a language that's more fashionable don't have the experience maintaining working systems over 20+ years, like Debian does
I'll just pitch in as someone who's worked with several 40+ year old codebases and 3+ year old perl, the perl code is significantly harder to maintain. The language lends itself to unreadable levels of terseness and abuse of regex. Unless you use perl every day, the amount of time trying to understand the syntax vs the code logic is skewed too heavily towards obscure syntax. Even the oldest fortran/c is much easier to work with.
Except maybe arithmetic gotos designed to minimize the number of functions. Those are straight evil and I'm glad no modern language supports them.
Sure, Perl can be hard to maintain. (I haven't used it in over 25 years, and don't expect to ever use it again)
From your experience maintaining 40+ year old codebases, is it possible that improving the Perl codebase is the best choice in a certain situation?
Or should we always be "surprised" that Perl exists? (honest question)
In other words, I don't see the connection between "this code is hard to maintain" and "I am surprised that it exists" / "it should be rewritten".
In general, I tend to observe languages that are minimally challenging for amateurs tend to build less sustainable mutagenic ecosystems.
Example:
* Prolog generated the worlds most functional 1 liner code, as it is bewildering for hope-and-poke programmers
* Pythons ease of use combined with out-of-band library packages became a minefield of compatibility, security, and structural problems (i.e. became the modern BASIC)
* NodeJS was based on a poorly designed clown JavaScript toy language, so naturally became a circus
* Modern C++ template libraries became unusable as complexity and feature creep spiraled out of control to meet everyone's pet use-case
* Rust had a massive inrush of users that don't understand why llvm compilers are dangerous in some use-cases. But considering 99% of devs are in application space, they will unlikely ever need to understand why C has a different use-case.
While "Goto" could just be considered a euphemism for tail-recursion in some languages. We have to remember the feature is there for a reason, but most amateurs are not careful enough to use it in the proper edge-case.
I really hope Julia becomes more popular, as it is the first fun language I've seen in years that kind of balances ease of use with efficient parallelism.
wrote a lot of RISC Assembly at one time... where stacks were finite, and thus one knew evil well... lol =3
Why are LLVM compilers "dangerous in some use-cases", and what are those use-cases?
Any Real-Time or mcu system that relies on repeatable code-motion behavior, and clock domain crossing mitigation (a named problem with finite solutions.) Many seem to wrongly conflate this with guaranteed-latency schedulers... However, the danger occurs when a compiler fails to account for concurrent event order dependent machine states that do not explicitly lock (i.e. less efficient un-optimized gcc will reproduce consistent register op ordered in time behavior for the same source-code.)
The llvm abstraction behavior is usually fine in multitasking application spaces, but can cause obscure intermittent failure modes if concurrent register states are externally dependent on the architecture explicitly avoiding contention. I would recommend having a look at zynq DMA kernel module memory mapped to hardware io examples.
It seems rather trivial, Best of luck =3
But if you depend on specific ordering of register operations, isn't this kind of code best hand-written in assembly anyway?
In general, assembly code escape blocks are often used with macros for ease of understanding architecture specific builds.
There are binary optimizers and linkers that can still thrash Assembly objects in unpredictable ways.
Best of luck =3
I also maintain older codebases and this is how I feel about Clojure. It’s far too terse and the syntax is nigh unreadable unless you are using it very often - add to that there seems to be a culture that discourages comments. OTOH so-called “C” languages seem much more readable even if I have haven’t touched them in years or have never written a line of code in them.
On another note, I have done a lot of things with Perl and know it well, and I agree that it can be written in a very unreadable way, which is of course enabled by its many esoteric features and uncommon flexibility in both syntax and methodology. It is simultaneously a high-level scripting language and a low-level system tool. Indeed, I have caught myself writing “clever” code only to come back years later to regret it. Instead of “TMTOWTDI”it is has turned into “TTMWTDI” (there’s too many ways to do it).
> Even the oldest fortran/c is much easier to work with.
Could that be survivor bias? If that old Fortran was hard to work with, maybe it would have been rewritten, and left the set of “oldest code” in the process.
I think it's just procedural languages with a lack of magic.
sbuild is written in Perl, so it’s not surprising that new code for it is written in Perl.
sbuild is at least 25 years old, it might be nearer to 30 at this point. I did a lot of cleanup of it during the mid-2000s, including adding the schroot support to it which is being removed here, and that included abstracting virtualisation backends which allowed newer solutions to replace it. The original had to run as root to do a chroot(2) call, and it reused the same environment between builds. schroot removed the need to run as root, and it could provide a clean snapshot for each build. But it too is now rather dated--though the replacement tools often lack some of the features it has, like being able to build from ZFS cloned snapshots or Btrfs snapshots, or even LVM snapshots. We wanted it to be fast as well as safe, and the time-to-build and time-to-clean-up using these was and is excellent.
Replacing key infrastructure is hard, particularly when it's working well and the cost of replacement is high, especially so when it's volunteer effort. It was cutting edge back when Perl was all the rage, but while Perl might no longer be the fashionable choice, tools written in Perl continue to work just as they always have. I found it dense and impenetrable even 20 years back, and I used to have the whole thing printed out on fan-fold paper from a dot matrix; it was about 3/4" thick and covered in annotations! Understanding several thousand lines of obscure regexes is a cautionary tale in how to write unmaintainable Perl, and was one of the motivations to clean it up, refactor it into meaningful functions, and document what it was doing so that it could be maintained with a little less effort.
I had a similar murder mystery investigation the last months, and to rewrite the whole murder concept. But in the end, getting rid of signals (I hate signals ) and replacing it with messages didn't work, so I embraced arranging a proper group and killing them at once. With some exceptions.
>From the kill(2) man page
>If pid is less than -1, then sig is sent to every process in the process group whose ID is -pid.
This is just awful IMO. Any time you build an API that has significant behavior boundary on input value (rather than input type) you have probably made a mistake.
I assume this is from the 60s though and therefore impossible to change.
This is still very common, unfortunately. Look at all the languages adopting special semantics for negative indices, for example - Python's xs[-1] etc. This can be a lot of fun to debug when you're passing some computed N, and the bug is that it happens to be negative in some cases.
Good to see schroot is being replaced in sbuild, given that it's nearly 20 years old and no longer a cutting-edge solution to the problem of creating build environments. Can't believe time flies that fast since I wrote it in 2005!
Has an outright replacement for sbuild itself been considered? Given the ubiquity of modern CI systems, it seems quite possible to replace it entirely. I would have thought GitLab runners or Jenkins or any other standard CI system would work well. While I once enjoyed writing Perl, sbuild is not a great piece of code, and the problems it was trying to solve now have better and more widely-used solutions which don't require ongoing maintenance of an even older piece of Perl code.