While reading this, I realized that I have a copy of the elusive usr/group Standard that the author mentions. I just pulled it off an image of my early DOS hard drive (before I migrated to Linux in 1991). I should probably post it somewhere.
# ls -altr
total 540
-rw-r--r-- 1 root root 22606 Apr 12 1990 NOTES
-rw-r--r-- 1 root root 172645 Apr 12 1990 LIB
-rw-r--r-- 1 root root 102349 Apr 12 1990 APP
-rw-r--r-- 1 root root 224037 Apr 12 1990 C
> I should probably post it somewhere.
Upload it to the Internet Archive! :D
> before I migrated to Linux in 1991
What a mic drop. That must have been a fun ride from then until now! Would love to hear some of your battle stories.
My first Linux machine was in 1993, what would you like to know? Pre-1.0 Kernels were an adventure that’s for sure.
My first Linux box ran 0.99.10. I was running SLS, which installed off of a dozen or so floppies. I eventually moved to Slackware a year or so later.
I remember downloading and installing one of the MCC Interim releases in 1993? 1994? before switching to Slackware. Early *BSD and Linux were certainly an adventure back then. I don't miss it.
In the stdio implementations that don't support free intermixing of reads and writes the issue typically is that there is only one buffer for both reading and writing. You have to reset the buffer in order to switch from reading to writing or vice-versa, else you will have a dirty, non-empty buffer that does not correspond. The functions `fflush()`, `fseek()`, `rewind()`, and `fsetpos()` happen to clear the buffer, which is why you have to use them before switching from reading to writing or vice-versa!
Without an indicator in `struct FILE` of whether the last operation was a read or a write, the stdio implementation has no way to detect the problem and correct the situation by automatically flushing and resetting the buffer, say. An alternative would be to have two buffers, naturally. But you can see how a pre-update version could be trivially made to support update modes without adding a second buffer or automatic buffer flushing. And that's almost certainly what happened when update mode was added. My guess is someone got bitten by that and then the maintainer decided to just document the problem rather than fix it, probably because by then fixing the problem was hard.
Historically, before mandatory locking, getc and putc have been implemented as macros, and an extra check for stream state likely mattered from a performance perspective.
To avoid the extra check, you don't actually need two buffers, just separate buffer pointers for reading and writing. (This is probably how most libcs implement this today.) I suppose memory was really scarce back then.
Separate non-overlapping pointers into one buffer is not that different from two buffers, notionally, but yeah.
The idea is that for the non-active mode, the current/end pointers are equal, signifying that the buffer is exhausted. This forces entering the slow path, where the mode can be switched.
I don't think an implementation with two active, non-empty buffers is all that useful because you can't tell which buffer's progress should be used for the file pointer adjustment in ftell.
I get that. One buffer that can be maximized by the path that most needs it (read or write). I'm just saying that notionally it's two independent buffers, which solves the problem of not having to force a buffer flush between mode change.
> I don't think an implementation with two active, non-empty buffers is all that useful because you can't tell which buffer's progress should be used for the file pointer adjustment in ftell.
Oh interesting. The other problem is that two buffers reduces memory utilization.
I'm having trouble following whether the problem occurs with any append or only when it's two consecutive commands like this.
I was a bit disappointed that the article didn't go into the system calls themselves, since AFAIK those have always supported interleaved reads and writes with no problems even on early Unices. E.g. POSIX has this:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/w...
After a write() to a regular file has successfully returned:
Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.
Perhaps because the article is specifically about the buffered f*() calls in stdio, and not the system calls?
Though, as I offer that thought, the divergence between C and the system calls is definitely curious.
I get a real kick out of the different ways people pluralize Unixen. Unices is a good one
I must be missing something.
The article lists three libcs (Open Watcom, Microsoft Visual C++ 6.0, IBM C/C++ 3.6 for Windows) from the good old times. Does the emulator link to Open Watcom, i.e., does it emulate DOS on machines about as old as DOS itself? What's the point here?
I believe it is a bug in the the emulator's implementation of COMMAND.COM. Often, these DOS "emulators" re-implement the standard commands of DOS, including the shell[1]. This is in addition to emulating weird 16-bit environment stuff and the BIOS.
The bug can pop up in any C program using stdio that assumes it's fine to do `fread` followed immediately by `fwrite`. The spec forbids this. To make matters more confusing, this behavior does _not_ seem to be in modern libc implementations. Or at least, it works on my machine. I bet modern implementations are able to be more sane about managing different buffers for reading and writing.
The original COMMAND.COM from MS-DOS probably did not have this problem, since at least in some versions it was written in assembly[2]. Even for a shell written in C, the fix is pretty easy: seek the file before switching between reading/writing.
The title of this post is confusing, since it clearly _is_ a bug somewhere. But I think the author was excited about possibly finding a bug in libc:
> Sitting down with a debugger, I could just see how the C run-time library (Open Watcom) could be fixed to avoid this problem.
[1] Here's DOSBox, for example: https://github.com/dosbox-staging/dosbox-staging/blob/main/s...
[2] MS-DOS 4.0: https://github.com/microsoft/MS-DOS/tree/main/v4.0/src/CMD/C...
The article is very vague about which emulator and COMMAND.COM it is about, and if they're integrated with each other. Can't be DOSBox, since it handles it correctly:
C:\> echo AB> foo.txt
C:\> echo CD>> foo.txt
C:\> type foo.txt
AB
CD
(Note that echo adds a newline, same as on real DOS, or even UNIX without "-n". This other shell doesn't for some reason.)The "real" COMMAND.COM, and all other essential parts of MS-/PC-/DR-DOS, have always been written in asm, where none of this libc nonsense matters.
Also it annoys me greatly when people talk about "the C Library" as if it exists in some Platonic realm, and is essential to all software ever written.
The article is about compiling and running a program inside the emulator. When the unexpected behavior occurred, the author assumed it was a bug in the emulator.
So if it's not a bug in the emulator, then it's a bug in COMMAND.COM? I don't think that's the case, surely it couldn't have been missed by Microsoft at the time. The article goes on to talk about fread/fwrite calls, but COMMAND.COM was written in assembly, I'm pretty sure it didn't link to any libc, and certainly not to Open Watcom -- why would MS use it instead of their own library?
It is not a bug. The article explains that this is the expected behaviour.
What is expected behavior? Surely `echo AB> foo.txt; echo CD>> foo.txt` producing `ABBC` is either a bug in COMMAND.COM, the emulator, or something else? That can't be correct.
There's a lot of weird missing details.