First place I read about this idea (specifically newlines, not in general trusting trust) was day 42 in https://www.sigbus.info/how-i-wrote-a-self-hosting-c-compile...
"For example, my compiler interprets "\n" (a sequence of backslash and character "n") in a string literal as "\n" (a newline character in this case). If you think about this, you would find this a little bit weird, because it does not have information as to the actual ASCII character code for "\n". The information about the character code is not present in the source code but passed on from a compiler compiling the compiler. Newline characters of my compiler can be traced back to GCC which compiled mine."
> This post was inspired by another post about exactly the same thing. I couldn't find it when I looked for it, so I wrote this. All credit to the original author for noticing how interesting this rabbit hole is.
I think the author may be thinking of Ken Thompson's Turing Award lecture "Reflections on Trusting Trust".
Although that presentation does point out that the technique is more generally used in quines. Given that there is a fair amount of research, papers and commentary on quines, it's possible that the author may have read something along those lines.
This is over my head. Why did we need to take a trip to discover why \n is encoded as a byte with the value 10? Isn't that expected? The author and HN comments don't say, so I feel stupid.
The point is to ask "who" encoded that byte as the value of 10. If you're writing a parser and you parse a newline as the escape sequence `\n`, then where did the value 10 come from? If you instead parse a newline as the integer literal `10`, then where does the actual binary value 1010 come from?
The ultimate point of this exercise is to alter your perception of what a compiler is (in the same way as the famous Reflections On Trusting Trust presentation).
Which is to say: your compiler is not something that outputs your program; your compiler is also input to your program. And as a program itself, your compiler's compiler was an input to your compiler, which makes it transitively an input to your program, and the same is true of your compiler's compiler's compiler, and your compiler's compiler's compiler's compiler, and your compiler's compiler's compiler's compiler's compiler, and...
I'm guessing the “other post” that inspired this might be: https://research.swtch.com/nih
Discussed here:
Running the "Reflections on Trusting Trust" Compiler - https://news.ycombinator.com/item?id=38020792 - Oct 2023 (67 comments)
I always thought, maybe because of C, that \0??? is an octal escape; so in my mind \012 is \x0a or 0x0a, and \010 is 0x08.
So I find this quite confusing; maybe OCaml does not have octal escapes but decimal ones, and \09 is the Tab character. I haven't checked.
It is indeed a decimal escape: https://ocaml.org/manual/5.2/lex.html#char-literal
One rule of programming I figured out pretty quick is: if there are two ways of doing it and there is a 50/50 chance of one being correct and the other one isn't, chances are you will get it wrong the first time.
The USB rule.
First time is the wrong way up
Second time is also the wrong way up
Third time works
It's like the Two General's Problem embedded in a single connector.
You never really know it's right until you take it out and test the friction against the other orientation.
It's because of the quantum properties of USB connectors. They have spin 1/2.
I thought it was because USB connectors occupy 4 spatial dimensions.
That's good, because otherwise we'd never be able to find them when we need them.
It's actually super easy and, atleast for me, was always intuitive. Most USB cables have their logo or something else engraved on the "top" with the air gap. And since the ports are mostly arranged the same way, there is rarely any problem. Maybe I am just too dumb to understand jokes, but it always confused me :(
I boosted my USB plugged-in-successfuly-on-first-try rate when I imagined the offset block in the cable male USB connector as being heavy, so it should be below the centerline when plugged into a laptop's female USB connector. (Only works when the connector is horizontal, but better than nothing.)
I remember a similar article for some C compiler, and it turned out the only place the value 0x10 appeared was in the compiler binary, because in the source code it had something like "\\n" -> "\n"
The incorrect capitalization made me think that, perhaps, there's a scarcely known escape sequence \N that is different from \n. Maybe it matches any character that isn't a newline? Nope, just small caps in the original article.
If you do view source, it’s actually \n, but it’s not displayed as such because of this CSS rule:
.title {
font-variant: small-caps;
}
So, the HN title is wrong.
The original title is.
In addition to what others have said about smallcaps being a stylistic rendering, if you copy & paste the original title, you'll get
Whence '\n'?
No, the original title is correct, small caps are just an alternate way of setting lowercase letters.
Python has a \N escape sequence. It inserts a Unicode character by name. For example,
'\N{PILE OF POO}'
is the Unicode string containing a single USV, the pile of poop emoji.Much more self-documenting than doing it with a hex sequence with \u or \U.
There is actually.
Many systems use \N in CSVs or similar as NULL, to distinguish from an empty string.
I figured this is what the article was about?
if only this went into where the ocaml escape came from :)
It does, it links to this: https://github.com/ocaml/ocaml/blob/4d6ecfb5cf4a5da814784dee...
But this doesn’t really explain anything: ‘\010’ isn’t really any more primitive than ‘\x0a’: they’re just different representations of the same bit sequence
But it is more primitive than '\n', and can be rendered into binary without any further arbitrary conversion steps (arbitrary in that there's nothing in '\n' that says it should mean 10). It's just "transform the number after the backslash into the byte with that value".
this is a nothingburger of an article
I thought this was going to be about '\N' but there's only '\n' here.
It's in the html doc title but the article doesn't deliver.
Previous discussion: https://news.ycombinator.com/item?id=41564527