Comments Page - Whence '\n'?

« Back Whence '\n'?rodarmor.comSubmitted by lukastyrychtr a day ago

ynfnehf 34 minutes ago
First place I read about this idea (specifically newlines, not in general trusting trust) was day 42 in https://www.sigbus.info/how-i-wrote-a-self-hosting-c-compile...
"For example, my compiler interprets "\n" (a sequence of backslash and character "n") in a string literal as "\n" (a newline character in this case). If you think about this, you would find this a little bit weird, because it does not have information as to the actual ASCII character code for "\n". The information about the character code is not present in the source code but passed on from a compiler compiling the compiler. Newline characters of my compiler can be traced back to GCC which compiled mine."
nasso_dev 2 hours ago
> This post was inspired by another post about exactly the same thing. I couldn't find it when I looked for it, so I wrote this. All credit to the original author for noticing how interesting this rabbit hole is.
I think the author may be thinking of Ken Thompson's Turing Award lecture "Reflections on Trusting Trust".
- Karellen 2 hours ago
  Although that presentation does point out that the technique is more generally used in quines. Given that there is a fair amount of research, papers and commentary on quines, it's possible that the author may have read something along those lines.
  https://en.wikipedia.org/wiki/Quine_(computing)
happytoexplain 18 minutes ago
This is over my head. Why did we need to take a trip to discover why \n is encoded as a byte with the value 10? Isn't that expected? The author and HN comments don't say, so I feel stupid.
- kibwen 13 minutes ago
  The point is to ask "who" encoded that byte as the value of 10. If you're writing a parser and you parse a newline as the escape sequence `\n`, then where did the value 10 come from? If you instead parse a newline as the integer literal `10`, then where does the actual binary value 1010 come from?
  The ultimate point of this exercise is to alter your perception of what a compiler is (in the same way as the famous Reflections On Trusting Trust presentation).
  Which is to say: your compiler is not something that outputs your program; your compiler is also input to your program. And as a program itself, your compiler's compiler was an input to your compiler, which makes it transitively an input to your program, and the same is true of your compiler's compiler's compiler, and your compiler's compiler's compiler's compiler, and your compiler's compiler's compiler's compiler's compiler, and...
ncruces an hour ago
I'm guessing the “other post” that inspired this might be: https://research.swtch.com/nih
- dang an hour ago
  Discussed here:
  Running the "Reflections on Trusting Trust" Compiler - https://news.ycombinator.com/item?id=38020792 - Oct 2023 (67 comments)
tzot an hour ago
I always thought, maybe because of C, that \0??? is an octal escape; so in my mind \012 is \x0a or 0x0a, and \010 is 0x08.
So I find this quite confusing; maybe OCaml does not have octal escapes but decimal ones, and \09 is the Tab character. I haven't checked.
- dpassens an hour ago
  It is indeed a decimal escape: https://ocaml.org/manual/5.2/lex.html#char-literal
atoav an hour ago
One rule of programming I figured out pretty quick is: if there are two ways of doing it and there is a 50/50 chance of one being correct and the other one isn't, chances are you will get it wrong the first time.
- chgs an hour ago
  The USB rule.
  First time is the wrong way up
  Second time is also the wrong way up
  Third time works
  jancsika 42 minutes ago
  It's like the Two General's Problem embedded in a single connector.
  You never really know it's right until you take it out and test the friction against the other orientation.
  fader an hour ago
  It's because of the quantum properties of USB connectors. They have spin 1/2.
  SAI_Peregrinus an hour ago
  I thought it was because USB connectors occupy 4 spatial dimensions.
  PaulDavisThe1st 39 minutes ago
  That's good, because otherwise we'd never be able to find them when we need them.
  dailykoder 20 minutes ago
  It's actually super easy and, atleast for me, was always intuitive. Most USB cables have their logo or something else engraved on the "top" with the air gap. And since the ports are mostly arranged the same way, there is rarely any problem. Maybe I am just too dumb to understand jokes, but it always confused me :(
  dtgriscom 39 minutes ago
  I boosted my USB plugged-in-successfuly-on-first-try rate when I imagined the offset block in the cable male USB connector as being heavy, so it should be below the centerline when plugged into a laptop's female USB connector. (Only works when the connector is horizontal, but better than nothing.)
dist-epoch 2 hours ago
I remember a similar article for some C compiler, and it turned out the only place the value 0x10 appeared was in the compiler binary, because in the source code it had something like "\\n" -> "\n"
kijin 2 hours ago
The incorrect capitalization made me think that, perhaps, there's a scarcely known escape sequence \N that is different from \n. Maybe it matches any character that isn't a newline? Nope, just small caps in the original article.
- cpach an hour ago
  If you do view source, it’s actually \n, but it’s not displayed as such because of this CSS rule:
  .title { font-variant: small-caps; }
  sedatk an hour ago
  So, the HN title is wrong.
  isatty an hour ago
  The original title is.
  deathanatos 20 minutes ago
  In addition to what others have said about smallcaps being a stylistic rendering, if you copy & paste the original title, you'll get
  Whence '\n'?
  niederman an hour ago
  No, the original title is correct, small caps are just an alternate way of setting lowercase letters.
- deathanatos 18 minutes ago
  Python has a \N escape sequence. It inserts a Unicode character by name. For example,
  '\N{PILE OF POO}'
  is the Unicode string containing a single USV, the pile of poop emoji.
  Much more self-documenting than doing it with a hex sequence with \u or \U.
- paulddraper 2 hours ago
  There is actually.
  Many systems use \N in CSVs or similar as NULL, to distinguish from an empty string.
  I figured this is what the article was about?
archmaster 2 hours ago
if only this went into where the ocaml escape came from :)
- diath an hour ago
  It does, it links to this: https://github.com/ocaml/ocaml/blob/4d6ecfb5cf4a5da814784dee...
  fiddlerwoaroof 43 minutes ago
  But this doesn’t really explain anything: ‘\010’ isn’t really any more primitive than ‘\x0a’: they’re just different representations of the same bit sequence
  fluoridation 29 minutes ago
  But it is more primitive than '\n', and can be rendered into binary without any further arbitrary conversion steps (arbitrary in that there's nothing in '\n' that says it should mean 10). It's just "transform the number after the backslash into the byte with that value".
gjvc an hour ago
this is a nothingburger of an article
coolio1232 an hour ago
I thought this was going to be about '\N' but there's only '\n' here.
- dang an hour ago
  It's in the html doc title but the article doesn't deliver.
cpach 2 hours ago
Previous discussion: https://news.ycombinator.com/item?id=41564527