    > various protocols (HTTP, SMTP, CSV) still "require" CRLF at the end of each line

    What would be the benefit to updating legacy protocols to just use NL? You save a handful of bits at the expense of a lot of potential bugs. HTTP/1(.1) is mostly replaced by HTTP/2 and later by now anyway.

    Sure, it makes sense not to require CRLF with any new protocols, but it doesn't seem worth updating legacy things.

    > Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply.

    I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

      Exactly. Please DO NOT mess with protocols, especially legacy critical protocols based on in-band signaling.

      HTTP/1.1 was regrettably but irreversibly designed with security-critical parser alignment requirements. If two implementations disagree on whether `A:B\nC:D` contains a value for C, you can build a request smuggling gadget, leading to significant attacks. We live in a post-Postel world, only ever generate and accept CRLF in protocols that specify it, however legacy and nonsensical it might be.

      (I am a massive, massive SQLite fan, but this is giving me pause about using other software by the same author, at least when networks are involved.)

        HTTP is saved here because headers aren't allowed to contain control characters. A server that is strict enough to only recognize CRLF will hopefully also be strict enough to reject requests that contain invalid characters.

        The situation is different with SMTP, see https://www.postfix.org/smtp-smuggling.html

          Took me a second to get what was going on here, but basically the idea is that you middleware might not see `C:D`, but then your application _does_ see `C:D`.

          And given your application might assume your middleware does some form of access control (for example, `X-ActualUserForReal` being treated as an internal-only header), you could get around some access control stuff.

          Not a bytes-alignment thing but a "header values disagreement" thing.

          This is an issue if one part of your stack parses headers differently than another in general though, not limited to newlines.

            What a weird reaction. Microsoft’s use of CRLF is an archaic pain in the ass. Taking a position that it should be deprecated isn’t radical or irresponsible — Microsoft makes gratuitous changes to things all of the time, why not this one?

            Hipp is probably one of the better engineering leaders out there. His point of view carries weight because of who he is, but should be evaluated on its merits. If Microsoft got rid of this crap 30 years ago, when it was equally obsolete, we wouldn’t be having this conversation; if nobody does, our grandchildren will.

              No one is talking about Microsoft and whatever it does on its platform, the parent comment is about network protocols (HTTP, SMTP and so on..).

              I understand that it is tempting to blame Microsoft for \r\n proliferation, but it does not seem to be the case - the \r\n is comes from the era of teletypes and physical VT terminals. You can still see the original "NL" in action (move down only, do not go back to start of line) on any Unix system by typing "(stty raw; ls)" in a throw-away terminal.

              This would be more persuasive if HTTP servers didn't already widely accept bare 0ah line termination. What's the first major public web site you can find that doesn't?

                Going down a list of top websites, these URLs respond with HTTP 200 (possibly after redirections) when sent an ordinary HTTP/1.1 GET request with 0D0A line endings, but respond with HTTP 400 when sent the exact same request with 0A line endings:

                  https://br.pinterest.com/ https://www.pinterest.co.uk/
                  https://apps.apple.com/ https://support.apple.com/ https://podcasts.apple.com/ https://music.apple.com/ https://geo.itunes.apple.com/
                  https://ncbi.nlm.nih.gov/ https://www.salesforce.com/ https://www.purdue.edu/ https://www.playstation.com/
                  https://llvm.org/ https://www.iana.org/ https://www.gnu.org/ https://epa.gov/ https://justice.gov/
                  https://www.brendangregg.com/ http://heise.de/ https://www.post.ch/ http://hhs.gov/ https://oreilly.com/
                  https://www.thinkgeek.com/ https://www.constantcontact.com/ https://sciencemag.org/ https://nps.gov/
                  https://www.cs.mun.ca/ https://www.wipo.int/ https://www.unicode.org/ https://economictimes.indiatimes.com/
                  https://science.org/ https://icann.org/ https://caniuse.com/ https://w3techs.com/ https://chrisharrison.net/
                  https://www.universal-music.co.jp/ https://digiland.libero.it/ https://webaim.org/ https://webmd.com/
                This URL responds with HTTP 505 on an 0A request:

                These URLs don't respond on an 0A request:

                Most of these seem pretty major to me. There are other sites that are public but responded with an HTTP 403, probably because they didn't like the VPN or HTTP client I used for this test. (Also, www.apple.com is tolerant of 0A line endings, even though its other subdomains aren't, which is weird.)
                  You sure about this? www.pinterest.com, for instance, does not appear to care whether I 0d0a or just 0a.

                    My apologies, I was using a client which kept the connection alive between the 0D0A and 0A requests, which has an effect on www.pinterest.com. Rerunning the test with separate connections for 0D0A and 0A requests, www.pinterest.com and phys.org are no longer affected (I've removed the two from the list), but all other URLs are still affected.

                      I picked one at random --- hhs.gov --- and it too appears to work?

                      For what it's worth: I'm testing by piping the bytes for a bare-newline HTTP request directly into netcat.

                  As the parent mentioned, it's security critical that every HTTP parser in the world - including every middleware, proxy, firewall, WAF - parses the headers in the same way. If you write a HTTP parser for a server application it's imperative you don't introduce random inconsistences with the standard (I can't believe I have to write this).

                  On the other hand, as a client, it's OK to send malformed requests, as long as you're prepared that they may fail. But it's a weird flex, legacy protocols have many warts, why die on this particular hill.

                    That appears to be an argument in favor of accepting bare-0ah, since as a positive statement that is the situation on the Internet today.

                      Wouldn't the safest thing, security-wise, to fail fast on bare 0ah?

                      As a web server, you may not know which intermediate proxies did the request traverse before arriving to your port. Given that request smuggling is a thing, failing fast with no further parsing on any protocol deviations seems to be the most secure thing.

                        I mean the safest thing would be to send an RST as soon as you see a SYN for 80/tcp.

                          Wouldn't not replying at all be the safest?

                    Hrm, this is what I get for logging in to HN from my phone. It’s possible I am confusing this with one of the other exploitable HTTP/1.1 header parser alignment issues.

                    Maybe this was so widespread that ~everything already handles it because non-malicious stuff breaks if you don’t. In that case, my bad, but I still would like to make a general plea as an implementer for sticking strictly to specified behavior in this sort of protocols.

                      Gunicorn expects `\r\n` for lines (see gunicorn/http/message.py:read_line), though it's possible that every middleware that is in front of gunicorn in practice normalizes lines to avoid this issue.

                        Yep, tested it locally, you're right; gotta CRLF to gunicorn.

                        We're talking about servers and clients here. The best way to ensure things work is to adhere to an established protocol. Aside from saving a few bytes, there doesn't seem to be any good reason to deviate.

                          I'm saying the consistency that Filippo says our security depends on doesn't really seem to exist in the world, which hurts the persuasiveness of that particular argument in favor of consistency.

                            But no one expects 0ah to be sufficient. Change that expectation, and now you have to wonder if your middleware and your backend agree on whether the middleware filtered out internal-only headers.

                              Yeah, I'm not certain that this is a real issue. It might be? Certainly, I'm read in to things like TECL desync. I get the concern, that any disagreement in parsing policies is problematic for HTTP because of middleboxes. But I think the ship may have sailed on 0ah, and that it may be the case that you simply have to build HTTP systems to be bare-0ah-tolerant if you want your system to be resilient.

                                But what's bare-0ah-tolerant? Accepting _or_ ignoring bare 0ah's means you need to ensure all your moving parts agree, or you end up in the "one bit thinks this is two headers, others think it's one header".

                                The only situation where you don't need to know two policies match is when one of the policies rejects one of the combinations outright. Probably. Maybe.

                                EDIT: maybe it's better phrased as "all parts need to be bare-0ah-strict". But then it's fine if it's bare-0ah-reject; they just need to all be strict, one way or the other.

                            Well, you can achieve the desired behavior in all situations by ignoring CR and treating any seen LF as NL.

                            I just don’t see why you’d not want to do that as the implementer. If there’s some way to exploit that behavior I can’t see it.

                              There is very good reasons not to deviate as mismatch in various other things that can or are not on the path can affect things. Like reverse proxies, load balancers and so on.

                            > massive SQLite fan, but this is giving me pause about using other software by the same author

                            Even if I wanted to contribute code to SQLite, I can't. I acknowledge the fact God doesn't exist, so he doesn't want my contributions :P

                              He does not want your code anyway, sqlite is public domain. this has several implications. One of which is the author wants nothing from you. Note that public domain is fundamentally different than the usual method of releasing code, which is to issue a license to distribute a copyright protected work. Putting a thing into the public domain is to renounce any ownership over the thing.

                              I think that the proper spirit of the thing is that if you have patches to sqlite is to just maintain them yourself. if you are especially benevolent you will put the patches in the public domain as well. and if they are any good perhaps the original author will want them.

                              In fact the public domain is so weird, some countries have no legal understanding of it. originally the concept was just the stance of the US federal government that because the works of the government were for the people, these works were not protected by copyright, and could be thought of as collectively owned by the people, or in the public domain. Some countries don't recognize this. everything has to be owned by someone. and sqlite was legally unable to be distributed in these countries, it would default to copyright with no license.

                              I wouldn't be too worried and making personal judgements, he says the same thing you are (though I assume you disagree)

                              > I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

                              It’s worse than satire. Postel’s Law is definitively wrong, at least in the context of network protocols, and delimiters, especially, MUST be precise. See, for example:


                              Send exactly what the spec requires, and parse exactly as the spec requires. Do not accept garbage. And LF, where CRLF is specified, is garbage.

                                If two systems agree, independent of any specification someone somewhere else wrote, to accept a bare NL where a CRLF is specified, that is not "garbage". Standards documents are not laws; the horse drags the cart.

                                  That's just two systems that happen to agree on garbage.

                                    > Standards documents are not laws; the horse drags the cart.

                                    They can be: c.f. legally-enforced safety-regulations.

                                      These aren't.

                                      Laws are also just some ink on paper (and are routinely overruled, circumvented or unenforced in certain jurisdictions), so using this kind of logic in order to encourage standard violations is unsound.

                                      There is a method to this madness, and that's revising the standards.

                                        What's a "standard violation"? The original history of the IETF is a rejection of exactly this mode of thinking about the inviolability of standards, which was the ethos of the OSI.

                                          Elephant in the room is the trillions of actual servers and user agents that would need to be tested and patched if you retroactively change a standard. Luckily there are some digits after HTTP that allow the concept of new versions of the standard.

                                            When an implementation is noncomformant to a standard in question.

                                              IETF standards are tools to help developers get stuff done on the Internet. They are not the only tool, and they don't carry any moral force.

                                                Apart from colloquially considering standards not-necessarily-normative being, in my opinion, nonsensical (see below), to the best of my knowledge at the very least the STD subseries of IETF standards documents are normative in nature: https://datatracker.ietf.org/doc/std

                                                > They are not the only tool, and they don't carry any moral force.

                                                To be even more exact, I do not know of any standards bodies who would publish what they and the world consider as standards, that would be entirely, or at least primarily, informational rather than normative in nature. Like, do I know the word "standard" incorrectly? What even is a point of a standard, if it doesn't aim to control?

                                                  Ok, but just to be clear: the standards-track HTTP RFC says you can use a single LF. I don't think this issue is as clear as people seem to want it to be.

                                                    Can you provide a citation for this? I’ve read older RFCs that "recommend" recipients allow single LFs to terminate headers for robustness. I’ve also read newer RFCs that weaken that recommendation and merely say the recipient "MAY" allow single LFs. I’ve never noticed an HTTP RFC say you can send headers without the full CRLF sequence, but maybe I missed something.

                                                    https://datatracker.ietf.org/doc/html/rfc2616#section-19.3 https://datatracker.ietf.org/doc/html/rfc9112#section-2.2

                                                      Sure. HTTP/1.1 isn't the only network protocol, though, IETF standardization or otherwise.

                                                      For SMTP (which this subthread started with):

                                                         In addition, the appearance of "bare" "CR" or "LF" characters in text
                                                         (i.e., either without the other) has a long history of causing
                                                         problems in mail implementations and applications that use the mail
                                                         system as a tool.  SMTP client implementations MUST NOT transmit
                                                         these characters except when they are intended as line terminators
                                                         and then MUST, as indicated above, transmit them only as a <CRLF>
                                                        Ah, this is a subthread about HTTP specifically - didn't notice. Explains why you focused on the IETF too. Nevertheless, my points I believe still all stand.

                                                        As for HTTP or any other protocols' definitions go, I'd rather not join in on that back and forth. I'd imagine it's well defined what's expected. Skim reading RFC-2616 now certainly suggests so.

                                                          none of this is as clear as anyone wants it to be. if standards _could_ be completely formally described, it would be an entirely different world. I did quite a bit of work implementing draft standards in the IETF, and and the end of the day the standard is the best we can make it, but for non-trivial things good luck actually implementing it without something to test against or a reference implementation.

                                                          thats the context in which Postel's law make absolute sense. not that you should forgo any sanity checking, or attempt to interpret garbage or make up frame boundaries. but when there is a potential ambiguity, and you can safely tolerate it, then its really helpful for you to do so.

                                                      I've implemented a lot of protocols. Most implementations I've come across for most protocols not strictly standards conformant, for many reasons.

                                                      Big ones being:

                                                      * The standards are often not detailed enough, or contain enough loose verbage that there are many ways to understand how to implement some part, yet those ways are not interoperable.

                                                      * Many protocols allow vendor specifications in such a way that 2 implementations that are 100% compliant won't interoperate.

                                                      * Many protocol implementations are interoperable quite well, converging on behavior that isn't specified in any standard (often to the surprise of people who haven't read the relevant standards)

                                                      At least this is my experience for ietf rfc standards.

                                                        I'm aware of these factors, wasn't trying to suggest that the practice doesn't differ from the theory. What I was more going for was to highlight that the goal should be to primarily try and have these eventually converge, preferably sooner than later, not trying to strongarm the practice side and wait for the standards body in question to wake up one day and decide to amend the standard. That might give the impression of suddenness, but the core issue remains unsolved that way.

                                                        Usually when there's a high disparity between the "de jure" and the "de facto", it's due to a discrepancy in the interests and the leverage, resulting in a breakdown in communication and cooperation. Laying into either then is a bandaid attempt, not a solution. It's how either standard sprawl starts, or how standards bodies lose relevance.

                                              > I'm hoping this is satire.

                                              Me too. It's one thing to accept single LFs in protocols that expect CRLF, but sending single LFs is a bridge to far in my opinion. I'm really surprised most of the other replies to your comment currently seem to unironically support not complying with well-established protocol specifications under the misguided notion that it will somehow make things "simpler" or "easier" for developers.

                                              I work on Kestrel which is an HTTP server for ASP.NET Core. Kestrel didn't support LF without a CR in HTTP/1.1 request headers until .NET 7 [1]. Thankfully, I'm unaware of any widely used HTTP client that even supports sending HTTP/1.1 requests without CRLF header endings, but we did eventually get reports of custom clients that used only LFs to terminate headers.

                                              I admit that we should have recognized a single LF as a line terminator instead of just CRLF from the beginning like the spec suggests, but people using just LF instead of CRLF in their custom clients certainly did not make things any simpler or easier for me as an HTTP server developer. Initially, we wanted to be as strict as possible when parsing request headers to avoid possible HTTP request smuggling attacks. I don't think allowing LF termination really allows for smuggling, but it is something we had to consider.

                                              I do not support even adding the option to terminate HTTP/1.1 request/response headers with single LFs in HttpClient/Kestrel. That's just asking for problems because it's so uncommon. There are clients and servers out there that will reject headers with single LFs while they all support CRLF. And if HTTP/1.1 is still being used in 2050 (which seems like a safe bet), I guarantee most clients and servers will still use CRLF header endings. Having multiple ways to represent the exact same thing does not make a protocol simpler or easier.

                                              [1]: https://github.com/dotnet/aspnetcore/pull/43202

                                                LF only? Huh.

                                                In its original terms for printing terminals, carriage return might be ambiguous. It could means either "just send the print head to column zero" or "print head to 0 and advance the line by one". The latter is what typewriters do for the Return key.

                                                But LF always meant Line Feed, moving the paper but not the print head.

                                                These are of course wildly out of date concepts. But it still strikes me as odd to see a Line Feed as a context reset.

                                                  >The latter is what typewriters do for the Return key.

                                                  Minor correction: mechanical typewriters do not have a Return key, but they have both operations (line feed, as well as carriage return).

                                                  The carriage return lever is typically rigged to also do line feed at the same time, by a preset amount of lines (which can be set to 0), or you can push the carriage without engaging line feed.

                                                  Technically, the lever would do LF, and pushing on it further would do CR (tensioning the carriage spring).

                                                  It is, however, true that most of the time, the users would simply push the lever until it stops without thinking about it, producing CRLF operation —

                                                  — and that CR without LF was comparatively rare.

                                                  From a pure protocol UX perspective, it would make sense IMO to have a single command for (CR + LF) too, just like the typewriter effectively does it (push the lever here to do both at once).

                                                  It seems weird that the protocol is more limited than the mechanical device that it drives, but then again, designers probably weren't involved in deciding on terminal protocol specs.

                                                >> I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

                                                It's not satire and it's not just trying to make a point. It's trying to make things simpler. As he says, a lot of software will accept input without the CR already, even if it's supposed to be there. But we should change the standard over time so people in 2050 can stop writing code that's more complicated (by needing to eat CR) or inserts extra characters. And never mind the 2050 part, just do it today.

                                                  Ignoring established protocols doesn't make things simpler. It makes things vastly more complicated.

                                                  Let's absolutely fix new protocols (or new versions of existing protocols). But intentionally breaking existing protocols doesn't simplify anything.

                                                    Yes. We all know how to do this. You know that API version thingy. I agree to drop the carriage return when not needed but do it in future protocols.

                                                    Obviously IPv6 shows you need to be patient. Your great grandkids may see a useless carriage return!

                                                    Windows doesn't help here.

                                                      Versioning provides people with capability for change management, but won't perform it on their behalf. Who knew.

                                                    It seems to me the author is not suggesting to update the protocols themselves but rather to stop sending them CR even if the spec requires it. And to patch the corresponding software to it accepts simple newlines.

                                                      > Why intentionally introduce potential bugs for the sake of making a point?

                                                      It seems spiteful, but it strikes me as an interesting illustration of how the robustness principle could be hacked to force change. It’s a descriptivist versus prescriptivist view of standards, which is not how we typically view standards.

                                                        FYI, Sendmail accepts LF without CR, but Exchange doesn't.

                                                          …how very in character for each of them!

                                                          At least for CSV, there's a divergence between usage in practice and the spec. The spec requires CRLF, but all of the commonly used tools I've encountered for reading and writing CSVs can read files with CR, LF, or CRLF line endings, and when writing CSVs they'll default to either LF or platform-specific line endings. (Even Excel for Mac doesn't default to CRLF!) I think this divergence is bad and should be fixed.

                                                          But IMO the right resolution is to update the spec so that (1) readers MUST accept any of (CR, LF, CRLF), (2) writers MUST use one of (CR, LF, CRLF), and (3) writers SHOULD use LF. Removing compatibility from existing applications to break legacy code would be asinine.

                                                            > What would be the benefit

                                                            Easy - being able to use a plain text protocol as a human being without having to worry if my terminal sends the right end of line terminator. Using netcat to debug SMTP issues is actually something I do often enough.

                                                              >What would be the benefit...

                                                              It is interesting that you ignore the benefits the OP describes and instead present a vague and fearful characterization of the costs. Your reaction lies at the heart of cargo-culting, the maintenance of previous decisions out of sheer dread. One can do a cost-benefit analysis and decide what to do, or you can let your emotions decide. I suggest that the world is better off with the former approach. To wit, the OP notes for benefits " The extra CR serves no useful purpose. It is just a needless complication, a vexation to programmers, and a waste of bandwidth." and a mitigation of the costs "You need to search really, really hard to find a device or application that actually interprets U+000a as a true linefeed." You ignore both the benefits assertion and cost mitigating assertion entirely, which is strong evidence for your emotionality.

                                                                What's your estimate for the cost of changing legacy protocols that use CRLF vs. the work that will be done to support those?

                                                                My intuition (not emotion) agrees with the parent that investing in changing legacy code that works, and doesn't see a lot of churn, is likely a lot more expensive than leaving it be and focusing on new protocols that over time end up replacing the old protocols anyways.

                                                                OP does not really talk about the benefit, he just opines. How many programmers are vexed when implementing "HTTP, SMTP, CSV, FTP"? I'd argue not many programmers work on implementations of these protocols today. How much traffic is wasted by a few extra characters in these protocols? I'd argue almost nothing. Most of the bits are (binary, compressed) payload anyways. There is no analysis by OP of the cost of not complying with the standard which potentially results in breakage and the difficulty of being able to accurately estimate the breakage/blast radius of that lack of compliance. That just makes software less reliable and less predictable.

                                                                  The cost is, if people start transitioning to a world where senders only transmit LF in opposition to current standards for protocols like HTTP/1.1 or SMTP (especially aggressively, e.g., by creating popular HTTP libraries without a CRLF option), then it will create the mental and procedural overhead of tracking which receivers accept LF alone vs. which still require CRLF. Switching established protocols is never free, even when there are definite benefits: see the Python 2-to-3 fiasco, caused by newer programs being incompatible with most older libraries.

                                                                    > you ignore the benefits the OP describes

                                                                    Funnily enough, the author doesn't actually describe any tangible benefits. It's all just (in my reading, semi-sarcastic) platonics:

                                                                    - peace

                                                                    - simplicity

                                                                    - the flourishing of humanity

                                                                    ... so instead of "vague and fearful", the author comes on with a "vague and cheerful". Yay? The whole shtick about saving bandwidth, lessening complications, and reducing programmer vexations are only ever implied by the author, and were explicitly considered by the person you were replying to:

                                                                    > You save a handful of bits at the expense of a lot of potential bugs.

                                                                    ... they just happened to be not super convinced.

                                                                    Is this the kind of HackerNews comment I'm supposed to feel impressed by? That demonstrates this forum being so much better than others?

                                                                      You're right that I didn't mention the supposed benefits in my response. But let's incorporate those benefits into new protocols rather than break existing protocols. I just don't see the benefit in intentionally breaking existing protocols.

                                                                      Thinking about it. Using CR alone in protocols actually make infinitely more sense. As that would allow use of LF in records. Which would make many use cases much simpler.

                                                                      Just think about text protocols like HTTP, how much easier something like cookies would be to parse if you had CR as terminating character. And then each record separated by LF.

                                                                        ASCII already has designated bytes for unit, group, and record separators. That aside, a big drawback of using unprintable bytes like these is they're more difficult for humans to read in dumps or type on a keyboard than a newline (provided newline has a strict definition CRLF, LF, etc)

                                                                          That is so backwards incompatible that it is never, ever going to fly.

                                                                        Ha, ha, ha! I love it. I believe the author is serious, and I think he's on to something.

                                                                        OP clearly says that most things in fact don't break if you just don't comply with the CRLF requirement in the standard and send only LF. (He calls LF "newline". OK, fine, his reasoning seems legit.) He is not advocating changing the language of the standard.

                                                                        To all those people complaining that this is a minor matter and the wrong hill to die on, I say this: most programmers today are blindly depending on third-party libraries that are full of these kinds of workarounds for ancient, weird vestigial crud, so they might think this is an inconsequential thing. But if you're from the school of pure, simple code like the SQLite/Fossil/TCL developers, then you're writing the whole stack from scratch, and these things become very, very important.

                                                                        Let me ask you instead: why do you care if somebody doesn't comply with the standard? The author's suggestion doesn't affect you in any way, since you'll just be using some third-party library and won't even know that anything is different.

                                                                        Oh bUT thE sTandArDs.

                                                                          They acted on these words, updating their HTTP server to serve just \n.

                                                                          => https://sqlite.org/althttpd/info/8d917cb10df3ad28 Send bare \n instead of \r\n for all HTTP reply headers.

                                                                          While browser aren't effected, this broke compatibility with at least Zig's HTTP client.

                                                                          => https://github.com/ziglang/zig/issues/21674 zig fetch does not work with sqlite.org

                                                                            > Let's make CRLF one less thing that your grandchildren need to know about or worry about.

                                                                            The struggle is real, the problem is real. Parents, teach your kids to use .gitattribute files[1]. While you're at it, teach them to hate byte order marks[2].

                                                                            1: https://stackoverflow.com/questions/73086622/is-a-gitattribu...

                                                                            2: https://blog.djhaskin.com/blog/byte-order-marks-must-diemd/

                                                                              Nope. Git should not mess with line endings, the remote repository not matching the code in your local clone can bite you when you least expect it. On Windows, one should disable the autocrlf misfeature (git config --global core.autocrlf false) and configure their text editor to default to LF.

                                                                              • layer8 19 minutes ago

                                                                                This is impractical in many situations, because tools that process build-source files (for example XML files that control the build, or generated source files) inherently generate CRLF on Windows. These are many, many, many tools, not just one’s text editor.

                                                                                The correct solution is to use .gitattributes.

                                                                                  3000% agree. I have been bitten endlessly by autocrlf. It is absolutely insane to me that anyone ever considered your having your SOURCE CONTROL tool get/set different content than what's in the repo

                                                                                • nsnshsuejeb 4 hours ago

                                                                                  The letters after the dot in my filename don't map 1 to 1 with the file format.

                                                                                  Counterpoint: Unix deciding on a non-standard line ending was always a mistake. It has produced decades of random incompatibility for no particular benefit. CRLF isn’t a convention: it’s two different pieces of the base terminal API. You have no idea how many programs rely on CR and LF working correctly.

                                                                                    It is a standard line ending. ANSI X3.4-1968 says:

                                                                                    10 LF (Line Feed). A format effector that advances the active position to the same character position on the next line. (Also applicable to display devices.) Where appropriate, this character may have the meaning “New Line” (NL), a format effector that advances the active position to the first character position on the next line. Use of the NL convention requires agreement between sender and recipient of data.

                                                                                    ASCII 1968 - https://www.rfc-editor.org/info/rfc20

                                                                                    ASCII 1977 - https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub1-2-197...

                                                                                      The first sentence is exactly what LF is in CRLF, and implies the necessity of CR. CR returns the cursor to the first character of the active line, LF moves it one line down without changing the horizontal position.

                                                                                      The second sentence is the UNIX interpretation of LF doing the equivalent of CRLF. But calling it a standard line ending when it's an alternative meaning defined in the standard as "requires agreement between sender and recipient of data" is a bit of a stretch. It's permissible by the standard, but it's not the default as per the standard

                                                                                      Yeah. It's weird how Unix picked LF given its love of terminals. CRLF is the semantically correct line ending considering terminal semantics. It's present in the terminal subsystem to this day, people just don't notice because they have OPOST output post processing enabled which automatically converts LF into CRLF.

                                                                                        I have always felt that somehow Linux and proponents of it default to every decision it made being right and everything else, namely Windows, being wrong. I honestly feel Linux is orders of magnitude more complex. It is much easier, in my experience to make software just work on Windows. (This is not to say Windows doesn't have bad decisions. It has many. All the major OSs are terrible.)

                                                                                      • perching_aix 5 hours ago

                                                                                        Well, at least the title is honest. Straight up asking people to break standards out of sheer conviction is a new one for me personally, but it's definitely one of the attitudes of all time, so maybe it's just me being green.

                                                                                        Can we ask for the typical *nix text editors to disobey the POSIX standard of a text file next, so that I don't need to use hex editing to get trailing newlines off the end of files?

                                                                                          People don't seem to mind when Chrome does it [0]. The response "standards aren't a death pact" stands out in particular.

                                                                                          [0] https://news.ycombinator.com/item?id=13860682

                                                                                            Death pact? Jeez. Standards simply prevent people from having to waste time debugging dumb issues that rightfully could have been avoided.

                                                                                              Might be just my personal impression, but I'm pretty sure Chrome is extremely notorious for abusing its market leader position, including in this way. So gonna have to disagree there, from my view people do mind Chrome and its implementation particularities quite a lot.

                                                                                                I think the parent is equally denigrating the situation.

                                                                                                Leaders choose the standards, especially as they approach monopoly.

                                                                                                Worse still: people will come out of the woodwork to actively defend the monopolist de facto standard producer.

                                                                                                  Not defending the producer, just making pragmatic choices!

                                                                                              Why would you want that?

                                                                                              All Unix text processing tools assume that every line in a text file ends in a newline. Otherwise, it's not a text file.

                                                                                              There's no such thing as a "trailing newline," there is only a line-terminating newline.

                                                                                              I've yet to hear a convincing argument why the last line should be an exception to that extremely long-standing and well understood convention.

                                                                                                Yeah, I have no idea what the author is smoking. Deliberately breaking standards is simply not an acceptable solution to the problem, even if it were a serious problem (it's not).

                                                                                                  If there truly is a problem with existing protocols, propose and properly design new one that can replace it. Then if it is technically superior solution it should win in long run.

                                                                                                  • nsnshsuejeb 3 hours ago

                                                                                                    No need. Just convince the king (e.g. Google for HTTP) to make a tweak in the next standard version.

                                                                                                  What's wrong with trailing newlines?

                                                                                                    Other than select software being pissy about it, not much. Just like how there's nothing wrong with CRLF, except for select software being pissy about that too.

                                                                                                      I do like concatenating files with cat, and if a file has its final line not ending in newline symbol the result is ugly.

                                                                                                      I know it's just me but my worldview is that the world would be better if all editors had "insert final newline" behavior

                                                                                                        My problem is that what I input (and observe!) doesn't match what's persisted. Worse still, editors lie about it to me until I close the file and reopen it. And just to really turn the knife, various programs will then throw a fit that a character that I did not input and my editor lies about not being present, is present. I hope it's appreciable why I find this frustrating.

                                                                                                        I expect my editor to do what I say, not secretly(!) guess what I might have wanted, or will potentially want sometime in the future. Having to insert a newline while concatenating files is a chore, but a predictable annoyance. Having to hunt for mystery bytes, maybe less so.

                                                                                                          I have been using and programming Unix systems for almost 30 years and have not run into anything like what you are describing.

                                                                                                          What Unix program "throws a fit" when encountering a perfectly normal newline in the last line in a file?

                                                                                                            "Unix programs" I haven't ran into throwing a fit per se. That's why I haven't wrote that.

                                                                                                            What I ran into issues with was contemporary software that's shipped to Linux, such as Neo4j, which expects its license files to have no newline at the end of the file, and will actively refuse to start otherwise.

                                                                                                            I have a feeling I'll now experience the "well that's that software's problem then" part of this debate.

                                                                                                        Yep. Select software being Unix command line tools.

                                                                                                        It makes writing parsers more complicated in certain cases because you can't tell without lookahead if a newline character should be treated as a newline or an eof.

                                                                                                          What? Which crazy non-binary format makes a distinction between CRLF(EOF) and just (EOF)? Apart from a plain text file, that is.

                                                                                                            I won't mention telnet because you don't use it, but in CSV and similar data it is quite a trouble to normalize the data. So instead of 2 possibilities now we 3 to detect.

                                                                                                              I don't get the CSV part. You can emit a new row after a line ending and on EOF with non-empty buffer. What's the tricky part or third option here? The crlf is never a part of the data.

                                                                                                                I have never had any issues with this using a standards compliant CSV parser.

                                                                                                        Of all the stupid and obsolete things in standards we use to interoperate, CRLF is one of the least consequential.

                                                                                                          SMTP <https://datatracker.ietf.org/doc/html/rfc2821#section-4.1.1....> is pretty clear that the message termination sequence is CR LF . CR LF, not LF . LF, and disagreements in this spot are known to cause problems (include undesirable message injection). But then enough alternative implementations that recognize LF . LF as well are out there, so maybe the original SMTP rules do not matter anymore.

                                                                                                          No mention of what happened the last time we mixed and matched line endings? https://smtpsmuggling.com/

                                                                                                          • deltaknight 5 hours ago

                                                                                                            Doesn’t this show that ignoring CR and only processing LFs is a good idea? If I’m understanding right (probably wrong), this vuln relied on some servers using CRLF only as endings, and others supporting both CRLF and LF.

                                                                                                            If every server updated to line-end of LF, thereby supporting both types, this vuln wouldn’t happen?

                                                                                                            Of course if there’s is a mixed bag then I guess this is still possible, if your server only supports CRLF. At least in that scenario you have some control over the issue though.

                                                                                                              Yes, if every server/middleware implemented parsing in the same way this kind of vulnerability wouldn't happen. Same goes for HTTP smuggling and other smuggling attacks.

                                                                                                              Unfortunately, asking more people to ignore the currently estabilished standards makes the problem worse, not better.

                                                                                                            This article seems like it was written to troll people into a flame war. There is no such character as NL, and the article does not at all address that fact that the "ENTER" key on every keyboard sends a CR and not a LF. Things work fine the way they are.

                                                                                                              U+0085 is sometimes called NL (it is the standard in EBCDIC), but more often NEL in the ASCII world.

                                                                                                                > There is no such character as NL ...

                                                                                                                More specifically the Unicode control character U+000a is, in the Unicode standard, named both LF and NL (and that comes from ASCII but in ASCII I think 0x0a was only called LF).

                                                                                                                It literally has both names in Unicode: but LINEFEED is written in uppercase while newline is written in lowercase (not kidding you). You can all see for yourself that U+000a has both names (and eol too):


                                                                                                                > and the article does not at all address that fact that the "ENTER" key on every keyboard sends a CR and not a LF.

                                                                                                                what a key on a keyboard sends doesn't matter though. What matters is what gets written to files / what is sent over the wire.

                                                                                                                    ... $  cat > /tmp/anonymousiam<ENTER>
                                                                                                                    ... $  hexdump /tmp/anonymousiam
                                                                                                                    00000000  000a
                                                                                                                When I hit ENTER at my Linux terminal above, it's LINEFEED that gets written to the file. Under Windows I take it the same still gets CRLF written to the file as in the Microsoft OSes of yore (?).

                                                                                                                > Things work fine the way they are.

                                                                                                                I agree

                                                                                                                As an implementation detail, I assume many programs simply ignore the CR character already? Whilst of course many windows programs (and protocols as mentioned) still require CRLF, surely the most efficient way to make something cross-platform if to simply act on the LF part of CRLF, that way it works for both CRLF and LF line ends.

                                                                                                                The fact that both CRLF and LF used the same control character in my eyes in a huge bonus for this type of action to actually work. Simply make everything cross platform and start ignoring CR completely. I’m surprised this isn’t mentioned explicitly as a course of action in the article, instead it focuses on making people change their understanding of LF in to NL which is as unnecessary complication that will cause inevitable bikeshedding around this idea.

                                                                                                                  >> instead it focuses on making people change their understanding of LF in to NL which is as unnecessary complication that will cause inevitable bikeshedding around this idea.

                                                                                                                  Not really. In order to ignore CR you need to treat LF as NL.

                                                                                                                  • deltaknight 5 hours ago

                                                                                                                    Fair point, although I’d suggest that many programs already treat LF as NL (e.g. unix text files), so this understanding of the meaning of LF already exists in the world. If you’re writing anything generic/cross-platform, you have to be able to treat LF as NL. So there isn’t really a change to be made here.

                                                                                                                  > Stop using "linefeed" as the name for the U+000a code point.

                                                                                                                  stop reinventing terms. it's literally standardized with the name "LF" / "line feed" in Unicode.

                                                                                                                  For extra fun, the original Mac OS used CR by itself to mean newline.

                                                                                                                    Of all the hills to die on. What an unbelivably silly one. CRLF sucks, suck it up. As many others have noted, there are millions of devices this idea puts in jeopardy for absolutely no reason. We should be reducing the exceptions, not creating them

                                                                                                                      > Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply. Send only NL.

                                                                                                                      Insane. First i think it was a April 1st joke, but is not.

                                                                                                                      Let's break everything because YES.

                                                                                                                        Indeed. Very strange to hear a break-the-world suggestion from a person leading a company famous for never breaking the world.

                                                                                                                        I'm kind of confused by this whole post.

                                                                                                                        I do understand the desire for simplification (let's ignore the argument of whether this is one), but...

                                                                                                                        > Nobody ever wants to be in the middle of a line, then move down to the next line and continue writing in the next column from where you left off. No real-world program ever wants to do that.

                                                                                                                        Is this true?

                                                                                                                          No, it's not true.

                                                                                                                          It was used for "graphics" on character-only terminals.

                                                                                                                          • numpad0 2 hours ago

                                                                                                                            isn't CR without LF how CLI progress bars work?

                                                                                                                              He says there are good usages of CR, he only argues for getting rid of LF.

                                                                                                                            I'm not trying to be obtuse but I am actually confused how a modern machine correctly interprets CRLF based on the description in this post.

                                                                                                                            If a modern machine interprets LF as a newline, and the cursor is moved to the left of the current row before the newline is issued, wouldn't that add a newline _before_ the current line, i.e. a newline before the left most character of the current line? Obviously this isn't how it works but I don't understand why not.

                                                                                                                              Line feed is "move the cursor down one line". It's irrelevant what is currently on the line. These are printer/terminal control instructions, not text editing instructions.

                                                                                                                                If you are thinking of it being more like pressing "Home" then "Enter", it would seem that "Enter" actually works more like LFCR ?

                                                                                                                                  Unless your editor is in auto-indent mode. ;)

                                                                                                                                FWIW, I actually find CRLF handy in a database export I work in --- it exports cells with multiple lines by using LF for the linebreaks --- I open it in a text editor, replace all LFs w/ \\ (so as to get a single line for each data record and to cause the linebreaks to happen in LaTeX), and it's ready for further processing.

                                                                                                                                  Nice. I think that's the most energized I've seen Richard Hipp on a topic.

                                                                                                                                    I wrote a command line program to determine/detect the end-of-line format, tabs, bom, and nul characters


                                                                                                                                    Stand-alone binaries are provided for all major platforms.

                                                                                                                                      sqlite is a work of absolute genius. But every once in awhile something comes along to remind us how weird its software background is. Fossil. The build system. The TCL test harness. And now this, a quixotic attempt to break 50+ years of text formatting and network protocols.

                                                                                                                                      Yes CRLF is dumb. No, replacing it is not realistic.

                                                                                                                                        Define "abolish."

                                                                                                                                        We could certainly try to write no new software that uses them.

                                                                                                                                        But last I checked, there are terabytes and terabytes of stored data in various formats (to say nothing of living protocols already deployed) and they aren't gonna stop using CRLF any time soon.

                                                                                                                                          Ridiculous! We need to develop 1 universal standard that covers everyone's use cases. Yeah!

                                                                                                                                            The article had some major gaffes. Teletypes never had a ball. The stationary platen models had type boxes and cylinders, but never balls.

                                                                                                                                              Not sure whether this changes anything about your critique, but note that the IBM 2741 terminal embedded a Selectric typewriter:

                                                                                                                                              > Selectric-based mechanisms were also widely used as terminals for computers, replacing both Teletypes and older typebar-based output devices. One popular example was the IBM 2741 terminal


                                                                                                                                                Well, it says right there, the 2741 replaced Teletypes. It wasn't a Teletype. (Not sure I'd call this a "major gaffe", though!)

                                                                                                                                              XKCD has graphically replied to this topic: https://xkcd.com/927/

                                                                                                                                                Can OP please tell me how to abolsih CR while in Raw Mode? Did he forget about it, or am I just unimaginative?

                                                                                                                                                  Right, you don't need to search that hard for a device which interprets 0xA as a line feed, just set your terminal to raw mode, done.

                                                                                                                                                  But given the very first sentence:

                                                                                                                                                  > CR and NL are both useful control characters.

                                                                                                                                                  I'm willing to conclude that he doesn't intend A Blaste Against The Useless Appendage of Carriage Return Upon a New Line, or Line Feed As Some Style It, to apply to emulators of the old devices which make actual use of the distinction.

                                                                                                                                                  we should leave it for backwards compatibility and adopt U+0085 as the standard next line codepoint. and utf8 libraries could unofficially support every combination of 0A 0D as escape sequences.

                                                                                                                                                    If you'd like to break every system, and nearly every protocol, start abolishing arbitrary line endings that have been used for decades.

                                                                                                                                                    That will make things better.

                                                                                                                                                      Does anyone besides poorly designed Unix tools and Git actually get confused by any of this? I configure my editor to just use LF on whatever OS to appease Linux and configure Git to never mess with them. And in dealing with serial protocols, it's never an issue.

                                                                                                                                                          > Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply. Send only NL.

                                                                                                                                                          Now just go pound sand. Seriously. And you owe me 5 minutes of my life wasted on reading the whole thing.

                                                                                                                                                          My god, I would have thought all those “simplification” ideas die off once you have 3 years of experience or more. Some people won’t learn.

                                                                                                                                                          P. S. Guess even the most brilliant people tend to have dumb ideas sometimes.

                                                                                                                                                            Conversely, I'd argue most brilliant people tend to have more dumb ideas than others, usually on oddly specific topics which most people would find inconsequential.

                                                                                                                                                              It's true. Smart people tend to have a lot of novel ideas, most of which are going to be retarded. Most people just have no ideas.

                                                                                                                                                            "Call to action" my god guy get a grip youre upset about some unicode

                                                                                                                                                              I could not possibly disagree with this more strongly or violently.

                                                                                                                                                              In short - shutup and deal with it. Is it an extremely mild and barely inconvenient nuisance to deal with different or mixed line endings? Yes. Is this actually a hard or difficult problem? No.

                                                                                                                                                              Stop trying to force everyone to break their backs so your life is inconsequentially easier. Deal with it and move on.

                                                                                                                                                                Why do we _have to_ keep bringing this legacy baggage with us for the next decades though?

                                                                                                                                                                Allowing CRFL-less operation intentionally, especially in new implementations. Abusing protocol tolerance is (just a bit) to switch current ones. Should allow relatively gradual progress towards Less Legacy:tm: with basically no cost.

                                                                                                                                                                Not every change is "breaking your back" especially if you should be updating your systems anyways to implement other, larger and more important changes.

                                                                                                                                                                  Because it’s literally fine and a non-issue. Only whiny Linux babies cry about it. It’s trivial for tools to support. Trivial. Like this is easiest, least harmful baggage in the history tech debt baggage.

                                                                                                                                                                  There will always be tech debt. Always and forever. Burn cycles on one that matters.

                                                                                                                                                                    So what's the issue with getting rid of this debt slowly? It costs basically nothing, yet makes it cleaner for those in the future. Debts matter at a larger scale and the long run.

                                                                                                                                                                      They carried the debt. Why shouldn’t everyone else?

                                                                                                                                                                      Regarding this issue…I don’t think the author is advocating for patching standards. Just consider CR as deprecated and use it only for backward compatibility.

                                                                                                                                                                      I do it similarly. I don’t convert line endings but any new project uses LF irrespective of the OS and configured as such in the editor.

                                                                                                                                                                I feel it necessary to have an obligatory 'Would someone think of banking?' before we 'abolish'(however we eventually arrive at defining it )anything.

                                                                                                                                                                I mean it is all cool to have this idea, but real world implications, where half the stuff dangles on a text file, appear to be not considered here.

                                                                                                                                                                For clarity's sake, I am not saying don't do it. I am saying: how will that work?

                                                                                                                                                                edit: spaces, tabs and one crlf

                                                                                                                                                                  Now convince Microsoft. It's really the legacy of DOS that keeps this alive.

                                                                                                                                                                    Even Notepad.exe supports LF only text files now.

                                                                                                                                                                    I think I can offer most reasonable compromise here. Decide upon on new UTF-8 code point. Have the use mandated and ignore and ban all end-points that do not use this code-point instead of CRLF or just LF alone.

                                                                                                                                                                      > Decide upon on new UTF-8 code point.

                                                                                                                                                                      Unicode have already done so - (NEL) https://www.compart.com/en/unicode/U+0085

                                                                                                                                                                        So break everything.

                                                                                                                                                                          Which would need to be encoded in at least two bytes at which point, why not just use CRLF?

                                                                                                                                                                            You mean U+2028 LINE SEPARATOR?

                                                                                                                                                                              Perfect. So now we just need to start filing bug reports to any tool that does not support it instead of CRLF or LF alone.

                                                                                                                                                                                Oh, yet another option - first thought was U+0085 NEXT LINE as above

                                                                                                                                                                              • whizzter 5 hours ago