Comments Page - Literate programming: Knuth is doing it wrong (2014)

« Back Literate programming: Knuth is doing it wrong (2014)akkartik.nameSubmitted by surprisetalk a year ago

fanf2 a year ago
There are at least two aspects to literate programming.
Knuth wrote TeX in Pascal, which has a couple of limitations that annoyed him:
- Pascal has no module system: you can’t break a program up into multiple files
- The order of presentation of a program is dictated by the needs of a single-pass compiler
The tooling that supports literate programming is mostly about overcoming these problems with Pascal. (Knuth also uses it to split up larger procedures in a manner that makes me think of very un-structured C macros.) Most other languages are more flexible so they have much less need for a technology like tangle/weave.
The opposite extreme is Literate Haskell, which doesn’t need any preprocessor. You just feed your TeX document to ghc and it picks the code out of the \begin{code} blocks. Or instead of Knuth style you can write in Bird style, where the file is a plain text document with code marked by > quote blocks.
The other aspect of literate programming is organizing the code like a book. This is very difficult to do well. When I think of how I approach a codebase, it reminds me of textbooks that have a complicated diagram describing multiple suggested orders of reading (a chapter per module?), or the informal introduction to Algol 68 with its orthogonal table of contents. Ideally a program’s commentary should span both the explanatory and reference documentation quadrants.
What strikes me about Knuth’s literate programming is how hypertextual it is, albeit using printed cross-references and indexes. It seems to be in desperate need of interactive pixels. And an underlying programming language that has better support for Knuth’s preferred size of code fragment.
On balance I think modern tools achieve Knuth’s goals better than tangle/weave. Modern languages let you organize code much closer to its narrative structure, without WEB’s unhygienic scoping. Editors and IDEs and documentation browsers give you hyperlinks everywhere.
It’s then “just” a matter of writing the exposition. Less literate programming, more literate programmer.
- wiz21c a year ago
  I like your comment.
  But let's take for example JupyterNoteBook. It's nice but not as good Knuth's stuff. Why ? Because the notebook forces you to follow the evaluation order whereas Knuth allows you follow your ideas order. Now for the rest, Jupyter is simply much better. But fundamentally, JNB can't match Knuth's LP flexibility.
  So I'd say that JupyterNotebooks are really nice to explain a recipe but LP is better at explaining ideas (more like algorithms).
  However, usually algorithms presentations have more to do with ideas, concepts. And thus, one may wonder if even bothering writing an LP document is necessary since the core ideas don't actually need code to be explained).
  So-so... :-)
  leephillips a year ago
  This is solved by the Pluto notebook, which makes presentation order independent of execution order. Only works for Julia, but shows that such a notebook is possible.
  https://lwn.net/Articles/835930/
  snops a year ago
  Marimo does this for python, by tracking dependencies.
  https://marimo.io/
  wiz21c a year ago
  Thanks, didn't know about that tool. Will try !
  zelphirkalt a year ago
  Do you happen to have any example that shows LP in Pluto? The site you linked does not show it, at least looking at the screenshots, I don't see anything like I would have in org-mode using org-babel.
  leephillips a year ago
  No, not specifically. But I think it’s pretty obvious how to put it to that purpose.
  wiz21c a year ago
  Thx! didn't notice that pluto had this feature (I used it like a Jupyter notebook, shame on me :-) )
  fanf2 a year ago
  WRT modern tooling, I was thinking more in terms of languages like Haskell that don’t constrain the order of presentation so much, or tools like rustdoc that embrace hypertext and allow the reader to choose their own adventure.
- PittleyDunkin a year ago
  It's one of two literate programming books I've read—the other one is PBRT, which provides about the same level of quality as the TeXBook IMO (I haven't read metafont).
  > On balance I think modern tools achieve Knuth’s goals better than tangle/weave. Modern languages let you organize code much closer to its narrative structure, without WEB’s unhygienic scoping. Editors and IDEs and documentation browsers give you hyperlinks everywhere.
  I just don't think this is a technical concern to begin with. It's true that you can jump around code a lot easier, but this doesn't make the issue of laying out the code as a linear narrative with interleaved code and text any easier of a task. I do think it's an excellent, excellent way to present code, though.
  WorldMaker a year ago
  I think "linear narrative" is exactly the hard part. Most programs don't have a linear narrative, they have a "hub-and-spoke" narrative of some sort, a central core then a bunch of branches of "features" that may or may not interact. Some of that is "do one thing" is great in the Unix philosophy but doesn't really describe things like business-oriented development. But I think more interestingly as Diataxis [0] maybe suggests learning about any project is maybe necessarily "two-dimensional" at best/least. Someone entirely new to a codebase is probably going to need a different "narrative" than someone familiar with it. Maintenance and new work are different narratives from the original design.
  Perhaps literate programming is so "hard" simply because we aren't able to nail down a single linear narrative.
  I don't know what the Diataxis of Literate Programming looks like, but it does wriggle with some ideas.
  [0] https://diataxis.fr/
  WillAdams a year ago
  In my own current project I've found that it's important to remember that there are at least two different audiences for the documentation:
  - the user who wants the bare minimum, perhaps a template and a brief discussion of each command, ideally wrapped up to be as friendly as possible
  - myself and other programmers who want the innermost workings explained in detail
  The typical user of TeX doesn't want _The TeXbook_, nor even a Literate version of Plain TeX (and why that doesn't exist is a different discussion), but rather lshort.pdf (and they _don't_ want the typeset .dtx source of latex2e --- I actually printed that out once, and have yet to read it, since I still need to compile a list of texts I need to read/concepts I need to understand in order to do so profitably).
- creer a year ago
  Yes the problem is all these different literate needs.
  There is a whole range of things we should want to document and encompass. From an outline idea or plan of how the software is planned to work and is currently implemented, all the way to more like programming journal or engineering notebook on the development, showing stuff that was tried and failed, record of performance experiments, all the way to day to day commits. Day to day commits is probably enough quantity that it will do fine with a separate system - but then should still have pointers or references in the engineering notebook aspect of the whole thing. And then of course multi-user by now. And for many software bases, this cannot be linked to "in order" execution like a python notebook. Execution is too variable, long, on-going and environment dependent. It's possible that what this ends up looking like is pairing "extensive in-code documentation" with a separate "overview narrative" and a separate "engineering journal" (with thoughts and rationales and test results pasted in or git-ted.)
  But I don't throw stones at "typesetting". Nowadays "mind-map" is probably more appropriate or free-form layout, and there is a lot to be said for throwing low cost graphical representations including napkin diagrams here and there in the documentation. If we are trying to make it easy on the programmers to provide all this input, then let's make it easy.
  (But then I object to the lack of "linearity, diff-ability, text-ability" of mind-map formats and my reaction to spreadsheets like one of the comments requests: "oh god no, let's not HIDE all this in countless tiny little boxes that must be opened one by one!" - but I would love a linear, text-based computable spreadsheet format.)
- Joker_vD a year ago
  > The order of presentation of a program is dictated by the needs of a single-pass compiler
  That never really made sense to me. All those "if"s, and "while"s, and "break"s, and "return"s jump to the not-yet-generated places in code just fine; a similar technique could be used for delaying the resolution of function/procedure calls as well.
  Now, generating initialized data/rodata sections is something that single-pass compilers do struggle with (that's why Pascal didn't have array literals), and it's understandable: the Modula compiler, which got them (or was it Modula-2?), had to hold all of that initialized data section in memory and then dump it to the disk only after the code-generation (and patching the offsets) was done. But dealing with not-yet-defined code labels? That's something you have to do anyhow.
  Someone a year ago
  > All those "if"s, and "while"s, and "break"s, and "return"s jump to the not-yet-generated places in code just fine; a similar technique could be used for delaying the resolution of function/procedure calls as well.
  Not quite. When compiling "if"s, and "while"s, and "break"s, a compiler will make use of the fact that code is structured. Because of that, a stack of addresses is sufficient to track (similar to how Forth uses the return stack to compile such constructs)
  For returns, the compiler doesn’t resolve the target address; that happens at runtime.
  For function calls, a compiler would need a map mapping function names to addresses.
  Also, generating good error messages is harder if you allow forward references. You cannot generate an error before seeing the definition, even if, say you encounter two calls to foo before seeing its definition.
  Joker_vD a year ago
  Of course you can do all of this! After all, the sources of e.g. Wirth's Modula-2 single-pass compiler for the Macintosh (that generates native MC68000 code) have survived, and it doesn't use "stack of addresses": it uses "fixup()" calls when compiling conditions, loops, references to (imported) global variables etc. and, surely enough, calls to functions. It even has "CheckUDProc()" call at the end of the module-translation procedure to check whether there are any undefined procedures left in the symbol table.
  It would be entirely possible to treat calls to undefined functions as implicit forward declarations; he simply never chose to do so.
  int_19h a year ago
  The problem with functions is that when you see something like this:
  F(X);
  there may be implicit conversions happening to X here depending on its type, and, crucially, the declared type of the argument - e.g. Integer to Real. This would be a separate function call in many cases, so it's not just a single address to patch - indeed, you don't even know the size of the code that will need to be inserted in advance.
  Joker_vD a year ago
  Yes, and the way that e.g. C famously handles it is that it treats it as X being implicitly int, if it's an integral expression that produces something shorter than long, and a double, if X is a floating-point-valued expression, and whatever happens to structs when invoking implicitly-declared functions.
  This, again, can be checked for the correctness at the end of the translation of a particular compilation unit, C just decided to not bother with this at all and leave it to the linker to figure out (even though it started its life as a language with 2-and-a-half passes compiler), and Pascal and its descendants decided to require either forward declarations, or using function pointers. I personally think that e.g.
  var F: proc(whatever): whatever; proc G() begin ... F(whatever); ... end G; proc F0(whatever): whatever begin ... G(); ... end F0; begin F := F0; end module.
  is a needless pessimization but e.g. Oberon does it this way.
- pragma_x a year ago
  > You just feed your TeX document to ghc and it picks the code out of the \begin{code} blocks.
  This is starting to remind me of Jupyter notebooks, as you are expected to have documentation (well-rendered markdown) blocks and dedicated code blocks. Now I wonder if notebooks were conceived with Knuth's vision in mind.
  kccqzy a year ago
  Yes Jupyter notebooks are inspired by Mathematica notebooks, and the documentation for the latter explicitly mentions literate programming.
PaulHoule a year ago
I spent a lot of time thinking about no/low code and one conclusion I came to was that there was "the program" and then there was the way it displayed. Consider
https://www.knime.com/why-visual-workflows
the topology of the connection between the blocks is essential to execute the program; that doesn't require coordinates for the blocks, but the visual editor does. The ideal tool has a clean separation between these. People are dimly aware of the "hairball" graph problem and looking for a visualization algorithm that banishes them:
https://blog.tomsawyer.com/untangle-the-hairball-with-bundle...
yet the real problem is that if your goal is to communicate you want to tell stories with visualizations and you need a visualization organized around a story -- and for each story you tell you need a different visualization.
I see the same thing with literate software. I can think of a number of interesting stories to tell about last month's chess program (e.g. "move generation", "alpha-beta search", for instance.) For a particular story I want to focus on certain things and completely eliminate other things. The same code might appear in more than one story. To make all this work there has to be a clean separation.
- WillAdams a year ago
  The fundamental question here is one which I don't think there is an agreed-upon answer for:
  >What does an algorithm look like?
  I am working on a rather visually-oriented tool, and while I did a fair bit of early development using BlockSCAD, and also have access to OpenSCAD Graph Editor, it hasn't made sense to show the algorithms visually because it's a lot of work making them suitably expressive.
  One almost wishes that the specialized library used for:
  https://www.youtube.com/watch?v=aVwxzDHniEw
  was published and widely used.
  Similarly, why can't we have more things such as:
  https://mathcs.clarku.edu/~djoyce/java/elements/elements.htm...
  mjochim a year ago
  The library developed and used by 3blue1brown [1] is open-source [2] and seems to fit the same use case. I don’t know about widely-used, though.
  [1] https://3blue1brown.com [2] https://github.com/ManimCommunity/manim
  PaulHoule a year ago
  An example I'll call out is an art show I saw by
  https://en.wikipedia.org/wiki/Mark_Lombardi
  a 'conspiracy theorist' who died mysteriously in 2000 at the age of 48. He would make large numbers of pencil sketches (50+) that started out as hairballs and gradually he'd try different layouts until they told a clear story. (At least some of the conspiracies, such as BCCI, were real, thus the quotes)
  The same kind of work is necessary if you want to reveal some network of relationships that is essential to understanding some technological system, rule base, etc.
  kayvulpe a year ago
  I cannot find any of his work in high-resolution but those diagrams are exhilarating. Thank you.
  PaulHoule a year ago
  As a kid I felt algebra >> geometry (like I want to divide an angle by three and why waste my time with a system that can't! sure you can learn what a system can and can't do but that can be taught more directly with examples from computing) so as much as I read about Elements in math books by the likes of Martin Gardner it struck me as serious malpractice that "Great Books" advocates wanted kids to read it. (It's better than reading Newton's Principia if you want to learn physics or calculus though...)
  I like what that site is trying to do but the upper levels don't communicate the affordances you would find if you drilled in. Also there is graph structure in Elements that I don't see visualized; also Elements uses a lot of weird vocabulary that would be a lot easier to deal with if it were hyperlinked to a glossary.
  I've been interested in old Asian texts like the https://en.wikipedia.org/wiki/Kojiki and https://en.wikipedia.org/wiki/Romance_of_the_Three_Kingdoms where I have, charitably, 5% reading comprehension of the language but could get a lot with the graph structure materialized (like the chain of succession from Amaterasu to the Emperor) and also would like to see the original text, plus human-generated English translation if available, LLM-based translations, links to properly resolved characters and words in the dictionary, etc. (Right now I am digging into about 800,000 images with Chinese language metadata with some crude tools, really just getting out named entities makes me tickled pink.)
  WillAdams a year ago
  I dunno, I've been including links to Euclid's Elements, Joyce's Java version at:
  https://willadams.gitbook.io/design-into-3d/2d-drawing
  and I think that the rigor which it imparts is a good thing (but ask me that again after I've finished reading Hilbert/Cohn-Vossen's _Geometry and the Imagination_ which I just ordered from the AMS) --- after that I need to read _Projective Geometric Algebra Illuminated_ by Eric Lengyel and hopefully out of all this I'll arrive at the understanding I need to finish up the next aspect of my current project (though I would accept recommendations on books on conic sections).
  tldr; I think Euclid should be included in math studies, but not as a sole text, more as an "ultimate authority" so that there is some commonality to logical processes and proofs.
- nathancahill a year ago
  Along the same lines, I quite like the regex visualizer (Railroad diagram): https://regexper.com/#%2F%5E%28%28%5Ba-f0-9%5D%7B32%7D%29%2B...
jasonpeacock a year ago
The problem with literate programming is that most people aren’t good writers, and they aren’t interested in developing their writing skills - they only want to code.
Already it’s pulling teeth just to get literate commit messages…
- lupire a year ago
  The other problem is that literate programming works well for code that has complicated ideas behind a small amount of code, but most businesses code is simple ideas behind a lot of boilerplatey code. The hard part is in making it all fit together, not explaining what each part means.
  exe34 a year ago
  it might be worth separating out the clever business-specific algorithms, etc. these can be done with literate programming and then the rest of the plumbing can be done as usual.
- mrweasel a year ago
  Side note, regarding commit messages: I have a few colleague who will go through the commit message just as thoroughly as they do the code, and match up the code with the commit. Over the past few years that has thought me to write pretty decent commit messages, to the point where I go "That is nicely done" when encountering my own commits.
  Still can't write outside commit messages, but I guess that can be learned as well.
- kubb a year ago
  You know who is a good writer? LLMs. Imagine a model interrogating you about a piece of code and writing perfect documentation.
  JK, the LLM will get bad input and it will spit out bad output.
  flir a year ago
  LLM's tend to write classic "what, not why" comments/commit messages. The idea of the LLM interrogating the programmer for the "why" is interesting, though.
  Or maybe reading the ticket? (That might just be moving the problem somewhere else though).
  toxik a year ago
  I wonder if an LLM could make ”atomic commits” out of my N pending changes.
  nzach a year ago
  It almost can... You can use something like gptcommit to automatically create a commit message. But the results are pretty bad. It can't produce anything beyond placeholder/filler text.
  I don't need a message explaining we introduced a if to return when i is greater than len(items). I want a commit explaining why it blew up in production after being in production for over a year. What changed ? Did it had any other implications ? There is any ticket for this bug, or maybe a thread in slack ?
- hitchstory a year ago
  >Already it’s pulling teeth just to get literate commit messages…
  I usually push back on this because those commit messages almost never actually get read. It's an investment whose dividends are nebulous and hard to pin down.
  It's rare that I look at a commit, and it's even rarer that I read it and wish that it had a better message.
  There are all sorts of other documentation I routinely wish people put more effort into writing (comments, a "why" attached to every test, how to guides, tutorials), but rarely ever a commit message.
  fsmv a year ago
  If you wrote better commit messages you might look at them more because they're a lot more useful.
  At work I see it as explaining why the code is there so when people check the blame layer they can find out and not delete my code if my reason is still relevant.
  At home it helps a lot to write changelogs later when I do a release and it helps so much to see the last few commits when I pick a project up again after a month or two.
  hitchstory a year ago
  I mostly just message the person who wrote it if I have questions when I look at a commit and have questions.
  I suppose if I painstakingly write 1000 beautiful commit messages I could save myself from having that one conversation when somebody else has a question about one of those commits.
  moe_sc a year ago
  That doesn't sound future proof.
  People leave projects/companies, people forget.
  Code comments are also far from good. They have the same issue as duplicatet code. Comment and code age indiviually. Now you have to maintain both and it's for them to diverge.
  Git commits are a snapshot of the codebase. Commit messages in them are pinned to a code version. Comments in commit messages are therefore always tied to the right version of code.
  hitchstory a year ago
  People leaving is a great reason to write literate tests and other kinds of docs - the kind people actually look for and want to read.
  If somebody asks a question that can only be asked by looking in a commit message that usually represents a failure in one of those docs.
  kccqzy a year ago
  I'm an introvert. I don't want anyone to message me (or worse, set up a meeting with me) just to understand what my commit does.
  Furthermore writing is itself a way to enhance clarity of thinking. Very often during the process of writing out a commit message I realize something else in the commit is missing.
  norir a year ago
  Writing good commit messages can be good self-promotion. You may not realize who is watching the repo and this is an easy way to differentiate yourself from you colleagues and make your work appear interesting and important.
  Also, I have found that writing commit messages often forced me to rework a poor solution whose weakness became apparent only when I tried to explain it. In other words, often the value of the commit message is not the message itself but rather the process that produced it. It's a bit like musical scales. Almost no one is performing scales at a concert but also almost no one is performing without scales as a part of their regular practice.
  hitchstory a year ago
  >can be good self-promotion. You may not realize
  When I said nebulous and hard to pin down this is kinda what I meant.
  Do you routinely monitor commit messages on adjacent teams' repos? I dont. I dont know anybody who does. When I want to know something about their code I ping them a slack message and vice versa.
  I dont think the CTO is reading commit messages either, theyre too busy.
  I think it's good to ask when and why people actually do read commit messages to make sure what you write aligns with what they want to see.
  paulddraper a year ago
  Surely you do it often enough that an extra 90s would not be an undue cost?
  hitchstory a year ago
  I'd estimate I probably commit ~30-50 times a day. I dig into and read a commit message once every 3 months.
  A 90 seconds a commit that's an hour a day spent writing beautiful commits.
  That one hour a day could be substituted with one conversation "hey, why did you do xyz in [ linktocommit ]?" every 3 months.
  If you dont do these back of the envelope calculations in your head when trying to figure out if something is worth doing I highly encourage it.
  kragen a year ago
  Maybe you should be using `git stash` and `git stash pop` rather than commits. Or go back and squash your 30–50 commits into one or two, which might take ten minutes. What you're doing is better than not using version control at all, but only barely.
  hitchstory a year ago
  I do squash them where appropriate.
  I do use git stash where appropriate.
  These arent nonobvious novelties, save perhaps to junior engineers.
  kragen a year ago
  Well, that is kind of what you sound like.
  Supermancho a year ago
  > Well, that is kind of what you sound like.
  This is what senior engineers do. Interrupted to switch tasks every 10 min. Make your good changes as a commit and move to the next task. Maybe you get back to it today, maybe not.
  kragen a year ago
  That's definitely not how Jeff Dean and Sanjay Ghemawat wrote MapReduce and Bigtable. I'm sure what you're saying is correct at many companies (I've seen a few), but it's a stupid policy, depriving them of the benefit of having senior engineers.
  paulddraper a year ago
  You make a new commit every 12 minutes for eight hours?
  Supermancho a year ago
  > You make a new commit every 12 minutes for eight hours?
  You only work 8 hours? Your changes are that big? These are bad faith questions.
  paulddraper a year ago
  These questions are not made in bad faith.
  hitchstory a year ago
  That sounds like a reasonable average, yeah. Some tiny changes take 45 seconds, some harder changes take 40 minutes.
  I find that working in working code increments that are as small as possible to be ideal.
  lanstin a year ago
  Do you not have code reviews?
- sparker72678 a year ago
  Seems like many devs don't even like to code. They just want to get paid.
  (Not suggesting there's something wrong with that, per se. But good luck getting someone's who just in it for the money to go above and beyond.)
dhosek a year ago
I used to do everything in WEB/CWEB back in the 80s/90s. My biggest difference from Knuth was that I always started with an outline of the program as my first section (or perhaps second after an introductory section talking about what the program was meant to do). This made sense both from an expository standpoint and from a development standpoint as I could then assemble the program piecewise by filling in the sections that I had outlined at the beginning and generally each block of code fit on a single page with its documentation.
Problems I ran into were (a) for Pascal WEB, it was hard to get away from Knuth’s string pool–based handling of strings which I didn’t especially love and (2) for CWEB, it made sense to have both the .c and .h files specified in the .cweb file, but this meant that a lot of the efficiencies of make, such as they are, would be broken since the .h file would get updated everytime I updated the .c file forcing recompilation of other chunks of the code even though there were no actual changes. Perhaps a more intelligent version of ctangle which could screw with modification dates of files based on whether the code was actually changed would have helped, but…
That said, the weave output of TeX and Metafont does provide amazing documentation of the internals of how those programs work and they make for excellent reading. Alas, that sort of thing is hard to justify in contemporary business contexts.
- froh a year ago
  > Perhaps a more intelligent version of ctangle which could screw with modification dates of files based on whether the code was actually changed would have helped, but…
  yes, it would --- noweb did two things right: they created a helper utility, `cpif`, which checks if a tangled file has changed at all and leaves the existing file alone if there was no change.
  and they radically simplified the markup, which made it most accessible.
  and they created (third good thing) a modular architecture with a.well defined easily parsable intermediate textual representation to pipe text and code chunks through whichever utilities you want, for syntax highlighting, indexing, whatever.
  and then, of all options, they chose the icon language for a reimplementation. from awk and c, available anywhere, to iconoclastic icon. bummer...
- taeric a year ago
  Your starting with a general outline of it feels exactly right, to me. I thought Knuth actually did that quite often, as well?
  In a sibling post, I describe it as how you would narrate the code to someone. You wouldn't go line-by-line in that. Instead, you would likely start by describing the general structure. Then, you'd start talking about bursts of changes.
  It can be difficult to explain, as often times this can be done by talking about the functions that you are adding. In which case, it doesn't look much different from many other environments. Once you get to the non-linear additions of code where you are scattering variables and such through multiple sections, though, it starts to really shine.
- metaed a year ago
  Regarding make, Noweb gets this right. When you edit the web source, the tangle step only writes new modification times to output files that actually changed.
- kragen a year ago
  I feel like you could run ctangle in one directory and use a ten-line awk script to copy the files with different checksums into the directory where you run the main Make?
  dhosek a year ago
  One of the challenges is that cweb wrote #line directives to its output so a file might change, but not in a meaningful way.
- coliveira a year ago
  You could write to a separate header file and use a script to copy to the right place only when the header was modified.
wduquette a year ago
In my experience, code bases are best structured for navigability, I.e., so that you can find what you’re looking for. Literate programming is about narrative, telling the story of the code in a clear way so that you can build your mental model. These two needs are frequently orthogonal.
- lou1306 a year ago
  But that is the point of the article. Literate programming was built on the premise "what if you could have _one_ document that you could either typeset as an article/report _or_ compile as a program?", but then most implementations only cater to the second part of the equation and essentially default to "comments on steroids". Being able to expose pieces of code in a non-linear way would be a basic necessity, so that you can put the "interesting stuff" front and center and only focus on the minutiae (e.g., the #includes) later in the exposition.
- josh-sematic a year ago
  Firm agree. To me the best way to build a mental model of some code is to be able to quickly answer the questions that arise in my mind as I read it. The order in which these questions arise differ for each reader, and indeed for each intention that you approach the code with. There is therefore no one “perfect linear order” the code could be presented in and the best you can do is make it easy for the reader to construct their own reading order by being able to easily navigate around.
  WillAdams a year ago
  I find that Literate Programming and being able to read through a nicely typeset, hyperlinked document with an index and ToC and marginal callouts of variable and routine names helps quite a bit in this.
d--b a year ago
What's difficult is the tension betwen compact code and verbose code.
Compact code makes the higher levels of abstraction easier to read, while more verbose code makes lower levels of abstraction easier to read.
In large codebases, if you spend 10 lines describing an optimization for sorting stuff faster, you may lose the overall idea of what the function is doing. But if you don't, no one is going to understand that particular optimization. People will say that these should go in subfunctions, but having too many subfunctions is yet another problem that breaks code's linearity, making it also harder to read.
- WillAdams a year ago
  This gets discussed in great detail in Ousterhout's _A Philosophy of Software Design_:
  https://www.goodreads.com/book/show/39996759-a-philosophy-of...
  Like most things in life, it's a series of tradeoffs and a balancing act.
- BlueTemplar a year ago
  Isn't this mostly solved by a "collapse comment(s) / function(s)" text browsing feature ?
alganet a year ago
For today, I would focus on literate testing. The linear unit test is a better bottle for narrative content than the non-linear code.
In a similar line, I believe automatic documentation should be generated from tests, not the implementation. Then it's always up to date.
Text has many forms. Poems with many kinds of metrics, prose with all kinds of tones. Some forms are atemporal, some get old. None of it is "wrong", just different bottles to convey different ideas to different audiences.
- bluGill a year ago
  I disagree. The most important documentation is for people who don't want to read your code just use it. Thus API documentation is important. Tests sort of do that, but they get into the wrong details - I don't want to read through a dozen different edge conditions for the first argument before you get to what the second argument does...
  alganet a year ago
  I said documentation generated from literate tests (this does not exist yet), not tests as the final documentation artifact. The result would be similar to API docs, but derived from a more authoritative resource (tests, not docblocks).
  > I don't want to read through a dozen different edge conditions
  Neither do I. You mean you want the happy path in a distinct scenario. That is a very common perception amongst both testers and documentation writers. Good test suites have the happy path distinct from the edge conditions as well as good docs.
  bluGill a year ago
  I don't think it is possible to usefully deliver what you want, but if you can do that I'm all for it. Just having the code in my documentation checked that it compiles (which often means assumptions because I shouldn't have to put in boilerplate)
  alganet a year ago
  Python doctests[1] scratch that idea, but in another angle. They are useful and popular.
  clitest[2] is more closely related to what I'm aiming for, and it's very useful, but hard to translate to non-shell paradigms.
  I am sure it can be done.
  [1]: https://docs.python.org/3/library/doctest.html
  [2]: https://github.com/aureliojargas/clitest
Sylvain78 a year ago
Oscar winning book about rendering accurately 3D scenes, written in literate programming : https://www.pbrt.org/
- WillAdams a year ago
  I've tried to collect a list of books written thus at:
  https://www.goodreads.com/review/list/21394355-william-adams...
  (please excuse _A Philosophy of Software Design_ EDIT and the two _Structure..._ books --- they were just too influential on me to not list)
  I would be glad of any I have missed, or other such lists.
- magicalhippo a year ago
  By far the best programming book I've ever read, by orders of magnitude.
  I think the literate programming style fit this book well, but the main reason is simply that it tackles not just the theoretical side but also the practical side in great detail.
  In my experience a lot of programming books focus a lot on either and ignore the other, and that leads to frustration when trying to implement stuff in practice.
- lupire a year ago
  Fascinating.
  I think it would be much easier to read if it were formatted in 2-pane format, with the English and math on one pane, and the code on the side, aligned to match.
  Working from the source code of the program/book and the Literate Code renderer program, it wouldn't be so hard to write a variant transformation to lay out in side-by-side fashion.
  Possibly could need some extra markup code, though much could be inferred simply from "comment-followed-by-code implies comment-aside-code" rule.
  As it is, the inline code is a distraction from the text and math, and vice versa.
  WillAdams a year ago
  There shouldn't be reason one couldn't use LaTeX to create such a typeset representation.
  Integrating the inline code is the responsibility of the writer.
  BlueTemplar a year ago
  The problem with (La)TeX is that it's designed for fixed layout documents.
  Great for print and slides, bad for most digital documents.
  WillAdams a year ago
  One can always just measure the current \textwidth (or other dimension) and make adjustments to typesetting based on that.
  BlueTemplar a year ago
  This sounds hacky, so probably will be slow and/or buggy.
  WillAdams a year ago
  It's no slower than any other calculation, and works reliably in my experience.
fjfaase a year ago
For 'Advent of Code' I have been using a kind of literary programming method based on Markdown files. I have written a parser that can read Markdown files with segments of C code and put all the segments in the correct order to be compiled. For an example, see: https://github.com/FransFaase/AdventOfCode2023/blob/main/Day...
I also used it as a documentation for parser I have been developing, which shows the literary programming style in a better way. The program can combine several Markdown files into a single C program. See: https://github.com/FransFaase/RawParser?tab=readme-ov-file#d...
- drivers99 a year ago
  That looks great. Found it in your IParse repo. I'm starting to work through "Write a C Compiler" by Nora Sandler and I think this could be very useful for me. Besides, I'm curious how it puts things in order and lets you mention functions multiple times, using "..." to skip what's already there, etc.
  fjfaase a year ago
  Look for the file MarkDownC.cpp and search for 'elipses', which stands for '...'. The code makes use of AbstractParseTreeCursor, which is a kind of smart pointer. (The is quite a bit of code that is commented out.) You can contact me through the email mentioned at the bottom of my website www.iwriteiam.nl
camel-cdr a year ago
> There's a fundamental problem with generating a beautifully typeset document for a codebase: it's dead. [...]
> You can't work with it, you can't try to make changes to it to see what happens, and you certainly can't run it interactively.
I think an aspect where " classical" literate programming excels is in the form of printed educational books.
A great example of this is LCC "A Retargetable Compiler for ANSI C" by David R. Hanson.
Reading about how to write compilers in theory is fine, but it's very nice to learn from a full working example implementation.
- bluGill a year ago
  Truth and Knuth is an author of books so it makes sense for him to write code for books when writing code for books. However code that isn't bound into book has different needs and should be different. I've printed out code before (the floor gives a lot more space to spread out text than even the largest monitor, and taking a pen to circle useful parts is helpful) - but I recycled that paper as soon as the bug was found/fixed. In that case I'm never interested in how the code is documented/supposed to work in the authors imagination - I care about what it actually does.
- zelphirkalt a year ago
  > > There's a fundamental problem with generating a beautifully typeset document for a codebase: it's dead. [...]
  I think this is false. The typeset document should only be the output of a rendering/compilation step, of a document that exists in plain text. Usually that is the case. Why would one throw away the source once the typeset document is created? One would then simply edit the source, just like we do with other code that runs.
  > > You can't work with it, you can't try to make changes to it to see what happens, and you certainly can't run it interactively.
  Yes you can. Just normal compile cycle. What am I missing here?
- WillAdams a year ago
  Note that it is possible to apply a [draft] option to some LaTeX documentclasses so as to avoid that appearance and encourage folks to treat as not-yet final/finished.
taeric a year ago
I used to agree with this post's general premise, but I have come to think it is wrong.
The idea seems to be that a literate program would be ordered such that a reader would not need any familiarity with general boilerplate of the language that is being presented. But there is no real benefit for that. Indeed, it makes everything so much harder if you are trying to have meaningfully original narrative for every single line.
Instead, people should know roughly what the outline of a C program is if they are going to try and read a C program. Regardless of if it is done literately or not. To that end, having some generic imports at the top of a file that has a scattering of globals the author typically uses makes a ton of sense.
The idea, restated, is to allow you to narrate code. So, think of how you would approach reading anyone's source code that wasn't reformatted in this way. You'd take a look at the general outline of the file. Probably take a quick peak at the basic imports. If there are a common set of top level variables you expect to see, check for those real quick. Then, start trying to find the narrative of the code.
flerchin a year ago
It's funny the very things the author decries are when IDEs do for us automatically. Hide the imports, expose the abstractions when needed.
jillesvangurp a year ago
I wrote a small kotlin framework to help me write documentation that has lots of code samples (https://github.com/jillesvangurp/kotlin4example) that might be of interest to people maintaining kotlin libraries.
My library tries to enable literal programming in Kotlin via a Kotlin DSL that makes it easy to use markdown in multi line string literals. You use the DSL to write your documentation as a Kotlin file that generates markdown that you can save to a file when you run it (from a test typically).
The key feature that enables literal programming is an example lamda function. This makes it easy to embed example kotlin code in the documentation. The library figures out how to extract the code block you pass to this function from the source code and includes it as a markdown source code block in the markdow output. Example blocks are also runnable (optional) and of course have to compile. If you run them, you can optionally capture their output as well and render that in the documentation. Additionally it captures the block return value and allows you to do things with that.
Of course you can also include existing markdown files, create links to files in a (public) github repository or pull in source code examples from existing source files.
Most of this isn't really novel. But I haven't really seen anything like the example lamda function in other tools. And this is something that might also work with other languages (Ruby maybe?). Although it does rely a bit on reflection and classloader magic to figure out the source code that corresponds with the .kt file in which the example blocks are located. At runtime it tries to figure out the beginning and end of those blocks and transforms those in markdown source blocks.
As far as I know, I'm the only user of this library so far. But since I think it is kind of nice, I thought I mention it here.
It's not perfect but I've documented a few of my open source kotlin libraries with this. The most significant one is jillesvangurp/kt-search, which has a lot of documentation at this point.
begueradj a year ago
A couple of months ago, I read a comment here in HN where the OP said that wherever he worked, his colleagues thanked him because he always used the literate programming approach, hence his code is easier to understand and follow. I wish I can find that comment.
- lupire a year ago
  Have you ever seen a comment of someone saying they have a coworker who writes in the Literate style, and they appreciate it?
  Bias may be a factor.
quantadev a year ago
It seems to me like in 20 to 50 years humans will still be writing computer code, but hopefully some IDE-like 'block-based' editor approach similar to Jupyter Notebooks will have taken over. Block-based editors let you mix in different "types" of artifacts like documentation, code, images, examples, scripts. It will need to be some kind of 'tree-like' structure with expandable nodes like a file system I think, so that for example underneath every method implementation you can have it's test cases right there with it, sort of 'inline'.
I've also thought that eventually even web-browsers will be replaced with some system like this too, where everything is "typed" (like Semantic Web), so that both machine parsers, AI systems, and humans can equally well consume and understand the content.
The problem is 'momentum'. People tend to build new capabilities on top of the old capabilities like layers of an onion rather than rebuilding something new from scratch like a Jupyter Notebook-like "do everything" browser/IDE/editor.
mtrovo a year ago
> You can't work with it, you can't try to make changes to it to see what happens, and you certainly can't run it interactively. All you can do, literally, is curl up with it in bed. And promptly fall asleep.
I totally agree with this. Generating static typeset docs essentially leaves them "dead" because they can't be poked at, run, or updated in real time. That's why I'm a big believer in the value of tests and instrumentation, which stay alive alongside the code. In practice, we don't need every aspect of a project ( e.g code, tests, docs, and instrumentation) to be simultaneously visible at all times, and I think part of the complexity of onboarding into a new project is the overwhelming feeling you get the first time you see a piece of code with all its tests, all comments, and all its instrumentation thrown at you like you should pay attention to everything. When we're refactoring, we focus on a particular chunk of code and its tests; when we're debugging, we need instrumentation. For everything else, we can loop back later.
We already have powerful ways to merge issues, commits, and discussions, but we still rely on massive documentation dumps that go stale fast. A more adaptive approach (perhaps using an LLM to generate contextual help from version control) would allow us to focus on the core process of writing and verifying code. I especially like the idea of code that can "talk back" and give just the right amount of detail based on what we're trying to accomplish at the moment. Sometimes it suffices to have novice-friendly, high-level explanations; other times, we want to jump in and change the behaviour of a very specific feature across some specific files. Maintaining documentation for every possible scenario is a huge burden, which is why I think flexible, on-demand help could be a game changer.
cbrozefsky a year ago
Emacs org-mode literate programming allows for re-ordering of code, extracting to multiple files, and if using a suitable language, evaluation of fragments and interactive exploration and rendering of examples.
I really enjoy it with a lisp, like clojure.
bluGill a year ago
Literate programming fails because it I'm a programmer and so it is telling things I already know. I understand how to read code.
Literate programming makes sense if you are writing raw machine code, or assembly where you cannot always name something important. DoSomething() better do what the names says it does so you don't need a comment to say anything more about it. Look at all the examples - why are they saying anything about "#include <stdio.h>" - if you don't know what that line does then you have no business reading C code - go learn C and then come back. Sure you could read the literate programming comments without knowing C - but I cannot think of any reason anyone would ever do that.
Comments that tell me what the code cannot are very helpful. Telling me that this complex sequence of weird code is just the standard AES algorithm is helpful (I haven't seen AES code, but I've seen other encryption algorithms that are weird bit manipulations and I assume AES is the same), but even then if I need to touch that code I better refer to the AES documentation so don't write how AES works. (If it is a custom algorithm for your company maybe document how it works, but the code is not the place for that). The important part is the code can tell me why is happening but not why it is that way so comments should tell me why you are doing something.
- liontwist a year ago
  > Literate programming fails because it I'm a programmer and so it is telling things I already know.
  Code is just a medium. You’re supposed to know it. What you don’t know is the domain knowledge the code is solving, and that’s what’s written in the literate program, along with diagrams, math, etc to aid your understanding.
  bluGill a year ago
  Look again at the examples - very often it isn't that at all. My example - where Knuth himself wrote about "#include <stdio.h>" proves my point. If you don't know what stdio.h is about you are not competent to be reading this code in the first place.
  > What you don’t know is the domain knowledge the code is solving
  THIS IS UTTERLY FALSE! The company hired me to work on this code and trained me. Or I got interested in the problem and learned the domain knowledge already. If you don't have this basic knowledge you either won't read the code in the first place, or you can only make the most minimal basic contributions for the next few months while learning it. Unless this problem is very trivial there are lots of other source files and lots of other knowledge needed that is outside the scope of the current file.
  Write comments to remind the experts (which is you in 6 months!) of the tricky details that they dare not forget. Write comments to explain to people who are experts in a different area of your program who this works - but you can assume a lot of shared domain knowledge because they work on the same project.
  liontwist a year ago
  > THIS IS UTTERLY FALSE!
  So you can read a section of code and immediately know what it’s doing and why it’s written that way? And the decisions and context that led to it? And you have no further questions?
  furthmore you are familiar with all techniques your coworkers might employ?
  Wow!
  > Write comments to remind the experts
  What if I want images in comments or a math formula? What if want to refer to another section of code?
  bluGill a year ago
  > So you can read a section of code and immediately know what it’s doing and why it’s written that way?
  No, but the documentation you have been advocating (at least so far here) wouldn't help either.
  > What if I want images in comments or a math formula?
  I don't have a good answer. I've seen some interesting ascii art. It looked cool, but it was out of date from what the code did and so needed to be deleted as I'm not enough of an artist to fix it.
  > What if want to refer to another section of code? Use your IDE to go to the definition of the code in question, (hopefuly you can do this).
  I understand (or think I understand) your problem. I often have problems with code that I don't understand. I don't think literate programming helps though as nobody actually is writing documentation of the type that would answer the questions I really have.
  liontwist a year ago
  Sorry. I could have toned down my reply.
  > nobody actually is writing documentation of the type that would answer the questions I really have.
  That’s a choice though, right? The tools enable you to document in ways you haven’t before.
  The most valuable use case I have seen is code that’s not too long and is read a lot of times. That’s when getting every decision, diagram, etc in one place helps a lot of people.
  I have one particular case at work where every line of code has had several meetings about it to ensure the parameters are correct, etc.
- akkartik a year ago
  > I understand how to read code.
  Funny, I still have no idea how to do this after decades of trying. I constantly run into functions called doSomething that also do something else.
  LP is absolutely not about teaching the language while teaching a codebase, and this is something LP does get right independent of my criticisms here.
  bluGill a year ago
  > I constantly run into functions called doSomething that also do something else.
  You have to have some trust in your fellow coders that doSomething doesn't do something else and this is often false. Still in most cases it doesn't matter on a first read as doSomething is close enough to only doing that something. LP won't help here anyway. Documentation can be wrote just as much as the function name.
  You say LP is not about teaching the language, but the examples from Knuth himself are often documenting the language and not the codebase. This is a common problem with documentation.
  akkartik a year ago
  As people repeatedly say in trying to rebut my OP, it's important not to cherry-pick examples. I try to be careful to find the best examples when criticizing. In that spirit, here's a couple of examples from https://cs.stanford.edu/~knuth/programs.html
  https://cs.stanford.edu/~knuth/programs/hwtime.w is the very first, and most easy to justify introducing a language, but it doesn't do much of that. It does your thing of explaining stdio.
  A slightly more advanced example https://cs.stanford.edu/~knuth/programs/hull.w doesn't explain #includes, and doesn't describe the language as far as I can see/recall.
  So feel free to share examples where people document the language. But that doesn't feel like a big anti-pattern to me.
- globular-toast a year ago
  So you reckon you could learn how compilers work by reading nothing but the GCC source code? While I'm sure it's possible, I think it would take you 10x as long as it would if you had read a few books about compilers first.
  bluGill a year ago
  I'm saying LP is the wrong way to write a book about how compilers work. LP might be useful for a toy example compiler as part of the how compilers work book, but that is it. Reading a LP version of GCC is the wrong way to learn how compilers work - once you see one (or maybe 10) optimization you know how they work and can just read source code for the rest - I don't know how many different optimizations gcc has, but I'm sure it is at least thousands. Similar for the parser, there are a lot of edge cases in gcc (C is a terrible language to parse - your compiler example should use something with a simple grammar so LR or other standard parser is used not the mess that gcc must have because C).
  akkartik a year ago
  You're describing staged learning -- and the example at the bottom of my OP is trying to demonstrate exactly that. My https://akkartik.name/post/wart-layers describes the mechanism in more detail for starting from a simple example and gradually adding concerns.
  It's true that the very first example a student sees shouldn't be some eldritch horror in all its complexity.
  It's also true that in the real world today, people study eldritch horrors on their own after learning the basics in kiddie pools.
  But we can do better than the way we've always done it. For key pieces of software that have eaten the world, it seems worthwhile to gradually chip away steps on the cliff side to help future learners more easily understand the real-world complexity.
WillAdams a year ago
Interesting.
Surprised I never heard of "Wikilon" (apparently it was an early precursor to Jupyter Notebooks?)
I agree with the criticism at the bottom:
>There's good and bad parts of LP, but it's not fair to pick examples of bad LP and use them to criticize all of LP, even if they were written by the guy who invented it. Some of the Wright planes crashed, but that doesn't mean airplanes are bad.
The mention of Axiom is interesting, since I find that going a step further and integrating Tangle/Weave so that both "just happen" is a big reduction in friction, hence my stumping for assistance:
https://tex.stackexchange.com/questions/722886/how-to-write-...
and working up a package:
https://github.com/WillAdams/gcodepreview/blob/main/literati...
(which unfortunately has to be edited to match the files which it outputs, as must the master .tex file)
which allows me to typeset a .tex file (which has ``normal'' syntax colouring in an editor, no "sea of grey" as .dtx inflicts) and get both the typeset .pdf and the .py and .scad files which are my project:
https://github.com/WillAdams/gcodepreview
(currently deeply into a re-write and hope to have a fully working version up by the end of the week --- see the archived v0.6 .tex/.pdf pair: https://github.com/WillAdams/gcodepreview/blob/main/gcodepre... to see a working state/example)
- akkartik a year ago
  I think it's not fair :) that you quoted from that comment here without pointing out my sentence in the immediate response:
  > I absolutely think literate programs are a strict improvement on non-literate ones.
  The whole article is about how good the airplanes are, and how much better they could be!
hugetim a year ago
Curious how folks think this measures up: https://www.answer.ai/posts/2024-06-23-claudette-src.html
- alexisread a year ago
  I built this sort of notebook before, the issue I have with notebooks though is version control, debugging, testing, and general integration eg. with deployment processes.
  In a large org, if you're missing any of them it's a showstopper, so lowest common denominator applies ie. basic comments.
  Ideally, the language would allow for scoped markdown+ comments, and be able to build a document from linking params and functions, into a narrative (Obv. this requires you do your project overviews in the correct location, but that's a small issue).
  I suspect LLMs will be able to help document code shortly.
  hugetim a year ago
  The package Jeremy Howard uses for Claudette, nbdev, actually does address the notebook pain points of version control, testing, integration, and deployment: https://nbdev.fast.ai/ (Debugging is already a strong point of developing within a notebook.) But it's admittedly used mostly for solo projects or small teams and may not be ready for large orgs.
evanmoran a year ago
The best approach I’ve seen in literate programming is when the comments become a webpage of documentation automatically from the source. This can be “// comments” or more like markdown indention, but I think the magic comes from the automatic documentation more than the particular literate programming syntax/approach. I think it encourages people to write just a bit more clearly, more like how people take more care with a blog post than an email, and that makes a significant difference.
norir a year ago
I take the author's point, but I still prefer languages that can be parsed in a single pass. Most commonly, this means it is written in a top down style where every reference is defined somewhere above the usage. It then naturally falls out that programs can be easily read bottom to top and the presentation will make sense.
You can put one intro comment at the top that tells the reader to go to the bottom and read from there. The actual main method code is almost always the most important and reliable information anyway.
In vim, I can open the file 'G' to the bottom, scroll up to the main function and start reading. Any def I want to look up I can select and find with '?'. The main catch is you need qualified imports rather than c style to make this work the best.
Writing in this style, I find that I don't need files. Everything I need for a module is already in a single file. I don't even need lsp since I personally don't ever use autocomplete in any context, programming or otherwise, and search is trivial when all defs are in a predictable order in a single file.
Now if I were writing a book, rather than a program, I would probably use a different format. Generally though, I think we should let programs be programs and books be books.
- packetlost a year ago
  I get that this is a matter of personal style, but when scanning a program that means you start from the most specific (ie. lowest in the call stack) layer of abstraction, which is IMO not the most intuitive way to reason about an application. You can always read in reverse order, but again, that's the opposite of what I would intuitively consider.
  norir a year ago
  It doesn't matter if it is intuitive or not if it is consistent. It will become intuitive once you have worked in this style for at most a few days.
  undefined a year ago
  [deleted]
- cxr a year ago
  > this means it is written in a top down style where every reference is defined somewhere above the usage. It then naturally falls out that programs can be easily read bottom to top
  So not top-down[1] then, but the opposite.
  1. <https://www.teamten.com/lawrence/programming/write-code-top-...>
  norir a year ago
  Regardless of semantics, I mean that a compiler will read the code from the top to the bottom and when it encounters a reference, it either must have already been defined or it doesn't exist and the compiler should error out. This is orthogonal to top down in the way that your link suggests.
  undefined a year ago
  [deleted]
liontwist a year ago
Knuth’s is the only literate programming system that emphasizes type setting. That’s because he designed Tex and wrote thousands of pages with it. It’s his local optimum.
All the others use markdown, html, eMacs org, etc.
- akkartik a year ago
  When I wrote OP I was thinking of html rendering as just another form of typesetting. Basically any method of reading something at a different place than your text editor. So markdown it itself is fine, but if you generate html from it and read that, that's "typesetting" that is taking you away from the tactile experience of working with code.
regnull a year ago
I'm a little confused about the point the author repeatedly makes in this article, complaining about the #includes on top of the file. Genuinely curious, where does he propose to put the includes?
- WillAdams a year ago
  With Literate Programming it would be possible to defer the mention of the includes and a discussion of why they are necessary/how they were selected to a point in the document where that makes sense and was interesting.
  The problem is, that sort of mechanistic thing is difficult to make interesting, and is easier to just do an exposition dump at the beginning of the document and be done with it.
  akkartik a year ago
  Part of my point was even if there was not an obvious place to put it, the mechanical place to put it is at the end! That's what I consistently did in the project I cited.
  So I wasn't arguing for people to agonize endlessly about where to put it.
  WillAdams a year ago
  Sounds like a plan --- I'd be curious to see an example LP which does this for which it works well.
kazinator a year ago
Knuth's literate programming is completely bonkers from the software engineering perspective. Knuth never worked as a rank-and-file programmer in a corporation.
However, Knuth is definitely not doing it wrong from the perspective of Knuth. His approach is workable by someone else who is another Knuth. Such another Knuth would never work in collaboration with Knuth; he or she would have an office at a different institution, working on different research with different programs. Knuth's approach does not have to scale to 2 or beyond, or to megaprojects.
Knuth's approach lets an author write a book or paper, and its accompanying code, as a single document. Knuth tested and refined his approach in the context of this use case, and of course it works.
Knuth's approach to literate programming chops up the program into arbitrary pieces that don't necessarily even follow functional boundaries. The program unit he's interested in is any group of lines that deserve commentary. His system is like a macro preprocessor that stitches the program together via text inclusion of arbitrary lines, which are given a name. For instance, a function body might be pieced together from three separately defined texts.
It's obvious that we can't simply cannot write the Linux kernel, MongoDB, Google Chat this way; basically anything that is not a solo effort. Even some solo efforts couldn't be done that way. The approach will not only scale to multiple developers, but also to certain project sizes and complexities. Solo projects can go into hundreds of thousands of lines; that's far beyond what we would want to develop using chopped up pieces of program text plugged into a book-like document, extracted to make a buildable program.
TeX is Knuth's most famous program that has been maintained for many decades and used by others, which is also developed with a literate programming system. However, those core programs like tex and metafont are only a small part of an entire TeX distribution. The entire TeX ecosystem contained in a Tex distro is not banged up in that literate programming system! Supposedly, TeX generates out to around 50KLOC of C, and Metafont to around 20KLOC. I would guess that would probably just about start to be straining at the limitations. Knuth's intuition for a documentation system to handle about the size of the program he set out to create may have been spot on.
Also, what fanf2 said: https://news.ycombinator.com/item?id=42683602
"The tooling that supports literate programming is mostly about overcoming these problems with Pascal."
Excerpt from a great comment!
ctrlp a year ago
Are we not at the point where an LLM could write the literate view of the codebase (updated on changes) putting the human interest stuff forward and providing that view of the code Knuth was always envisioning?
- int_19h a year ago
  They still hallucinate way too much, so you'd need to manually review it on every update.
nzach a year ago
I think the author missed the real point of literate programming - or I may be interpreting literate programming totally wrong.
In my opinion literate programming is really about optimizing for 'read/understand workflow' and not the 'execute workflow'. But I don't blame him, programming change quite a lot in the last 10 years.
Especially in the past a lot of people thought things like fast inverse square root[0] were the pinnacle of programming. Because they made something possible that was previously impossible. And culturally being 'clever' was generally used as a compliment. I think a good example would be ESR[1], a really smart individual who always enjoyed to push things to the limit and that would frequently involve quite a lot o hacking(in the original sense).
But as complexity grows we started to understand we should not rely too much in clever solutions because its hard and expensive to find someone that is smart. And even when you do, its nice to allow people to take some vacations every so often without creating disruptions in your company.
In this context I think literate programming is more about making software anyone can understand and change than it is about having a nicely typeset manual. When literate programming was proposed reading source-code created by someone else wasn't as easy as it is today. If you wanted to understand how something worked your best bet would be the manual provided by the author, today with modern LSPs you are generally one shortcut away from the actual source-code.
With that said, my personal interpretation is that literate programming is about writing code that doesn't need documentation (I think 'self-documenting code' has a bad reputation, but that is another problem). For a extremely contrived example you could look at the difference between 'x << 1', 'pow(x, 2)' and 'square(x)'. All three options should give you the same result. And even knowing that the first would probably always be faster I think 'square(x)' is generally the best option. Sure, I'm leaving performance on the table but if my instrumentation is reasonably good it shouldn't be too hard to find this line when we really do need this extra performance and then I change back and leave a comment explaining why we had to change to this less clear code.
In the end code should mostly be about intent. Just by reading the code I should be able to understand the business need that required this code to be created. DDD has this idea of unified language, that I think is pretty relevant here. In the DDD book there is a quite nice example about modelling a system that handles shipment contracts and needs to work with the concept of overbooking. The naive approach would be having something like "if (alreadyBookedCargo + cargoSize) > maxAllowedSize { return 'cannot book new cargo" } ' but if you change to something like "if !policyAllowed('overbooking', alreadyBookedCargo, cargoSize) { return 'cannot book new cargo' } " the code becomes easier to reason about.
Most people don't do this because this process is essentially 'just' naming things, and we all know how hard it is.
[0] - https://en.wikipedia.org/wiki/Fast_inverse_square_root [1] - https://en.wikipedia.org/wiki/Eric_S._Raymond
- akkartik a year ago
  As the author of OP, I just want to say thanks for the comment. I don't disagree with any of it. I was experimenting with incendiary headlines 10 years ago when I wrote it, but perhaps a less incendiary headline is that others misinterpret Knuth by emphasizing typesetting over sequencing and structuring an exposition.
lincpa a year ago
[dead]
MACHINEWAVE a year ago
[dead]