It would be interesting to see a comparison with vhash[0] which uses automatically sized hashtables under the hood, but exposing an alist-like interface.
[0] https://www.gnu.org/software/guile/manual/html_node/VHashes....
I miss when posts like this mattered.
That's not to say performance doesn't matter anymore or that blog posts on niche topics don't matter anymore.
It's more that there are 30 opponents on all sides fighting to minimize the impact of this kind of post. CPUs are still getting faster even now despite Moore's law being dead. The business or career impact of choosing between an associative list vs hashmap in a garbage-collected language like Guile Scheme is so minimal that it's hard to quantify.
If it's in a hot enough path that it matters, it's likely that there are at least 3 things you can do within 20 minutes of work (or 5 minutes of GPU time) that will solve the problem as effectively or better.
I remember the very specific period of time when blog posts talking about functional programming for React developers were en vogue. You can speed up you Scheme app by 15%, or you can build and deploy a completely new service from scratch in Node.JS in the same amount of time.
It used to feel like code had some kind of meaning or value. Now, it's just an artifact produced as a side effect of work. But that's been a trend for a decade or so now, AI is just the latest (and most significant) iteration of it.
Sounds like the difference between code as a craft versus artifact and product. Actually not even product, it's the inner guts of a product that most people don't need to care about, and increasingly just machine generated so it's not even meant to be read by humans. Write-only code of the post-programming era.
Professional software has always aspired to be an industrial process, like OOP and Agile, as a collective endeavor to produce code of decent quality that works reliably, to achieve business goals. Any aesthetic satisfaction or philosophical insights are a byproduct, nice to have, but not the main point.
Code as a craft is a niche for experts and researchers, for hobbyists and amateurs. The miniscule performance improvement gained from choosing an array or hashmap is insignificant in most situations, other than maybe resource-constrained contexts like embedded programming, retro computers, games, competitions.
But, thinking over it, code as a craft still has a healthy subculture of people across older and younger generations. Perhaps it's up to the older ones who remember the good ol' days of respectable craftsmanship ("craftspersonship") to promote and encourage others to carry on the tradition.
Or is that not even worth doing anymore with language models pumping out vibe-coded slop? Will programmers be relegated to reviewing and fixing such mass-produced code? Yes, probably. But no, of course it's worth preserving and evolving the culture of computer science and software development, maybe it's more important than ever to keep the flame of human spirit and imagination alive, supported by machine intelligence rather than replaced.
Oh my. The most important difference is that a-lists are sorted by order of insertion, hashtables not. That's why a-lists are used for symtabs, allowing shadowing of newer temporary dynamic symbols.
Insertion-ordered hash tables are a thing, though they usually don't allow duplicates any more than other hash table designs.
Ocaml hash tables have the interesting property where each entry is treated like a stack; inserting an existing key pushes the new value and deleting pops. I've always assumed this is for use with lexically scoped symbol tables in the compiler.
The diagrams could use a bit bigger font. It is slightly hard to read. Or make the pictures bigger on the website, but they are already big.
As far as I can see, there is no mention of how many elements are inserted in each run? The x/horizontal-axis is not number of items, as one would expect, but is "run #" which I interpret to be "number of the run"/"run number", which says nothing about the number of elements added to the data structures. So from what is visible in the writing of post, one couldn't actually conclude the conclusion. I guess the idea is, that you must go and look at the code itself. This post could do a better job at explaining what is measured, especially since it is about a benchmark.
But aside from that: I think the answer here is not merely from a performance side of things. If there is a chance, that your program might grow to need to put more stuff in that a-list, I think you can just go for a hashtable right away and save yourself the effort to later change the code. Unless you are hyper-optimizing stuff, which most of the time we are not. I also want to note, that there is SRFI-69 [1], which has `a-list->hash-table`, in case you want to opt for a-list because of how easy they are to write in the code.
Choosing an a-list in a scenario where it might still be faster is not only a trade-off in performance, but also in maintainability. Are you ready to build an abstraction layer around a-list, so that you can later switch it out and put a hash-table in easily? If not, then the little performance cost for using a hash-table too early might be the better trade-off, compared to having to replace usages of a-list in the code later.
I think a-list is fine for things where you:
(1) convert to hash-table anyway, with `a-list->hash-table`
(2) construct a struct/record from it, because those usually don't have that many fields
(3) never add to the a-list, and only have it contain stuff you manually type in the code, because those usually aren't that many either
(4) know that the a-list will not grow much or is even static in size, and that size is small. One such example are often function (keyword) arguments.
Otherwise just use a hash-table and maybe, maaaaybe check again, if there is really any performance bottleneck that can be traced back to using a hash-table. Don't obsess over it.> As far as I can see, there is no mention of how many elements are inserted in each run?
FYI it's in the title of the plots.
Ah, I did not see that. Yeah that definitely needs bigger fonts. Or better: Average the runs and perform them with different numbers of items and put those as one axis, instead of wasting an axis for the number of a run. Then we would get a much better idea about how the performance changes with respect to the number of items.
Would the Guile maintainers be open to putting that final summation into "6.6.22 Hash Tables", in the Guile reference manual?