• dmagyari 12 hours ago

    "Instead of allocating Expr objects willy-nilly on the heap, we’ll pack them into a single, contiguous array." Zig compiler pipeline (AST, Zir, Air, Sema) does exactly this on all layers. Not only contiguous, but instead of array-of-structs it is struct-of-arrays, so walking the tree is even more cache friendly. For AST see: https://github.com/ziglang/zig/blob/master/lib/std/zig/Ast.z...

    • gritzko 2 hours ago

      I work on a C dialect where everything is flattened. JSON and other trees in particular. Binary heaps are flat, merge sort and iterator heaps are absolutely great, can build LSM databases with that. Stacks, circular buffers, hash maps, etc, all flat. Templated output (PHP like) is done by a flat data structure.

      https://github.com/gritzko/librdx/blob/master/abc/B.md

      Apart from locality and lifetimes, these flat data structures improve composability. When every data structure is a flat buffer, you can mmap them or zip them or send them by the network, all by the same routine. They are uniform like bricks, in a sense.

      • agumonkey 9 hours ago

        Makes me wonder if people in APL/J/K community have not been influenced or influencing this kind of technique. IIRC Aaron Hsu does tree processing through arrays (but i'm not skilled enough to analyze his code)

        • gsf_emergency 3 hours ago

          Do you have a link to such an example of Aaron's code? Thank you in advance!

      • jgrowl 4 hours ago

        I thought a reddit comment on this article had an interesting point:

        https://www.reddit.com/r/rust/comments/1d3b356/my_new_favori...

        [–]Timzhy0 3 points 7 months ago

        Btw I think one can go a step further than the author, there is no need to keep two explicit ExprRef baked in a binary node (lhs, rhs). You can exploit locality, basically the AST, seen it the LISP way, is just an arbitrarily nestable list, where elements are atoms or other lists. Hence all you need to know is where each list ends (and if it's an atom you can assume it spans one node) and actually one bit to know if it is the last entry in the list is quite ergonomic as well (because then you can distinguish whether moving next slot in the AST means there is a sibling). Basically it's easier to keep it sync while constructing and takes up less memory per node. I pay 40 bits per node, stored interleaved for best cache locality (some unaligned accesses but I think it's still worthwhile), 8 bits for the tag, 32 for the data, if data is bigger, 32 is an index into some auxiliary segment (basically a ptr).

        • kazinator 11 hours ago

          > Instead of allocating Expr objects willy-nilly on the heap, we’ll pack them into a single, contiguous array.

          This happens naturally if you bump-allocate them in a garbage-collected run-time, particularly under a copying collector. Free lists also tend to co-locate because they are produced during sweep phases of GC which run through heaps in order of address.

          Don't make me bring out the L word for the billionth time.

          > A flat array of Exprs can make it fun and easy to implement hash consing

          OK, it's not a case of L-ignorance, just willful neglect.

          • samps 11 hours ago

            FWIW I did acknowledge this in the article:

            > A sufficiently smart memory allocator might achieve the same thing, especially if you allocate the whole AST up front and never add to it

            > Again, a really fast malloc might be hard to compete with—but you basically can’t beat bump allocation on sheer simplicity.

          • userbinator an hour ago

            Rediscovering techniques that were somewhat well-known in the 70s and 80s.

            See also: https://en.wikipedia.org/wiki/Binary_heap

            • Taniwha an hour ago

              heh - I built compilers this back in the 70s because the machine I was working on didn't really do pointers as a 1st class data structure (B6700 algol), it's not really surprising finding someone doing something similar in another language that makes pointers difficult to deal with

            • emptysea 12 hours ago

              Rust-analyzer uses a similar technique for parsing https://github.com/rust-lang/rust-analyzer/blob/master/crate... which then gets fed into https://github.com/rust-analyzer/rowan (lossless syntax tree)

            • cardanome 12 hours ago

              Amazing article, very good advice to keep your data structures flat.

              Adding to that, it also makes editing the AST vastly more efficient.

              I have discovered that principle on my own when I worked on an editor that directly operated on the AST instead of text. I found manipulating the tree-style AST so painful, constantly traversing the tree and all. Once I made it flat, my life was a hell lot easier. You can just directly index any part of AST in linear time.

              • dang 4 hours ago

                Related:

                Flattening ASTs and other compiler data structures (2023) - https://news.ycombinator.com/item?id=42181603 - Nov 2024 (2 comments)

                Flattening ASTs and other compiler data structures - https://news.ycombinator.com/item?id=36559346 - July 2023 (81 comments)

                • ww520 12 hours ago

                  This is a fantastic idea. AST works well in an array based allocation block since it has no need for freeing individual nodes. It’s an add-only allocation.

                  • JonChesterfield 31 minutes ago

                    What about transforming the AST after it is built, or deriving a new tree which partly aliases the original in persistent structure fashion?

                  • hencq 10 hours ago

                    As the article mentions, this makes it quite similar to a bytecode vm. I think the traditional wisdom is that an AST walker is easy to write, but for speed you'd want a bytecode interpreter. It'd be interesting to see how close the performance gets with this flattened AST.

                    In practice I think there are more differences. E.g. AST interpreters tend to pass environments around while bytecode interpreters often store these on a stack (though I guess there's nothing stopping you from doing this with an AST either). I wonder if there's some goldilocks zone for ease of implementation with decent performance.

                    • kazinator 9 hours ago

                      If you instead flatten the expression tree into RPN, then you can execute it like that, with a stack machine.

                      I seem to recall that the Red Dragon Book (Compilers: Principles, Techniques and Tools, Aho, Sethi, Ullman [1988]) describes a technique whereby intermediate code is represented in RPN, and transformations are performed by pattern matches on it.

                      • finnh 8 hours ago

                        The sample flat program in the post is exactly RPN, no?

                        • samps 7 hours ago

                          I think it would be more like RPN if it used a stack, and operands were specified as relative offsets (i.e., stack offsets). In the version I wrote, operands are still represented as absolute offsets in the expression table.

                    • ndesaulniers 12 hours ago

                      Cool! Carbon is doing exactly this. I had asked leads if there was a paper on this approach, but they didn't have anything for me. I'll send them this post!