• xvilka 6 hours ago

    We use the tree-sitter[1] for parsing C declarations in Rizin[2] (see the "td" command, for example). See our custom grammar[3] (modified mainstream tree-sitter-c). The custom grammar was sadly necessary, due to the inability of Tree-Sitter to have the alternate roots[4].

    P.S. Please add a license for your code.

    [1] https://tree-sitter.github.io/

    [2] https://github.com/rizinorg/rizin/tree/dev/librz/type/parser

    [3] https://github.com/rizinorg/rizin-grammar-c/

    [4] https://github.com/tree-sitter/tree-sitter/issues/711

    • coherentpony 18 hours ago

      I don’t understand what the visualisation screenshot in the README is trying to communicate to me.

      • bluetomcat 18 hours ago

        It starts from the identifier. At every stage, it outputs a sub-expression which is the “mirrored use” and corresponds to the boxed representation below it. When it reaches the top of the expression, it prints the final type of the expression which is the lone specifier-qualifier list.

        As per the screenshot, “arr” is an array of 4 elements. Consequently, “arr[0]” is an array of 8 elements. Then, “arr[0][0]” is a pointer. And so on, until we arrive at the specifier-qualifier list.

      • pcfwik 18 hours ago

        Since this is about C declarations: for anyone who (like me) had the misfortune of learning the so-called "spiral rule" in college rather than being taught how declarations in C work, below are some links that explain the "declaration follows use" idea that (AFAIK) is the true philosophy behind C declaration syntax (and significantly easier to remember/read/write).

        TL;DR: you declare a variable in C _in exactly the same way you would use it:_ if you know how to use a variable, then you know how to read and write a declaration for it.

        https://eigenstate.org/notes/c-decl https://news.ycombinator.com/item?id=12775966

        • userbinator 13 hours ago

          if you know how to use a variable, then you know how to read and write a declaration for it.

          In other words, the precedence of operators in a declaration have exactly the same precedence as in its use.

          • nitrix 16 hours ago

            That is correct.

              int x, *p, arr[5], fn(), (*pfn)();
            
            Using x, or dereferencing p, or subscripting the array arr, or declaring a function that can be called with fn, or dereferencing the function pointer pfn then calling it, all these things would produce an int.

            It's the intended way to read/write declarations/expressions. As a consequence, asterisks ends up placed near the identifiers. The confused ones will think it's a stylistic choice and won't understand any of this.

            • pwdisswordfishy 13 hours ago
              • saagarjha 16 hours ago

                Of course, the correct way to use a function pointer is to call it.

                • nitrix 16 hours ago

                  Yes, the () operator dereference function pointers automatically for you for convenience. There's also the surprise that you can infinitely dereference function pointers as they just yield you more function pointers.

                  • korianders 10 hours ago

                    One baffling thing I see people do with typedefing function pointers is insisting on adding in the pointer part in the typedef which just complicates and hides things.

                    If you want to typedef a function pointer, make a completely ordinary function declaration, then slap 'typedef' at the beginning, done. This does require you to do "foo_func *f" instead of "foo_func f" when declaring variables, but that is just clearer imo.

                        typedef int foo_func(int); // nice
                    
                        typedef int (*foo_func)(int); // why?
                    • cryptonector 2 hours ago

                      Why do you need the `*` to be part of every variable/member declaration?

                • any1 14 hours ago

                  > It's the intended way to read/write declarations/expressions. As a consequence, asterisks ends up placed near the identifiers.

                  You know you don't always have to use things as they were intended?

                  > The confused ones will think it's a stylistic choice and won't understand any of this.

                  Well, I've written it both ways, and the compiler never seems to mind. :)

                  Maybe I should start putting space on both sides of the asterisk; seems like it would be a good way to annoy even more people.