• fuhsnn 2 hours ago

    My recent favorite is glibc's hack to implement _Static_assert under C99: https://codebrowser.dev/glibc/glibc/misc/sys/cdefs.h.html#56...

    It uses the constant expression to create a bitfield of size -1 when failed, and leaves the compiler to error on that as the intended assertion. The actual statement is an extern pointer to a function returning a pointer to an array which has sizeof the aforementioned bitfield struct as its size.

    Another one encountered in Toybox is (0 || "foo") being a const expression that evaluates to 1. Apparently the string literal must have been soundly created in data section, so its pointer address is safely assumed to be non-zero.

    • wolfspaw 2 hours ago

      Really liked the trick of defining the struct in the return part of the function.

      Array pointers: Array to pointer decay is extremely annoying, if it was implemented as Array to "slice" decay it would be great.

      Static array indices in function parameter declarations: awesome, a shame that C++ (and Tiny C) do not support it >/

      flexible array member: extremely useful, and now there are good compiler flags for ensuring correct flexible array member usage

      X-Macro: nice, no-overhead enum to string name. Didn't know the trick

      Combining default, named and positional arguments: Named-arguments/default-arg, C version xD. It would be cool if it was added to C language as a native feature, instead of having to do the struct hiding macro.

      Comma operator: really useful, specially in macros

      Digraphs, trigraphs and alternative tokens: di/tri/graphs rarely useful, alternatives synonims of iso646.h are awesome, love using and/or instead of &&/||

      Designated initializer: super awesome, could not use if you wanted C++ portability. Now C++ supports some part of it.

      Compound literals: fantastic, but in C++ it will explode due to stack deallocation in the same line. C++ should fix this and allow the C idiom >/

      Bit fields: nice for more control of structs layout

      constant string concat: "MultiLine" String, C version xD

      Ad hoc struct declaration in the return type of a function: didn't know this trick, "multi value" return, C version xD

      Cosmopolitan-libc: incredible project. Already knew of it, its awesome to offer a binary that runs in all S.Os at the same time.

      Evaluate sizeof at compile time by causing duplicate case error: ha, nice trick for debugging the size of anything.

      • fuhsnn 2 hours ago

        >Static array indices in function parameter declarations: awesome, a shame that C++ (and Tiny C) do not support it >/

        The first array size is actually always decayed to a pointer, supporting it in a compiler without analysis passes like TCC is just a matter of skipping the "static" token and the size.

      • saagarjha 4 hours ago

        Mentioning %n without explaining that it is overwhelmingly used for exploits is a little reckless IMO.

        • _kst_ 2 hours ago

          Background: A %n format specifier in a printf call stores the number of characters written so far into a specified variable. For example:

              #include <stdio.h>
              int main(void) {
                  int count;
                  printf("%s%n\n", "hello, world", &count);
                  printf("count = %d\n", count);
              }
          
          The output is:

              hello, world
              count = 12
          
          %n can be exploited to write data to an arbitrary memory location, but only if the format string is something other than a string literal.

          %n can be exploited, but it's entirely possible to use it safely.

          • greiskul 3 hours ago

            I'm curious about this, didn't know about %n before. What are the common pitfalls and exploits using this enables?

            • mananaysiempre 3 hours ago

              You would expect a printf call with a user-controlled format string to be, at worst, an arbitrary read. Thanks to %n, it can be a write as well.

              • lights0123 3 hours ago

                If the user can control the formatting string, they can write to pointers stored on the stack. It's important to use printf("%s", str) instead of printf(str).

                • rep_lodsb 2 hours ago

                  Useless use of printf; what's wrong with "puts(str)"?

                  • shawn_w 2 hours ago

                    puts() adds a newline at the end. gcc will happily turn printf("%s\n", str) into puts(str), though.

                    I've never tested to see if printf("%s", str) becomes the equivalent fputs(str, stdout)

            • coreyp_1 7 hours ago

              That's a nice list!

              I've been digging into cross-platform (Windows and Linux) C for a while, and it has been fascinating. On top of that, I've been writing a JIT-ted scripting (templating) language, and the ABI differences (not just fastcall vs stdcall vs cdecl) are often not easy to find documentation about.

              I've decided that if I ever get to teach a University class on C again, I wanted to cover some of these things that I feel are often left out, and this list is a helpful reference! Thanks!

              • jonathrg 3 hours ago

                Multi character constants is one of the many things in C that would be nice to use if the language would just choose some well-defined behaviour for it. It doesn't really matter which.

                • mananaysiempre 3 hours ago

                  Mainstream compilers agree on multicharacter literals being big endian; that is, 'AB' is usually 'A' << CHAR_BIT | 'B'. The exception is MSVC, which also works like that as long as you don't use character escapes, but if you do it emits some sort of illogical, undocumented mess that looks like an ancient implementation bug fossilized into a compatibility constraint.

                • o11c 2 hours ago

                  Bah, those are all well-known.

                  What value does the following program return?

                      int main()
                      {
                          int *p = 0;
                  
                      loop:
                          if (p)
                              return *p;
                  
                          int v = 1;
                          p = &v;
                          v = 2;
                          goto loop;
                          return 3;
                      }
                  
                  Also, rather than doing `sizeof` via one error at a time, it's better to just emit them to a char array {'0' + sz/10, '0' + sz%10, '\0'}. Generalizing this to signed numbers of arbitrary size is left as an exercise for the reader.
                  • _kst_ an hour ago

                    It returns 2.

                    The only reason that might be surprising is that the "return *p;" statement refers to the value of an object at a point (textually) before its definition. But the lifetime of the object named "v" begins on entry to the innermost compound statement enclosing its definition -- in this case the body of "main".

                    Space for "v" is allocated on entry to "main". It's initialized to 1 when its definition is reached. The "return *p;" statement appears before the definition of "v" in the program source, but is executed after its definition was reached at run time.

                    It's important to remember that scope and lifetime are two different things. The scope of an identifier is the region of program text in which the identifier is visible; for "v" it extends from the definition to the closing "}". The lifetime of an object is the time span during execution in which it exists; for "v" it extends from the time when execution reaches the opening "{" to the time when execution reaches the closing "}". Formally, storage for "n" is allocated at the beginning of its lifetime and deallocated at the end of its lifetime. Compilers can and do optimize allocation and deallocation, as long as the visible behavior is consistent.

                    Aside: If "v" were a VLA (variable length array, introduced in C99, made optional in C11) its lifetime would begin when execution reaches its definition.

                    • sweeter 2 hours ago

                      Is it 2? I'm not exactly sure though. I'm interested in hearing the logic

                  • golergka 3 hours ago

                        switch (n % 2) {
                            case 0:
                                do {
                                    ++i;
                            case 1:
                                    ++i;
                                } while (--n > 0);
                    
                        }
                    
                    Someone is really ought to record a "WAT" video about C.
                    • mananaysiempre 3 hours ago

                      The switch statement in C is not a very limited pattern match. The switch statement in C is a very ergonomic jump table. Do not think ML’s case-of with only integer literals for patterns; think FORTRAN’s computed GO TO with better syntax. And it will cease to be a WAT. (For a glimpse of the culture before pattern matching was in programmers’ collective consciousness, try the series on designing a CASE statement for Forth that ran for several issues of Forth Dimensions.)

                      • russellbeattie 3 hours ago

                        I don't think there's any confusion of how it works, it's the deep horror in discovering that it's possible in the first place, and a morbid curiosity of the chaos it could cause if abused.

                        • mananaysiempre 2 hours ago

                          At least for me, the feelings you describe are characteristic of a footgun, not a WAT. A WAT is rather a desperate bewilderment as to who could ever design the thing that way and why, and for switch statements computed gotos are the answer to that question.

                          As for the footgun issue, I mean, it could be one in theory, sure. But I don’t think I’ve ever seen it actually fired. And I can’t really appreciate the Javaesque “abuse” thinking—it is to some extent the job of the language designer to prevent the programmer from accidentally doing something bad, but I don’t see how it is their job to prevent a programmer from deliberately doing strange things, as long as the result looks appropriately strange as well.

                          (There are reasons to dislike C’s switch statement, I just don’t think the potential for “abuse” is one.)

                      • tom_ 2 hours ago

                        This sort of thing is pretty handy sometimes. Don't forget you can have code (e.g., start of the loop) before any of the cases too!

                        • PhilipRoman 3 hours ago

                          Just think of the "case" statements like any other label, despite the misleading indentation. Then it becomes perfectly natural to jump in the middle of a loop.

                          • agumonkey 2 hours ago

                            I wonder if there's any other instance (in programming or else) of intersecting grammar constructs being accepted.

                          • ranger_danger 4 hours ago

                            > quirks and features

                            Someone is a fan of Doug DeMuro.

                            • randomdata 4 hours ago

                              This... is the 1972 Riche C