• jake_morrison 6 hours ago

    Embedded systems often have crappy compilers. And you sometimes have to pay crazy money to be abused, as well.

    Years ago, we were building an embedded vehicle tracker for commercial vehicles. The hardware used an ARM7 CPU, GPS, and GPRS modem, running uClinux.

    We ran into a tricky bug in the initial application startup process. The program that read from the GPS and sent location updates to the network was failing. When it did, the console stopped working, so we could not see what was happening. Writing to a log file gave the same results.

    For regular programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a typical Tuesday, and your only debugging option may be staring at the code and thinking hard.

    This board had no Ethernet and only two serial ports, one for the console and one hard-wired for the GPS. The ROM was almost full (it had a whopping 2 MB of flash, 1 MB for the Linux kernel, 750 KB for apps, and 250 KB for storage). The lack of MMU meant no shared libraries, so every binary was statically linked and huge. We couldn't install much else to help us.

    A colleague came up with the idea of running gdb (the text mode debugger) over the cellular network. It took multiple tries due to packet loss and high latency, but suddenly, we got a stack backtrace. It turned out `printf()` was failing when it tried to print the latitude and longitude from the GPS, a floating point number.

    A few hours of debugging and scouring five-year-old mailing list posts turned up a patch to GCC (never applied), which fixed a bug on the ARM7 that affected uclibc.

    This made me think of how the folks who make the space probes debug their problems. If you can't be an astronaut, at least you can be a programmer, right? :-)

    • toast0 3 hours ago

      At least the debugger worked. The processor I used in embedded systems in college, the 68HC11, would stop doing conditional branches when the supply voltage was too low.

      We had a battery powered board, with no brownout detection, and I was using rechargable NiMH batteries to save money/waste. When the students with alkaline batteries had low batteries, the motor load would bring vcc down far enough that the CPU would reset by itself. With NiMH, the batteries could still drive the motors and keep the CPU alive...

      You could single step in the debugger, and see the flag register was set as expected, but the branch didn't happen. Just ran straight through. I can't remember if unconditional jump or call worked. After about the third time this happened, I got good at figuring it out.

      • anitil 4 hours ago

        > For embedded developers, that's just a typical Tuesday

        I was trying to explain to my colleague the other day that I've spent an unhealthy amount of time rebooting devices while staring at an LED wondering why it won't turn on.

        • sitkack 4 hours ago

          It is nuts to have a dev board that is constrained as the final device. You should have had an additional serial port and 8x as much flash, it would have solved your problem immediately.

          It is even better to do the bulk of the dev inside of an emulator if you can swing it. The GPS and GPRS could be tethered into the emulator instead of trying to get a debug link into the system board.

          • motorest 3 hours ago

            > For regular programmers, if your machine won't boot up, you are having a bad day. For embedded developers, that's just a typical Tuesday, and your only debugging option may be staring at the code and thinking hard.

            It seems to me that if you can still update and reboot said machine, you can do a bisect on your commits to pinpoint the regression. Once you spot the regression commit you can split it to check what introduced the regression.

            • smcl 2 hours ago

              It took them multiple tries just to use gdb, I don’t think this is a scenario where you can easily reflash the image on the board

            • ShroudedNight 5 hours ago

              Were these commodity boards? Having to resort to using the cellular connection, instead of attaching a hardware debugging probe (J-Link?) seems like a recipe for a painful squandering of intellect.

              • exmadscientist 3 hours ago

                One of the lovely "features" of embedded work is that after a while of doing this sort of thing, sometimes you get good enough at the crazy hacks that it becomes faster and easier to do something like this than to track down who has the J-Link (okay, they've usually got more than one) and can they spare it/where did they put it/why does that person have a J-Link at all/is the J-Link still alive....

              • stuaxo an hour ago

                Did the GCC patch get applied after that?

              • procaryote 22 minutes ago

                Learning to code (C) I thought I found a compiler bug lots of times and was almost always wrong. It gave me the heuristic that if I thought I found a compiler bug, it was time to take a break, have a snack and go for a walk or something before looking again. It usually helped me find my mistake much faster.

                The thing I disliked most about later learning PHP or Javascript was that my previously usually wrong reaction of "the compiler is insane" suddenly turned out to be commonly true. Even when it wasn't an actual bug PHP and javascript were often so poorly designed that intended behaviour wasn't much better than one.

                • subharmonicon 7 hours ago

                  I’ve spent 30 years working on compilers.

                  They have bugs. Lots of them.

                  With that in mind, the article is correct that the vast majority of issues people think might be a compiler bug are in fact user errors and misunderstanding.

                  My experience actually working with users has been somewhat humorous in the past, including multiple instances of people completely freaking out when they report something that turns out to be a miscompile. I’ve seen people completely freaking out, to the point that they no longer felt that any code could be trusted since it could have been miscompiled in some way.

                  • jcranmer 7 hours ago

                    Compilers are multimillion line programs, and they have an error rate which is commensurate with multimillion line programs.

                    That said, I think like half the bugs I see get filed against the compiler aren't actually compiler bugs but errors in user code--and this is already using the filter of "took the trouble to file a compiler bug." So it's a pretty good rule of thumb that it's not a compiler bug, unless you understand the compiler rules well enough to articulate why it can't be user error.

                    • LiamPowell 6 hours ago

                      It's not quite half the bugs on GCCs bug tracker, but it's very high: https://gcc.gnu.org/bugzilla/report.cgi?x_axis_field=&y_axis...

                      It's around 10% invalid bugs and another 10% duplicates. A lot of them that I've seen, including one of mine, are a result of misinterpreting details of language standards.

                    • IgorPartola 4 hours ago

                      I am very curious, if these bugs are that common then why don’t we see more programs with weird bugs when they are running and especially having them be documented? Is it because when an unknown bug turns out to be a compiler bug and not a code error it gets fixed right away and with little fanfare? Or that there is some sort of resiliency built into the compiled code that can mask compiler bugs? Or is there some other factor?

                      Also how easy is it do discover a compiler bug and how easy is it to identify that a bug in your executable is due to a compiler bug?

                      • starspangled an hour ago

                        Compilers runs enormous regression suites, and CI/git/bisect/etc style of development has made bugs harder to check in and quicker to squash in a lot of cases I would say.

                        I have found a number of compiler bugs in GCC and LLVM (and GAS and LLMV AS). Almost without fail they have been in the use of new features (certain new instructions, new ABI / addresing model) or esoteric things (linker script trickery, unusual use of extended inline asm) etc where the compilers had probably no or very little "real" code to test against other than presumably some simple things and basic unit tests when they check in said features.

                        Unless you're doing _really_ unusual things, or exercising new paths that don't just get picked up when compiling existing code (e.g., like many/most optimizations would), it's just not that likely you'll write code that triggers some unique path / state that has a noticeable bug.

                        To identify the bug is a compiler bug that is silent bad code generation, you basically assume the compiler is correct until you start to narrow the problem down to a state which should be impossible. After you put in enough assertions and breakpoints and logging (some of which might make the problem mysteriously go away) and reach the point of banging your head on the table, you start side-eyeing the compiler. If you know assembly you might start looking at some assembly output. Or you would start trying to make an reduced reproducer case. E.g., take the suspect function out on its own and make some unit tests for it. A tool like C-reduce can sometimes help if it's not a relatively simple small function.

                        How quickly you reach that point where you can actually start to narrow down on a possible compiler bug entirely depends on the problem. If it's causing some memory ordering or race condition or silent memory corruption that is only detected later or can only be reproduced at a customer sporadically, then who knows? Could be months, if ever. Others could be an almost immediate assert or error log or obvious bad result that you could debug and file a bug report in a day.

                        • alexey-salmin 3 hours ago

                          > I am very curious, if these bugs are that common then why don’t we see more programs with weird bugs when they are running and especially having them be documented?

                          Any given program has N "native" bugs and M bugs introduced by the compiler. I think as long as N >> M you won't really notice. Even if you stumble across a compiler bug by chance, proving it is a nightmare: there's so much UB everywhere that any possible output is technically correct. Exceptions are compiler crashes but those are rare.

                          In my experience most of compiler bugs were found by well-tested and proven software during the update of the compiler version or switching compilers. That kind corresponds to the prerequisite of "N is small".

                        • AlotOfReading 7 hours ago

                          It's amazing how many compiler issues never translate into meaningful deviations at the level of application behaviour. Code tends to be highly resilient to small execution errors, seemingly by accident. I wonder what a language/runtime would look like if it were optimized to maximize that resilience, i.e. every line could miscompile in arbitrary ways. Is there a smarter solution than computational redundancy without an isolated verifier system?

                        • CrossVR 5 hours ago

                          Back when I worked on the MPC-HC project we found a bug in the Visual Studio MSVC compiler. When we upgraded from VS2010 to VS2012 subtitles would fail to render.

                          We eventually traced it down to a small for loop that added 0.5 to double members in an anonymous struct. For some reason these three factors: an anonymous struct, double datatypes and a for loop caused those member variables to become uninitialized.

                          We extracted this code into a small code sample to make it easily reproducible and reported it to Microsoft. Their compiler team called it one of the most helpful reports they'd gotten and confirmed it was a bug in their for-loop vectorization code. The compiler appeared to have messed up the SIMD instructions to write the results of the addition back to memory.

                          • LiamPowell 8 hours ago

                            There's 830 open and confirmed wrong-code bugs in GCC at the time of writing. Compiler bugs aren't as rare as people think: https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=NEW&bug_...

                            I think it's just common for people to assume they're wrong and change things blindly rather than carefully checking the standard for their language (assuming their language even has a standard to check). It doesn't help that before AddressSanitizer and co. existed compilers would just do all sorts of nonsense when they detected possibly undefined code in C and C++.

                            • vessenes 6 hours ago

                              Oh man. I uncovered a hash implementation bug in go, ca 2014 or so and I spent like two days prepping my bug report, tests, I was so certain it was me. The team of course was super nice and like ‘good catch’. Victory lap day for any nerd.

                              • KolmogorovComp 15 minutes ago

                                As the article shows it’s highly dependent on which compiler you’re relying on. Always good to keep this in mind when assessing the likelyhood of an error.

                                • nayuki an hour ago

                                  I crashed the Oracle HotSpot Java virtual machine back in 2017 with a totally innocuous program involving nested arrays. After reproducing and minimizing it, I filed a bug report. It got fixed quickly.

                                  I'm not sure why the page is no longer publicly available: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-818... (JDK-8181921)

                                  • dataflow 4 hours ago

                                    Note... it's not really that "it's never a compiler bug," but more like "it's never a backend/codegen bug."

                                    It's not particularly hard (for someone who knows the language rules, which are difficult for a language like C++) to make a widely-used compiler be erroneous in its acceptance or rejection of code.

                                    What's much more difficult ("never" happens) is to make the compiler accept valid code and then generate an incorrect executable. It's possible (and I run into this maybe once a year doing unusual things) but it's really rare. If you think that's what's going on, it's very unlikely to be the case.

                                    • UncleEntity 2 hours ago

                                      Round 'bout 10 years ago I was working on this Python C extension and, after a distro upgrade, it started segfaulting. Dropping down into gdb, python was fairly obviously calling the wrong C function. I didn't know if the linker, compiler or python was at fault and "it is never a compiler error" was at the forefront of my mind so I never even tried to report the incorrect behavior out of fear that maybe I was doing something stupid that caused gcc to compile an incorrect shared library without complaining.

                                      IIRC after the next fedora release everything started working again so maybe not me? Still don't know.

                                    • tomcam 8 hours ago

                                      My worst slowdown ever was when a compiler failed because a bit had flipped somehow. After a month and a half I finally reinstalled it and everything worked perfectly.

                                      • sitkack 8 hours ago

                                        That sucks, but feels good to have solved it.

                                        This is why it is important to have portable build environments. And ECC and checksumming file systems.

                                        • 0x1ceb00da 4 hours ago

                                          Do macbooks have that? What is the best checksumming file system I could use on mainstream linux?

                                      • o11c 4 hours ago

                                        In the early days of C++11, I used to get unique ICEs in both GCC and Clang weekly. One particular annoyance was when a stable release of Debian decided to ship a point release with a regression (not looking it up, but it was something like: 4.6.1 or 4.6.3 worked, but 4.6.2 had completely broken UDLs for constant expressions or something). I had just converted the whole codebase to use UDLs aggressively since they worked everywhere in my tests, not thinking I had to test every point release in between ...

                                        Thankfully I don't think I ever had any miscompilations - that would require the code actually compile across several compiler versions in the first place.

                                        • pjmlp 3 hours ago

                                          I still have a recognition letter from Borland regarding a bug I have found in Turbo Pascal 6.0.

                                              function BrokenResult: Integer;
                                              var
                                          
                                                BrokenResult: Integer; (* This should not happen *)
                                          
                                          
                                              begin
                                                BrokenResult := 42  (* Local variable will be assigned, function result is whatever the compiler comes up with*)
                                              end;
                                          • nsoonhui 4 hours ago

                                            >> It is not a compiler error. It is never a compiler error (2017)

                                            No, not always true. Even in modern compilers -- as matured and as modern as VS 2022-- you would still get bug.

                                            I found one[0]. In my case it's easy to tell it's a compiler bug because the program just can't compile properly. But it's also not easy to reproduce, which just proves how well tested compilers usually are.

                                            0: https://github.com/dotnet/roslyn/issues/74872

                                            • jcelerier 6 hours ago

                                              Idk I reports bugs on GCC / clang something like every few months. I used to do it for msvc too but there were honestly too many

                                              • ShroudedNight 6 hours ago

                                                This brings back memories of XL calculating an address wrong as a result of it lying on a boundary ≡ 0 (mod 2^32). Fortunately, the TOBEY (XL back-end) guys were in the same area in the building so restablishing our sanity was faster than it otherwise could have been...

                                                • carterschonwald 3 hours ago

                                                  I’ve hit so many fun compiler bugs. Usually easy to work around though (yay modern / fp flavored languages). It certainly helps when it also crashes the compiler ;).

                                                  Miscompilation bugs are definitely nasty though. Especially if it’s a self boot strapping compiler. Save your old build artifacts! :)

                                                  • hasley 3 hours ago

                                                    When I started learning Turbo Pascal I came across a problem where an if-statement was obviously decided wrong. I saw the values in the debugger.

                                                    My rescue was that I had a more experienced friend who knew that IIRC the compiler would choose the data type of the left operand of a comparison also for the right operand leading to potential sign switches.

                                                    • dang 3 hours ago

                                                      Discussed at the time:

                                                      “It is never a compiler error” - https://news.ycombinator.com/item?id=15699675 - Nov 2017 (272 comments)

                                                      • okaleniuk 3 hours ago

                                                        Our infrastructural team keeps about 2 MSLOC building on several compilers and running on several architectures. They report a new compiler bug every 2-3 years.

                                                        • tgma an hour ago

                                                          ...unless it is. Compiler crashes are easy to see, but it can actually be nontrivial to identify miscompilations as they can only trigger in certain code paths and with careful observation you can notice the second order effects...

                                                          If you specifically look for them you might find quite a bit: https://web.cs.ucdavis.edu/~su/publications/emi.pdf [disclosure: an author]

                                                          • timpark 7 hours ago

                                                            Back when I was using CodeWarrior to make a game for PlayStation 2, I found a compiler bug, but fortunately, it was one where it gave an error on valid code, rather than generating bad output. I can't remember the details, but I had some sort of equation that my co-workers agreed should have compiled with no problems. I was able to rewrite it a little to get the result I wanted without triggering any compiler errors.

                                                            • hasley 2 hours ago

                                                              Woa, CodeWarrior was one of the worst compilers (and IDEs) I had to use so far.

                                                            • dehrmann 6 hours ago

                                                              At my first job, it actually was a compiler error, and I'm not sure if my manager ever believed me. We were using an internal gcc fork and cross-compiling, so who knows where the bug was, but the compiler team got back to me. Jump tables were sometimes broken, and we had to add a switch to disable them.

                                                              Not the right lesson to learn for a first job.

                                                              • amluto 6 hours ago

                                                                For anyone who worked in embedded programming in the bad old days of proprietary compilers, it sometimes felt like the compiler working correctly was the common case. One of my first jobs involved programming a smallish, embeddedish, ruggedized computer in C. IIRC I wasted several hours on a bug once before realizing that it was a compiler issue and I needed to try arbitrarily rearranging the buggy function until it generated code that at least appeared to work.

                                                              • est31 7 hours ago

                                                                It depends really which compiler you are testing and whether the version you are testing has just been released or has been around for some time. If the compiler is for a niche language, then it's possible to find bugs. If the compiler has been released, it's even possible to be the first person to note the bug. But the bigger the language, the more has passed, the less likely this is.

                                                                • tbrownaw 7 hours ago

                                                                  Just last week I tripped over a couple compilation bugs in (an old version of) bpftrace.

                                                                  One was caught by internal checks somewhere, something about struct member offsets that I think was an alignment / padding issue and didn't seem to actually break anything. The other made it segfault during compilation, and I had to just tweak my code blindly until it decided to go away.

                                                                  • don_neufeld 4 hours ago

                                                                    I found multiple compiler bugs at my first real programming job in 1997.

                                                                    MSVC did not do a good job of maintaining the FPU stack in those days…

                                                                    • colonial 6 hours ago

                                                                      I wonder what % of compiler bugs go unidentified due to the user code-massaging them away in some fashion.

                                                                      • pfdietz 8 hours ago

                                                                        On the other hand, if you really focus in testing a compiler, particularly an immature one, it's remarkable how many bugs you can find.

                                                                        • jfim 6 hours ago

                                                                          Or if one is using newly introduced language features or accelerated instruction sets.

                                                                        • bmenrigh 5 hours ago

                                                                          I’ve thought I’d found a compiler but maybe 5 times in my life and it has never actually been a compiler bug.

                                                                          When I reflect on the ~25 years I’ve been programming C, all of the times I thought I’d found a compiler bug were in the first ~8 years. Dunning-Kruger hard at work :-/

                                                                          • cwalv 3 hours ago

                                                                            I've encountered 1 in ~20 years. I don't even remember what it was, but I remember being shocked when I tracked it down and it actually was a compiler bug

                                                                          • kevingadd 8 hours ago

                                                                            Similarly, I once ran into a broken implementation of a Dictionary type (in Mono, I think.) It was only comparing the keys' hash codes, not the keys themselves. In most scenarios this turned out to be more than good enough - for int32 keys obviously it will work, and for most strings it works too if the hash function is good - but I had a great many keys without an amazing hash function for them.

                                                                            It's funny how sometimes a really glaring bug can hide in a stdlib for months or years just because by luck the stars never align to trigger it where somebody can notice it. In my case, the dictionary bug was causing recoverable errors, and I only noticed because I dug in instead of going "Mono's just broken".

                                                                            • AnimalMuppet 7 hours ago

                                                                              In my case, it wasn't a compiler bug - it was a bug in the STL, before the STL was part of the compiler. It was a separate thing you downloaded. I found a bug, and emailed Stepanov (or Lee - I forget). Me, just some random nobody on the internet. I got a fix, and then an improved fix, and then a final fix, all within two hours. I was floored.

                                                                              • adzm 7 hours ago

                                                                                Thankfully though we can still look at the STL source easily and presumably be able to determine the source of the bug or trace behavior or design test cases easier etc.

                                                                              • bitwize 7 hours ago

                                                                                I was playing with Java 1.0.1 trying to make an app screen with a GridBagLayout. It made utter hash of my layout, drawing things on top of each other, etc. Applying the First Rule of Compiler/Runtime Bugs I double-checked and triple-checked and quadruple-checked my work, making sure I used the GridBagLayout API exactly according to spec. Eventually I posted to USENET comp.lang.java asking, "Is there a bug in GridBagLayout?"

                                                                                The problem disappeared in Java 1.0.3.