Comments Page - Logging all C++ destructors, poor mans run-time tracing

« Back Logging all C++ destructors, poor mans run-time tracingraymii.orgSubmitted by jandeboevrie a year ago

loeg a year ago
But what was the shutdown bug you were trying to identify? Was this destructor logging actually useful? The article teases the problem and provides detailed instructions for reproducing the logging, but doesn't actually describe solving the problem.
jprete a year ago
Address/MemorySanitizer are also meant for this kind of problem. https://github.com/google/sanitizers/wiki/AddressSanitizer https://github.com/google/sanitizers/wiki/MemorySanitizer
Also valgrind, but I'm more familiar with the first two.
- richardwhiuk a year ago
  The author explicitly name checks valgrind.
  I think plain gdb would have been sufficient if it's exiting with a segfault or terminating...
kccqzy a year ago
One of my favorite hacky workarounds is to simply call _exit(0) for an immediate exit without running destructors. Most of the time, the destructors are just freeing memory that will be reclaimed by the OS anyways so they are not worth running if you know the program is exiting. And even if the destructor does more than just freeing memory, maybe the work it's doing isn't needed if you know the process is ending soon: maybe it's joining threads or releasing mutexes or deleting timers.
You will find that in a typical C++ codebase, the destructors that do useful things (say, flushing useful buffers and closing files etc) are much fewer.
- ignoramous a year ago
  Sub-optimal if the exiting process must relinquish resources external to the current process or the host.
  gpderetta a year ago
  Optimal of you are writing crash-only, i.e. reliable, software.
  TeMPOraL a year ago
  Suboptimal if you're relying on RAII not for "regular" resources, but more abstract ones like "correctly bookended piece of persisted data".
  Then again, I guess it's optimal if you're writing fault-resistant software.
  akira2501 a year ago
  And you guarantee you can release those resources with out error or exception all the time? If this is not the case then a destructor is the wrong place to be doing this work anyways. If an external service requires a client experiencing an error to manually release resources than that external service has significant design or protocol issues.
  ignoramous a year ago
  > guarantee you can release those resources with out error or exception all the time
  releasing resources (optimally) and resource exhaustion / ageing (crash recovery) are orthogonal.
  > a destructor is the wrong place
  I commented on GP's use of exit() not on finalizers / destructors.
  > If an external service requires a client experiencing an error to manually release resources...
  You perhaps missed what I wrote: It isn't optimal, if a failing client never signals release (think: distributed lock).
  akira2501 a year ago
  > releasing resources (optimally) and resource exhaustion / ageing (crash recovery) are orthogonal.
  In concept. In the reality of implementation they are not.
  > I commented on GP's use of exit() not on finalizers / destructors.
  You said they were "suboptimal." Implying there is an available optimal solution. I'm challenging that exact notion.
  > It isn't optimal,
  You perhaps missed the point. It _can't_ possibly be optimal given the nature of the problem itself. These are the wrong terms to understand the problem in.
  ignoramous a year ago
  > In concept. In the reality of implementation they are not.
  Recovery flows are very different in both concept and implementation.
  > It can't possibly be optimal given the nature of the problem itself.
  What's in the "nature" of this problem that one should never bother releasing resources once done? I mean, if any program creates an external resource (say, unix domain socket or shared mem), it isn't okay if it never releases it.
rqtwteye a year ago
I did this a long time ago with macros. It helped me to find a ton of leaks in a huge video codec codebase.
I still don't understand the hate for the C preprocessor. It enables doing this like this without any overhead. Set a flag and you get constructor/destructor logging and whatever else you want. Don't set it and you get the regular behavior. Zero overhead.
- jonathrg a year ago
  The hate might have to do with it being such a primitive and blunt tool; doing anything moderately complex becomes extremely complicated and fragile.
  tialaramex a year ago
  Yeah, this very primitive tool easily creates the programming equivalent of the "iwizard problem".
  [You replace straight forwardly "mage" with "wizard" and oops, now your images are "iwizards" and your "magenta" is "wizardnta"]
  immibis a year ago
  By contrast something like AspectJ for C++ (if such a thing existed) could express this requirement cleanly.
  akira2501 a year ago
  Complex arrangements _are_ fragile.
  There is no magic environment that can fix this for you. If you feel you've seen one, then you've focused on the parts of it that were important to you, while ignoring the parts of it everyone else actually needs.
  undefined a year ago
  [deleted]
- synergy20 a year ago
  do you have a write-up how you did it? I'm interested, thanks.
  naruhodo a year ago
  I did a similar thing, in C++, 3 decades ago. I used a macro, FUNC(), that I would put at the start of functions. It took no arguments and declared a local instance, using the __FUNC__ preprocessor builtin to pass the function name to the Trace constructor:
  Trace trace##__LINE__(__FUNC__);
  The Trace instance would generate one log on construction and another on destruction. It also kept track of function call nesting (a counter) in a static member that would increment in the constructor and decrement in the destructor. It was inherently single-threaded, because I used a static member, but it could be adapted to multiple threads using thread local storage. I paired it with a LINE("Var x is " << x); macro for arbitrary ostreams-style logging. And building on that, EXPR(x) would do LINE(#x " = " << (x)). The output was along the lines of:
  ,- A::f() | ,- A::g() | | ,- B::B() | | `- B::~B() | | x = 12 | | About to do a thing... | | ,- A::doAThing() | | `- A::doAThing() | `- A::g() `- A::f()
  The macros could be disabled (defined to do nothing) by a preprocessor symbol.
  synergy20 a year ago
  gcc's -finstrument-functions can also add the function call traces without changing the code I think
MontagFTB a year ago
I consider Tracy the state of the art for profiling C++ applications. It’s straightforward to integrate, toggle, gather data, analyze, and respond. It’s also open source, but rivals any product you’d have to pay for:
https://github.com/wolfpld/tracy
- Veserv a year ago
  Looks fine, but it does not look like there is a automatic full function entry/exit trace, just sampling. The real benefit is when you do not even need to insert manual instrumentation points, you just hit run and you get a full system trace.
  How well does the visualizer handle multi-TB traces? Usually pretty uncommon, but a 10-100 GB is not that hard to produce when doing full tracing.
  jms55 a year ago
  Of note is that tracy is aimed at games, where sampling is often too expensive and not fine-grained enough. Hence the manual instrumenting.
  For the Bevy game engine, we automatically insert tracy spans for each ECS system. In practice, users can just compile with the tracy feature enabled, and get a rough but very usable overview of which part of their game is taking a long time on the CPU.
  Veserv a year ago
  I was talking about automatic instrumentation of every single function call by default. No manual instrumentation needed because everything is already instrumented.
  To be fair, you do still want some manual instrumentation to correlate higher level things, but full trace everywhere answers most questions. You also want to be able to manually suppress calls for small functions since that can be performance relevant or distorting, but the point is “default on, manual off” over “default off, manual on”.
  jpc0 a year ago
  How would you implement this may I ask? C++ does not have reflection in the language so you at best can do that by hooking into the running application, but C++ also aggressively inlines functions on anything except -O0 which mean your function call might never be a function call. Running at -O0 is alsp just generally a bad idea since many many instances of UB will never get caught.
  The only way I can see doing this at compile time is with a compiler extension but then you are entirely locked in to 1 compiler.
  Maybe if you compile with debug symbols but then well, you are shipping debug symbols...
  TickleSteve a year ago
  gcc has "-finstrument-functions". This calls your code on every function entry and exit. I've used this previously for tracing as described here and to move memory-protection windows around based on the running code.
  Veserv a year ago
  If you want something passable, most compilers have function prologue/epilogue hooks that you can write in plain code. Realistically, development and test systems are only going to target like 1 or 2 compilers, so it is not very much work. Unless you are distributing a source library and you want to get full-trace telemetry from customer systems in-development, that is probably all you really need to do.
  If you want to really zoom, you need to get the hooks inlined and probably just written in straight assembly. You then need to optimize your binary format and recording system. You then need to start optimizing your memory bandwidth usage when that becomes the bottleneck. Your overhead in the end is basically limited by memory bandwidth; you can only shovel so many tens of GB/s of logging into memory. Note that persistence has likely been infeasible for the last 2 or so orders of magnitude; RAM is likely the only storage consistently fast enough for the data rates you want to generate when doing this.
  jpc0 a year ago
  Makes sense and that's for three detailed answer. May put that one on the list of things to play with in the future.
  andersa a year ago
  This would be unbelievably inefficient, game engines will be running hundreds of millions of functions per second. And if the code runs 10x slower with the trace active, then it's no longer sensible.
  We use sampling for the cases where this level of detail is needed as it has lower overhead.
  What use case did you find this useful for?
  Veserv a year ago
  No, it is quite reasonable with efficient implementation. 10-50% overhead or so depending on function size distribution (since it is small fixed overhead per call, smaller functions result in a greater fraction of overhead). 10x for just function entry/exit recording would be grotesquely inefficient. You can do inefficient time travel debugging recording for less than that.
  You do need to allocate a ton of memory for the recording buffer to record sizable amounts of trace data. GB per core-second of trace or so (ring buffer so you get to see the last N seconds, not you need to run for less than N seconds) but that is fine during development on normal dev machines.
  It is useful for everything. Why would you not want full traces for everything? It is amazing. We use it for everything internally where I work. Or rather, it is part of it. We actually prefer full time travel debugging during development and automated testing (again, overhead is low enough) but it is not available for everything. So sometimes we are stuck with just traces.
  donadigo a year ago
  Every function would be pretty overkill, but you can automatically install instrumentation on functions on demand or with a pre-determined user selection. You can even instrument on a line basis if your instrumentation is cheap enough. I've experimented with this in a VS extension I'm developing and I could easily browse through a non-trivial game codebase without causing noticeable performance overhead [1]. In the demo, the instrumentation is auto-installed on all functions within the file you opened. Obviously, this is just one project I was testing on but it shows that this type of tracing is feasible.
  [1] https://www.youtube.com/watch?v=3PnVG49SFmU
  undefined a year ago
  [deleted]
- rerdavies a year ago
  Alas, not for Linux. I've been using the unloved and mostly abandoned (and mostly awful) google perf tools on Linux. :-(
  jchw a year ago
  Hmm? I haven't used Tracy yet but the demo trace they show at the URL linked on GitHub[1] sure looks like a trace from an application running on Linux. The documentation[2] also seems to reference what you need to run it on Linux, and the NixOS derivation[3] also suggests it runs on at least Linux and macOS, and I was able to run several of the binaries including the UI and capture binary. I still hesitate to doubt you on this because I haven't figured out how one is supposed to actually use it but it surely seems to support Linux. (I will definitely find a use for this, it looks amazing.)
  [1]: https://tracy.nereid.pl/
  [2]: https://github.com/wolfpld/tracy/releases/latest/download/tr...
  [3]: https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/devel...
  dkersten a year ago
  I’ve used Tracy on Linux about two months ago. Works fine.
  HellsMaddy a year ago
  Tracy works fine on Linux! Just used it today.
  rerdavies a year ago
  OK. Thanks for straightening me out. I saw the .msi, but no linux packages. I will definitely check it out!
  las_balas_tres a year ago
  That github repo contains the sources. You can also compile tracy by cloning the repo.
- undefined a year ago
  [deleted]
neverartful a year ago
I did something similar once but my implementation didn't rely on any compiler features. I made tracing macros for constructors, destructors, and regular c++ methods. If the tracing was turned on in the macros, the information given to the macro (class name, method name, etc.) would be passed to the tracing manager. The tracing manager would serialize to a string and send it through a TCP socket. I also wrote a GUI tracing monitor that would listen on a socket for tracing messages and then display the trace messages received (including counts by class and method). The tracing monitor had filters to tweak. It was a nice tool to have and was very instrumental in finding memory leaks and obscure crashes. This was back in the late 1990s or early 2000s.
rerdavies a year ago
Just spent three days of debugging hell getting my app to shut down gracefully, so that it gracefully turns off all the things that it asynchronously turned on without performing use-after deletes). I can sympathise with that.
omnicognate a year ago
Don't know why the link to the GCC runtime instrumentation docs points to the internet archive. The direct link is https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.h...
meindnoch a year ago
And what was the bug in the end?
meisel a year ago
I’d say address sanitizer is a better starting point, and likely to show memory issues faster than this
- tempodox a year ago
  Yep, ASAN to find use-after-free, and valgrind memcheck to find forgotten-to-free.
  manwe150 a year ago
  ASAN also checks for memory leaks like valgrind, the main difference with the tools is whether you can recompile all of libraries to get the compiler support for detection or whether binary instrumentation is better (https://github.com/google/sanitizers/wiki/AddressSanitizerLe...)
  tempodox a year ago
  Thanks, I didn't know the `ASAN_OPTIONS` part for macOS.
tempodox a year ago
Too bad, it doesn't work like this on macOS.
grahamj a year ago
[flagged]