« BackPlaying with BOLT and Postgresvondra.meSubmitted by aquastorm a day ago
  • albntomat0 33 minutes ago

    I posted this in a comment already, but the results here line up with the original BOLT paper.

    “For the GCC and Clang compilers, our evaluation shows that BOLT speeds up their binaries by up to 20.4% on top of FDO and LTO, and up to 52.1% if the binaries are built without FDO and LTO.”

    “Up to” though is always hard to evaluate.

    • miohtama 3 hours ago

      10% - 20% performance improvement for PostgreSQL "for free" is amazing. It almost sounds too good to be true.

      • albntomat0 35 minutes ago

        There’s a section of the article at the end about how Postgres doesn’t have LTO enabled by default. I’m assuming they’re not doing PGO/FDO either?

        From the Bolt paper: “For the GCC and Clang compilers, our evaluation shows that BOLT speeds up their binaries by up to 20.4% on top of FDO and LTO, and up to 52.1% if the binaries are built without FDO and LTO.”

        • pgaddict 5 minutes ago

          With the LTO, I think it's more complicated - it depends on the packagers / distributions, and e.g. on Ubuntu we apparently get -flto for years.

          • touisteur 24 minutes ago

            I've always wondered how people actually get the profiles for Profile-Guided-Optimization. Unit tests probably won't actuate high-performance paths. You'd need a set of performance-stress tests. Is there a write-up on how everyone does it in the wild ?

            • mhh__ 11 minutes ago

              You might be surprised how much speedup you can get from (say) just running a test suite as PGO samples. If I had to guess this is probably because compilers spend a lot of time optimising cold paths which they otherwise would have no information about

              • pgaddict 6 minutes ago

                Yeah, getting the profile is obviously a very important step. Because if it wasn't, why collect the profile at all? We could just do "regular" LTO.

                I'm not sure there's one correct way to collect the profile, though. ISTM we could either (a) collect one very "general" profile, to optimize for arbitrary workload, or (b) profile a single isolated workload, and optimize for it. In the blog I tried to do (b) first, and then merged the various profiles to do (a). But it's far from perfect, I think.

                But even with the very "rough" profile from "make installcheck" (which is the basic set of regression tests), is still helps a lot. Which is nice. I agree it's probably because even that basic profile is sufficient for identifying the hot/cold paths.

        • fabian2k 2 hours ago

          My first instinct is that the effect is too large to be real. But that should be something other people could reproduce and verify. The second thought is that it might overfit the benchmark code here, but they address it in the post. But any kind of double-digit improvement to Postgres performance would be very interesting.

          • pgaddict 2 hours ago

            (author here)

            I agree the +40% effect feels a bit too good, but it only applies to the simple OLTP queries on in-memory data, so the inefficiencies may have unexpectedly large impact. I agree 30-40% would be a massive speedup, and I expected it to disappear with a more diverse profile, but it did not ...

            The TPC-H speedups (~5-10%) seem much more plausible, considering the binary layout effects we sometimes observe during benchmarking.

            Anyway, I'd welcome other people trying to reproduce these tests.

            • fabian2k an hour ago

              I looked and there is no mention of BOLT yet in the pgsql-hackers mailing list, that might be the more appropriate place to get more attention on this. Though there are certainly a few PostgreSQL developers reading here as well.

              • pgaddict 27 minutes ago

                True. At the moment I don't have anything very "actionable" beyond "it's magically faster", so I wanted to investigate this a bit more before posting to -hackers. For example, after reading the paper I realized BOLT has "-report-bad-layout" option to report cases of bad layout, so I wonder if we could identify places where to reorganize the code.

                OTOH my blog is syndicated to https://planet.postgresql.org, so it's not particularly hidden from the other devs.

          • Avamander an hour ago

            How easy would it be to have an entire distro (re)built with BOLT? Say for example Gentoo?

            • fishgoesblub an hour ago

              It would be difficult as every package/program would need a step to generate the profile data by executing and running the program like the user would.

              • metadat 5 minutes ago

                Is it theoretically possible to perform the profile generation+apply steps dynamically at runtime?