Comments Page - Bypass PostgreSQL catalog overhead with direct partition hash calculations

« Back Bypass PostgreSQL catalog overhead with direct partition hash calculationsshayon.devSubmitted by shayonj 4 days ago

pella 2 hours ago

Note: PostgreSQL 18 includes many optimisations related to partitions; because of this, the improvement may be smaller. ( https://www.postgresql.org/docs/18/release-18.html )

  -- "Improve the efficiency of planning queries accessing many partitions (Ashutosh Bapat, Yuya Watari, David Rowley)"

  ..."The actual performance increases here are highly dependent on the number
  of partitions and the query being planned.  Performance increases can be
  visible with as few as 8 partitions, but the speedup is marginal for
  such low numbers of partitions.  The speedups become much more visible
  with a few dozen to hundreds of partitions.  With some tested queries
  using 56 partitions, the planner was around 3x faster than before.  For
  use cases with thousands of partitions, these are likely to become
  significantly faster.  Some testing has shown planner speedups of 60x or
  more with 8192 partitions."
  https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d69d45a5a

regecks 5 hours ago
I’d be interested to see a benchmark of actually selecting rows by manually specifying the partition vs not.
This benchmark seems to be pure computation of the hash value, which I don’t think is helpful to test the hypothesis. A lot can happen at actual query time that this benchmark does not account for.
nopurpose an hour ago
This sounds really interesting. Query planner acquires lightweight lock on every table partition and its indexes. There can be only 16 fast locks, exceed that number and locking becomes significantly more expensive slowing down queries significantly. Workaround I have used so far was to use plan_cache_mode=force_generic which prevents re-planning queries on parameters change.
This approach might be a better option, but sadly app needs to be modified to make use of it.
ysleepy 4 hours ago
Without a comparison of letting postgres calculate the partition this is just useless.
And who in their right mind would calculate a hash using a static SQL query that isn't even using the pg catalog hashing routine but a reimplementation.
I'm baffled.
Elucalidavah 5 hours ago
Tangential: is "without requiring knowledge of data patterns" a frequently useful requirement? I.e. isn't knowledge of data patterns basically required for any performance optimization?
wengo314 5 hours ago
this looks like maintenance nightmare going forward, but i could be wrong.
If you are stuck on specific pg version for a while, maybe it's worth it.
- nopurpose an hour ago
  hash won't change on PG version upgrade, because that would cause a massive reshuffling by pg_upgrade: something PG managed to avoid so far.
itsthecourier 4 hours ago
nice to see ruby getting ahead