• johnthuss 3 hours ago

    This is a super useful thing to know and I'm thankful for this article highlighting this aspect of Postgres.

    But I would disagree with the takeaway to focus on optimizing your indexes and not your tables. The reason is that the order of columns in a multi-column index is highly meaningful and intentional in order to support match on a range of values for the last column in the index. The way databases work you can only utilize a multi-column index on (customer_id int4, date timestamp) if have an equality match on customer_id, like "WHERE customer_id = 1 AND BETWEEN '2024-01-01' and '2025-01-01'". If you reorder these columns in the index to put the larger date column first, then, sure, you save space in the index, but you also make it worthless – it will never be used by the query above. As such, optimizing a multi-column index is only useful when all the columns are queried for equality rather than a range.

    In contrast, when you are creating a NEW table you might not think hard about the order of the columns in the table, and especially not about the data-sizes of each column and their alignment. But doing so at the time you create the table can be tremendously beneficial if it is going to be very large. It is important to note that you not only save space on-disk, but in precious RAM when the tuples are loaded.

    • jeltz 2 hours ago

      This is a limitation which is currently being worked on. The order will still matter of course but it will allow PostgreSQL to make some use of indexes even when the order of coulums does not match.

      https://www.postgresql.org/message-id/CAH2-Wzmn1YsLzOGgjAQZd...

      • johnthuss an hour ago

        >>> This is feasible in cases where the total number of distinct values in the column 'a' is reasonably small (think tens or hundreds, perhaps even thousands for very large composite indexes).

        It's great this is improving, but this is a fairly narrow improvement. Personally, the multi-column indexes I use would not be improved by this change since column 'a' does not store a "reasonably small" distribution of values.

      • sgarland 2 hours ago

        This (index ordering resulting in useless indices) is not true, at least not in newer versions of Postgres (I just tried with 15).

        While the query will take much longer (for me, it was about 47 msec vs 0.6 msec for 2,000,000 rows), it still uses the index.

        Similarly, while normally you wouldn’t expect a query using predicate Y to use an index defined on columns (X, Y, Z) if predicate X is also not referenced, Postgres may choose to do so, depending on table stats, and the relative difference between random_page_cost and seq_page_cost. I’ve seen it happen before.

      • wccrawford 3 hours ago

        Interesting. But if this is important, why doesn't Postgres do it invisibly, automatically? Surely there's a most-correct sequence, with the longest values first, and the shortest values last, and then (probably) the variable-length values?

        It could then happily report back the data in the order that the DB Admin/developer expects.

        • jeltz 2 hours ago

          Because nobody has implemented it yet. A patch would be very welcome by many but nobody has written one.

          • j45 3 hours ago

            Updating column rows in MySQL is pretty trivial, and I've wondered why it's not the same in Postgres, even at the GUI level.

            Ordering columns for performance might be a different order of reccomendation.

          • rtuin 2 hours ago

            It’s really something you don’t know how this applies to your Postgres DB, until you run into a situation where you do.

            The author explains this very well, it’s a good read! I’ve learned about this padding little over a year ago, while I was designing a data intensive application with a colleague. I was skeptical about the advantage at first, but for our specific design, where we have 100 to 480+ columns in one table it makes a huge difference on table store size. Not so much on the indexes, though.

            • sgarland 2 hours ago

              > where we have 100 to 480+ columns in one table

              I’m sorry, what? Why?

              • rtuin 2 hours ago

                Glad you asked!

                This system contains measurements and state of physical devices (time series). It’s designed for both heavy write and read, with slight emphasis on write. Each table is one type of device and contains 1 to 5 different measurements/states. But here’s the trick: because data is queried with minimum bucket size of 15minutes I figured we could just create a column for each measurement + quarter of the day (i.e. measure0000, measure0015), so that’s 100 columns for each measurement (96 quarter + 4 for DST), include the date in the key, et voila: excellent write performance (because it’s mainly UPDATE queries) and good read performance.

                Okay, the queries to make sense of the data aren’t pretty, but can be generated.

                I find it really cool how effective this is for time-series data without Postgres extensions (we’re on RDS).

                • aidos an hour ago

                  The updates wouldn’t be in place anyway because of the Postgres mvcc approach, right?

                  I can see the advantage in terms of just needing a single tuple for a reads. So a timestamp + value model would likely take twice as much heap space than your approach?

                  Given that you’re probably always just inserting new data you could use a brin index to get fast reads on the date ranges. Would be interesting to see it in action and play around to see the tradeoffs. The model you’ve settled on sounds like it would be a pain to query.

                  • napsterbr an hour ago

                    One interesting thing your team may want to look into (if you haven't already) is compression.

                    Of course there are a multitude of variables we don't have access from the outside, but Postgres only compresses data that is TOASTed, and based on your description of the table, the data is not being TOASTed (and therefore not being compressed).

                    Instead, if you could somehow pack your timeseries entries into an array, you would get the benefits of compression automatically.

                    Given your write performance requirements, using an array may be out-of-question (and you may get too much overhead from dead tuples) -- but who knows? Always a good idea to benchmark.

                    I actually considered mentioning this at the post but figured it was too long already and could be the material for a future one :)

              • remus 3 hours ago

                Very interesting, but I think the author overstates the importance of alignment a little. Unless your data/indexes are already of a challenging size (or you expect them to be imminently) for your hardware then fiddling with byte alignment details feels like a fairly premature optimisation.

                Disk is cheap, memory is plentiful, your time is expensive etc.

                • egnehots 3 hours ago

                  As mentioned in the article, it's a good idea to consider this when creating a new table, since it's essentially a free optimization. However, it's probably not worth the hassle of reordering a production table for that.

                  > Disk is cheap, memory is plentiful, but your time is expensive, etc.

                  One thing to keep in mind, though, is that while you often have plenty of disk space, RAM is still relatively expensive. It's also divided into many smaller buffers, such as working memory and shared buffers, which are not that large. These optimizations help to fit more data into cache.

                  However, what the article said about alignment being important for indexes is somewhat misleading. Reordering an index field is not the same as reordering columns in a table. Beside having to rewrite queries, it also changes the access pattern and the time required to access the data, which is often much more significant than the space saved. Indexes are, by nature, a tradeoff where you give up space to gain time, so this mindset doesn't really apply there.

                  • napsterbr 3 hours ago

                    Hey, author here.

                    > Indexes are, by nature, a tradeoff where you give up space to gain time, so this mindset doesn't really apply there.

                    I agree that (re)aligning indexes are a different beast entirely, but (as mentioned in my recommendation) ideally the developer should keep this in mind when creating the index initially.

                    Factors like cardinality and even readability should take precedence over perfect alignment, but all else being equal, aligning your indexes from the very moment they are introduced in the codebase is the ideal scenario IMO.

                  • sgarland 2 hours ago

                    > Disk is cheap, memory is plentiful, your time is expensive etc.

                    Spend 30 minutes one day playing around with Postgres, trying different column combinations out. Boom, you now know how best to order columns. This doesn’t seem like a big ask.

                    The flip side is that changing data at scale is HARD, so if you put things like this off, when you do finally need to squeeze bytes, it’s painful.

                    Also, memory is absolutely not plentiful. That’s generally the biggest bottleneck (or rather, the lack of it then makes IO the bottleneck) for an RDBMS, assuming you have connection pooling and aren’t saturating the CPU with overhead.

                    • koolba 3 hours ago

                      > Disk is cheap, memory is plentiful, your time is expensive etc.

                      Taking the time to know the in memory sizing for your data types is well worth it. Taking the time to think about the types to use and sorting them by size is also minimal and well worth it.

                      It may make sense for the system to do it automatically for newly created tables. But maybe not as it’s possible you’d want the data layout to match some existing structure.

                    • gnfargbl an hour ago

                      The ever-useful postgres_dba tool (https://github.com/NikolayS/postgres_dba) can help with this. Install it, start psql, run :dba and choose option p1 to see how much table space you would save by reordering your columns.

                      • didgetmaster an hour ago

                        It's been awhile since I dug into the bowels of PG; but the overall row size with respect to page size used to be important. If you had a table where every row took up 4097 bytes and the page size was 8K; the your disk footprint for that table was double. Only one row fit in a page and the other (nearly) half of each page was wasted.

                        If that is still true, then alignment issues could push you over the edge if you are close to it.

                        • OliverJones 2 hours ago

                          Good material!

                          It has to be said, the order of columns in correctly designed multicolumn BTREE indexes is governed by the shape of the queries the indexes support.

                          So don't arbitarily reorder columns in your indexes to handle alignment.

                          • SoftTalker 2 hours ago

                            This is the sort of thing that good DBAs used to know about and manage, but nowadays that isn't a fashionable job.

                            • delduca 2 hours ago

                              Could the Django ORM deal with this?

                              • sgarland 2 hours ago

                                Yes, but it doesn’t that I’m aware of. Certainly able to override the classes yourself, though. Make a lookup table of column type / length and then reorder based on that.

                                • jbkkd 2 hours ago

                                  It can't right now, but this is a doable extension

                                • koolba 3 hours ago

                                  The content itself is fine but the unnecessary image above the label “ You have the right to remain aligned” with the usual AI-generated garbage text cheapens the entire article.

                                  • loloquwowndueo 3 hours ago

                                    Are you kidding, that image is hilarious.

                                    (Articles that intersperse annoying memes every 3 paragraphs with obnoxious gif animations are much much worse).