when row size matters

Most of the time, you don’t give a lot of thought to how big the data in a database row is. And most of the time, it doesn’t matter. But sometimes it does.

For something like 18 months, I would periodically get frustrated by the performance of an indexed primary key lookup. Our APM system routinely reported p50 query times times in the 2-3ms range on a query like:

SELECT * FROM entities WHERE id = 2

The table in question had fewer than 100 rows, so the timing didn’t make sense. Especially because a comparable query on another table was consistently faster, despite having tens of thousands of records. However, the slower query wasn’t actively harmful enough to justify focused time.

One day, I happened to look at the APM for this query again. The distribution of timings had many sub-millisecond times and another cluster around p50. When I checked the table definition, I noticed a couple of JSONB columns. Then it clicked – for some rows, those JSONB columns contain a lot of data. Gathering that data (probably from a TOAST table) and sending it over the wire was making these queries more expensive.

Adding insult to injury is that the vast majority of the time, the querying code had no use for this data blob. So the time gathering and shipping that data is wasted.

Unfortunately, I made this discovery on my last day working on this codebase, so I don’t have the after timings. The fix is to change the data structure and/or queries such that the large JSONB columns are only queried when needed. I wrote this up and left it to my colleagues to address at an appropriate time.