The bitmap heap scan that ate our p99

One of our ingestion queries started going from 8ms to 400ms overnight. No schema change. No code change. No obvious traffic spike. The usual suspects file was empty and I was annoyed.

The query was a plain SELECT ... WHERE tenant_id = $1 AND created_at >= $2 — the kind of thing we have maybe thirty of. There’s a composite index on (tenant_id, created_at DESC) and it had been doing an index-only scan forever.

Until it wasn’t.

What the planner did instead

Here’s the EXPLAIN, trimmed:

Bitmap Heap Scan on events  (cost=312.10..14820.44 rows=4208 width=214)
  Recheck Cond: ((tenant_id = '...') AND (created_at >= '...'))
  Heap Blocks: exact=3812 lossy=9110
  ->  Bitmap Index Scan on events_tenant_created_idx  (cost=0.00..311.05 rows=4208)

lossy=9110 was the smoking gun. When work_mem isn’t big enough to hold the bitmap, Postgres degrades individual TIDs to whole-page markers. That means “recheck” has to read every row on those pages and re-evaluate the filter. For our query, that meant reading ~60% of the table one time per query. No wonder p99 was cooked.

Why did work_mem suddenly matter?

It hadn’t changed. What had changed was the data. A customer had backfilled ~18M rows over the weekend. The matching row count for a typical query jumped from ~200 to ~200k, which pushed the bitmap past work_mem and into lossy mode. The planner’s own row estimate (4208) was also badly off — stats hadn’t caught up to the backfill.

Two things happened simultaneously:

The cost of an index scan went up (more matching rows).
The cost estimate for bitmap heap scan was optimistic because it doesn’t really model the lossy fallback.

The planner picked the bitmap plan. It was wrong.

The fix (three layers of it)

First, force stats to catch up:

ANALYZE events;

That bought us back ~30ms. Still not good enough.

Second, bump work_mem for this workload. Not globally — that way lies OOM — but per-session from the worker that runs this query:

SET LOCAL work_mem = '64MB';

Third, I adjusted the composite index to include the commonly-projected columns so it could stay index-only:

CREATE INDEX CONCURRENTLY events_tenant_created_covering_idx
  ON events (tenant_id, created_at DESC)
  INCLUDE (event_type, payload_hash);

After all three, we were back to 6-9ms p99 and the bitmap path stopped showing up.

What I actually learned

I’ve read about lossy bitmap scans many times. I’ve never felt them in production before. The thing the textbook doesn’t emphasize enough is that the lossy transition is effectively invisible at the plan level unless you run EXPLAIN ANALYZE. A regular EXPLAIN tells you it’s a bitmap scan. It won’t tell you that the bitmap is about to fall off a cliff.

So now I have a checklist item whenever a formerly-fast query gets slow:

Run EXPLAIN (ANALYZE, BUFFERS) on a prod-sized dataset.
Look for lossy= anything larger than zero.
Compare estimated vs actual rows. If they disagree by more than 5x, stats are the problem or the predicate is too weird for the histogram.

See also the write-up on CTE materialization changes — that one bit us for similar reasons.

The broader lesson: planner decisions are not stable across data distributions, and “works fine today” is a one-day guarantee. Index design should assume the adversarial case, not the average one.