Partition attach locks and the bloat we didn't see coming

We finally partitioned our events table. It was 1.2TB, growing by about 30GB a week, and range queries over time were getting slow enough that the junior devs started referring to it as “the dragon.”

The physical migration was straightforward. What wasn’t straightforward was everything that happened after.

The plan

Classic range partitioning by month. Native Postgres partitioning (not pg_partman — I’ll explain in a minute). Backfill by inserting into new partitioned table in chunks, then swap.

CREATE TABLE events_new (
  id bigserial,
  tenant_id uuid,
  created_at timestamptz not null,
  payload jsonb
) PARTITION BY RANGE (created_at);

-- partition per month
CREATE TABLE events_2023_01 PARTITION OF events_new
  FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');
-- ...

Backfill took three days because we rate-limited ourselves to avoid overwhelming the primary. The cutover used ALTER TABLE ... RENAME inside a transaction. Zero downtime. Celebratory cake.

The first surprise: ATTACH takes a strong lock

Adding new monthly partitions going forward means running ALTER TABLE events ATTACH PARTITION. And ATTACH PARTITION takes SHARE UPDATE EXCLUSIVE on the parent, which conflicts with basically every DDL you might want to run concurrently, and in some Postgres versions an ACCESS EXCLUSIVE for a moment.

Worse, in 13 and below, ATTACH triggered a full scan of the partition being attached to verify the constraint. We worked around that with:

ALTER TABLE events_2026_02 ADD CONSTRAINT events_2026_02_range_check
  CHECK (created_at >= '2026-02-01' AND created_at < '2026-03-01') NOT VALID;
ALTER TABLE events_2026_02 VALIDATE CONSTRAINT events_2026_02_range_check;

ALTER TABLE events ATTACH PARTITION events_2026_02
  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

The matching CHECK constraint lets Postgres skip the scan. We’re now on 15 and this matters less, but the lock shape is still annoying. We schedule partition creation during the quietest window we have.

The second surprise: toast bloat didn’t go away

The old table had a ton of TOAST bloat from historical JSONB payloads. I assumed that by moving the data row-by-row into new partitions, we’d shed that bloat naturally.

We did, but not as much as I expected. The issue is that many of our payloads are the same (or very similar), and TOAST compression was re-compressing them after insert but not dedup’ing. The total size went from 1.2TB to 980GB, which is a lot smaller but not as dramatic as our back-of-envelope said.

We ended up writing a payload canonicalizer that extracts a hash and references a shared payload table. That took another few weeks. Worth it — new partitions are about a third the size of old ones.

The third surprise: indexes

Partitioned tables in Postgres 11+ propagate index definitions, but the index itself is created per-partition. Creating indexes on a new partition is fast because partitions are small. Creating them on existing partitions is slow and takes a lock unless you use CONCURRENTLY — and CONCURRENTLY has to be done per-partition, because the parent-level index can’t be CONCURRENT.

The workaround: create the index on each partition concurrently, then attach them to the parent:

CREATE INDEX CONCURRENTLY events_2026_02_tenant_idx
  ON events_2026_02 (tenant_id, created_at);

-- then attach to the parent index:
ALTER INDEX events_tenant_idx
  ATTACH PARTITION events_2026_02_tenant_idx;

We automated this. It’s not hard, just annoying.

Why not pg_partman

pg_partman is great software. We use it on another project. For this table, the issue was cardinality of our tenant_id dimension — we wanted subpartitioning by hash(tenant_id) inside each month, and the layout we ended up with didn’t map cleanly onto the partman conventions. Writing our own migration + maintenance scripts was ~200 lines of SQL and gave us more control over attach timing.

Reflection

Partitioning is well-documented. The operational details aren’t. If I were doing this again I’d:

Do the tenant hash subpartitioning from day one, not as a follow-up.
Build the partition-rotation cron before the first partition. Never trust a “we’ll do it next week” here.
Set up monitoring for partition count, per-partition size, and locks on partition DDL before cutover.

None of that is in the Postgres docs. Most of it is in the “lessons we paid for” column.