Autovacuum tuning, one table at a time
Autovacuum is one of those features where the defaults work fine for 95% of your tables. The other 5% will keep you up at night.
For years I tuned it globally. Bumped autovacuum_vacuum_scale_factor down a bit, bumped autovacuum_vacuum_cost_limit up a bit, called it a day. This works in the sense that no one table blows up, but it’s a mediocre compromise — small tables get vacuumed more than they need, big tables get vacuumed less than they need.
The fix is boring and obvious in retrospect: set per-table autovacuum parameters. It’s a one-line ALTER TABLE and it’s the right answer.
How bad can it get
Our worst offender was an events table of 1.2TB, with 30GB/week of inserts, and a smallish but fast churn of updates (maybe 5% of rows updated within a month). At default settings:
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 50
…autovacuum wouldn’t kick in until 20% of the table (240GB) was dead. When it finally did run, it took hours and saturated IO. Dead tuple ratio would walk up to 18%, 19%, sit there, and then we’d see random query latency spikes.
Per-table knobs
These are the ones I actually use:
ALTER TABLE events SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_analyze_scale_factor = 0.005,
autovacuum_vacuum_cost_limit = 2000,
autovacuum_vacuum_cost_delay = 10
);
Translation:
- Vacuum when dead tuples > 1% of table size (not 20%). For a 1.2TB table that’s 12GB, still big, but vacuums are more frequent and each one is more manageable.
- Analyze when 0.5% of rows have changed. We want fresh stats because the planner is sensitive on this table.
- Higher cost limit + moderate cost delay means each vacuum cycle moves faster without saturating IO. Tune for your IO budget.
This is not magic. It is “what the global settings should probably be if you were writing them today.” Nothing prevents Postgres from being smarter by default; a lot of the default values date back to when “large” meant “a few GB.”
How to decide which tables need tuning
SELECT
relname,
pg_size_pretty(pg_total_relation_size(relid)) AS size,
n_live_tup,
n_dead_tup,
round(n_dead_tup::numeric / nullif(n_live_tup, 0), 4) AS dead_ratio,
last_autovacuum,
autovacuum_count
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;
Dead ratios that sit above 0.1 for days are candidates. Tables where last_autovacuum is more than a day old despite high write rate are candidates. Tables where autovacuum_count barely grows are candidates.
Don’t tune the tables that are already fine.
The “table is too hot for autovacuum” case
There are tables — usually queue tables or hot key-value stores — that get enough writes that autovacuum can’t keep up at any reasonable cost_limit without causing user-visible IO pain. For these, a few options:
- Partition it. Hot inserts on the new partition, older partitions are effectively static and vacuum there is cheap.
- Use
pg_repackinstead of vacuum for periodic cleanup. Runs concurrently, rewrites the table without taking a heavy lock. - If it really is a queue, consider moving to a purpose-built queue (Redis streams, NATS, Kafka). Postgres can be a queue but you’ll know when it shouldn’t be.
VACUUM FULL is almost never the answer
I want to say this plainly because I see it every few months on some Stack Overflow post: VACUUM FULL takes an ACCESS EXCLUSIVE lock, rewrites the whole table, and blocks everything. It reclaims disk more thoroughly than regular VACUUM, but at the cost of total unavailability for the duration.
If you need to reclaim disk without downtime, use pg_repack. If you need to reclaim disk urgently and you can take downtime, VACUUM FULL is OK, but plan it like any other outage.
Reflection
The pattern that finally clicked for me: autovacuum is not “garbage collection that runs in the background,” it’s “a scheduled maintenance process with a cost budget and a trigger condition.” Both the trigger and the budget need to match the table’s profile. The per-table override is there because one global setting can’t fit every table. Use it.
Related: Partition attach locks and the bloat we didn’t see coming.