Tuning pgvector HNSW without giving up
We shipped a semantic-search feature backed by pgvector. The first version used IVFFlat indexes and I was mostly fine with its latency. When pgvector 0.5 brought HNSW to Postgres, I migrated, and the result surprised me — both the wins and the gotchas.
The dataset
About 18 million 768-dimensional embeddings. Insert rate roughly 40k per hour. Query load: bursty, about 30-120 QPS. Recall target: 95% for the top 10.
Default HNSW: comfortable but not great
Creating the index with defaults:
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);
This uses m = 16 and ef_construction = 64. Build took about 6 hours on our hardware, which was OK. Query latency with default hnsw.ef_search = 40 was around p50 28ms, p95 120ms. Recall was about 91%. Acceptable but not great — the product wanted sub-50ms p95 and 95% recall.
The three knobs that matter
m: number of connections per layer in the graph. Higher = better recall, bigger index, slower build. Typical range 8-64.
ef_construction: how hard the builder tries to find good neighbors. Higher = better quality index, much slower build, no effect on query time.
ef_search: how hard the query tries to find nearest neighbors at query time. Higher = better recall, slower queries. This is a query-time parameter and can be set per-session.
SET hnsw.ef_search = 100;
What I tried
Increased ef_construction to 128. Build time went from 6h to ~14h. Index size grew from 22GB to 24GB. Query recall at default ef_search=40 went to 94%. Progress.
Increased m to 24. Build time another 6h on top. Index size to 33GB. Recall at default ef_search=40 went to 95.5%. Getting there.
At query time, bumped ef_search to 100. p95 latency went up to 55ms. Recall to 97%.
That put us at: 95%+ recall, p95 55ms, p50 32ms. Acceptable for launch.
The index build is hard to do online
HNSW index builds hold a ShareUpdateExclusiveLock, which conflicts with DDL but not DML. You can insert during the build. You can’t run VACUUM FULL or schema changes.
On a 18M-row table, the build took the better part of a day. We did it during a maintenance window. If I were doing it again I’d:
- Build in parallel. pgvector 0.5+ supports
max_parallel_maintenance_workers. Set it to 4 and the build is closer to 4 hours. - Build on a replica, promote. Not easy if your replica is hot-standby, but doable with an async replica and some careful switchover.
Inserts during and after
Insert rate while the index exists is ~3x slower than without the index. For our workload that’s tolerable. For a system with sustained 10k inserts/sec I’d need to think harder — maybe partition by time and only keep the HNSW index on recent partitions.
We also noticed that deletes don’t fully remove entries from the graph; they’re marked as deleted but stay in the structure. Over time this grows. There’s no VACUUM-like operation that rebuilds the graph; you have to drop and rebuild. We scheduled quarterly rebuilds.
The bizarre recall cliff
One thing that wasn’t well-documented: recall on HNSW is non-monotonic in ef_search for very small values. With ef_search = 10 we had weirdly good recall (because the few points it did find were the easy nearest neighbors). With ef_search = 20 it dropped (the search expanded but got confused by early layer choices). With ef_search = 40+ it stabilized and grew monotonically.
I spent an afternoon convinced the index was broken. It wasn’t — I was benchmarking in the weird zone. Set ef_search to at least 40 before you measure.
Cost/benefit vs IVFFlat
For our workload, HNSW won on every axis except index build time and index size:
- Query latency: ~2x better than IVFFlat (IVFFlat was p95 ~100ms at our settings).
- Recall: better, and more stable — IVFFlat’s recall depends a lot on the
probessetting. - Inserts: similar to IVFFlat.
- Build time: HNSW took 3-4x longer than IVFFlat.
- Index size: HNSW about 2x the size of IVFFlat.
For our size of dataset (~18M vectors) the tradeoff clearly favored HNSW. For 10x more vectors I’d reconsider — the build time and size cost scales with the data, while the latency benefit plateaus.
Reflection
pgvector’s HNSW is genuinely production-ready, but “production-ready” doesn’t mean “no tuning required.” The defaults are conservative on build time, at the cost of index quality. If you have the time to build a good index once, you’ll save on query-time ef_search forever. Spend on the build.
Related: The bitmap heap scan that ate our p99.