Tuning pgvector HNSW without giving up

We shipped a semantic-search feature backed by pgvector. The first version used IVFFlat indexes and I was mostly fine with its latency. When pgvector 0.5 brought HNSW to Postgres, I migrated, and the result surprised me — both the wins and the gotchas.

The dataset

About 18 million 768-dimensional embeddings. Insert rate roughly 40k per hour. Query load: bursty, about 30-120 QPS. Recall target: 95% for the top 10.

Default HNSW: comfortable but not great

Creating the index with defaults:

CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);

This uses m = 16 and ef_construction = 64. Build took about 6 hours on our hardware, which was OK. Query latency with default hnsw.ef_search = 40 was around p50 28ms, p95 120ms. Recall was about 91%. Acceptable but not great — the product wanted sub-50ms p95 and 95% recall.

The three knobs that matter

m: number of connections per layer in the graph. Higher = better recall, bigger index, slower build. Typical range 8-64.

ef_construction: how hard the builder tries to find good neighbors. Higher = better quality index, much slower build, no effect on query time.

ef_search: how hard the query tries to find nearest neighbors at query time. Higher = better recall, slower queries. This is a query-time parameter and can be set per-session.

SET hnsw.ef_search = 100;

What I tried

Increased ef_construction to 128. Build time went from 6h to ~14h. Index size grew from 22GB to 24GB. Query recall at default ef_search=40 went to 94%. Progress.

Increased m to 24. Build time another 6h on top. Index size to 33GB. Recall at default ef_search=40 went to 95.5%. Getting there.

At query time, bumped ef_search to 100. p95 latency went up to 55ms. Recall to 97%.

That put us at: 95%+ recall, p95 55ms, p50 32ms. Acceptable for launch.

The index build is hard to do online

HNSW index builds hold a ShareUpdateExclusiveLock, which conflicts with DDL but not DML. You can insert during the build. You can’t run VACUUM FULL or schema changes.

On a 18M-row table, the build took the better part of a day. We did it during a maintenance window. If I were doing it again I’d:

Build in parallel. pgvector 0.5+ supports max_parallel_maintenance_workers. Set it to 4 and the build is closer to 4 hours.
Build on a replica, promote. Not easy if your replica is hot-standby, but doable with an async replica and some careful switchover.

Inserts during and after

Insert rate while the index exists is ~3x slower than without the index. For our workload that’s tolerable. For a system with sustained 10k inserts/sec I’d need to think harder — maybe partition by time and only keep the HNSW index on recent partitions.

We also noticed that deletes don’t fully remove entries from the graph; they’re marked as deleted but stay in the structure. Over time this grows. There’s no VACUUM-like operation that rebuilds the graph; you have to drop and rebuild. We scheduled quarterly rebuilds.

The bizarre recall cliff

One thing that wasn’t well-documented: recall on HNSW is non-monotonic in ef_search for very small values. With ef_search = 10 we had weirdly good recall (because the few points it did find were the easy nearest neighbors). With ef_search = 20 it dropped (the search expanded but got confused by early layer choices). With ef_search = 40+ it stabilized and grew monotonically.

I spent an afternoon convinced the index was broken. It wasn’t — I was benchmarking in the weird zone. Set ef_search to at least 40 before you measure.

Cost/benefit vs IVFFlat

For our workload, HNSW won on every axis except index build time and index size:

Query latency: ~2x better than IVFFlat (IVFFlat was p95 ~100ms at our settings).
Recall: better, and more stable — IVFFlat’s recall depends a lot on the probes setting.
Inserts: similar to IVFFlat.
Build time: HNSW took 3-4x longer than IVFFlat.
Index size: HNSW about 2x the size of IVFFlat.

For our size of dataset (~18M vectors) the tradeoff clearly favored HNSW. For 10x more vectors I’d reconsider — the build time and size cost scales with the data, while the latency benefit plateaus.

Reflection

pgvector’s HNSW is genuinely production-ready, but “production-ready” doesn’t mean “no tuning required.” The defaults are conservative on build time, at the cost of index quality. If you have the time to build a good index once, you’ll save on query-time ef_search forever. Spend on the build.