pgbouncer transaction pooling broke our prepared statements

We switched our API tier from pgbouncer session pooling to transaction pooling to get connection utilization up. Connection counts dropped nicely. Then at 4am somebody’s script started seeing prepared statement "S_1" does not exist and I got paged.

What transaction pooling actually does

Session pooling is 1-to-1: one client connection maps to one backend server connection for the duration of the client session. Transaction pooling shares backend connections among many clients, switching between clients at transaction boundaries.

This is great for connection counts, terrible for anything with backend session state. And prepared statements are session state.

If your client issues PREPARE foo AS SELECT ... on backend #42, then in the next transaction pgbouncer might route the client to backend #7, where foo doesn’t exist.

What made this hard to spot

Three things.

First, many clients “prepare” invisibly. Most language drivers turn parameterized queries into prepared statements behind the scenes. You didn’t write PREPARE, but psycopg3 did for you.

Second, some drivers cache prepared statement handles across queries without coordinating with the server. The client thinks “I already prepared this, I’ll reuse handle S_1” — but server-side, the backend connection behind the handle is now a different process.

Third, this only fails under pool pressure. When connections are abundant, pgbouncer often sticks you on the same backend for successive transactions and the bug hides. Under load, you hop, and you break.

The fix

We considered three paths:

Go back to session pooling. Works but wastes connections.
Disable prepared-statement caching client-side. Works but the perf hit is real — we measured ~8% on a couple of hot endpoints.
Use pgbouncer 1.21+ which has prepared statement support in transaction mode. This is the actual answer.

We went with option 3. Config looked like:

[pgbouncer]
pool_mode = transaction
max_prepared_statements = 200

Then on the client side (psycopg3 in our case), we made sure the driver uses server-side prepared statements with a known name template:

conn = psycopg.connect(conninfo, prepare_threshold=1)

The gotcha inside the gotcha

Even with pgbouncer 1.21’s prepared statement support, the first time a statement is prepared on a new backend it’s a real round trip — pgbouncer intercepts the prepare and replays it on each backend as needed. In pathological cases (very high statement churn, small pool) you can see elevated latency as the fleet of backends catches up. Size max_prepared_statements for what you actually use.

Also: DEALLOCATE ALL from the client nukes the cache globally on that backend and confuses pgbouncer. If you have any code doing that — disconnect-on-error handlers, paranoid middleware — you’ll see mysterious “prepared statement does not exist” errors long after the switch.

Observability we added

Counters in pgbouncer are your friend:

psql -h pgbouncer -p 6432 -U stats pgbouncer -c "SHOW STATS;"
psql -h pgbouncer -p 6432 -U stats pgbouncer -c "SHOW POOLS;"

We pipe these into Prometheus via pgbouncer_exporter. The metrics we actually watch:

pgbouncer_servers_busy — are backends saturated?
pgbouncer_queries_per_second — sanity check
Prepared-statement miss rate (parsed from logs, annoyingly)

What I’d do differently

The switch was planned over a week. I read the docs, I did a dev cutover, I ran load tests. None of this exposed the prepared-statement issue because the load generator wasn’t doing the same kind of reuse that psycopg3 does under natural traffic.

If I were doing this again I’d:

Run a shadow connection pool for 24h and diff error rates before cutting traffic.
Set log_min_duration_statement = 0 on a canary backend and grep for “prepared statement does not exist” rather than hoping the APM catches it.
Read the release notes on pgbouncer 1.21 before deciding which mode to use.

Transaction pooling is genuinely a big win on connection counts. But it is not a free switch.