Posts
Things I’ve written that are long enough to deserve an introduction. Mostly backend engineering, occasional detour into photography and espresso machines.
-
Homelab rack fans at 90% load (a short clip)
I finally measured what my rack sounds like when the GPU node is actually working. Here's a five-second clip and the numbers.
-
GitHub Actions cache lied to us
A stale cache partition was producing 'passing' builds for a bug that was 100% broken at runtime.
-
Migrating to SQLite STRICT tables: mostly boring, sometimes not
We turned on STRICT mode on a decade-old SQLite schema. Here is what broke.
-
Redis maxmemory, eviction, and the day we served stale for 20 minutes
noeviction is the default, and the default is dangerous when you thought you were running a cache.
-
We culled 40% of our alerts and nothing bad happened
A retrospective on how our team finally beat alert fatigue.
-
Autovacuum tuning, one table at a time
Global autovacuum settings are a lie. I tune per-table now.
-
Logical replication slot lag ate our WAL
A forgotten logical replication slot accumulated 380GB of WAL before we caught it. Here's what we changed.
-
When GIN on JSONB is wrong
The default JSONB index in Postgres is GIN, and there's a very common case where that's the wrong choice.
-
Postgres 12 changed my CTEs and nobody told me
In Postgres 12 the optimization fence around CTEs came down. For us, that was mostly bad.
-
Flamegraphs in production without the fear
I used to be scared of perf record in prod. Then I wasn't.
-
The label that killed Prometheus
One innocuous request_id label, 18M active series, and a very bad Friday.
-
Tail sampling that actually saved money
Head sampling is simple. Tail sampling works. Here is a config we run in production without sadness.
-
pgbouncer transaction pooling broke our prepared statements
A multi-day outage-adjacent incident caused by prepared statements not making it across pool boundaries.
-
Partition attach locks and the bloat we didn't see coming
We moved a 1.2TB table to native range partitioning. The migration was the easy part.
-
What they don't tell you about SQLite WAL in production
WAL mode is a huge usability win for SQLite, but it has teeth that bite during backups, fsync, and NFS.