Sre

Mar 20, 2026 CI dependency graph for a monorepo that doesn't run everything
The core trick that cut our PR CI time from 22 minutes to 4 on average.
Feb 2, 2026 We tried buildpacks. I don't recommend them for most teams.
A six-month experience report. The value proposition is real; the downsides are too.
Jan 4, 2026 SLO math for tired engineers
Enough formulas to write real alerts without spending a weekend in a textbook.
Dec 20, 2025 Structured logging lessons from four years of zerolog
Structured logging is not the hard part. The hard part is everything around it.
Dec 7, 2025 Head vs tail sampling: the mental model I wish I'd had
I conflated these for years. Here is a cleaner way to think about them.
Oct 26, 2025 strace revealed our libc mismatch
A service worked on one image and not another. The difference was invisible until we traced syscalls.
Oct 12, 2025 Debugging a remote core dump without losing your mind
A core dump from production is a gift. Here is how I unwrap it.
Sep 15, 2025 Dev containers at 30 engineers: the unglamorous middle
Dev containers solve real problems but have their own operational tail.
Aug 2, 2025 Redis maxmemory, eviction, and the day we served stale for 20 minutes
noeviction is the default, and the default is dangerous when you thought you were running a cache.
Jul 18, 2025 We culled 40% of our alerts and nothing bad happened
A retrospective on how our team finally beat alert fatigue.
Jul 5, 2025 Autovacuum tuning, one table at a time
Global autovacuum settings are a lie. I tune per-table now.
Jun 22, 2025 Logical replication slot lag ate our WAL
A forgotten logical replication slot accumulated 380GB of WAL before we caught it. Here's what we changed.
May 1, 2025 The label that killed Prometheus
One innocuous request_id label, 18M active series, and a very bad Friday.
Apr 18, 2025 Tail sampling that actually saved money
Head sampling is simple. Tail sampling works. Here is a config we run in production without sadness.
Apr 5, 2025 pgbouncer transaction pooling broke our prepared statements
A multi-day outage-adjacent incident caused by prepared statements not making it across pool boundaries.