Debugging
-
httptap
A terminal UI that lets you inspect live HTTP traffic on a process without reaching for Wireshark.
-
nix-shell for reproducing bugs from five years ago
A small, practical use of Nix that doesn't require buying into the ecosystem.
-
TIL: Go race detector can cause tests to pass that would fail in production
The race detector slows things down enough to hide some timing-dependent failures.
-
A Heisenbug in a Go channel close that took me two weeks
The bug went away when I added logging. I know. Here is what was going on.
-
strace revealed our libc mismatch
A service worked on one image and not another. The difference was invisible until we traced syscalls.
-
Debugging a remote core dump without losing your mind
A core dump from production is a gift. Here is how I unwrap it.
-
A workflow for flaky tests that doesn't involve retry-until-green
We stopped rerunning flakes and started investigating them. Our suite is healthier and faster for it.
-
GitHub Actions cache lied to us
A stale cache partition was producing 'passing' builds for a bug that was 100% broken at runtime.
-
Redis maxmemory, eviction, and the day we served stale for 20 minutes
noeviction is the default, and the default is dangerous when you thought you were running a cache.
-
Flamegraphs in production without the fear
I used to be scared of perf record in prod. Then I wasn't.
-
The label that killed Prometheus
One innocuous request_id label, 18M active series, and a very bad Friday.
-
pgbouncer transaction pooling broke our prepared statements
A multi-day outage-adjacent incident caused by prepared statements not making it across pool boundaries.
-
The bitmap heap scan that ate our p99
A query that should have been a boring index scan turned into a full-table shuffle because the planner guessed wrong.
-
Capabilities bounded the wrong way
A service that could no longer bind to low ports after an innocent systemd change, and what I learned about capability sets
-
httptap: attach to a running service
Pointing httptap at a flaky payments-api and watching a backoff loop reveal itself in real time.