TIL: Go race detector can cause tests to pass that would fail in production

Had a test that was flaky under go test but solid under go test -race. My naive read: “race detector fixed it.” Wrong.

The race detector adds memory access tracking, which slows goroutines down. That changed the scheduling of a bunch of concurrent operations in the test just enough to hide the race (while still detecting it — the tool was doing its job). When we ran without -race in production, the race conditions would reappear intermittently.

Lesson: go test -race detects races, but it also changes timing. A test passing under -race doesn’t mean the race is gone. It means the race is currently detected-or-benign. Running without -race can expose different scheduling-dependent bugs.

What I do now in CI:

# run both. catches different classes of bugs.
go test ./...            # non-race, production-like timing
go test -race ./...      # race detector, different timing

Budget time for both. The non-race run is faster but both are worth running.

Related gotcha: the race detector only fires when it observes a race during the run. It can miss races that only happen under specific loads. Getting “race detector is clean” on your test suite is necessary but not sufficient for “the code is race-free.”

Two-line learning. Big implications for the trust I place in -race.