Reading the Go scheduler traces for the first time

I’d known about GODEBUG=schedtrace for years and never actually used it, because every time I’d seen it come up in a talk or blog post, the output had looked like cryptic line noise and I figured I could infer what the scheduler was doing from metrics. That turns out to be very wrong. An evening of staring at schedtrace output taught me more about my service than a month of reading dashboards had.

Here’s the incantation:

GODEBUG=schedtrace=1000,scheddetail=1 ./myservice

This dumps a status line every second. scheddetail=1 gives you per-P (processor) and per-M (OS thread) information. The output looks like this:

SCHED 1000ms: gomaxprocs=8 idleprocs=0 threads=24 spinningthreads=2
  idlethreads=6 runqueue=0 gcwaiting=0 nmidlewait=0 stopwait=0 sysmonwait=0
  P0: status=1 schedtick=1247 syscalltick=33 m=4 runqsize=3 gfreecnt=0
  P1: status=1 schedtick=1198 syscalltick=28 m=2 runqsize=4 gfreecnt=0
  ...

Reading this took me a while. Here’s what each field means, condensed:

gomaxprocs: how many Ps (logical processors) the runtime has
idleprocs: how many of those are idle right now
threads: total OS threads the runtime has spun up
spinningthreads: threads actively trying to find work to steal
runqueue: the size of the global run queue
Per-P: runqsize is the per-P local queue, schedtick is how many goroutines the P has scheduled since boot

The thing that jumped out at me in my own service’s output was the thread count: 24. I have GOMAXPROCS=8. Why are there 24 threads?

This is the first hard lesson: the Go runtime creates an OS thread for every goroutine that’s currently blocked in a syscall. If you have 16 goroutines doing file I/O or sitting in an unhinted CGO call, those 16 goroutines hold 16 threads hostage. Your GOMAXPROCS just controls how many Gs can execute Go code simultaneously — there’s no cap on how many threads can be hanging out blocked.

For my service, the 24 came from a surprising place: the DNS resolver. We were doing roughly 200 DNS lookups per second against our internal service mesh, and Go’s default DNS resolver on Linux uses cgo by default, which means every resolution sits in a thread until it returns. Swapping to the pure-Go resolver dropped our thread count to 11:

// Force the pure-Go resolver
net.DefaultResolver.PreferGo = true

Or set GODEBUG=netdns=go at process start.

The second thing I learned was about runqueue and runqsize. If runqsize on a P is consistently high while idleprocs > 0, you have a work-stealing problem — other Ps aren’t noticing they have work to steal. This is rare, but it can happen if your goroutines churn very rapidly. If runqueue (the global queue) is growing, you’re producing goroutines faster than the Ps can execute them. That’s a sign of a burst of spawned work and usually correlates with a GC pause or a synchronous fan-out.

The third thing — and this is the one that actually helped me fix a bug — was syscalltick. That’s the count of goroutines that entered a syscall on that P. If syscalltick grows much faster than schedtick, you’re spending a lot of time in syscalls. In my case, I had one P with syscalltick dramatically higher than the others. That turned out to be pinned traffic from a single client who was hitting a code path that did a file open per request. Never a good look.

For fancier analysis, there’s also the “execution tracer”: runtime/trace. You call trace.Start(w) with a writer, run for a while, call trace.Stop(), and then view the output with go tool trace. This is a GUI, in a browser, that shows goroutine activity over time — you can see individual goroutines being scheduled, yielding, blocking on channels, running GC. It’s heavier than schedtrace but it’s spectacular when you actually need it.

import (
    "os"
    "runtime/trace"
)

func main() {
    f, _ := os.Create("trace.out")
    defer f.Close()
    trace.Start(f)
    defer trace.Stop()

    // ... your program ...
}

Then go tool trace trace.out. A browser opens. Poke around.

The only warning I’ll give: don’t leave schedtrace on in production. It’s cheap, but it’s not free, and parsing the output into metrics is a pain. I tend to turn it on for a couple of minutes in a running canary when I want to know what’s actually going on under the hood, and then turn it back off. It’s become my go-to for “this service feels weird but I can’t tell why” — more useful than a top-line CPU graph most of the time.