Profiling a Go service with many concurrent workers. The flamegraph had all the worker time merged together. I could see “worker.go is hot” but not “worker processing tenant X is hot” vs “tenant Y.”

Enter pprof.Labels:

import "runtime/pprof"

pprof.Do(ctx, pprof.Labels("tenant", tenant.ID, "kind", job.Kind), func(ctx context.Context) {
    doTheWork(ctx)
})

Every sample captured while that goroutine runs inside the pprof.Do block gets tagged with those labels. Then in go tool pprof:

(pprof) tags
(pprof) list someFunc -tagfocus="tenant:t123"

Suddenly I can see “this function takes 300ms for tenant t123 but 12ms for everyone else.” Which was exactly the information I needed, and nothing else surfaces it.

Small caveat: only CPU profiles respect labels (not heap profiles, not goroutine dumps). And labels add a small overhead per sample — negligible unless your labels are themselves allocated per-call.

I now routinely add labels for tenant/customer/request-kind at the top of handler goroutines. Cost is tiny, payoff is huge when debugging.