The GC knob that actually helped

For a long time, my advice on Go GC tuning was “don’t bother.” The default GC is good. GOGC=100 (the default) is fine for almost everyone. If you’re spending a lot of time in GC, your real problem is probably that you’re allocating too much, not that the GC is misbehaving.

I still think that’s mostly true. But in the last year, I’ve tuned one specific knob — GOMEMLIMIT — enough times that I’m revising my advice.

The problem GOMEMLIMIT solves is something like: you run a Go service in a container with a 2GB memory limit. Your service’s live heap is around 800MB under normal load. The Go GC, by default, lets the heap grow to roughly 2x the live set before running — so it wants to grow to ~1.6GB. Fine. But under a burst of traffic, your live set might briefly spike to 1.3GB. The GC still applies GOGC=100, meaning it wants to let the heap grow to 2.6GB. The kernel OOMs you.

Before GOMEMLIMIT, your options were:

Set GOGC really low, which made GC fire all the time and ate CPU
Increase the container memory, which was expensive and wasteful
Set GOGC=off and run a periodic runtime.GC() call manually, which was gross

With GOMEMLIMIT (added in Go 1.19), you can set a soft memory target:

GOMEMLIMIT=1800MiB

The GC uses this as a target and will run more aggressively as you approach the limit, regardless of GOGC. You get the best of both worlds: when memory is plentiful, GC runs at the relaxed default rate; when you’re pushing the limit, GC fires more often to keep you under.

In practice, I set it to ~90% of the container memory limit, leaving headroom for stack allocations, cgo, and other non-Go-heap usage:

// in your main, for containers that inject MEMORY_LIMIT_BYTES
func init() {
    if s, ok := os.LookupEnv("MEMORY_LIMIT_BYTES"); ok {
        if n, err := strconv.ParseInt(s, 10, 64); err == nil {
            debug.SetMemoryLimit(int64(float64(n) * 0.9))
        }
    }
}

Or just set it via environment variable on the container:

# kubernetes pod spec
env:
  - name: GOMEMLIMIT
    value: "1800MiB"
resources:
  limits:
    memory: "2Gi"

A service at a previous job went from OOMing roughly once a week under traffic spikes to effectively never, just from this one change. The GC spent slightly more CPU on average (we went from ~4% of CPU in GC to ~6%), but the tail latency actually improved because we stopped having cold restarts.

Some other GC knobs that occasionally matter:

GOGC: controls the target growth ratio. Default is 100 (meaning: grow to 2x live set). Lower values mean more frequent GC, less memory, more CPU. Higher values mean the opposite. If GOMEMLIMIT is set and GOGC is set, the runtime uses whichever causes GC to fire sooner. You can also set GOGC=off to disable automatic GC, but don’t, because that’s asking for trouble.

GODEBUG=gctrace=1: prints a line on each GC. Format is dense but useful:

gc 42 @5.123s 2%: 0.015+3.2+0.010 ms clock, 0.12+1.1/2.9/0.87+0.08 ms cpu, 102->104->52 MB, 110 MB goal, 8 P

The 102->104->52 MB tells you: heap at start of GC, after sweeping, live heap. The 110 MB goal is what the runtime was targeting. In a healthy service, these numbers should be stable and the goal should be well under GOMEMLIMIT.

runtime.SetFinalizer: basically never useful. If you’re using finalizers, you almost certainly have a design problem.

runtime/debug.FreeOSMemory(): force the runtime to return free memory back to the OS. Mostly useful after a large one-time allocation you’re done with. I call it in a service that spikes memory during ETL and then idles.

A pitfall I’ve seen: GOMEMLIMIT is a SOFT target. The runtime will breach it if it has to — e.g., if you’re allocating faster than GC can keep up. It’s not going to panic or block allocation. So if you’re allocating unboundedly, GOMEMLIMIT won’t save you. But for the common case of “my service is healthy but occasionally bursts,” it’s exactly the right tool.

Another thing to be aware of: GOMEMLIMIT was not designed to be a hard limit. If you really need to bound memory usage, combine it with a cgroup/container limit and expect the occasional OOM under extreme load. But in the normal case, the GC will keep you well under.

I’d have paid real money to have known about this knob two years ago. Container-deployed Go services with memory limits are a common deployment pattern, and the pre-1.19 GC defaults were actively bad for that case. If you haven’t revisited your GC settings since 1.19, it’s worth an hour.

Semi-related: my post on sync.Pool covers the other side of this — reducing allocation pressure at the source.