Head vs tail sampling: the mental model I wish I'd had

I spent longer than I’d like to admit being fuzzy on head vs tail sampling in distributed tracing. Every time I read an explanation I’d nod along and then a week later I’d have to re-derive it. This post is the mental model I finally built that stuck.

The core question

Every distributed trace eventually has to make a keep/drop decision. You collect less data than is produced, and you want the sample you keep to be useful.

Head sampling makes the decision at the start, before the trace completes. Tail sampling makes the decision at the end, after seeing the whole trace.

That’s it. Everything else is consequence.

Why head is cheap

Head sampling is cheap because the decision is local. At the service where the trace starts, you flip a coin (or hash the trace_id deterministically). If kept, you propagate a “keep me” flag through all downstream calls. Downstream services honor the flag. One decision, no coordination.

This is how probabilistic sampling works in almost every tracing library by default. 10% head sampling means: at trace start, keep 10% of traces at random, and every span in those traces flows through.

The property this gives you: all or nothing per trace. Either you have the whole trace or you have none of it. This is really valuable — incomplete traces are mostly useless.

Why head is limited

The decision is made before anything interesting happens. You don’t know if the trace will error. You don’t know if it’ll be slow. You can’t say “keep all error traces” with head sampling, because at the start of the trace, you don’t know yet.

Why tail is expensive

Tail sampling needs to see the whole trace, which means buffering all the spans somewhere, waiting for the trace to complete, and then making a decision. Buffering has a memory cost. Waiting has a latency cost (for your pipeline, not for users). And determining “is this trace complete” is actually hard — traces can fragment, services can die mid-trace, etc.

The buffering also has to be centralized per-trace. Span A from service X and span B from service Y both belong to trace T — they need to land on the same sampler to be considered together. This usually means load-balancing on trace_id.

Why tail is worth it

Because you can sample based on trace properties. “Keep all errors.” “Keep all traces slower than 500ms.” “Keep 50% of traces for customer X, 1% of everyone else.” These are impossible with head sampling, trivial with tail.

And the quality of your trace dataset jumps. Head-sampled dataset: random selection, probably mostly boring. Tail-sampled dataset: errors, slow traces, edge cases, plus a probabilistic baseline. The tail-sampled dataset is what you actually want.

The hybrid approach

Most real setups use both.

Head sampling happens at the service level. Usually you propagate a “sampling priority” hint via the sampled bit in the W3C traceparent header. Every service respects the hint. This prevents a poorly-configured service from spraying 100% of its traces into the pipeline and overwhelming the collector.

Tail sampling happens at a centralized collector, with the output from head sampling as its input. If head says “already decided to drop this,” tail doesn’t see it. If head says “keep for tail decision,” tail decides.

Head sampling protects the pipeline from overload. Tail sampling makes the final dataset useful. They’re complementary, not competing.

The configuration that confused me

OpenTelemetry Collector has a probabilistic_sampler and a tail_sampling processor. Both can be in a pipeline. The order matters.

processors:
  probabilistic_sampler:
    sampling_percentage: 50  # drop half as a first filter
  tail_sampling:
    decision_wait: 10s
    policies:
      - { name: errors, type: status_code, status_code: { status_codes: [ERROR] } }
      - { name: baseline, type: probabilistic, probabilistic: { sampling_percentage: 10 } }

If probabilistic_sampler comes first in the pipeline, it pre-filters. If tail_sampling comes first, the baseline probabilistic policy inside tail is the only filter. Decide which behavior you want. For us, probabilistic-first was right — it protected the tail_sampling processor’s memory from getting slammed during traffic spikes.

Gotchas I’ve hit

Hash-based head sampling has to agree across services. If service A and service B both “keep 10%” but use different hash functions, you’ll get incoherent traces (some spans from a trace kept, others dropped). Use trace_id as the input to the sampling decision, and make sure every service uses the same algorithm.
Tail sampling doesn’t work if spans are fragmented across multiple collectors. You need load-balancing on trace_id in front of the collector tier.
Buffer size matters. num_traces in the OTEL tail sampler is a per-collector limit. Too small, and traces get evicted before decision. Too big, and you run out of memory.

Reflection

The simplest way I can say it: head sampling gives you reliable baselines, tail sampling gives you useful exceptions. Use both. Understand that the head decision cascades through the call graph; the tail decision is per-trace but centralized.

Related: Tail sampling that actually saved money has our actual production config.