Was debugging a graph that showed our p99 latency as 2ms, which was suspicious because users were complaining about slow requests.

The query I had was:

histogram_quantile(0.99, http_request_duration_seconds_bucket)

The right query is:

histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

The difference: without rate(), you’re computing the quantile over the cumulative-since-process-start values of the buckets. That’s dominated by historical traffic. With rate(), you compute it over the recent rate of change, which is what you actually want.

And the sum(... by (le)) is important: if you have multiple instances, you need to sum them up first by bucket, then compute the quantile. Otherwise histogram_quantile tries to compute it per-instance and average, which is wrong.

Canonical form I now paste into every new dashboard:

histogram_quantile(0.99,
  sum by (le, route) (rate(http_request_duration_seconds_bucket[5m]))
)

le is always in the by. Other labels depend on what you’re grouping by (route, status, etc.).

This is in the Prometheus docs. I had read it. I had apparently not internalized it. Future me: if the histogram-based quantile graph looks too smooth and too low, you forgot the rate.