One of the reasons I wrote ripgrab was because I got tired of piping tail -F through grep and awk to answer the same three questions every incident: which host, which status, how often.

Here I’m tailing /var/log/nginx/access.log with --filter 'status>=500' and --group-by host. The filter is a real expression, not regex over the raw line — ripgrab parses the combined nginx format under the hood, so I can refer to fields by name. You can see the output has a fixed color scheme per host (magenta for api.internal, a different one for assets.internal) which makes it obvious at a glance when one service is responsible for a spike.

Notice that after a few seconds it prints a rolling 30-second summary. That’s the feature I use most. During a bad deploy it turns a wall of noise into a single line that says “api.internal is on fire, assets is mostly fine.”

Then I hit / to open the interactive filter prompt and narrow further to charge. From there it’s obvious the /v1/charge endpoint on api.internal is where the 503s live. That’s what I hand to the on-call engineer: one endpoint, one host, and a timestamp range.

A few things that make this feel different from tail | grep:

  • Filters are parsed once and compiled. Adding more filters doesn’t slow it down.
  • The summary is wall-clock based, so a quiet period doesn’t inflate old events.
  • q quits cleanly and tells me what I looked at. Useful for pasting into the incident doc.