Every time a pod is in CrashLoopBackOff I run the same three commands in the same order. I may as well record it, so here it is.

The sequence is get pods to see the state, describe pod ... | tail to see the events (which is where the kubelet will tell you whether it’s a scheduling, image, or runtime problem), and then logs --previous to see what the container said before it died. --previous is the flag I wish I’d learned years earlier than I did; without it you get the logs of the current — often zero-lived — attempt.

Notice that in this case the events don’t tell me anything interesting: it’s just “Back-off restarting.” The useful signal is in logs --previous: the worker dies immediately because DNS can’t resolve redis.api.svc.cluster.local. That’s a missing Service object, not a missing deployment.

kubectl get svc -n api | grep redis confirms there’s nothing there. I reapply the svc yaml, then kubectl get pods -w to watch the pod go through Terminating -> ContainerCreating -> Running.

Callouts:

  • The --previous (or -p) flag on kubectl logs is the single most important crashloop debugging tool. Always use it for crashing pods.
  • I used tail -18 on the describe output because the events are always at the bottom and the top is noise I don’t need.
  • A CrashLoopBackOff with a “cannot reach dependency” is almost always a Service, ConfigMap, or Secret that got deleted by someone’s cleanup script. Check those first.

This is a made-up scenario but the shape of the investigation is exactly the one I run in production.