When journald ate my disk

A monitoring alert went off at 4 AM: root filesystem on one of our app hosts was 97% full. I rolled out of bed, VPN’d in, and found /var/log/journal holding a very even 93 GB. Someone’s debug logs had been running hot for days, and journald had been storing every line.

Why did it let this happen

You would think journald would have a default limit. It does. The default SystemMaxUse is “10% of the filesystem or 4 GB, whichever is less”. But there is a subtle thing: SystemMaxUse is a limit, not a target, and the actual accounting is based on journal file sizes. If you have ten journal files each at 4 GB, you are over the limit; if they are all active (not yet sealed), the limit check waits until they rotate. On this host, the journal files were rotating rarely because the process was filling them at thousands of lines per second, and the rotation logic was apparently conservative.

More importantly, the effective limit on this host was not the default. I checked:

journalctl --disk-usage
# Archived and active journals take up 93.1G in the file system.

grep -E '^(Storage|SystemMaxUse|RuntimeMaxUse)=' /etc/systemd/journald.conf
# Storage=persistent

No explicit max. So it should be the default. But:

grep -E 'SystemMaxUse' /etc/systemd/journald.conf.d/*.conf
# /etc/systemd/journald.conf.d/99-our-override.conf:SystemMaxUse=100G

Someone had dropped a 100 GB limit in a conf.d override long ago. I had not known this was there.

The immediate fix

Step one, stop the bleeding:

journalctl --rotate
journalctl --vacuum-size=1G

The --vacuum-size flag deletes archived journal files until the total disk usage is below the target. On a host that was actively writing, I had to rotate first so that the current file became archived and therefore eligible for deletion. vacuum-size=1G freed 90 GB in a few seconds.

Step two, stop the process:

journalctl --no-pager -n 200 | awk '{print $5}' | sort | uniq -c | sort -rn | head
# 184  app[1234]:
#  12  systemd[1]:
#   4  kernel:

The culprit was obvious. app[1234] was logging at fire-hose rate. kill -USR1 1234 to get it to dump stats, then I reached for my runbook to find the graceful restart for this service. Someone had set its log level to DEBUG in an env var that was not supposed to be on this host.

The real fixes

Fix the config precedence. Our puppet-style config management had two overrides fighting each other: one intended for a staging host (100 GB) and one intended for prod (2 GB). The selector was wrong. I fixed the selector and pushed.
Add rate limiting in journald itself. I added:
```
# /etc/systemd/journald.conf.d/99-ratelimit.conf
[Journal]
RateLimitIntervalSec=10s
RateLimitBurst=2000
```
This caps any one service at 2000 messages per 10s. A flood of errors still gets into the journal, but a runaway process with DEBUG level is stopped. Suppressed messages show up in the journal as Suppressed XX messages from unit foo.service.
Add a canary alert. A tiny prometheus exporter reads journalctl --disk-usage every minute and exposes a metric. Alert at 50% of the configured SystemMaxUse. This is the kind of thing you do after you get bitten.

Vacuum gotchas

Some things I learned the annoying way:

--vacuum-size does not respect paid rotations; if you vacuum to 1G and your current file is 2G, you end up with 2G because the current file cannot be deleted.
--vacuum-time=7d deletes records older than 7 days, which sounds like what you want until you realize the unit of deletion is whole journal files. If a single file spans six weeks, --vacuum-time=7d will not touch it. You need to rotate first.
On a box where /var/log/journal does not exist, Storage=persistent creates it on next boot. I forgot this once and wondered why my journal was still in tmpfs after setting persistent storage.

Log-level hygiene

After this I made a small pass across our roles to reject DEBUG-level logging in prod. Our shipping services honored a LOG_LEVEL env var; our Ansible role now refuses to deploy to prod if it is set to anything other than INFO or WARN:

- name: enforce log level in prod
  fail:
    msg: "LOG_LEVEL={{ log_level }} not allowed in prod"
  when:
    - inventory_hostname in groups['prod']
    - log_level is not in ['INFO', 'WARN', 'ERROR']

Not perfect, but it catches the obvious mistake.

Reflection

Journald is great. It does exactly what you ask. The gotcha is that the implicit contract is about rates, and a poorly-configured journald will happily store 93 GB of one service’s DEBUG logs before anyone notices. Rate-limiting the journal and alerting on its disk usage are cheap insurance.

Related: see my post on systemd timers and the clock drift that ate our backups for another story of default-ish systemd behavior being not-what-you-thought.