stacks/monitoring/README.md

# monitoring

Prometheus + Alertmanager + Loki + Grafana, plus a couple of exporter
helper scripts. This is the eyes for the rest of the stack.

See also: mercemay.top/src/homelab-compose/

## Layout

```
monitoring/
  alertmanager/
    alertmanager.yml
    templates/
      email.tmpl
      slack.tmpl
  blackbox/
    blackbox.yml
  exporters/
    node-exporter-textfile.sh
    smart-exporter.sh
  grafana/
    dashboards/
    datasources/
    provisioning/
    grafana.ini
  loki/
    loki.yaml
    promtail.yaml
  prometheus/
    prometheus.yml
    rules/
      alerts.yml
      recording.yml
      blackbox.yml
```

## Bringing it up

```
docker compose -f stacks/monitoring/docker-compose.yml up -d
```

All containers are attached to `net-monitor` (internal) and the edge
services additionally to `caddy-edge` for reverse proxying.

## Retention

- Prometheus: 30d, 50 GiB on the SSD pool
- Loki: 30d, compactor enabled, filesystem store
- Grafana: SQLite with WAL, backed up nightly by `backup/stages/sqlite-backup.sh`

## Adding a new scrape target

1. Put the service on `net-monitor` in its compose.
2. Expose a `/metrics` port (or use blackbox for pure HTTP).
3. Add a `job_name` to `prometheus/prometheus.yml`.
4. Add / adjust alerts in `prometheus/rules/alerts.yml`.
5. Reload with `docker compose kill -s HUP prometheus`.

## Alertmanager routing

- `severity=critical` -> slack `#alerts` + pager email
- `severity=warning` -> mailbox only
- blackbox-specific alerts -> `#http-health`

Secret values (Slack webhook, SMTP password) live in
`/run/secrets/*` and are mounted by the compose file.