docs/architecture.md

# Architecture

How the NUC under my desk runs Jellyfin, Immich, Paperless, Gitea,
Syncthing, Pi-hole, Prometheus, and Grafana, in one compose stack,
behind a single Caddy reverse proxy.

If you want "what to do when X breaks," read
[docs/runbook.md](/src/homelab-compose/docs-runbook-md/). If you
want the backup story specifically, see
[docs/backup-strategy.md](/src/homelab-compose/docs-backup-strategy-md/).

## Hardware

- Intel NUC 12 Pro (i5, 64 GB RAM, 2 TB NVMe for system + hot data)
- External 4-bay DAS with 4 x 12 TB CMR drives in a `btrfs raid1`
  for media/photos/documents
- UPS wired via USB, monitored by NUT

Runs Debian stable with unattended-upgrades. Everything below is
Docker Compose on top of that; no Kubernetes, no Portainer, no
agents.

## Network topology

    +-----------------------------+
    |      Internet (IPv4)        |
    +--------------+--------------+
                   |
             (home router)
                   |
          +--------v--------+
          |   Caddy (443)   |  TLS, reverse proxy
          +--------+--------+
                   |
    +--------------+--------------+
    |     Docker bridge network   |
    | (name: homelab)             |
    |                             |
    |  jellyfin  immich  paperless|
    |  gitea     syncthing pi-hole|
    |  grafana   prometheus       |
    +-----------------------------+

- Caddy is the only container with ports on the host.
- All service-to-service traffic stays on the `homelab` bridge
  network.
- Pi-hole uses `network_mode: host` so it can bind UDP 53 without
  fighting Docker's userland proxy. It also means Pi-hole sees
  its peers by the host's LAN IP, not the Docker bridge.

Split-horizon DNS: my router forwards every internal DNS query to
Pi-hole. Pi-hole resolves `*.home.example.net` to the NUC's LAN IP,
which hits Caddy on port 443. Public DNS has no entries for these
names; they are resolvable only inside the LAN.

## Services and what they do

| Service    | Public path                  | Port | State bind mount              | Backup priority |
|------------|------------------------------|------|-------------------------------|-----------------|
| Caddy      | n/a (is the proxy)           | 443  | `caddy_data`, `caddy_config`  | low (regenerable) |
| Jellyfin   | `jellyfin.home.example.net`  | int  | `/srv/media`, `jellyfin_cfg`  | medium          |
| Immich     | `photos.home.example.net`    | int  | `/srv/photos`, `immich_db`    | high            |
| Paperless  | `docs.home.example.net`      | int  | `/srv/docs`, `paperless_db`   | high            |
| Gitea      | `git.home.example.net`       | int  | `gitea_data`, `gitea_db`      | high            |
| Syncthing  | `sync.home.example.net`      | int  | `/srv/sync`                   | high            |
| Pi-hole    | `pihole.home.example.net`    | 53   | `pihole_etc`                  | low             |
| Prometheus | `metrics.home.example.net`   | int  | `prometheus_tsdb`             | low             |
| Grafana    | `grafana.home.example.net`   | int  | `grafana_data`                | low             |

See [`docker-compose.yml`](/src/homelab-compose/docker-compose-yml/)
for exact image tags and volume bindings.

"int" means "only reachable through Caddy." The services have no
host-level port bindings.

## Trust boundaries

There are three concentric circles of trust:

    +------------------------------------+
    |  Internet                          |
    |   Blocked at router. Only tailscale|
    |   tunnels for remote access.       |
    +------------------------------------+
           |
    +------v-----------------------------+
    |  LAN                               |
    |   Caddy answers 443 and HTTPS-terms|
    |   with certs from internal CA.     |
    |   Every app is accessible here.    |
    +------------------------------------+
           |
    +------v-----------------------------+
    |  Docker bridge 'homelab'           |
    |   App-to-app traffic on this net.  |
    |   Prometheus scrapes from here,    |
    |   nothing leaves except through    |
    |   Caddy.                           |
    +------------------------------------+

No container runs with `--privileged`. Jellyfin has `/dev/dri` for
QuickSync; Immich has the ML model volume mounted read-only; Pi-hole
is the only one with `network_mode: host` and that comes with
`cap_add: [NET_ADMIN]`, scoped as narrowly as possible.

Secrets (database passwords, Grafana admin password, Immich DB
password) live in `.env` which is not committed. See
[`.env.example`](/src/homelab-compose/env-example/) for the full list
of variables.

## Reverse proxy: Caddy

Caddy config is in [`Caddyfile`](/src/homelab-compose/caddyfile/). For
each service, a block like:

    jellyfin.home.example.net {
      tls /data/certs/home.example.net.crt /data/certs/home.example.net.key
      reverse_proxy jellyfin:8096
    }

Certificates come from my internal step-ca. Caddy does not use ACME
because the services are not publicly reachable. The cert volume
(`/data/certs`) is synced from step-ca via a small `renew-cert.sh`
script run weekly by cron; Caddy reloads its config on file change
automatically.

Grafana gets an extra `basic_auth` block in front for a second
factor, added in `9a0bdf4`. This is belt-and-suspenders; Grafana's
own auth is enabled, but this catches anything misconfigured.

## Storage layout

    /srv/
      data/        bind-mount root for service state
        jellyfin/
        immich/
        paperless/
        gitea/
      media/       Jellyfin library, read-mostly
      photos/      Immich originals
      docs/        Paperless consume + archive
      sync/        Syncthing shared folder
      backup/      rsync --link-dest snapshots (see backup-strategy.md)

`/srv/data` is on NVMe. `/srv/media`, `/srv/photos`, `/srv/docs`,
`/srv/sync` are on the RAID1. `/srv/backup` is on NVMe for speed,
rotated to B2 nightly.

## Update policy

Pinned image tags. I don't use `:latest` for anything except
Caddy, and I might change my mind about Caddy too. Sunday morning,
I run:

    docker compose pull
    docker compose up -d

...and read the changelogs. `watchtower` was removed in `5512de8`
because I'd rather eat the manual work than explain a silent
regression to myself.

## Backup flow

Summarised here; details in
[docs/backup-strategy.md](/src/homelab-compose/docs-backup-strategy-md/).

    nightly 03:15 -> scripts/backup.sh
        |
        +-- rsync --link-dest /srv/data -> /srv/backup/YYYY-MM-DD/
        +-- database dumps (immich, paperless, gitea)
        +-- rclone copy newest snapshot -> B2
        +-- prune old snapshots locally (keep 30d + 12m)
        +-- prune old snapshots on B2 (same policy)

Health check runs hourly:

    scripts/health-check.sh
        curl each service's healthz, mail on non-200
        (exit 2 on warn so cron's MAILTO fires; see eab2c71)

## Prometheus and Grafana

Prometheus scrapes:

- cadvisor for container metrics
- node_exporter for host metrics
- Pi-hole's internal stats endpoint
- Jellyfin's Prometheus export
- the Caddy admin endpoint (on the Docker bridge, not exposed)

Grafana dashboards are provisioned from
`grafana/provisioning/dashboards/`. Dashboards are versioned;
edits in the UI get committed back manually when they stabilise.

## How things talk

- Jellyfin, Immich, Paperless, Gitea each have their own Postgres
  (or SQLite for Gitea). No shared DB.
- All Postgres containers use `pg_isready` as a healthcheck.
- Immich's ML container is a separate service pinned to a specific
  model version (`12fce10`); it shares a volume with the Immich
  server for model cache.
- Syncthing discovers peers via the relay servers on the public
  internet; its listening port is bound only on the LAN.

## What I don't run

By choice:

- **Nextcloud**. I tried it twice; too many moving parts for what I
  need. Syncthing + Paperless + Immich covers the actual use cases.
- **Kubernetes**. One machine, one user, one compose file. k3s would
  be an entertainment project, not a homelab improvement.
- **A dedicated Vaultwarden**. I use Bitwarden as a paid user; no
  reason to self-host.

## If I were starting over

- Same Docker Compose shape. One machine, one file.
- Same single-Caddy pattern. Certs from step-ca.
- Probably on ZFS instead of btrfs; native encryption and scrub
  reporting are nicer. btrfs has been fine though.
- I'd put the database dumps on a separate volume so their rsync
  throughput doesn't fight the media library during backups.

## Files in this repo

| Path                                                                                 | Purpose                                |
|-------------------------------------------------------------------------------------|----------------------------------------|
| [`docker-compose.yml`](/src/homelab-compose/docker-compose-yml/)                    | all services                           |
| [`Caddyfile`](/src/homelab-compose/caddyfile/)                                      | reverse proxy config                   |
| [`.env.example`](/src/homelab-compose/env-example/)                                 | required env vars                      |
| [`scripts/backup.sh`](/src/homelab-compose/scripts-backup-sh/)                      | nightly backup                         |
| [`scripts/restore.sh`](/src/homelab-compose/scripts-restore-sh/)                    | restore from snapshot                  |
| [`scripts/health-check.sh`](/src/homelab-compose/scripts-health-check-sh/)          | hourly probe of each service           |