docs/bpf-notes.md

# BPF Notes

Informal notes I keep for myself while working on the kernel side of
httptap. If you have run into verifier strangeness, a kernel version
that refuses to load the object, or want to know why I didn't use maps
for X, skim this first.

## Supported kernels

I test on:

- 5.15 (Ubuntu 22.04)
- 6.1 (Debian 12)
- 6.6 (Arch rolling at the time of writing)
- 6.11 (my laptop)

The floor is 5.8 because that's when BPF ring buffers (`BPF_MAP_TYPE_RINGBUF`)
landed. Earlier kernels had perf event arrays; I decided the per-CPU
accounting and lost-sample tracking weren't worth carrying around two
code paths. Anyone stuck on 5.4 or older can use an older commit
(`2e5b8d0` was the last release on perf arrays).

Uretprobes themselves go back much further, so the TLS hooking side of
things is fine on any modern kernel; the constraint is purely the
transport.

## Ring buffers vs. perf arrays

I picked ring buffers because:

- one shared ringbuffer across all CPUs, no fan-out in userspace
- producer reservations are lock-free
- the loss counter is surfaced per-reservation, not at poll time

The downside is that a single slow consumer stalls every CPU. In
practice the cilium/ebpf reader hits ~3 Mevt/s on my laptop, which is
more than enough for a debugging tool. If I ever need to trace
10 Gbit/s traffic I'll revisit, but that's not what httptap is for.

## Verifier quirks

Things the verifier makes you do that feel odd the first time:

1. **Bounded loops only.** I initially wrote a tiny loop to memset the
   event struct; the verifier got angry. Replaced with an `__builtin_memset`
   which it handles fine. LLVM's bpf backend knows the form.
2. **`bpf_probe_read_user` return checks.** Every copy needs the
   return value checked or the verifier will reject downstream reads
   of the buffer. I gate everything with `if (ret < 0) return 0;`.
3. **Map lookups return pointers that must be null-checked.** This is
   obvious but easy to forget when you're writing what feels like C.
4. **Stack is small.** 512 bytes. My flow struct is 112 bytes; anything
   bigger goes in a percpu array map as scratch.

One thing that bit me early: the verifier tracks pointer provenance
across branches, and if you do `ptr = map_lookup(); if (ptr) {...} *ptr = x;`
outside the `if`, it rightly rejects. I now use an early-return pattern
everywhere.

## Uretprobe specifics

Uretprobes fire when the target function returns. That means you get
the return value for free but have to have cached the argument at
entry if you want it. For `SSL_read`, the buffer pointer is the
second arg and the return value is bytes read. I cache `(ssl, buf)`
in a per-task map on entry via a `uprobe`, and retrieve it on the
uretprobe.

There's a race here that is unfixable without kernel changes: if the
target process forks between entry and return, the child inherits
nothing. In practice this never matters for TLS I/O but I mention it
for completeness.

## Symbol resolution for Go binaries

Go's crypto/tls lives in the binary. The tracer reads the ELF symbol
table and (if present) DWARF debug info to find
`crypto/tls.(*Conn).Write` and `crypto/tls.(*Conn).Read`. Stripped
binaries have neither; for those, `--go-symbols` accepts a path to an
un-stripped copy.

Args come from registers on amd64 (ABI0 stack on older Go, ABIInternal
registers on 1.17+). The BPF code supports both via
`BPF_CORE_READ_STR_INTO` dispatched on a version-detected flag that
userspace sets in a `.rodata` variable at load time.

## Maps in use

| Map                    | Type                    | Entries  | Purpose                                     |
|------------------------|-------------------------|----------|---------------------------------------------|
| `events`               | RINGBUF                 | 256KB    | outbound events to userspace                |
| `pid_ssl_to_flow`      | HASH                    | 65536    | `(pid, ssl*)` -> flow_id                    |
| `entry_scratch`        | PERCPU_ARRAY            | 1        | per-CPU scratch for entry probe             |
| `go_tls_args`          | HASH                    | 4096     | pid+tid -> entry args for Go probes         |

None of them are large. The ringbuf dominates at 256KB. `pid_ssl_to_flow`
can fill up if you attach to a noisy proxy; at that point we start
returning a fresh flow id and stamping events "reassembly lost".

## CO-RE and kernel variation

I use `libbpf` CO-RE (`BPF_CORE_READ`) throughout. No BCC, no
per-kernel rebuilds. The BPF object is compiled once in CI and the
field offsets for a given kernel are resolved at load time. This is
the single biggest quality-of-life improvement over the early
versions of this repo.

Older kernels (5.8, 5.10) have slightly different `struct task_struct`
field names; CO-RE handles it as long as the BTF for that kernel is
present. Distributions ship BTF by default these days; on homegrown
kernels you may need `CONFIG_DEBUG_INFO_BTF=y`.

## Things I tried and reverted

- **Using `bpf_tail_call` to chain a parser in-kernel.** Saved one
  ringbuffer copy, added a giant pile of complexity. The parser was
  also very hard to reason about with the verifier. Reverted in
  `f8e0a12`, commit message "tracer: switch to cilium/ebpf for arm64
  support", which also swept this away.
- **Uprobe instead of uretprobe on `SSL_write`.** Gave me the buffer at
  entry which is what I wanted, but meant the event arrived before any
  errors; partial writes were then ambiguous. Uretprobe and the return
  value make life easier.
- **Embedding Go symbols in the BPF object.** Too fragile across Go
  versions. Resolving at runtime wins.

## Debugging tips

- `bpftool prog list` to see what attached.
- `bpftool map dump id <n>` to inspect `pid_ssl_to_flow`.
- `cat /sys/kernel/debug/tracing/trace_pipe` if you sprinkle
  `bpf_printk` while developing. Strip them before merging.
- `llvm-objdump -d tracer.bpf.o` to see what the compiler produced
  when the verifier yells at a specific instruction offset.

## When it won't load

If `tracer_bpf.o` fails to load, in order of likelihood:

1. Kernel lacks CAP_BPF or you're not root
2. Kernel predates 5.8 (check `uname -r`)
3. `/sys/kernel/btf/vmlinux` missing (install `linux-image-*-dbg` or
   equivalent, or rebuild kernel with `CONFIG_DEBUG_INFO_BTF=y`)
4. `libssl.so` is BoringSSL and symbol names differ - pass
   `--probe SSL_read:path/to/lib` to disable auto-detection

The userspace side logs the exact verifier message on load failure.
That log is almost always enough to diagnose the problem without
needing `veristat`.