Why we picked NATS over Kafka (and when we would have picked Kafka)
Abstract
“Why we picked X over Y” talks are usually post-hoc rationalizations for a decision someone made in a meeting you were not in. I tried to make this one different by actually presenting the matrix we used, the things we got wrong about that matrix, and the workloads where I would pick Kafka today if I were doing it again. No winners, no losers, just a team of four engineers trying to move messages between services without pretending to be LinkedIn.
Outline
- The workload: ~40k msg/sec, at-least-once, low latency, multi-region
- The matrix we started with: ops overhead, client maturity, storage cost, replay
- Where NATS JetStream surprised us (the good): ops simplicity, subject wildcards
- Where NATS JetStream surprised us (the bad): consumer semantics, unclear reliability boundaries
- The Kafka features we ended up re-implementing badly in NATS
- The Kafka features we genuinely did not need
- If I were starting over: the one question that would flip the decision
- Q&A
What I learned giving it
The Kafka users in the room were much friendlier than I expected. I think the honest “here is what we got wrong, here is when we would pick yours” framing helped. Nobody wants a vendor-pitch talk at a meetup.
The question I had no good answer to: “what does your disaster recovery story look like if an entire NATS cluster dies?” I had one, but it was less rehearsed than I thought, and I stumbled. That is going in the next version of the deck.
What I’d change
- Put the decision matrix on a slide as early as possible. I buried it at slide 12 and it is the thing most people wanted to see.
- Cut the history section. I spent four minutes on the evolution of NATS-streaming to JetStream and nobody cared.
- Add a slide titled “when we would pick Kafka” right before Q&A. Setting up the question lets you answer it on your own terms instead of defensively.
Related posts: /posts/otel-tail-sampling-that-works/, /posts/tracing-head-vs-tail-sampling/, /posts/k8s-operator-reconcile-loop/.
The meetup organizers recorded it but the audio is hot enough that I am not linking to it publicly. Happy to send it if you email me.