Moving a service from Go to Rust, an honest report

For four months last year I worked on rewriting an internal service from Go to Rust. I want to write down what that was actually like, because the “migration from X to Y” posts I read before starting this project were mostly of the “we did it, it was great, here are some perfect numbers” variety, and I think the reality is more complicated.

The service. An internal data-joining service. Consumes events from Kafka, looks up related records from a Postgres database, emits enriched events back to Kafka. Stateless per-replica, scaled horizontally. About 8,000 lines of Go.

Why move? The honest answer is that memory cost was growing, and our budget wasn’t. The Go service was sitting at about 1.8GB per replica and we had 40 replicas. Under burst traffic, the heap would spike and we’d get OOM-kills occasionally. We could throw hardware at it, but the architecture team was asking for efficiency wins.

There was also a non-honest reason, which is that a few of us just wanted to write Rust for this service. Management was happy to let us, because we promised wins. This mixture of motivations complicates the retrospective.

The rewrite.

We spent about 4 calendar months on this, with two engineers part-time. Here’s where the time went:

~2 weeks: setting up the project, choosing crates (tokio, rdkafka, sqlx, serde), getting the cargo workspace structured
~4 weeks: porting the event pipeline logic, including all the enrichment rules
~3 weeks: porting the Postgres access layer and matching our connection pooling behavior
~3 weeks: porting metrics and tracing to match our observability stack
~3 weeks: staging rollout, catching edge cases, handling backward compatibility with event schemas
~1 week: production cutover, debugging, monitoring

What I expected to gain.

Lower memory usage. Rust’s ownership model means no GC, no background heap growth, just the memory you need.
Better CPU efficiency. Rust is typically 2-3x faster than Go for the same workload.
Fewer runtime bugs. The borrow checker catches some classes of concurrency errors.

What I actually gained.

Memory dropped from 1.8GB to about 380MB per replica. That’s a 4-5x reduction, bigger than expected.
CPU dropped modestly, maybe 30%. The service was more memory-bound than CPU-bound, so CPU gains were less significant.
No deadlocks, no data races in production over 6 months. I’m pretty sure we had some in the Go version that we never noticed because they weren’t severe.
Bonus: binary size dropped from 40MB (Go) to 11MB (Rust release). Startup time improved.

What I didn’t expect to lose.

Iteration speed. Go compiles in seconds; Rust compiles in tens of seconds for a clean build, and our full test suite went from 12 seconds in Go to 48 seconds in Rust. Doesn’t sound like much, but it’s felt.
Debuggability. Go has pprof that works out of the box with no code changes. Getting the equivalent in Rust took real effort — we settled on tokio-console for tasks, pprof-rs for CPU, jemalloc-profiling for memory. Each worked, but setup took days.
Team velocity, temporarily. It took about 3 months before I was as productive in Rust as I was in Go for similar tasks. The middle engineer on the project was faster in Go throughout.
Library quality. rdkafka is fine but has rough edges. sqlx is great for simple queries but has quirks around transactions and connection pools. We ran into two library bugs that required upstream patches.

Operational surprises.

Panics are different. Go’s panic/recover model gave us a clear story about how to isolate failures per request. Rust’s panic system is similar in effect but the ecosystem conventions are different — some libraries panic on input errors, others return Result. We had a panic take down a worker once because we hadn’t wrapped a specific code path.

Async debugging is harder. In Go, a goroutine deadlock usually shows up as a stuck stack in /debug/pprof/goroutine. Tokio’s async tasks are harder to introspect. tokio-console helps enormously but it’s not as universal.

Deploys are slower. Our Go Docker image built in ~90 seconds; the Rust one takes ~4 minutes. Incremental build caching with cargo-chef helps, but a clean build is still painful. For a fast-iteration team, this eats into velocity.

What I’d tell someone considering this.

Don’t do it for hype. Do it if you have a specific problem Rust solves better (low memory, high CPU, native libraries, specific correctness guarantees).
Budget for 1.5-2x the time estimate. The 50% overhead is real, and it’s mostly in “I know exactly how to do X in Go, I have to figure out how to do X in Rust.”
Invest in the observability toolchain early. If you ship Rust to production with worse observability than your Go counterpart, you’ve made your life harder.
Don’t rewrite the whole thing at once. We did a clean rewrite because the service was small. If it had been >20k lines, I’d have done it incrementally via a gRPC split or similar.
Expect your team to be split on the outcome. Some engineers will love the rewrite; some will feel productivity loss more sharply. Both are valid.

Would I do it again?

For this specific service, yes. The memory win was real, the operational story after cutover was stable, and the team has become comfortable in Rust. We’ve since written two new services in Rust from scratch rather than in Go.

For our larger services, no. A 50k-line Go service with active feature work would be a nightmare to migrate, and the memory/CPU wins wouldn’t justify it. Rewrites are rarely a good business decision unless the thing being rewritten has a clear deficiency.

I’ll also say: Go is a great language. The dismissive “just use Rust lol” takes on the internet are silly. Go’s ecosystem, tooling, and productivity are excellent, and for most services, it’s the right choice. We picked Rust for this one because of specific constraints. Those constraints weren’t “Go is bad.”

Related: my post on Rust trait objects vs generics is where I wrote about one of the more noticeable learning-curve items.