tokio::sync vs std::sync in a real service

Here’s a question I’ve seen trip up every Rust engineer I’ve worked with who’s new to async: when should you use std::sync::Mutex and when should you use tokio::sync::Mutex? The internet is full of “always use tokio::sync::Mutex” and “never use tokio::sync::Mutex” takes, both of which are wrong. The actual answer is nuanced, and the wrong choice can cause either performance problems or deadlocks, depending on which direction you get it wrong.

The core distinction:

std::sync::Mutex blocks the current OS thread while waiting for the lock. In an async context, this blocks the runtime worker thread — no other futures can progress on that worker.
tokio::sync::Mutex yields back to the runtime while waiting. Other tasks on the same worker can progress. It’s slower per-lock than std::sync::Mutex because it goes through the runtime.

The simple rule is: if your critical section is short and doesn’t contain any .await, use std::sync::Mutex. If your critical section might .await, use tokio::sync::Mutex.

Here’s why both parts of that rule matter.

Why “short critical section” matters. If you hold std::sync::Mutex for a long time — say, doing a slow computation — you’re blocking the runtime worker. Other tasks can’t progress. In the worst case, you can deadlock the runtime if you’re blocking more workers than you have Ps. But “long” here is microseconds or more. If your critical section is a few dozen nanoseconds (update a counter, read a cached value), std::sync::Mutex is fine and MUCH faster.

use std::sync::Mutex;

struct Stats {
    counter: Mutex<u64>,
}

impl Stats {
    fn inc(&self) {
        let mut c = self.counter.lock().unwrap();
        *c += 1;
    } // lock released here, nanoseconds
}

This is fine. std::sync::Mutex here is probably 10x faster than tokio::sync::Mutex and the critical section is so short you’ll never block a worker meaningfully.

Why “no .await” matters. If you hold a lock across an .await, you’re saying “this task holds the lock AND is willing to yield to the runtime.” For std::sync::Mutex this is a correctness problem: the lock guard is not Send (well, depends on the type) AND you can’t hold it across .await points on most runtimes because the runtime might reschedule your task to a different thread, and mutex locks are generally per-thread constructs.

Actually, std::sync::MutexGuard IS Send if T: Send, but holding it across .await is still usually wrong. Compare:

// BAD - holds std::sync::Mutex across .await
async fn bad(m: &std::sync::Mutex<Vec<u8>>) {
    let mut guard = m.lock().unwrap();
    let data = fetch_something().await;  // BLOCKS runtime worker!
    guard.extend(data);
}

If 8 tasks all do this and fetch_something().await takes 500ms, you’ve frozen the runtime for 500ms. Nothing else can progress.

Vs:

// GOOD - uses tokio::sync::Mutex
async fn good(m: &tokio::sync::Mutex<Vec<u8>>) {
    let mut guard = m.lock().await;
    let data = fetch_something().await;  // runtime can do other things
    guard.extend(data);
}

Here m.lock().await yields instead of blocking, and the lock guard yields during fetch_something().await. Other tasks can run.

The “also no .await” pitfall. You can also take a lock, pull out the data you need, drop the lock, then do your async work:

async fn better(m: &std::sync::Mutex<Config>, target: Url) -> Result<(), Error> {
    let timeout = {
        let cfg = m.lock().unwrap();
        cfg.fetch_timeout
    }; // lock dropped here
    let data = fetch_with_timeout(target, timeout).await?;
    Ok(())
}

This is often the best option — you get std::sync::Mutex’s speed AND you don’t hold anything across .await. When I can restructure code to look like this, I do.

Other tokio::sync primitives:

tokio::sync::RwLock — same story. Prefer std::sync::RwLock for short critical sections.
tokio::sync::Semaphore — useful for rate limiting and bounded concurrency. There’s no stdlib equivalent.
tokio::sync::oneshot — a single-shot channel for one-time signal. Very lightweight. Use it.
tokio::sync::mpsc — multi-producer single-consumer. For task-to-task communication. std::sync::mpsc exists too, but tokio’s is async-aware and integrates with the runtime.
tokio::sync::watch — a latest-value broadcast channel. Good for configuration updates, broadcasting state changes.
tokio::sync::Notify — a minimal signal primitive. Good for “wake up one waiter” patterns.

A thing that bit me: tokio::sync::Mutex::try_lock() and the async .lock() are different. try_lock is synchronous and never yields. .lock().await yields. If you’re in async code and want non-blocking, use try_lock. Don’t use .lock().await if you’re not willing to wait — use a semaphore with try_acquire or structure the code differently.

A performance note from my own benchmarks: a single uncontended std::sync::Mutex lock+unlock is roughly 15ns on my machine. tokio::sync::Mutex is roughly 150ns. Under contention, the gap narrows — parking-lot-based std::sync::Mutex handles contention well, but tokio::sync::Mutex doesn’t need the OS thread for its waiters. For highly contended locks in async code, they’re closer to parity.

The rule of thumb I give new teammates:

Default to std::sync::Mutex / RwLock.
If you’re about to .await while holding the lock, stop. Either restructure to release the lock before the await, or switch to tokio::sync::Mutex.
For non-mutex primitives (mpsc, semaphore, notify), just use tokio’s versions. They’re designed for async code.
Never use std::sync::mpsc::channel() in async code. It’s blocking. Use tokio::sync::mpsc or crossbeam::channel with spawn_blocking if you really need to bridge.

One day I’ll write a follow-up about how to debug “my tokio runtime is frozen” issues, which are almost always the result of someone holding a std::sync::Mutex in an async function and doing something slow. Until then, I refer people to the Tokio docs’ section on “shared state” — they’re good.

For more on async gotchas see my post on block_in_place which is another “works locally, scales badly” async trap.