Unsafe code boundaries I drew and regretted

I’ve written a modest amount of unsafe Rust — a few hundred lines, across a couple of library crates. Mostly it’s been fine. But last year I shipped a subtly unsound abstraction that worked for about six months before it caused a random memory corruption in production, and it’s made me much more cautious.

The crate in question was a small wrapper around an mmap’d file. We had a binary format where we wanted to access records by offset into a mapped file, and the natural way to do this in Rust was something like:

pub struct Mapping {
    ptr: *const u8,
    len: usize,
    _mmap: memmap2::Mmap, // keep the mapping alive
}

impl Mapping {
    pub fn open(path: &Path) -> io::Result<Self> {
        let f = File::open(path)?;
        let mmap = unsafe { memmap2::Mmap::map(&f)? };
        Ok(Mapping {
            ptr: mmap.as_ptr(),
            len: mmap.len(),
            _mmap: mmap,
        })
    }

    pub fn slice(&self, offset: usize, len: usize) -> Option<&[u8]> {
        if offset + len > self.len {
            return None;
        }
        unsafe {
            Some(std::slice::from_raw_parts(self.ptr.add(offset), len))
        }
    }
}

Reviewers looked at this, saw the bounds check, saw the lifetime on the returned &[u8] was tied to &self, and said “looks sound.” I thought so too. It worked for months.

The bug was in the bounds check. offset + len > self.len can overflow. If an attacker (or a corrupted input, more realistically) gave us offset = usize::MAX and len = 1, the addition wraps to 0, which is <= self.len, and we merrily pass a wildly out-of-bounds pointer to from_raw_parts. Instant memory corruption.

The fix is to use checked arithmetic:

pub fn slice(&self, offset: usize, len: usize) -> Option<&[u8]> {
    let end = offset.checked_add(len)?;
    if end > self.len {
        return None;
    }
    unsafe {
        Some(std::slice::from_raw_parts(self.ptr.add(offset), len))
    }
}

This is a one-line fix, and it’s completely obvious once you see it. But I didn’t see it during code review, two colleagues didn’t see it during code review, and it took real effort to realize that an “unsafe block protected by a bounds check” was still wrong because the bounds check itself had a bug.

Things I’ve internalized since:

The soundness of an unsafe block depends on EVERYTHING visible to its safety argument. It’s not enough for the unsafe block to look right. The entire public API of the type must uphold the invariants. If any caller can construct an input that breaks the invariant, the type is unsound.

Write out the safety argument as a comment. Every unsafe block should have a comment starting with // SAFETY: explaining why it’s sound. This forces you to articulate the invariant:

// SAFETY: we checked offset + len <= self.len (without overflow, via checked_add),
// so offset..offset+len is within the mapped region, and the mapping outlives &self.
unsafe {
    Some(std::slice::from_raw_parts(self.ptr.add(offset), len))
}

Writing this comment is where I catch my own bugs half the time. “We checked offset + len <= self.len” — wait, but addition can overflow. Aha.

Test with adversarial inputs. cargo fuzz is straightforward to set up. For unsafe code that takes untrusted inputs (file contents, byte buffers, anything from outside your process), fuzzing is basically mandatory. For our mmap wrapper:

fuzz_target!(|data: &[u8]| {
    if data.len() < 16 { return; }
    let offset = u64::from_le_bytes(data[0..8].try_into().unwrap()) as usize;
    let len = u64::from_le_bytes(data[8..16].try_into().unwrap()) as usize;
    let mapping = ...; // some fixed mapping
    let _ = mapping.slice(offset, len);
});

This would have caught the overflow in seconds.

Use miri whenever it’s reasonable. Miri is an interpreter for Rust’s MIR that detects undefined behavior. It’s slow but thorough. It catches use-after-free, out-of-bounds pointer arithmetic, stacked-borrows violations — things that AddressSanitizer would catch too but that miri catches more reliably on complex patterns.

cargo +nightly miri test

For crates that do interesting unsafe, I run miri in CI. It’s slow enough that I run it on a separate slow lane, but it’s caught two bugs for me this year.

Prefer &[u8] APIs over *const u8. The fewer raw pointers in your public API, the fewer places you need unsafe reasoning. The bytes crate’s Bytes type is a lovely example — it encapsulates the unsafe mmap/refcount dance and gives you a safe API.

Don’t wrap Sync-unsafe things in Sync wrappers. If your inner data isn’t safe to share across threads, don’t promise Send + Sync on the wrapper. I’ve seen people slap unsafe impl Send for Foo {} on things to silence the compiler and immediately get data races in production. The compiler was right; the implementer was lazy.

unsafe impl is the biggest lever. The actual use of unsafe inside a function body is local. The biggest source of unsoundness is unsafe impl Send for X {} or unsafe impl Sync for X {} on types where the invariant doesn’t actually hold. Be extra careful with these.

Don’t publish pub unsafe functions without careful thought. If you have an unsafe function in your public API, every caller gets to uphold your invariants. That responsibility is heavy. If you can wrap the unsafety inside a safe public API, do it. If you can’t — if there’s a real reason users need unsafe control — document the invariants extensively in the function’s SAFETY section.

I still write unsafe when I need to. FFI, memory mapping, custom allocators, a handful of performance-critical data structures. But I try to minimize the surface. One carefully-reviewed module with a small amount of unsafe is easier to audit than ten modules each with a single “harmless” unsafe block.

And I always run cargo fuzz on anything that touches untrusted input. Always. Twelve hours of fuzzing is cheaper than one out-of-bounds read in production.