We have about 40 services and 80 packages in our monorepo. When every PR ran every test in the entire repo, CI took 22 minutes on a good day and 45 minutes on a bad one. Engineer satisfaction was… low.

The fix is the same fix everyone writes about: only run what’s affected by the change. The implementation is annoyingly hard to get right. Here’s what worked for us.

The mental model

You have a graph. Nodes are packages/services. Edges are dependencies — “package A imports package B” means A depends on B. If B changes, A might break.

A PR modifies some nodes. The “affected set” is those nodes plus every node transitively upstream (that depends on them, directly or indirectly). You only need to build and test the affected set.

This is the same thing Bazel does with its build graph. Unlike Bazel, we wanted a lighter touch — something that layered on top of our existing builds rather than replacing them.

Our implementation

We build a dependency graph from package.json, pyproject.toml, and go.mod files. Plus a manual YAML file for cross-language dependencies (“the Python service imports the protobuf package generated by the Go service”).

# build-graph.py
import json
from pathlib import Path

def build_graph(root):
    graph = {}
    for pkg_json in Path(root).rglob("package.json"):
        data = json.loads(pkg_json.read_text())
        name = data.get("name")
        if not name:
            continue
        deps = set(data.get("dependencies", {})) | set(data.get("devDependencies", {}))
        graph[name] = {
            "path": str(pkg_json.parent.relative_to(root)),
            "deps": sorted(deps),
        }
    # ... similar for pyproject.toml, go.mod, etc.
    return graph

The graph is regenerated on every CI run — it’s fast enough that caching doesn’t pay off.

Mapping changed files to nodes

Given the files changed in a PR, which nodes are affected?

def affected_nodes(graph, changed_files):
    direct = set()
    for f in changed_files:
        for name, node in graph.items():
            if f.startswith(node["path"] + "/"):
                direct.add(name)
                break
    # expand transitively upstream
    all_affected = set(direct)
    changed = True
    while changed:
        changed = False
        for name, node in graph.items():
            if name in all_affected:
                continue
            if any(dep in all_affected for dep in node["deps"]):
                all_affected.add(name)
                changed = True
    return all_affected

A change in packages/shared-utils/ with one downstream consumer (services/api) results in an affected set of {shared-utils, api}. Everything else gets skipped.

The gotchas

Tooling/config changes affect everything. A change to .github/workflows/ or the root justfile or our CI orchestrator itself can’t be treated as local. We have a “always-full-run” pattern list that triggers everything-build:

ALWAYS_FULL = [
    ".github/workflows/*",
    "*.justfile",
    "scripts/ci/*",
    "tools/ci/*",
]

def needs_full_run(changed_files):
    return any(any(fnmatch(f, p) for p in ALWAYS_FULL) for f in changed_files)

Non-code changes can affect tests. Docker images, protobuf schemas, GraphQL schemas — changes here ripple. We encode these as explicit nodes in the dependency graph with their own downstream edges.

Lock file changes. A package-lock.json change means dependencies changed, and potentially any package is affected. We treat lock file changes as a full rebuild trigger for the corresponding language.

Flaky “I saw nothing change but my test broke” cases. Sometimes a test depends on something that the graph doesn’t know about — a shared config file, an environment variable, a snapshot file. When someone reports this, we update the graph. Over time the graph gets more accurate.

The results

PR median CI time went from ~22 minutes to ~4 minutes. Main-branch builds still do a full run, so we catch anything the graph missed (rare, but it happens).

Beyond time savings, the graph itself became a useful artifact. We can answer questions like:

  • “What uses this library?” → downstream of a node.
  • “What’s the blast radius of changing this config?” → upstream of a node.
  • “What’s the most-dependent-on package?” → node with highest upstream count.

What I’d do differently

I’d start with the graph earlier. We spent a year with “run everything” CI before we did this. The value of the graph compounds — every time we add a new service, the savings increase. If I were starting the monorepo today I’d build the dependency graph before the second service was added.

I’d also not try to auto-detect every dependency. We eventually gave up on inferring some cross-cutting relationships and just let teams declare them manually. Explicit declarations are easier to reason about and don’t silently drift.

Reflection

The “only run what’s affected” idea is straightforward in theory, and every monorepo hits the same obstacles on implementation: dependency tracking correctness, cross-language integration, and the “always full run” escape hatch. If you’re setting this up, plan for all three, don’t chase elegant minimal implementations, and accept that the graph will get manually corrected forever. It’s worth it.

Related: GitHub Actions cache lied to us.