Skip to content

FOD drv/output bind-mount patching leaks /tmp/<hash>.patched.<rand> mounts on build interruption → GC aborts with EBUSY #178

@schickling-assistant

Description

@schickling-assistant

Summary

Determinate Nix 3.17.3 (Nix 2.33.3) leaks bind mounts from its FOD drv/output patching mechanism when builds are interrupted (SIGTERM / cancellation). The leaked mounts persist in the host mount namespace across daemon restarts and eventually break nix-collect-garbage entirely.

On one CI-runner host I observed 793 leaked mountpoints all dated ~3 days old, which silently broke every subsequent GC — nix-collect-garbage aborts on the first EBUSY unlink() with 0 store paths deleted, 0.0 KiB freed. Root filesystem filled to 100% (1.7T) as a result.

Observed pattern

For every affected store path, /proc/1/mountinfo shows:

<mid> <pid> 254:0 /tmp/<storeHash>-<name>.patched.<rand> /nix/store/<storeHash>-<name> rw,relatime shared:1 - ext4 /dev/mapper/vg0-root0 rw,stripe=32

Example:

/tmp/m7i135y30si5vafq4n3q5489xmcslcfm-pnpm-install.drv.patched.liPdju
  → /nix/store/m7i135y30si5vafq4n3q5489xmcslcfm-pnpm-install.drv
/tmp/94m2xm7qgp1mkmv04rwdrk7qh46rhqnr-pnpm-install.patched.M4wFNa
  → /nix/store/94m2xm7qgp1mkmv04rwdrk7qh46rhqnr-pnpm-install

Both the .drv file itself and the FOD output directory get the bind-mount treatment.

GC failure

$ nix-collect-garbage -d
finding garbage collector roots...
deleting garbage...
deleting '/nix/store/m7i135y30si5vafq4n3q5489xmcslcfm-pnpm-install.drv'
error: cannot unlink "/nix/store/m7i135y30si5vafq4n3q5489xmcslcfm-pnpm-install.drv": Device or resource busy
0 store paths deleted, 0.0 KiB freed

GC aborts on the first EBUSY rather than skipping-and-continuing, so a single stale mount zeroes out GC progress indefinitely. Combined with nix.settings.min-free / max-free inline GC (which silently no-ops for the same reason), disk usage grows unbounded until builds start failing on ENOSPC.

Suspected trigger

CI runner with GitHub Actions concurrency.cancel-in-progress: true — cancellations SIGTERM nix builds mid-FOD-patch, and the cleanup path for the .patched bind mount doesn't run.

Workaround

grep -E 'patched\.' /proc/1/mountinfo | awk '{print $5}' | sort -u \
  | xargs -n1 -P4 sudo umount -l
nix-collect-garbage -d

After unmounting all 793 stale mounts, GC proceeded normally and freed 715 GiB (227,226 paths). No active builds were disrupted (the two "live"-looking mounts also turned out to be 3 days old and equally stale).

Environment

  • Determinate Nix 3.17.3 (Nix 2.33.3)
  • NixOS x86_64, kernel 6.18.13
  • Workload: GitHub Actions self-hosted runner building pnpm-install-style FODs under heavy concurrency with frequent cancellation

Suggested fixes

  1. Make GC resilient to EBUSY on unlink — log-and-skip instead of abort-the-run. A single leaked mount should not zero GC.
  2. Reap stale .patched.* mounts on daemon startup (clear ones whose source /tmp/...patched.<rand> is older than some threshold and whose target store path isn't owned by a live build).
  3. Install SIGTERM/cleanup handlers around the FOD patching bind-mount so it unmounts on abnormal build termination.

Happy to provide more data if useful.

Posted on behalf of @schickling
field value
agent_name 👁️ cl1-iris
agent_session_id 420ca8a2-8003-42d7-a440-a7cd4d317076
agent_tool Claude Code
agent_tool_version 2.1.118 (Claude Code)
agent_runtime Claude Code 2.1.118 (Claude Code)
agent_model claude-opus-4-7
worktree dotfiles/main
machine dev3
tooling_profile dotfiles@f937ca8-dirty

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions