Summary
Determinate Nix 3.17.3 (Nix 2.33.3) leaks bind mounts from its FOD drv/output patching mechanism when builds are interrupted (SIGTERM / cancellation). The leaked mounts persist in the host mount namespace across daemon restarts and eventually break nix-collect-garbage entirely.
On one CI-runner host I observed 793 leaked mountpoints all dated ~3 days old, which silently broke every subsequent GC — nix-collect-garbage aborts on the first EBUSY unlink() with 0 store paths deleted, 0.0 KiB freed. Root filesystem filled to 100% (1.7T) as a result.
Observed pattern
For every affected store path, /proc/1/mountinfo shows:
<mid> <pid> 254:0 /tmp/<storeHash>-<name>.patched.<rand> /nix/store/<storeHash>-<name> rw,relatime shared:1 - ext4 /dev/mapper/vg0-root0 rw,stripe=32
Example:
/tmp/m7i135y30si5vafq4n3q5489xmcslcfm-pnpm-install.drv.patched.liPdju
→ /nix/store/m7i135y30si5vafq4n3q5489xmcslcfm-pnpm-install.drv
/tmp/94m2xm7qgp1mkmv04rwdrk7qh46rhqnr-pnpm-install.patched.M4wFNa
→ /nix/store/94m2xm7qgp1mkmv04rwdrk7qh46rhqnr-pnpm-install
Both the .drv file itself and the FOD output directory get the bind-mount treatment.
GC failure
$ nix-collect-garbage -d
finding garbage collector roots...
deleting garbage...
deleting '/nix/store/m7i135y30si5vafq4n3q5489xmcslcfm-pnpm-install.drv'
error: cannot unlink "/nix/store/m7i135y30si5vafq4n3q5489xmcslcfm-pnpm-install.drv": Device or resource busy
0 store paths deleted, 0.0 KiB freed
GC aborts on the first EBUSY rather than skipping-and-continuing, so a single stale mount zeroes out GC progress indefinitely. Combined with nix.settings.min-free / max-free inline GC (which silently no-ops for the same reason), disk usage grows unbounded until builds start failing on ENOSPC.
Suspected trigger
CI runner with GitHub Actions concurrency.cancel-in-progress: true — cancellations SIGTERM nix builds mid-FOD-patch, and the cleanup path for the .patched bind mount doesn't run.
Workaround
grep -E 'patched\.' /proc/1/mountinfo | awk '{print $5}' | sort -u \
| xargs -n1 -P4 sudo umount -l
nix-collect-garbage -d
After unmounting all 793 stale mounts, GC proceeded normally and freed 715 GiB (227,226 paths). No active builds were disrupted (the two "live"-looking mounts also turned out to be 3 days old and equally stale).
Environment
- Determinate Nix 3.17.3 (Nix 2.33.3)
- NixOS x86_64, kernel 6.18.13
- Workload: GitHub Actions self-hosted runner building
pnpm-install-style FODs under heavy concurrency with frequent cancellation
Suggested fixes
- Make GC resilient to EBUSY on unlink — log-and-skip instead of abort-the-run. A single leaked mount should not zero GC.
- Reap stale
.patched.* mounts on daemon startup (clear ones whose source /tmp/...patched.<rand> is older than some threshold and whose target store path isn't owned by a live build).
- Install SIGTERM/cleanup handlers around the FOD patching bind-mount so it unmounts on abnormal build termination.
Happy to provide more data if useful.
Posted on behalf of @schickling
| field |
value |
agent_name |
👁️ cl1-iris |
agent_session_id |
420ca8a2-8003-42d7-a440-a7cd4d317076 |
agent_tool |
Claude Code |
agent_tool_version |
2.1.118 (Claude Code) |
agent_runtime |
Claude Code 2.1.118 (Claude Code) |
agent_model |
claude-opus-4-7 |
worktree |
dotfiles/main |
machine |
dev3 |
tooling_profile |
dotfiles@f937ca8-dirty |
Summary
Determinate Nix 3.17.3 (Nix 2.33.3) leaks bind mounts from its FOD drv/output patching mechanism when builds are interrupted (SIGTERM / cancellation). The leaked mounts persist in the host mount namespace across daemon restarts and eventually break
nix-collect-garbageentirely.On one CI-runner host I observed 793 leaked mountpoints all dated ~3 days old, which silently broke every subsequent GC —
nix-collect-garbageaborts on the first EBUSYunlink()with0 store paths deleted, 0.0 KiB freed. Root filesystem filled to 100% (1.7T) as a result.Observed pattern
For every affected store path,
/proc/1/mountinfoshows:Example:
Both the
.drvfile itself and the FOD output directory get the bind-mount treatment.GC failure
GC aborts on the first EBUSY rather than skipping-and-continuing, so a single stale mount zeroes out GC progress indefinitely. Combined with
nix.settings.min-free/max-freeinline GC (which silently no-ops for the same reason), disk usage grows unbounded until builds start failing on ENOSPC.Suspected trigger
CI runner with GitHub Actions
concurrency.cancel-in-progress: true— cancellations SIGTERM nix builds mid-FOD-patch, and the cleanup path for the.patchedbind mount doesn't run.Workaround
After unmounting all 793 stale mounts, GC proceeded normally and freed 715 GiB (227,226 paths). No active builds were disrupted (the two "live"-looking mounts also turned out to be 3 days old and equally stale).
Environment
pnpm-install-style FODs under heavy concurrency with frequent cancellationSuggested fixes
.patched.*mounts on daemon startup (clear ones whose source/tmp/...patched.<rand>is older than some threshold and whose target store path isn't owned by a live build).Happy to provide more data if useful.
Posted on behalf of @schickling
agent_nameagent_session_idagent_toolagent_tool_versionagent_runtimeagent_modelworktreemachinetooling_profile