-
Notifications
You must be signed in to change notification settings - Fork 371
Closed
Labels
state:triage-neededOpened without agent diagnostics and needs triageOpened without agent diagnostics and needs triage
Description
Agent Diagnostic
- Investigated using the
create-spikeskill backed byprincipal-engineer-reviewer. Skills loaded:debug-openshell-cluster,openshell-cli,create-spike. - Traced
sandbox create --fromthroughrun.rs → build.rs → push.rs - Confirmed three discrete full-image heap allocations in
push_local_images()atpush.rs:54–55 - Verified
bollard::upload_to_containeracceptsbody_try_stream()— no API constraint blocks a streaming fix - Confirmed the
tarcrate size-in-header constraint requires a seekable (disk) intermediate; fully in-memory zero-copy is not feasible - Found
tempfileis already a dev-dep; needs promotion to prod dep - No test coverage exists for
push.rs
Description
Running openshell sandbox create --from <dockerfile-dir> on a large sandbox image (~3.7 GB) causes the openshell process to be killed by the Linux OOM killer. The image export pipeline in crates/openshell-bootstrap/src/push.rs buffers the entire Docker image tar three times in memory before importing it into the gateway, producing ~11 GB peak allocation.
The three allocations in push_local_images():
collect_export(push.rs:97–107) — streamsdocker.export_images()into aVec<u8>(~3.7 GB)wrap_in_tar(push.rs:114–131) — copies thatVec<u8>into a second tar-wrappedVec<u8>(~3.7 GB); both are live simultaneously, peak ~7.4 GBupload_archive(push.rs:135–151) — callsBytes::copy_from_slice(archive)creating a third copy (~3.7 GB); all three overlap in scope, peak ~11 GB
Expected: the export → tar wrap → upload pipeline streams data in O(chunk) memory.
Reproduction Steps
- Build a large sandbox image (≥ 3 GB uncompressed):
openshell sandbox create --from sandboxes/gemini
- Observe output:
[progress] Exported 3745 MiB Killed - Exit code is
137(SIGKILL from OOM killer).
Environment
- Image size: ~3.7 GB (base sandbox +
@google/gemini-cli@0.34.0) - OS: Linux
- Docker: Docker Engine 28.2.2
- OpenShell: 0.0.0 (output of
openshell --version) - Host RAM: 23 GB total, ~13 GB available at time of failure
- Swap: 8 GB total, ~12 MB free at time of failure
Logs
Out of memory: Killed process (openshell) total-vm:19913320kB, anon-rss:13626992kB
Agent-First Checklist
- I pointed my agent at the repo and had it investigate this issue
- I loaded relevant skills (e.g.,
debug-openshell-cluster,debug-inference,openshell-cli) - My agent could not resolve this — the diagnostic above explains why
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
state:triage-neededOpened without agent diagnostics and needs triageOpened without agent diagnostics and needs triage