feat: container image-pull progress SSE + UI pull bar#673
Conversation
Backend:
- ContainerProvider.image_present(): inspect syscall for present/missing check
- ContainerProvider.pull_image_stream(): async generator yielding layer progress
from docker/podman pull stdout (Pulling fs layer → Pull complete heuristic)
- _container_state_enrichment: image_status (present|pulling|missing) per slot,
reads active slot_pull_jobs registry first, falls back to image_present()
- POST /api/slots/{name}/pull (202): start background image pull, idempotent
- GET /api/slots/{name}/pull/stream (SSE): 0.5s poll loop; terminal frame
(present|missing) when no pull active; state|layer|total_layers frames in-flight
- app.state.slot_pull_jobs registry initialised in lifespan
- endpoints.ts: slotPull + slotPullStream added
Frontend:
- useSlotImagePull() hook (useSlots.ts): mirrors usePullJob pattern, owns one
EventSource; start(name) POSTs then opens SSE stream; invalidates ['slots']
on terminal state
- SlotImagePullBar (slots.jsx): shows when image_status==="pulling"; aria-live
polite; label ends "…"; indeterminate bar animation
- ImagePullBar (slot-modals.jsx): layer N/M progress + pct bar for active pull
- ErrorSlotCardBanner Re-pull button: wired to pull.start(slot.name), disabled
while in-flight, shows inline ImagePullBar during/after pull
- @Keyframes hal0-indeterminate in dashboard.css
Tests (tests/api/test_slots_image_pull.py): 12 tests covering image_status
fields, POST pull idempotency, SSE stream terminal frames, image_present unit
tests with fake runtime scripts, pull_image_stream completed/failed frames.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
thinmintdev
left a comment
There was a problem hiding this comment.
VERDICT: APPROVE-ON-GREEN — merge-ready once the python Test substep + γ-suite pass
ui green; python Lint + Format check already GREEN on both 3.11/3.12 (confirmed — the #664 lint-failure class does NOT recur; note the docstring claims "marked ARG001 to suppress the linter" on _run_image_pull without a visible # noqa, but Lint passed so ruffs ARG ruleset isnt enabled here — moot). Only the Test substep remains pending → APPROVE-ON-GREEN. No overlap with #672 (capacity.py vs slots.py — they merge in parallel cleanly).
SPEC (vs #659): ACs met.
- image_status (present|pulling|missing) added in _container_state_enrichment, which already skips lemond slots — so image_status is CONTAINER-ONLY; lemond slots never get it (verified; tests cover present/missing/pulling + lemond absence). Pulling is read from slot_pull_jobs without an extra inspect syscall; otherwise inspect via image_present (executor-dispatched), error→"missing".
- POST /{name}/pull: 202, idempotent (existing pulling job → resumed:true, no second pull), 404 for unknown slot (await sm.status(name) first), BadRequest when no profile/image. BackgroundTasks runs _run_image_pull.
- GET /{name}/pull/stream SSE: immediate snapshot, per-layer frames, terminal frame; graceful present/missing terminal when no job active. app.state.slot_pull_jobs registered in lifespan.
- UI: SlotImagePullBar (card, image_status==="pulling", aria-live polite, indeterminate hal0-indeterminate), ImagePullBar (banner, N/M % + completed/failed), useSlotImagePull hook (POST→SSE, terminal closes stream + invalidates slots), Re-pull wired to pull.start (was dead __hal0Toast).
STANDARDS — clean:
- Lemond unaffected: image_status only on container slots; no new lemond fields. No fabricated data on lemond cards.
- No SSE leak: on client disconnect, CancelledError propagates through the await asyncio.sleep(0.5) and the generator stops; the background task OWNS the subprocess and reaps it via
finally: proc.kill(), so the pull continues to completion (intended — UI disconnect shouldnt abort a pull) with no subprocess leak. - Idempotency + 404 verified by tests; image_present/pull_image_stream unit-tested with fake-runtime scripts (completed + failed frames).
NON-BLOCKING NOTES:
- Layer N/M counting is a logic bug on BOTH runtimes (re: "no fabricated data") — not just unverified. The heuristic counts lifecycle lines as layers: one downloaded layer emits "Pulling fs layer" + "Waiting" + "Verifying Checksum" (each +1 total_layers) AND "Download complete" + "Pull complete" (each +1 done_layers), so N and M inflate by different factors → the ImagePullBar percentage is wrong on docker. On podman (the PREFERRED runtime) those keywords mostly dont match → total_layers≈0 → pct=null → indeterminate. "Downloading" progress lines are ignored, so the count doesnt advance during the actual ~6 GB transfer. Recommend: drop or caveat the N/M fraction in ImagePullBar and keep it indeterminate like the cards SlotImagePullBar (which is already honest). NON-BLOCKING because the AC-relevant card surface is already indeterminate and the banner self-heals to "Image ready".
- _ImagePullJob class docstring describes "an asyncio.Event used to wake SSE subscribers" but the implementation POLLS (asyncio.sleep(0.5) + layer-delta compare); no Event in slots. Stale doc — cosmetic.
- Minor UI/API (single-user LAN-acceptable): EventSource onerror immediately sets state="failed"+closes, so a transient network blip shows "failed" though the background pull continues (self-heals on next /api/slots poll); completed _ImagePullJob entries linger in slot_pull_jobs (bounded — one per slot, overwritten on re-pull); the stream endpoint emits "missing" for an unknown slot while POST returns 404 (minor inconsistency).
MERGE-READY: yes, on green python Test + γ. The notes are P2 polish (#1 worth a quick follow-up to stop showing a wrong layer fraction). Closes the #659 slice and the #652 container-runtime epic UI chain.
Summary
image_status(present|pulling|missing) per container slot in/api/slots;POST /api/slots/{name}/pullstarts a background docker/podman pull;GET /api/slots/{name}/pull/streamSSE streams layer progress (layer N/M) until terminalSlotImagePullBarin SlotCard (shows whenimage_status==="pulling");ImagePullBarwith layer N/M label inErrorSlotCardBanner;Re-pullbutton in error banner wired to real pull endpoint (was dead__hal0Toast);useSlotImagePull()hook mirrorsusePullJobpattern;@keyframes hal0-indeterminateCSS animationimage_present()unit tests with fake runtime scripts,pull_image_stream()completed/failed framesAcceptance criteria
state=presentimmediately)Test plan
PYTHONPATH=src pytest tests/api/test_slots_image_pull.py -q— 12 passruff check src/hal0/api/routes/slots.py src/hal0/providers/container.py tests/api/test_slots_image_pull.py— cleancd ui && npm run build— cleanPOST /api/slots/{name}/pull+GET /api/slots/{name}/pull/streamshould stream docker layer lines and completeCloses #659
Parent: #652
🤖 Generated with Claude Code