feat(memory-map): container slot attribution via podman cgroup#672
Conversation
) Populate per-slot mem_mb for container slots from the podman/docker cgroup (memory.current) so the memory map shows real resident bytes instead of a file-size estimate. Backend (capacity.py): - Add `_container_cgroup_mem_bytes(slot_name)`: resolves podman/docker runtime via shutil.which, runs `<runtime> inspect -f {{.State.Pid}} hal0-slot-<name>`, walks to cgroupv2 memory.current. Returns 0 on any error (container absent = lemond slot, fallback is automatic). - `build_per_slot` path 2: try cgroup probe for every non-NPU slot; use result if >0 (no KV estimate — cgroup already includes all resident bytes). Path 3 (lemond file-size + KV estimate) unchanged. UI (memory-map.jsx): - Add `slotLegendSub(slot)`: container slots show image tag (truncated to last 32 chars) or profile slug in legend; lemond slots keep device token as before. - Expand slot model with `isContainer`, `image`, `profile` fields. - Both sidebar and expanded legend variants use slotLegendSub. Tests: - 12 unit tests for _container_cgroup_mem_bytes (happy path, no runtime, container absent, timeout, cgroupv1, unreadable, docker fallback) and build_per_slot container vs lemond vs NPU paths. - 2 Playwright specs: container legend shows image tag (HAL0_DATA injection via addInitScript); pool bar has ≥2 segments. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
thinmintdev
left a comment
There was a problem hiding this comment.
VERDICT: APPROVE-ON-GREEN — merge-ready once python matrix + γ-suite pass
ui already green; python (3.11/3.12) + γ-suite pending. Value-parity for lemond/NPU is clean and the structure is sound. One correctness caveat (lead, non-blocking) requires CT105 validation; the cost note is secondary.
SPEC (vs #660): ACs met structurally.
- _container_cgroup_mem_bytes: podman→docker runtime detect → inspect PID → /proc//cgroup (v2 unified) → /sys/fs/cgroup//memory.current. Error/timeout(1.5s)/no-runtime/PID≤0/cgroupv1(no "::")/unreadable all → 0. 12 unit tests cover every branch incl docker fallback.
- build_per_slot: probe runs only for RESIDENT, non-NPU slots — NPU/FLM branch sets its row and
continues BEFORE the probe (test_npu_slot_skips_cgroup_probe asserts the probe is never called). Lemond slots: probe returns 0 → falls through to the ORIGINAL file-size + KV registry estimate. Lemond + NPU mem_mb values are byte-unchanged — verified. - UI: attributeFallback (GTT-split / NPU-divide) is UNTOUCHED and still gated on hasMemMetrics (used only when no slot reports mem_mb). slotLegendSub change is legend SUB-LABEL TEXT only (image tag/profile for container, device token for lemond) — does not touch attribution math. GTT pool ceiling logic unchanged.
CAVEATS:
-
(LEAD — correctness, non-blocking, needs CT105 validation) The "memory.current includes weights + KV" assumption is the riskiest claim and is unverified on this hardware. These are Strix Halo iGPU slots — model weights live in GTT (system RAM mapped via amdgpu/TTM). Whether GTT/DRM buffers are charged to the memory cgroup is kernel/driver-dependent and historically often NOT (the controller charges anon + page-cache, frequently not DRM buffers). --no-mmap (in every profile) makes the initial read anonymous+charged, but once llama.cpp copies tensors to HIP/GTT and frees the host copy, the resident ~22 GB may sit in GTT that memory.current does not count.
Why this is sharper than a cost note: theif cgroup_bytes > 0:branch has NO sanity floor andcontinues — ANY positive value wins, even 2 GB. So a partial-accounting cgroup (e.g. ~2 GB CPU-side runtime while the model is ~22 GB) REPLACES the correct ~22 GB registry estimate with a wrong ~2 GB, i.e. under-reports container slots below what the old path showed — only EXACTLY 0 falls back. That directly threatens AC1 ("show resident+KV memory"). All 12 tests mock the cgroup value, so neither they nor green CI can catch this.
VALIDATION (maintainer, post-merge on CT105): load the ace-saber container slot, compare /api/slots mem_mb against the ~22 GB GGUF size. If it reads single-digit GB, the probe is missing GTT and the >0 branch is degrading attribution. Cheap hardening: floor the cgroup value against the file-size estimate —mem = max(cgroup_mb, file_est_mb)— or treat an implausibly-small cgroup (well below model size) like 0 so the fallback fires. Fits the epics deferred-live-validation pattern (#666 tok/s, #661/#669 e2e), so APPROVE-ON-GREEN, but please run the CT105 check before relying on the container numbers. -
(P2 — cost) Unconditional
inspectsubprocess per resident non-NPU slot per memory-map poll. The probe is not gated on any container signal, so an all-lemond install forks a podman/dockerinspectfor every resident GPU slot on every dashboard poll (returns fast when the container is absent; the 1.5s timeout only bites if the runtime hangs). I confirmed theres no cheap container signal at this layer — the Slot object exposes only state/model_id/metadata/backend, not runtime/profile — so the authors "probe everything" is a defensible tradeoff, not an oversight. Ifmetadatacarriesprofile, gating onmeta.get("profile")would skip the fork for lemond slots; otherwise plumb a runtime flag into the Slot. P2.
TRIVIA / OUT OF SCOPE:
- Expanded-legend variant now shows
· ≈for approx lemond slots (old code was bare s.device) — trivially additive, cosmetic. - memory-map.jsx imports isSlotLive from slot-status.js (#668-inherited parity item) — NOT touched by this PR; out of scope, remains a #668 follow-up.
MERGE-READY: yes, on green python + γ. Caveat #1 is a post-merge CT105 validation + likely a small follow-up floor, not a code blocker; #2 is P2. Closes the #652 container-runtime epic chain.
#672 review: on Strix Halo (UMA) model weights live in GTT (system RAM via amdgpu/TTM) and are often NOT charged to the process memory cgroup. A live container can report a cgroup of only ~2 GB (runtime/buffers) while holding a ~22 GB model. The previous `if cgroup_bytes > 0` used the cgroup unconditionally, REPLACING the correct registry estimate with a wrong-LOW value — under-reporting container memory. Fix: compute BOTH the cgroup value and the registry file-size + KV estimate for container slots, and use `max(cgroup_mb, estimate_mb)`. When the cgroup accounts for weights it wins (>= estimate); when GTT isn't charged the estimate wins → never under-reports. lemond/NPU paths unchanged. Tests: add under-report case (2 GB cgroup, 22 GB model -> estimate wins) and cgroup-wins case (24 GB cgroup > 22 GB model -> cgroup wins). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
_container_cgroup_mem_bytes(slot_name)incapacity.pyprobes the live podman/docker cgroup (memory.current) forhal0-slot-<name>.build_per_slottries this first for every non-NPU resident slot; returns 0 for lemond slots (no container exists) so the lemond file-size + KV estimate path is unchanged.slotLegendSub()inmemory-map.jsxshows image tag (truncated to last 32 chars) or profile slug for container slots instead of the uninformativedevice=rocmtoken. Both sidebar and expanded legend variants updated.memory-map-v3.spec.ts.Closes #660
Parent: #652
Test plan
pytest tests/slots/test_container_cgroup_mem.py -q— 12 passedruff check + format --check— cleanvite build) — no errorsplaywright test memory-map-v3.spec.ts— 9 passed (includes 2 new container specs)🤖 Generated with Claude Code