Skip to content

feat(memory-map): container slot attribution via podman cgroup#672

Merged
thinmintdev merged 2 commits into
mainfrom
issue-660-memmap
Jun 8, 2026
Merged

feat(memory-map): container slot attribution via podman cgroup#672
thinmintdev merged 2 commits into
mainfrom
issue-660-memmap

Conversation

@thinmintdev

Copy link
Copy Markdown
Contributor

Summary

  • Backend: _container_cgroup_mem_bytes(slot_name) in capacity.py probes the live podman/docker cgroup (memory.current) for hal0-slot-<name>. build_per_slot tries this first for every non-NPU resident slot; returns 0 for lemond slots (no container exists) so the lemond file-size + KV estimate path is unchanged.
  • UI: slotLegendSub() in memory-map.jsx shows image tag (truncated to last 32 chars) or profile slug for container slots instead of the uninformative device=rocm token. Both sidebar and expanded legend variants updated.
  • Tests: 12 unit tests for the cgroup probe (happy path, no runtime, container absent, timeout, cgroupv1, docker fallback) + 2 new Playwright specs in memory-map-v3.spec.ts.

Closes #660
Parent: #652

Test plan

  • pytest tests/slots/test_container_cgroup_mem.py -q — 12 passed
  • ruff check + format --check — clean
  • UI build (vite build) — no errors
  • playwright test memory-map-v3.spec.ts — 9 passed (includes 2 new container specs)

🤖 Generated with Claude Code

)

Populate per-slot mem_mb for container slots from the podman/docker
cgroup (memory.current) so the memory map shows real resident bytes
instead of a file-size estimate.

Backend (capacity.py):
- Add `_container_cgroup_mem_bytes(slot_name)`: resolves podman/docker
  runtime via shutil.which, runs `<runtime> inspect -f {{.State.Pid}}
  hal0-slot-<name>`, walks to cgroupv2 memory.current. Returns 0 on
  any error (container absent = lemond slot, fallback is automatic).
- `build_per_slot` path 2: try cgroup probe for every non-NPU slot;
  use result if >0 (no KV estimate — cgroup already includes all
  resident bytes). Path 3 (lemond file-size + KV estimate) unchanged.

UI (memory-map.jsx):
- Add `slotLegendSub(slot)`: container slots show image tag (truncated
  to last 32 chars) or profile slug in legend; lemond slots keep
  device token as before.
- Expand slot model with `isContainer`, `image`, `profile` fields.
- Both sidebar and expanded legend variants use slotLegendSub.

Tests:
- 12 unit tests for _container_cgroup_mem_bytes (happy path, no
  runtime, container absent, timeout, cgroupv1, unreadable, docker
  fallback) and build_per_slot container vs lemond vs NPU paths.
- 2 Playwright specs: container legend shows image tag (HAL0_DATA
  injection via addInitScript); pool bar has ≥2 segments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@thinmintdev thinmintdev left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VERDICT: APPROVE-ON-GREEN — merge-ready once python matrix + γ-suite pass

ui already green; python (3.11/3.12) + γ-suite pending. Value-parity for lemond/NPU is clean and the structure is sound. One correctness caveat (lead, non-blocking) requires CT105 validation; the cost note is secondary.

SPEC (vs #660): ACs met structurally.

  • _container_cgroup_mem_bytes: podman→docker runtime detect → inspect PID → /proc//cgroup (v2 unified) → /sys/fs/cgroup//memory.current. Error/timeout(1.5s)/no-runtime/PID≤0/cgroupv1(no "::")/unreadable all → 0. 12 unit tests cover every branch incl docker fallback.
  • build_per_slot: probe runs only for RESIDENT, non-NPU slots — NPU/FLM branch sets its row and continues BEFORE the probe (test_npu_slot_skips_cgroup_probe asserts the probe is never called). Lemond slots: probe returns 0 → falls through to the ORIGINAL file-size + KV registry estimate. Lemond + NPU mem_mb values are byte-unchanged — verified.
  • UI: attributeFallback (GTT-split / NPU-divide) is UNTOUCHED and still gated on hasMemMetrics (used only when no slot reports mem_mb). slotLegendSub change is legend SUB-LABEL TEXT only (image tag/profile for container, device token for lemond) — does not touch attribution math. GTT pool ceiling logic unchanged.

CAVEATS:

  1. (LEAD — correctness, non-blocking, needs CT105 validation) The "memory.current includes weights + KV" assumption is the riskiest claim and is unverified on this hardware. These are Strix Halo iGPU slots — model weights live in GTT (system RAM mapped via amdgpu/TTM). Whether GTT/DRM buffers are charged to the memory cgroup is kernel/driver-dependent and historically often NOT (the controller charges anon + page-cache, frequently not DRM buffers). --no-mmap (in every profile) makes the initial read anonymous+charged, but once llama.cpp copies tensors to HIP/GTT and frees the host copy, the resident ~22 GB may sit in GTT that memory.current does not count.
    Why this is sharper than a cost note: the if cgroup_bytes > 0: branch has NO sanity floor and continues — ANY positive value wins, even 2 GB. So a partial-accounting cgroup (e.g. ~2 GB CPU-side runtime while the model is ~22 GB) REPLACES the correct ~22 GB registry estimate with a wrong ~2 GB, i.e. under-reports container slots below what the old path showed — only EXACTLY 0 falls back. That directly threatens AC1 ("show resident+KV memory"). All 12 tests mock the cgroup value, so neither they nor green CI can catch this.
    VALIDATION (maintainer, post-merge on CT105): load the ace-saber container slot, compare /api/slots mem_mb against the ~22 GB GGUF size. If it reads single-digit GB, the probe is missing GTT and the >0 branch is degrading attribution. Cheap hardening: floor the cgroup value against the file-size estimate — mem = max(cgroup_mb, file_est_mb) — or treat an implausibly-small cgroup (well below model size) like 0 so the fallback fires. Fits the epics deferred-live-validation pattern (#666 tok/s, #661/#669 e2e), so APPROVE-ON-GREEN, but please run the CT105 check before relying on the container numbers.

  2. (P2 — cost) Unconditional inspect subprocess per resident non-NPU slot per memory-map poll. The probe is not gated on any container signal, so an all-lemond install forks a podman/docker inspect for every resident GPU slot on every dashboard poll (returns fast when the container is absent; the 1.5s timeout only bites if the runtime hangs). I confirmed theres no cheap container signal at this layer — the Slot object exposes only state/model_id/metadata/backend, not runtime/profile — so the authors "probe everything" is a defensible tradeoff, not an oversight. If metadata carries profile, gating on meta.get("profile") would skip the fork for lemond slots; otherwise plumb a runtime flag into the Slot. P2.

TRIVIA / OUT OF SCOPE:

  • Expanded-legend variant now shows · ≈ for approx lemond slots (old code was bare s.device) — trivially additive, cosmetic.
  • memory-map.jsx imports isSlotLive from slot-status.js (#668-inherited parity item) — NOT touched by this PR; out of scope, remains a #668 follow-up.

MERGE-READY: yes, on green python + γ. Caveat #1 is a post-merge CT105 validation + likely a small follow-up floor, not a code blocker; #2 is P2. Closes the #652 container-runtime epic chain.

#672 review: on Strix Halo (UMA) model weights live in GTT (system RAM
via amdgpu/TTM) and are often NOT charged to the process memory cgroup.
A live container can report a cgroup of only ~2 GB (runtime/buffers)
while holding a ~22 GB model. The previous `if cgroup_bytes > 0` used
the cgroup unconditionally, REPLACING the correct registry estimate
with a wrong-LOW value — under-reporting container memory.

Fix: compute BOTH the cgroup value and the registry file-size + KV
estimate for container slots, and use `max(cgroup_mb, estimate_mb)`.
When the cgroup accounts for weights it wins (>= estimate); when GTT
isn't charged the estimate wins → never under-reports. lemond/NPU
paths unchanged.

Tests: add under-report case (2 GB cgroup, 22 GB model -> estimate
wins) and cgroup-wins case (24 GB cgroup > 22 GB model -> cgroup wins).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thinmintdev thinmintdev merged commit 0533394 into main Jun 8, 2026
4 checks passed
@thinmintdev thinmintdev deleted the issue-660-memmap branch June 8, 2026 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory-map: container slot attribution via podman cgroup

1 participant