EPIC: containerized GPU backend (podman + profiles) + dashboard rewire

## What to build
Move GPU LLM slots from Lemonade-forked baremetal binaries to **podman containers** built by the nightly toolbox fork, selected via **profiles** (image + bench-tuned flags), dispatched through hal0's existing remote-upstream proxy. Then rewire the dashboard to a hybrid (container + lemond) model.

Design: `hal0-container-runtime-design-2026-06-08.md`. Bench basis: `hal0-container-bench-2026-06-08.md`.

Decisions locked: container runtime (bench parity 52.8 vs 53.6 baremetal); profiles = flag-bundles on shared images; podman; slot OWNS container (1:1); phase-1 = GPU LLM slots only (lemond keeps embed/rerank/stt/tts; NPU/FLM untouched).

Per-slot optimal (bench): agent 35B MoE ace-saber = moe-rocmfp4 (MTP off) ~52.8 tok/s; chat 27B dense qwopus = dense-mtp-rocmfp4 (MTP on) ~24.4 tok/s.

## Child slices
Tracked as separate issues (see Blocked-by chains). This epic is the umbrella; do not close until all children merged + live cutover done.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: containerized GPU backend (podman + profiles) + dashboard rewire #652

What to build

Child slices

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

EPIC: containerized GPU backend (podman + profiles) + dashboard rewire #652

Description

What to build

Child slices

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions