Surfaced during the #662 container cutover.
slots/manager.py has NPU_SEEDED_SLOTS = ("agent", "stt-npu", "embed-npu"), classifying agent as the NPU FLM chat anchor. But on the live deployment (and per the cutover) agent is a GPU container slot (ace-saber MoE, device=gpu-rocm); the actual NPU FLM anchor is the separate npu slot (npu.toml, device=npu, gemma3-4b-FLM).
This mis-classification had real fallout — the chat normalizer's DEFAULT_CHAINS never mapped hal0/agent (fixed in #677), and the resolver/loaded-model assumptions treated agent as NPU.
Proposed
- Drop
agent from NPU_SEEDED_SLOTS → ("stt-npu", "embed-npu").
- Audit the few
SEEDED_SLOTS + NPU_SEEDED_SLOTS call sites (expected-slots, reserved-names) for any reliance on agent being NPU.
- Consider whether
agent belongs in SEEDED_SLOTS (GPU chat-role) instead.
Low risk; mostly a correctness/labeling cleanup now that #677/#678 made hal0/agent route to the GPU slot regardless.
Relates to #652, #662.
Surfaced during the #662 container cutover.
slots/manager.pyhasNPU_SEEDED_SLOTS = ("agent", "stt-npu", "embed-npu"), classifyingagentas the NPU FLM chat anchor. But on the live deployment (and per the cutover)agentis a GPU container slot (ace-saber MoE,device=gpu-rocm); the actual NPU FLM anchor is the separatenpuslot (npu.toml,device=npu, gemma3-4b-FLM).This mis-classification had real fallout — the chat normalizer's
DEFAULT_CHAINSnever mappedhal0/agent(fixed in #677), and the resolver/loaded-model assumptions treated agent as NPU.Proposed
agentfromNPU_SEEDED_SLOTS→("stt-npu", "embed-npu").SEEDED_SLOTS + NPU_SEEDED_SLOTScall sites (expected-slots, reserved-names) for any reliance onagentbeing NPU.agentbelongs inSEEDED_SLOTS(GPU chat-role) instead.Low risk; mostly a correctness/labeling cleanup now that #677/#678 made
hal0/agentroute to the GPU slot regardless.Relates to #652, #662.