Skip to content

Drop agent from NPU_SEEDED_SLOTS — it's a GPU slot, not the NPU anchor #679

@thinmintdev

Description

@thinmintdev

Surfaced during the #662 container cutover.

slots/manager.py has NPU_SEEDED_SLOTS = ("agent", "stt-npu", "embed-npu"), classifying agent as the NPU FLM chat anchor. But on the live deployment (and per the cutover) agent is a GPU container slot (ace-saber MoE, device=gpu-rocm); the actual NPU FLM anchor is the separate npu slot (npu.toml, device=npu, gemma3-4b-FLM).

This mis-classification had real fallout — the chat normalizer's DEFAULT_CHAINS never mapped hal0/agent (fixed in #677), and the resolver/loaded-model assumptions treated agent as NPU.

Proposed

  • Drop agent from NPU_SEEDED_SLOTS("stt-npu", "embed-npu").
  • Audit the few SEEDED_SLOTS + NPU_SEEDED_SLOTS call sites (expected-slots, reserved-names) for any reliance on agent being NPU.
  • Consider whether agent belongs in SEEDED_SLOTS (GPU chat-role) instead.

Low risk; mostly a correctness/labeling cleanup now that #677/#678 made hal0/agent route to the GPU slot regardless.

Relates to #652, #662.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions