What to build
Point the renamed slots at the new ROCmFP4+MTP models and tune for performance.
chat ← qwopus3.6-27b-v2 (27B dense, ROCmFP4+MTP, vision via /mnt/ai-models/qwopus3.6-27b-v2/mmproj-F32.mmproj):
- wire the mmproj for vision; keep
--flash-attn on --no-mmap + the MTP draft args; enable_thinking=true (reasoning available for Hermes).
- bump ctx 4096 → propose 32768 (full dense KV — memory-aware; record GTT).
agent ← chadrock3.6-35b-uncensored (35B-A3B MoE, MTP):
enable_thinking=false by default; tune -b/-ub, ngl, ctx (MoE/hybrid KV can go larger); add MTP draft args if the MoE gguf supports --spec-type draft-mtp.
Retire/fold the now-redundant gpu-rocmfp4 / gpu-rocmfp4-moe slots (their tuned args move onto chat/agent).
Acceptance criteria
Blocked by
- slots: rename roles primary→chat/agent-hermes→agent
What to build
Point the renamed slots at the new ROCmFP4+MTP models and tune for performance.
chat ←
qwopus3.6-27b-v2(27B dense, ROCmFP4+MTP, vision via/mnt/ai-models/qwopus3.6-27b-v2/mmproj-F32.mmproj):--flash-attn on --no-mmap+ the MTP draft args;enable_thinking=true(reasoning available for Hermes).agent ←
chadrock3.6-35b-uncensored(35B-A3B MoE, MTP):enable_thinking=falseby default; tune-b/-ub, ngl, ctx (MoE/hybrid KV can go larger); add MTP draft args if the MoE gguf supports--spec-type draft-mtp.Retire/fold the now-redundant
gpu-rocmfp4/gpu-rocmfp4-moeslots (their tuned args move onto chat/agent).Acceptance criteria
Blocked by