slots: assign+perf-tune chat=qwopus3.6-27b-v2 (vision) & agent=chadrock3.6-35b MoE (thinking off); retire gpu-rocmfp4* slots

## What to build
Point the renamed slots at the new ROCmFP4+MTP models and tune for performance.

**chat** ← `qwopus3.6-27b-v2` (27B dense, ROCmFP4+MTP, **vision** via `/mnt/ai-models/qwopus3.6-27b-v2/mmproj-F32.mmproj`):
- wire the mmproj for vision; keep `--flash-attn on --no-mmap` + the MTP draft args; `enable_thinking=true` (reasoning available for Hermes).
- bump ctx 4096 → propose **32768** (full dense KV — memory-aware; record GTT).

**agent** ← `chadrock3.6-35b-uncensored` (35B-A3B MoE, MTP):
- `enable_thinking=false` by default; tune `-b/-ub`, ngl, ctx (MoE/hybrid KV can go larger); add MTP draft args if the MoE gguf supports `--spec-type draft-mtp`.

Retire/fold the now-redundant `gpu-rocmfp4` / `gpu-rocmfp4-moe` slots (their tuned args move onto chat/agent).

## Acceptance criteria
- [ ] chat serves qwopus-27b with working **vision** (image smoke)
- [ ] agent serves the MoE with thinking **off** by default
- [ ] perf recorded (tok/s, TTFT, GTT) and acceptable
- [ ] redundant gpu-rocmfp4* slots removed; CT105 verified

## Blocked by
- slots: rename roles primary→chat/agent-hermes→agent


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slots: assign+perf-tune chat=qwopus3.6-27b-v2 (vision) & agent=chadrock3.6-35b MoE (thinking off); retire gpu-rocmfp4* slots #634

What to build

Acceptance criteria

Blocked by

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

slots: assign+perf-tune chat=qwopus3.6-27b-v2 (vision) & agent=chadrock3.6-35b MoE (thinking off); retire gpu-rocmfp4* slots #634

Description

What to build

Acceptance criteria

Blocked by

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions