Skip to content

slots: assign+perf-tune chat=qwopus3.6-27b-v2 (vision) & agent=chadrock3.6-35b MoE (thinking off); retire gpu-rocmfp4* slots #634

@thinmintdev

Description

@thinmintdev

What to build

Point the renamed slots at the new ROCmFP4+MTP models and tune for performance.

chatqwopus3.6-27b-v2 (27B dense, ROCmFP4+MTP, vision via /mnt/ai-models/qwopus3.6-27b-v2/mmproj-F32.mmproj):

  • wire the mmproj for vision; keep --flash-attn on --no-mmap + the MTP draft args; enable_thinking=true (reasoning available for Hermes).
  • bump ctx 4096 → propose 32768 (full dense KV — memory-aware; record GTT).

agentchadrock3.6-35b-uncensored (35B-A3B MoE, MTP):

  • enable_thinking=false by default; tune -b/-ub, ngl, ctx (MoE/hybrid KV can go larger); add MTP draft args if the MoE gguf supports --spec-type draft-mtp.

Retire/fold the now-redundant gpu-rocmfp4 / gpu-rocmfp4-moe slots (their tuned args move onto chat/agent).

Acceptance criteria

  • chat serves qwopus-27b with working vision (image smoke)
  • agent serves the MoE with thinking off by default
  • perf recorded (tok/s, TTFT, GTT) and acceptable
  • redundant gpu-rocmfp4* slots removed; CT105 verified

Blocked by

  • slots: rename roles primary→chat/agent-hermes→agent

Metadata

Metadata

Assignees

No one assigned

    Labels

    ready-for-humanNeeds human implementationslotsSlot roles / model assignment / perf tuningv0.5v0.5 scope — MCP admin + memory wiring across UI and agents

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions