Skip to content

Fix ROCm build/runtime naming and MTP model mapping#156

Open
adv0r wants to merge 2 commits into
antirez:rocmfrom
adv0r:codex/rocm-backend-name
Open

Fix ROCm build/runtime naming and MTP model mapping#156
adv0r wants to merge 2 commits into
antirez:rocmfrom
adv0r:codex/rocm-backend-name

Conversation

@adv0r
Copy link
Copy Markdown

@adv0r adv0r commented May 15, 2026

Summary

  • add --rocm and --backend rocm as the canonical user-facing names for ROCm builds
  • keep --cuda and --backend cuda as compatibility aliases because the shared GPU backend still uses the CUDA enum internally
  • pass DS4_ROCM_BUILD to both C and HIP/CUDA compilation during make rocm
  • make the low-level GPU runtime logs emitted by the shared CUDA/HIP file print ROCm/hipBLAS in ROCm builds instead of CUDA/cuBLAS
  • fix two ROCm host-side rsqrtf() build failures by using 1.0f / sqrtf(...) for hipBLAS alpha constants
  • fix the CUDA/HIP MTP support-model path so a secondary MTP GGUF does not replace the primary model mapping state

Implementation notes

This intentionally does not rename the internal DS4_BACKEND_CUDA enum or the shared CUDA/HIP implementation symbols. The ROCm branch still maps HIP onto the existing CUDA graph backend internally, so this patch keeps the internal backend identity stable and adds a separate compile-time DS4_GPU_BACKEND_CLI_NAME for the user-facing CLI spelling.

The MTP fix addresses a separate runtime issue found while testing the optional MTP GGUF on ROCm:

  • the CUDA/HIP weight resolver has process-global state for the current full model map;
  • with --mtp, startup registered the primary model map and then registered the MTP model map;
  • the second registration made g_model_host_base point at the MTP file;
  • the full-map shortcut then treated unrelated model maps as directly device-readable whenever g_model_registered was true;
  • the primary model preload was also skipped when MTP was present.

The fix keeps the full-map shortcut scoped to the matching model_map, keeps caching checks scoped the same way, avoids promoting the MTP file to the CUDA/HIP global model map, and still prepares the primary model tensor cache when MTP is enabled. MTP weights can then be resolved through the existing per-range path instead of replacing the primary model state.

Validation

Built on AMD ROCm 7.2 / gfx1151 with:

make rocm ROCM_PATH=/opt/rocm-7.2.0 ROCM_ARCH=gfx1151 -j$(nproc)

Checked ./ds4 --help, ./ds4-server --help, and ./ds4-bench --help expose --rocm, --backend rocm, and the compatibility cuda backend value in ROCm builds.

Runtime smoke on a Radeon 8060S / gfx1151:

  • ./ds4 --inspect --backend rocm --ctx 32768 loads the 80.76 GiB GGUF and logs ROCm backend initialized on Radeon 8060S Graphics (gfx1151)
  • ./ds4 --backend rocm --ctx 32768 --nothink -n 32 --temp 0 -p ... returns DS4_OK
  • ./ds4-server --backend rocm --host 127.0.0.1 --port 18188 --ctx 32768 --kv-disk-dir /scratch/ds4-kv ... serves both /v1/chat/completions and /v1/messages
  • ./ds4-bench --backend rocm --ctx-start 2048 --ctx-max 2048 --ctx-alloc 4096 --gen-tokens 32 reports 39.33 prefill tok/s and 8.99 generation tok/s

MTP regression check on the same machine:

Before the last commit, this command shape booted but crashed on first prompt with a ROCm GPU page fault:

./ds4-server \
  --model DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix.gguf \
  --backend rocm \
  --ctx 65536 \
  --mtp DeepSeek-V4-Flash-MTP-Q4K-Q8_0-F32.gguf \
  --mtp-draft 2

Crash signature:

Memory access fault by GPU node-1 ... Reason: Page not present or supervisor privilege.

After the fix:

  • ctx=16384 --mtp-draft 2 returned DS4 MTP FIX OK
  • ctx=65536 --mtp-draft 2 returned DS4 MTP 65K FIX OK
  • startup logs show the primary model tensor cache is prepared again with MTP enabled: ROCm startup model cache prepared 80.76 GiB of tensor spans

This only fixes the crash. On this APU, the MTP path was not a throughput win in my short benchmark: non-MTP 256-token decode was about 10.18 tok/s, while patched MTP was about 8.44 tok/s.

@adv0r adv0r force-pushed the codex/rocm-backend-name branch from 3116fc8 to 1d2f608 Compare May 15, 2026 07:10
@adv0r adv0r marked this pull request as ready for review May 15, 2026 07:11
@adv0r adv0r force-pushed the codex/rocm-backend-name branch from 1d2f608 to 9960bd2 Compare May 15, 2026 07:51
@adv0r adv0r force-pushed the codex/rocm-backend-name branch from 9960bd2 to 31b7425 Compare May 15, 2026 07:53
@adv0r adv0r changed the title Add ROCm backend CLI alias Fix ROCm build/runtime naming and MTP model mapping May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant