Fix ROCm build/runtime naming and MTP model mapping#156
Open
adv0r wants to merge 2 commits into
Open
Conversation
3116fc8 to
1d2f608
Compare
1d2f608 to
9960bd2
Compare
9960bd2 to
31b7425
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--rocmand--backend rocmas the canonical user-facing names for ROCm builds--cudaand--backend cudaas compatibility aliases because the shared GPU backend still uses the CUDA enum internallyDS4_ROCM_BUILDto both C and HIP/CUDA compilation duringmake rocmrsqrtf()build failures by using1.0f / sqrtf(...)for hipBLAS alpha constantsImplementation notes
This intentionally does not rename the internal
DS4_BACKEND_CUDAenum or the shared CUDA/HIP implementation symbols. The ROCm branch still maps HIP onto the existing CUDA graph backend internally, so this patch keeps the internal backend identity stable and adds a separate compile-timeDS4_GPU_BACKEND_CLI_NAMEfor the user-facing CLI spelling.The MTP fix addresses a separate runtime issue found while testing the optional MTP GGUF on ROCm:
--mtp, startup registered the primary model map and then registered the MTP model map;g_model_host_basepoint at the MTP file;g_model_registeredwas true;The fix keeps the full-map shortcut scoped to the matching
model_map, keeps caching checks scoped the same way, avoids promoting the MTP file to the CUDA/HIP global model map, and still prepares the primary model tensor cache when MTP is enabled. MTP weights can then be resolved through the existing per-range path instead of replacing the primary model state.Validation
Built on AMD ROCm 7.2 /
gfx1151with:make rocm ROCM_PATH=/opt/rocm-7.2.0 ROCM_ARCH=gfx1151 -j$(nproc)Checked
./ds4 --help,./ds4-server --help, and./ds4-bench --helpexpose--rocm,--backend rocm, and the compatibilitycudabackend value in ROCm builds.Runtime smoke on a Radeon 8060S /
gfx1151:./ds4 --inspect --backend rocm --ctx 32768loads the 80.76 GiB GGUF and logsROCm backend initialized on Radeon 8060S Graphics (gfx1151)./ds4 --backend rocm --ctx 32768 --nothink -n 32 --temp 0 -p ...returnsDS4_OK./ds4-server --backend rocm --host 127.0.0.1 --port 18188 --ctx 32768 --kv-disk-dir /scratch/ds4-kv ...serves both/v1/chat/completionsand/v1/messages./ds4-bench --backend rocm --ctx-start 2048 --ctx-max 2048 --ctx-alloc 4096 --gen-tokens 32reports 39.33 prefill tok/s and 8.99 generation tok/sMTP regression check on the same machine:
Before the last commit, this command shape booted but crashed on first prompt with a ROCm GPU page fault:
Crash signature:
After the fix:
ctx=16384 --mtp-draft 2returnedDS4 MTP FIX OKctx=65536 --mtp-draft 2returnedDS4 MTP 65K FIX OKROCm startup model cache prepared 80.76 GiB of tensor spansThis only fixes the crash. On this APU, the MTP path was not a throughput win in my short benchmark: non-MTP 256-token decode was about 10.18 tok/s, while patched MTP was about 8.44 tok/s.