bfhp: pre-zero quantized padding in loader; add --gpu-host-import flag by doctorjei · Pull Request #9 · domvox/llama.cpp-turboquant-hip

doctorjei · 2026-04-26T04:40:25Z

Overview

Fix-forward for 4b95731 (merged as a9f3521). The earlier review-feedback commit moved padding-zeroing for external (bfhp) buffers from GPU cudaMemset to host-side memset through ctx->host_ptr, but assumption of writability doesn't always hold; it's fine for --hugepages (PROT_RW during load, PROT_R after) but not for file-mmap (PROT_R from the start) — resulting in SIGSEGV on any non-hugepages model load with bfhp active (yikes).

This Fix

Moves zero-pass into loader (where mapping writability / lifetime are known), removes unsafe host-memset from ggml-cuda.cu, and adds --gpu-host-import flag so users can opt out (or explicitly opt in) for non-hugepages mmap loads.

The fix is broken into 3 commits for readability:

adds GPU host import switch (bfhp) — Pure plumbing; --gpu-host-import / --no-gpu-host-import CLI flags (env LLAMA_ARG_GPU_HOST_IMPORT), and common→model wiring. No behavioral change on its own.
adds zero-fill protection for buffers from host pages in model loader. (walks tensors backed by mapping idx zeros out slide in host mapping. Early-returns for hugetlb mappings (kernel already zero-fills anon allocation). For file-mmap, does a scoped mprotect(PROT_RW) → memset → restore.
streamlines zero-fill to avoid regression — init_tensor no longer host-memsets external buffers (now in loader), so external branch collapses to a no-op and surviving cudaMemset only runs for owned device buffers. Replaces the four NULL vtable slots imported buffer interface with GGML_ABORT wrappers.

Results (Benchmarks)

build	path	PP t/s	TG t/s	notes
pre-bfhp `73a6481`	mmap	(baseline)	(baseline)
this PR	mmap, `--no-gpu-host-import`	≈ baseline	≈ baseline	bfhp disabled, loader zero-pass skipped
this PR	mmap, `--gpu-host-import`	≈ baseline	−5.2% TG	bfhp on hipMalloc-cheaper path
this PR	`--hugepages` (AUTO ⇒ on)	≈ baseline	+4.0% TG	one physical copy of weights, +17.5 GiB free VRAM

Considerations

The --gpu-host-import and --no-gpu-host-import flags were added because, at the moment, host import is only performant with the hugepages implementation. However, it is likely that this solution could be extended to solve problems with other architectures, and there may be cases were a user or developer may need or want to use imported host pages without hugepages.

Due to the performance issue of bfhp without hugepages, the AUTO setting for bfhp defaults to match hugepages - on or off together.

bfhp = buffer from host pointer. First commit to add support for the flag; implementation to follow.

When bfhp is active, add zero-fill protection; collapses to no-op when it is known that pages are already "clean".

Activated more nuanced zero-fill to avoid penalty when zero-fill is already known (e.g., hugepages)

Jeremiah Blanchard added 3 commits April 25, 2026 23:14

Adds GPU host import switch (bfhp)

df6660a

bfhp = buffer from host pointer. First commit to add support for the flag; implementation to follow.

Add zero-fill protection for buffers from host pages

f4c0208

When bfhp is active, add zero-fill protection; collapses to no-op when it is known that pages are already "clean".

Streamline zero-fill to avoid regression

92a824a

Activated more nuanced zero-fill to avoid penalty when zero-fill is already known (e.g., hugepages)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bfhp: pre-zero quantized padding in loader; add --gpu-host-import flag#9

bfhp: pre-zero quantized padding in loader; add --gpu-host-import flag#9
doctorjei wants to merge 3 commits into
domvox:mainfrom
doctorjei:tq-hip-bfhp-pr

doctorjei commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

doctorjei commented Apr 26, 2026

Overview

This Fix

Results (Benchmarks)

Considerations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant