feat(qwen35): derive scalars from weights, assert vs GGUF metadata#359
Merged
Merged
Conversation
Load-time guard: after loading wq/wk, derive head_dim/n_head/n_head_kv from tensor shapes and assert against GGUF-declared values; set_last_error+return false on mismatch. Makes the stale-scalar-at-graph-build bug class impossible. DRY: extracted verify_derived_scalars() pure helper into server/src/common/derived_scalars.h (no IO, header-only); wired at both new sites (draft loader layer 0, qwen35 target first full-attn layer). gemma4 inline block is a silent override not an assert; left as-is with comment. Unit test: server/test/test_derived_scalars.cpp — 13 assertions, 0 failures.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-carved from #274 (commit
5819648), DRY'd into a shared helper + the unit test the original lacked.After loading weights, the qwen35 target loader and the dflash draft loader derive
head_dim/n_head/n_head_kvfrom the actual weight-tensor shapes and assert against the GGUF-declared hparams; on mismatch →set_last_error+ return false at load time, making the "stale scalar at graph-build time" bug class structurally impossible. Load-time only, no runtime cost; well-formed GGUFs pass through unchanged.DRY: pure
verify_derived_scalars()inserver/src/common/derived_scalars.h, unit-tested (13 cases). The qwen35 target Q-projection packs Q‖gate (ne[1] = n_head·n_embd_head_k·2, per the loader's own contract); the draft loader uses the standardn_head·head_dim. gemma4 has an equivalent inline check on a different (unpark) path with different semantics — left as-is, noted in the header.Validation: helper unit-tested (13 cases); both modified loader TUs compile clean against the new call sites. Real-GGUF end-to-end load not yet exercised — this is a defensive load-time check that only fires on a genuine weight-shape↔metadata mismatch.
5 files, +266.