gguf: parser for percentage-mixed GGUF filenames#2170
Draft
mishig25 wants to merge 2 commits into
Draft
Conversation
Adds parseGGUFQuantMix for GGUF filenames that encode a per-tensor-class
byte-share recipe rather than a single quant label, e.g.
DeepSeek-V4-Flash-55IQ2_XXS-34Q2_K-07Q8_0-03F16-imatrix.gguf
Returns { components, dominant } or undefined for plain single-quant
filenames (so it composes cleanly with parseGGUFQuantLabel).
Also extends parseGGUFQuantLabel: when a mix is detected, return the
dominant (largest-pct) component rather than the file-order last match
(which would surface the smallest tail — F16 in the example above).
Originating use case: huggingface.co/antirez/deepseek-v4-gguf, where
DeepSeek V4 Flash is shipped with an asymmetric MoE recipe (routed
experts at IQ2_XXS / Q2_K, shared experts and attention projections at
Q8_0, embed / router at F16). No single LLAMA_FTYPE_MOSTLY_* label
captures the file's behavior, hence the per-quant breakdown.
Implementation notes:
- delimited lookbehind / lookahead so size labels like "7B" / "8B"
aren't misread as components;
- quant alternation length-sorted so "IQ2_XXS" wins over its prefix
"IQ2_XS";
- path prefix stripped before parsing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8bf5a36 to
a6e16c8
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a6e16c8. Configure here.
pcuenca
approved these changes
May 13, 2026
Member
pcuenca
left a comment
There was a problem hiding this comment.
Looks good based on the tests
GGMLQuantizationType includes I8/I16/I32/I64/F64 alongside the real quantization types. Those are integer/float storage types for metadata tensors, not quant methods. Including them in the mix component alternation meant a filename containing two dash-delimited tokens like "-32I32-" and "-16I16-" would be wrongly recognized as a mix recipe. Filter them out before building GGUF_QUANT_MIX_COMPONENT_RE and add a regression test (incl. a real-world-shaped check that F64 is dropped while F32 + Q8_0 in the same filename still parse as a mix). Reported by Cursor Bugbot on PR huggingface#2170. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Adds a parser for percentage-mixed GGUF filenames — files that ship a single artifact with tensors quantized to multiple ggml types and encode the per-type byte share in the name as
<pct><quant>tokens.Example:
DeepSeek-V4-Flash-55IQ2_XXS-34Q2_K-07Q8_0-03F16.gguf(antirez/deepseek-v4-gguf).API
Also exports
GGUF_QUANT_MIX_COMPONENT_REandGGUFQuantMix/GGUFQuantMixComponenttypes.parseGGUFQuantLabeldelegates to itWhen the filename looks like a mix,
parseGGUFQuantLabelnow returns the dominant component instead of the file-order last match (which today surfaces the smallest tail —F16in the example above). Plain single-quant filenames are unchanged.Notes
undefinedfor plain single-quant filenames so it composes cleanly with the existing fallback.7B.IQ2_XXSwins over the prefixIQ2_XS.Note
Low Risk
Low risk: adds a new filename parser and adjusts
parseGGUFQuantLabeloutput only for percentage-mixed names, with extensive unit test coverage; no GGUF binary parsing or data handling logic is changed.Overview
Adds support for percentage-mixed GGUF filenames (e.g.
Model-55IQ2_XXS-34Q2_K-07Q8_0-03F16.gguf) by introducingparseGGUFQuantMix,GGUF_QUANT_MIX_COMPONENT_RE, and theGGUFQuantMix*types in@huggingface/tasks, and re-exporting them frompackages/gguf.Updates
parseGGUFQuantLabelto detect these mixed names and return the dominant (largest %) quant instead of the last match, and adds comprehensive tests covering suffixes like-imatrix/-MTP, path prefixes, size-label false positives, and non-quant storage types.Reviewed by Cursor Bugbot for commit f4a6f75. Bugbot is set up for automated code reviews on this repo. Configure here.