Skip to content

gguf: parser for percentage-mixed GGUF filenames#2170

Draft
mishig25 wants to merge 2 commits into
huggingface:mainfrom
mishig25:mishig/gguf-quant-mix-parser
Draft

gguf: parser for percentage-mixed GGUF filenames#2170
mishig25 wants to merge 2 commits into
huggingface:mainfrom
mishig25:mishig/gguf-quant-mix-parser

Conversation

@mishig25
Copy link
Copy Markdown
Collaborator

@mishig25 mishig25 commented May 13, 2026

Adds a parser for percentage-mixed GGUF filenames — files that ship a single artifact with tensors quantized to multiple ggml types and encode the per-type byte share in the name as <pct><quant> tokens.

Example: DeepSeek-V4-Flash-55IQ2_XXS-34Q2_K-07Q8_0-03F16.gguf (antirez/deepseek-v4-gguf).

API

parseGGUFQuantMix("DeepSeek-V4-Flash-55IQ2_XXS-34Q2_K-07Q8_0-03F16.gguf")
// {
//   components: [{pct:55,quant:'IQ2_XXS'}, {pct:34,quant:'Q2_K'}, {pct:7,quant:'Q8_0'}, {pct:3,quant:'F16'}],
//   dominant:   {pct:55, quant:'IQ2_XXS'},
// }

Also exports GGUF_QUANT_MIX_COMPONENT_RE and GGUFQuantMix / GGUFQuantMixComponent types.

parseGGUFQuantLabel delegates to it

When the filename looks like a mix, parseGGUFQuantLabel now returns the dominant component instead of the file-order last match (which today surfaces the smallest tail — F16 in the example above). Plain single-quant filenames are unchanged.

parseGGUFQuantLabel("DeepSeek-V4-Flash-55IQ2_XXS-34Q2_K-07Q8_0-03F16.gguf") // → "IQ2_XXS"
parseGGUFQuantLabel("Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf")               // → "Q4_K_M"  (unchanged)
parseGGUFQuantLabel("Qwen3-4B-UD-Q2_K_XL.gguf")                             // → "UD-Q2_K_XL"  (unchanged)

Notes

  • Returns undefined for plain single-quant filenames so it composes cleanly with the existing fallback.
  • Delimited lookbehind / lookahead avoids false positives on size labels like 7B.
  • Quant alternation is length-sorted so IQ2_XXS wins over the prefix IQ2_XS.

Note

Low Risk
Low risk: adds a new filename parser and adjusts parseGGUFQuantLabel output only for percentage-mixed names, with extensive unit test coverage; no GGUF binary parsing or data handling logic is changed.

Overview
Adds support for percentage-mixed GGUF filenames (e.g. Model-55IQ2_XXS-34Q2_K-07Q8_0-03F16.gguf) by introducing parseGGUFQuantMix, GGUF_QUANT_MIX_COMPONENT_RE, and the GGUFQuantMix* types in @huggingface/tasks, and re-exporting them from packages/gguf.

Updates parseGGUFQuantLabel to detect these mixed names and return the dominant (largest %) quant instead of the last match, and adds comprehensive tests covering suffixes like -imatrix/-MTP, path prefixes, size-label false positives, and non-quant storage types.

Reviewed by Cursor Bugbot for commit f4a6f75. Bugbot is set up for automated code reviews on this repo. Configure here.

Adds parseGGUFQuantMix for GGUF filenames that encode a per-tensor-class
byte-share recipe rather than a single quant label, e.g.

  DeepSeek-V4-Flash-55IQ2_XXS-34Q2_K-07Q8_0-03F16-imatrix.gguf

Returns { components, dominant } or undefined for plain single-quant
filenames (so it composes cleanly with parseGGUFQuantLabel).

Also extends parseGGUFQuantLabel: when a mix is detected, return the
dominant (largest-pct) component rather than the file-order last match
(which would surface the smallest tail — F16 in the example above).

Originating use case: huggingface.co/antirez/deepseek-v4-gguf, where
DeepSeek V4 Flash is shipped with an asymmetric MoE recipe (routed
experts at IQ2_XXS / Q2_K, shared experts and attention projections at
Q8_0, embed / router at F16). No single LLAMA_FTYPE_MOSTLY_* label
captures the file's behavior, hence the per-quant breakdown.

Implementation notes:
- delimited lookbehind / lookahead so size labels like "7B" / "8B"
  aren't misread as components;
- quant alternation length-sorted so "IQ2_XXS" wins over its prefix
  "IQ2_XS";
- path prefix stripped before parsing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mishig25 mishig25 force-pushed the mishig/gguf-quant-mix-parser branch from 8bf5a36 to a6e16c8 Compare May 13, 2026 08:52
@mishig25 mishig25 marked this pull request as ready for review May 13, 2026 09:03
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a6e16c8. Configure here.

Comment thread packages/tasks/src/gguf.ts
Copy link
Copy Markdown
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good based on the tests

GGMLQuantizationType includes I8/I16/I32/I64/F64 alongside the real
quantization types. Those are integer/float storage types for metadata
tensors, not quant methods. Including them in the mix component
alternation meant a filename containing two dash-delimited tokens like
"-32I32-" and "-16I16-" would be wrongly recognized as a mix recipe.

Filter them out before building GGUF_QUANT_MIX_COMPONENT_RE and add a
regression test (incl. a real-world-shaped check that F64 is dropped
while F32 + Q8_0 in the same filename still parse as a mix).

Reported by Cursor Bugbot on PR huggingface#2170.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants