ComfyUI-GGUF (Randy420Marsh fork)

GGUF Quantization support for native ComfyUI models

This is a fork of city96/ComfyUI-GGUF with additional architecture support and conversion tooling that is not present upstream. Highlights specific to this fork:

Text-encoder GGUF support for mistral3 (Ministral-3-3B, used by ERNIE-Image), plus Mistral-family tekken tokenizer reconstruction from GGUF metadata.
Gemma-4 text-encoder loading via tokenizer.json sidecar (fixes the 'str' object has no attribute 'decode' crash you get with the upstream loader).
Optional mmproj_name picker on CLIPLoader (GGUF) for explicit multimodal-projector selection when filename auto-discovery cannot match it.
tools/convert.py extended to support ERNIE-Image and ComfyUI scaled-fp8 dequantization.
tools/gguf_gui.py — a Qt GUI front-end for convert.py with bf16 auto-detection, a full llama-quantize output-type selector, and an Analyze button that picks a quant target from the model's metadata.
tools/inspect_gguf.py --metadata for inspecting per-architecture GGUF metadata (replaces a safetensors-only script that does not work on GGUF).
docs/CONVERSION_GUIDE.md — long-form per-model walkthrough.

Use this repo's URL when installing, cloning the tools, or filing bug reports about the fork-specific features. Issues that exist in the upstream loader (everything outside the bullets above) should be reported to city96/ComfyUI-GGUF where the original author can triage them.

Documentation

The repo carries three layers of documentation depending on what you're trying to do:

Where	Audience	Content
This file (`README.md`)	First-time visitors	What the fork is, what's new vs. upstream, how to install the custom node, list of pre-quantized model links.
`tools/README.md`	Anyone converting their own models	Setup reference doc for `convert.py` + `gguf_gui.py` + the patched `llama-quantize` build. Covers venv, dependencies, `LD_LIBRARY_PATH`, GUI controls, Analyze button, CLI invocations, troubleshooting, platform notes (VS2022, CUDA, macOS).
`docs/CONVERSION_GUIDE.md`	Anyone converting a specific model	Long-form per-model walkthrough: Flux, SD3 / SD3.5, ERNIE-Image (Ministral-3-3B text encoder), Z-Image / Lumina2 / RedCraft ZiB, Hunyuan Video, Wan 2.1, plus the math behind the Analyze recommendation and a quant-types reference.
Wiki (landing page)	Anyone who prefers a browseable nav	Mirror of the above plus the shortcut build recipe.

Wiki pages

Home — index + fork-features → PR table mapping each addition in this fork to the PR that introduced it.
Build the patched llama-quantize — build recipe using the pre-patched city96 branch of Randy420Marsh/llama.cpp so you don't have to git clone llama.cpp + git checkout tags/b3962 + git apply lcpp.patch by hand. Covers CPU / CUDA / Windows builds, LD_LIBRARY_PATH setup, and a smoke test. The manual lcpp.patch route is still documented inside tools/README.md as a fallback.
Conversion-Guide — wiki mirror of docs/CONVERSION_GUIDE.md with the relative ../tools/README.md links rewritten to absolute URLs so they resolve from the wiki.

The in-repo tools/README.md and docs/CONVERSION_GUIDE.md are the source of truth; the wiki pages mirror them for browseability. If you're filing a bug or PR, edit the in-repo files.

These custom nodes provide support for model files stored in the GGUF format popularized by llama.cpp.

While quantization wasn't feasible for regular UNET models (conv2d), transformer/DiT models such as flux seem less affected by quantization. This allows running it in much lower bits per weight variable bitrate quants on low-end GPUs. For further VRAM savings, a node to load a quantized version of the T5 text encoder is also included.

Note: The "Force/Set CLIP Device" is NOT part of this node pack. Do not install it if you only have one GPU. Do not set it to cuda:0 then complain about OOM errors if you do not undestand what it is for. There is not need to copy the workflow above, just use your own workflow and replace the stock "Load Diffusion Model" with the "Unet Loader (GGUF)" node.

Installation

Important

Make sure your ComfyUI is on a recent-enough version to support custom ops when loading the UNET-only.

To install the custom node normally, git clone this repository into your custom nodes folder (ComfyUI/custom_nodes) and install the only dependency for inference (pip install --upgrade gguf)

git clone https://github.com/Randy420Marsh/ComfyUI-GGUF

To install the custom node on a standalone ComfyUI release, open a CMD inside the "ComfyUI_windows_portable" folder (where your run_nvidia_gpu.bat file is) and use the following commands:

git clone https://github.com/Randy420Marsh/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
.\python_embeded\python.exe -s -m pip install -r .\ComfyUI\custom_nodes\ComfyUI-GGUF\requirements.txt

On MacOS sequoia, torch 2.4.1 seems to be required, as 2.6.X nightly versions cause a "M1 buffer is not large enough" error. See this upstream issue for more information/workarounds.

Usage

Simply use the GGUF Unet loader found under the bootleg category. Place the .gguf model files in your ComfyUI/models/unet folder.

LoRA loading is experimental but it should work with just the built-in LoRA loader node(s).

Pre-quantized models:

Initial support for quantizing T5 has also been added recently, these can be used using the various *CLIPLoader (gguf) nodes which can be used inplace of the regular ones. For the CLIP model, use whatever model you were using before for CLIP. The loader can handle both types of files - gguf and regular safetensors/bin.

t5_v1.1-xxl GGUF

See the instructions in the tools folder for how to create your own quants. Long-form per-model walkthroughs (Flux, SD3.5, ERNIE-Image, Lumina/Gemma, Wan/Hunyuan-Video, etc.) live in docs/CONVERSION_GUIDE.md, or on the wiki which also has a dedicated Build the patched llama-quantize page that uses the pre-patched city96 branch of Randy420Marsh/llama.cpp — no manual git apply step required.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
.devin		.devin
docs		docs
tools		tools
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
PYTHON_CODE_QUALITY_TEST_SUITE.md		PYTHON_CODE_QUALITY_TEST_SUITE.md
README.md		README.md
__init__.py		__init__.py
dequant.py		dequant.py
loader.py		loader.py
nodes.py		nodes.py
ops.py		ops.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComfyUI-GGUF (Randy420Marsh fork)

Documentation

Wiki pages

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-GGUF (Randy420Marsh fork)

Documentation

Wiki pages

Installation

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages