feat: comfykit awq w4a16 modulation (CORE-31) by HK416-TYPED · Pull Request #13580 · Comfy-Org/ComfyUI

HK416-TYPED · 2026-04-27T07:50:43Z

Wires comfy-kitchen's int4 quantization layouts through ComfyUI's MixedPrecisionOps so the existing UNETLoader path can dispatch optimized int4 kernels for two checkpoint families used by
Qwen-Image-Edit-2511 (and similar SVD-LoRA + AWQ-modulation models).

`353978a9` SVDQuant W4A4 integration

quant_ops.py: register TensorCoreSVDQuantW4A4Layout when comfy-kitchen exposes it; gate the kitchen CUDA backend on cuda ≥ 13 (the optimized kitchen CUDA ops are validated against cu13+;
older cu falls back to eager).
ops.py: handle svdquant_w4a4 quant_format by loading weight_scale / proj_down / proj_up / smooth_factor into TensorCoreSVDQuantW4A4Layout.Params, with the img_mlp.net.2 /
txt_mlp.net.2 fallback for act_unsigned (post-GELU u4.s4 MMA path).
Pairs with comfy-kitchen #36 (feat/svdquant-w4a4-kitchen-native).

`3ddcc095` AWQ W4A16 modulation integration

quant_ops.py: detect TensorCoreAWQW4A16Layout, stub for the no-kitchen fallback (mirrors W4A4 pattern), register the layout class, add awq_w4a16 to QUANT_ALGOS (storage int8 packed
uint4, params {weight_scale, weight_zero}, default group_size=64).
ops.py: add the awq_w4a16 branch in MixedPrecisionOps.Linear._load_from_state_dict that constructs Params(scale, zeros, group_size, …) and wraps qweight into a QuantizedTensor;
F.linear then dispatches to ck.gemv_awq_w4a16 via the layout's aten handlers.
Targets the ~10 GB inflation in Qwen-Image-Edit kitchen-native checkpoints, where the modulation linears (img_mod.1 / txt_mod.1) currently dominate disk + VRAM because they're materialized
as plain bf16 Linear during conversion.
Pairs with comfy-kitchen feat/awq-w4a16-modulation (companion PR coming after Low resolution on HiDPi monitor #36 lands).

Verification

Qwen-Image-Edit r96 ComfyUI E2E sampling: kitchen-native AWQ checkpoint runs end-to-end, image visually equivalent to the bf16-dequantized baseline (PSNR ~33 dB; differences sit at the bf16 ULP
precision floor of (nibble - 8) * scale + zero chains).
Stratified sample across 6 variant families (balanced/fast/mid/quality × ranks 32/64/96/128 × base/lightning4/lightning8) all sample successfully with the right.

…major) quant_ops.py: register TensorCoreSVDQuantW4A4Layout when comfy-kitchen exposes it; gate the kitchen CUDA backend on cuda >= 13 (the optimized kitchen CUDA ops are validated against cu13+ runtimes; on older cu the backend falls back to eager). ops.py: handle svdquant_w4a4 quant_format by loading weight_scale / proj_down / proj_up / smooth_factor into TensorCoreSVDQuantW4A4Layout.Params, with the img_mlp.net.2 / txt_mlp.net.2 fallback for act_unsigned. Targets the row-major kitchen-native kernels on feat/svdquant-w4a4-kitchen-native; the verbatim zgemm path is a sibling branch.

Wires comfy-kitchen's TensorCoreAWQW4A16Layout (introduced on feat/awq-w4a16-modulation) into ComfyUI's MixedPrecisionOps so checkpoints that tag modulation linears with comfy_quant.format = "awq_w4a16" get their (qweight, weight_scale, weight_zero) loaded into the kitchen layout class instead of being dequantized to bf16 plain Linear at conversion time. quant_ops.py: - detect TensorCoreAWQW4A16Layout availability and stub it out for the no-kitchen fallback (mirrors the SVDQuant W4A4 pattern) - register the layout class + add "awq_w4a16" to QUANT_ALGOS (storage_t = int8 packed uint4, parameters = {weight_scale, weight_zero}, default group_size = 64) ops.py: add the awq_w4a16 branch in MixedPrecisionOps.Linear._load_from_state_dict that constructs Params(scale, zeros, group_size, ...) and wraps qweight into a QuantizedTensor — F.linear then dispatches to ck.gemv_awq_w4a16 via the layout's aten handlers. Pairs with comfy-kitchen feat/awq-w4a16-modulation. Targets the ~10 GB inflation in Qwen-Image-Edit kitchen-native checkpoints, where the modulation linears (img_mod.1 / txt_mod.1) currently dominate disk + VRAM because they're materialized as plain bf16 Linear during conversion.

coderabbitai · 2026-04-27T07:56:00Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c9ae13fa-9bb4-4d41-8c12-b259c1b6cf64

📥 Commits

Reviewing files that changed from the base of the PR and between b6f438d and 2322ff5.

📒 Files selected for processing (2)

comfy/ops.py
comfy/quant_ops.py

🚧 Files skipped from review as they are similar to previous changes (2)

comfy/ops.py
comfy/quant_ops.py

📝 Walkthrough

Walkthrough

This PR adds two quantization formats (svdquant_w4a4, awq_w4a16) with conditional kitchen-backed layout classes and registry entries, updates comfy/ops.py to validate quant_format, deserialize format-specific weight params (including act_unsigned), and serialize act_unsigned in state_dict, and changes inference to only pre-wrap inputs into QuantizedTensor when the resolved layout declares it quantizes inputs.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: comfykit awq w4a16 modulation (CORE-31)' clearly identifies the main feature addition—AWQ W4A16 quantization modulation support—and references the ticket number.
Description check	✅ Passed	The description thoroughly explains the changes, implementation details, and verification results, directly corresponding to the code modifications in ops.py and quant_ops.py.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

comfy/ops.py (2)

951-956: Friendlier error when a checkpoint declares a format the kitchen build doesn't support.

With this PR, svdquant_w4a4 and awq_w4a16 are only registered into QUANT_ALGOS when the corresponding kitchen layout import succeeds. A user on an older comfy_kitchen that loads a Qwen-Image-Edit AWQ checkpoint will hit KeyError: 'awq_w4a16' at line 954 rather than a message pointing them at the actual problem. Consider raising a clear ValueError listing the supported formats.

♻️ Suggested message

-                    qconfig = QUANT_ALGOS[self.quant_format]
+                    if self.quant_format not in QUANT_ALGOS:
+                        raise ValueError(
+                            f"Quantization format '{self.quant_format}' for layer {layer_name} "
+                            f"is not available in this build (supported: {sorted(QUANT_ALGOS.keys())}). "
+                            f"Update comfy_kitchen to enable it."
+                        )
+                    qconfig = QUANT_ALGOS[self.quant_format]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@comfy/ops.py` around lines 951 - 956, The code assumes self.quant_format
exists in QUANT_ALGOS and will raise a KeyError; change the block to explicitly
check whether self.quant_format is in QUANT_ALGOS and if not raise a ValueError
that names the offending format (self.quant_format) and the layer (layer_name)
and lists supported formats (sorted QUANT_ALGOS.keys()), then continue to look
up qconfig = QUANT_ALGOS[self.quant_format], set self.layout_type and call
get_layout_class(self.layout_type) as before.

1156-1157: Optional micro-optim: cache layout_cls instead of resolving it on every forward.

get_layout_class(self.layout_type) runs on every forward() invocation per linear. It's cheap (dict lookup) but redundant — self.layout_type is set once at load time. Caching once during _load_from_state_dict would also let you compute layout_quantizes_input ahead of time, removing two attribute lookups from the inference hot path.

♻️ Sketch

-                    qconfig = QUANT_ALGOS[self.quant_format]
-                    self.layout_type = qconfig["comfy_tensor_layout"]
-                    layout_cls = get_layout_class(self.layout_type)
+                    qconfig = QUANT_ALGOS[self.quant_format]
+                    self.layout_type = qconfig["comfy_tensor_layout"]
+                    layout_cls = get_layout_class(self.layout_type)
+                    self._layout_quantizes_input = getattr(layout_cls, "QUANTIZES_INPUT", True)

-                    layout_cls = get_layout_class(self.layout_type)
-                    layout_quantizes_input = getattr(layout_cls, "QUANTIZES_INPUT", True)
-
-                    if layout_quantizes_input:
+                    if getattr(self, "_layout_quantizes_input", True):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@comfy/ops.py` around lines 1156 - 1157, Cache the resolved layout class and
the boolean QUANTIZES_INPUT during model load instead of resolving them in
forward: in _load_from_state_dict (or wherever self.layout_type is set) call
get_layout_class(self.layout_type) once, store it on the instance (e.g.,
self._layout_cls) and compute self._layout_quantizes_input =
getattr(self._layout_cls, "QUANTIZES_INPUT", True); then update forward to use
self._layout_cls and self._layout_quantizes_input instead of calling
get_layout_class(self.layout_type) and getattr each invocation.

comfy/quant_ops.py (1)

60-83: Optional: align stub-fallback placement with MXFP8 for consistency.

_CKMxfp8Layout defines its stub via if not _CK_MXFP8_AVAILABLE: at module scope (lines 60-62), whereas the new layouts define the stub inside the inner except ImportError branch. Both are functionally equivalent (since lines 40-44 already cover the outer-_CK_AVAILABLE-False case), but mirroring the MXFP8 pattern would make the three blocks read identically.

♻️ Sketch of the consistency-aligned shape

 _CK_SVDQUANT_W4A4_AVAILABLE = False
 if _CK_AVAILABLE:
     try:
         from comfy_kitchen.tensor import TensorCoreSVDQuantW4A4Layout as _CKSVDQuantW4A4Layout
         _CK_SVDQUANT_W4A4_AVAILABLE = True
     except ImportError:
         logging.info("comfy_kitchen does not expose SVDQuant W4A4 layout; int4 SVDQuant checkpoints will not be supported.")
-        class _CKSVDQuantW4A4Layout:
-            pass
+
+if not _CK_SVDQUANT_W4A4_AVAILABLE and _CK_AVAILABLE:
+    class _CKSVDQuantW4A4Layout:
+        pass

(Same shape for AWQ.)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@comfy/quant_ops.py` around lines 60 - 83, The three layout stubs are
inconsistent: _CKMxfp8Layout is defined at module scope under "if not
_CK_MXFP8_AVAILABLE" while _CKSVDQuantW4A4Layout and _CKAWQW4A16Layout are
defined inside the except ImportError branches; make them consistent by
moving/adding their stub definitions to the same outer-pattern (i.e., define a
stub class under "if not _CK_SVDQUANT_W4A4_AVAILABLE:" and "if not
_CK_AWQ_W4A16_AVAILABLE:" at module scope like _CKMxfp8Layout) and keep the
try/except only responsible for setting the real import and toggling the
_CK_*_AVAILABLE flags, referencing the symbols _CKSVDQuantW4A4Layout and
_CKAWQW4A16Layout to locate the code to change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy/ops.py`:
- Around line 1000-1028: The heuristic that forces act_unsigned True for layers
matching ".img_mlp.net.2" or ".txt_mlp.net.2" is too broad; update the condition
in the svdquant_w4a4 branch so the override only runs when the layer_conf does
not explicitly contain the act_unsigned key (e.g., check presence/absence in
layer_conf) and preserve the original act_unsigned when it is explicitly set to
False, and add a one-line TODO comment referencing the Qwen-Image-Edit
conversion path so future re-exports can remove this workaround; locate and
modify the code around the symbols act_unsigned, layer_conf, layer_name inside
the svdquant_w4a4 handling block.

---

Nitpick comments:
In `@comfy/ops.py`:
- Around line 951-956: The code assumes self.quant_format exists in QUANT_ALGOS
and will raise a KeyError; change the block to explicitly check whether
self.quant_format is in QUANT_ALGOS and if not raise a ValueError that names the
offending format (self.quant_format) and the layer (layer_name) and lists
supported formats (sorted QUANT_ALGOS.keys()), then continue to look up qconfig
= QUANT_ALGOS[self.quant_format], set self.layout_type and call
get_layout_class(self.layout_type) as before.
- Around line 1156-1157: Cache the resolved layout class and the boolean
QUANTIZES_INPUT during model load instead of resolving them in forward: in
_load_from_state_dict (or wherever self.layout_type is set) call
get_layout_class(self.layout_type) once, store it on the instance (e.g.,
self._layout_cls) and compute self._layout_quantizes_input =
getattr(self._layout_cls, "QUANTIZES_INPUT", True); then update forward to use
self._layout_cls and self._layout_quantizes_input instead of calling
get_layout_class(self.layout_type) and getattr each invocation.

In `@comfy/quant_ops.py`:
- Around line 60-83: The three layout stubs are inconsistent: _CKMxfp8Layout is
defined at module scope under "if not _CK_MXFP8_AVAILABLE" while
_CKSVDQuantW4A4Layout and _CKAWQW4A16Layout are defined inside the except
ImportError branches; make them consistent by moving/adding their stub
definitions to the same outer-pattern (i.e., define a stub class under "if not
_CK_SVDQUANT_W4A4_AVAILABLE:" and "if not _CK_AWQ_W4A16_AVAILABLE:" at module
scope like _CKMxfp8Layout) and keep the try/except only responsible for setting
the real import and toggling the _CK_*_AVAILABLE flags, referencing the symbols
_CKSVDQuantW4A4Layout and _CKAWQW4A16Layout to locate the code to change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d527adc4-fdbb-40b5-980b-8ca75fe532ee

📥 Commits

Reviewing files that changed from the base of the PR and between 115f418 and 3ddcc09.

📒 Files selected for processing (2)

comfy/ops.py
comfy/quant_ops.py

coderabbitai · 2026-04-27T07:56:03Z

+                    elif self.quant_format == "svdquant_w4a4":
+                        # SVDQuant W4A4: per-group weight scales + low-rank correction
+                        # (proj_down, proj_up) + activation smoothing (smooth_factor)
+                        wscales = self._load_scale_param(state_dict, prefix, "weight_scale", device, manually_loaded_keys)
+                        proj_down = self._load_scale_param(state_dict, prefix, "proj_down", device, manually_loaded_keys)
+                        proj_up = self._load_scale_param(state_dict, prefix, "proj_up", device, manually_loaded_keys)
+                        smooth_factor = self._load_scale_param(state_dict, prefix, "smooth_factor", device, manually_loaded_keys)
+                        act_unsigned = bool(layer_conf.get("act_unsigned", False))
+
+                        # Early Qwen-Image conversion artifacts did not persist the
+                        # fused GELU -> fc2 unsigned-activation flag. Those layers
+                        # are the second linear in the feed-forward block.
+                        if not act_unsigned and (
+                            layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2")
+                        ):
+                            act_unsigned = True
+
+                        if any(t is None for t in (wscales, proj_down, proj_up, smooth_factor)):
+                            raise ValueError(f"Missing SVDQuant W4A4 parameters for layer {layer_name}")
+
+                        params = layout_cls.Params(
+                            scale=wscales,
+                            orig_dtype=MixedPrecisionOps._compute_dtype,
+                            orig_shape=(self.out_features, self.in_features),
+                            proj_down=proj_down,
+                            proj_up=proj_up,
+                            smooth_factor=smooth_factor,
+                            act_unsigned=act_unsigned,
+                        )


⚠️ Potential issue | 🟡 Minor

act_unsigned layer-name heuristic is a load-bearing workaround — please add a TODO and narrow the trigger.

The endswith(".img_mlp.net.2") or endswith(".txt_mlp.net.2") rule unconditionally flips act_unsigned to True for any model whose state_dict happens to use those submodule names — not just early Qwen-Image-Edit kitchen-native checkpoints. If a different topology (or future SVDQuant export) reuses the same names with signed activations, those layers will dispatch with the wrong activation domain and produce silently corrupted outputs (no exception, just bad samples). Two suggestions:

Gate the override so it only triggers when the format truly requires a fused-GELU upstream signal (e.g., also check that act_unsigned is absent from layer_conf rather than merely False — a future exporter explicitly writing act_unsigned: false would currently still get overridden).

Add a TODO referencing the Qwen-Image-Edit conversion path so this can be deleted once re-exported checkpoints carry the flag.

🛡️ Suggested narrowing

- act_unsigned = bool(layer_conf.get("act_unsigned", False)) - - # Early Qwen-Image conversion artifacts did not persist the - # fused GELU -> fc2 unsigned-activation flag. Those layers - # are the second linear in the feed-forward block. - if not act_unsigned and ( - layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2") - ): - act_unsigned = True + # TODO(comfykit-awq): drop the layer-name heuristic once all SVDQuant + # exporters persist `act_unsigned`. Only override when the flag is + # absent from layer_conf, not when it's explicitly false. + if "act_unsigned" in layer_conf: + act_unsigned = bool(layer_conf["act_unsigned"]) + else: + act_unsigned = layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@comfy/ops.py` around lines 1000 - 1028, The heuristic that forces act_unsigned True for layers matching ".img_mlp.net.2" or ".txt_mlp.net.2" is too broad; update the condition in the svdquant_w4a4 branch so the override only runs when the layer_conf does not explicitly contain the act_unsigned key (e.g., check presence/absence in layer_conf) and preserve the original act_unsigned when it is explicitly set to False, and add a one-line TODO comment referencing the Qwen-Image-Edit conversion path so future re-exports can remove this workaround; locate and modify the code around the symbols act_unsigned, layer_conf, layer_name inside the svdquant_w4a4 handling block.

HK416-TYPED added 2 commits April 27, 2026 07:33

HK416-TYPED requested review from Kosinkadink, comfyanonymous and guill as code owners April 27, 2026 07:50

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

alexisrolland assigned comfyanonymous Apr 27, 2026

alexisrolland changed the title ~~Feat/comfykit awq w4a16 modulation(CORE-31)~~ feat: comfykit awq w4a16 modulation (CORE-31) Apr 27, 2026

alexisrolland and others added 2 commits April 27, 2026 17:42

Merge branch 'master' into feat/comfykit-awq-w4a16-modulation

96e5287

Address AWQ ComfyUI review feedback

b6f438d

HK416-TYPED requested review from alexisrolland, kijai and rattus128 as code owners May 9, 2026 11:04

HK416-TYPED added 2 commits May 9, 2026 19:06

Merge branch 'master' into feat/comfykit-awq-w4a16-modulation

2322ff5

Merge branch 'master' into feat/comfykit-awq-w4a16-modulation

d171998

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: comfykit awq w4a16 modulation (CORE-31)#13580

feat: comfykit awq w4a16 modulation (CORE-31)#13580
HK416-TYPED wants to merge 6 commits into
Comfy-Org:masterfrom
HK416-TYPED:feat/comfykit-awq-w4a16-modulation

HK416-TYPED commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

HK416-TYPED commented Apr 27, 2026

353978a9 SVDQuant W4A4 integration

3ddcc095 AWQ W4A16 modulation integration

Verification

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`353978a9` SVDQuant W4A4 integration

`3ddcc095` AWQ W4A16 modulation integration

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading