Skip to content

feat: comfykit awq w4a16 modulation (CORE-31)#13580

Open
HK416-TYPED wants to merge 6 commits into
Comfy-Org:masterfrom
HK416-TYPED:feat/comfykit-awq-w4a16-modulation
Open

feat: comfykit awq w4a16 modulation (CORE-31)#13580
HK416-TYPED wants to merge 6 commits into
Comfy-Org:masterfrom
HK416-TYPED:feat/comfykit-awq-w4a16-modulation

Conversation

@HK416-TYPED
Copy link
Copy Markdown

Wires comfy-kitchen's int4 quantization layouts through ComfyUI's MixedPrecisionOps so the existing UNETLoader path can dispatch optimized int4 kernels for two checkpoint families used by
Qwen-Image-Edit-2511 (and similar SVD-LoRA + AWQ-modulation models).

353978a9 SVDQuant W4A4 integration

  • quant_ops.py: register TensorCoreSVDQuantW4A4Layout when comfy-kitchen exposes it; gate the kitchen CUDA backend on cuda ≥ 13 (the optimized kitchen CUDA ops are validated against cu13+;
    older cu falls back to eager).
  • ops.py: handle svdquant_w4a4 quant_format by loading weight_scale / proj_down / proj_up / smooth_factor into TensorCoreSVDQuantW4A4Layout.Params, with the img_mlp.net.2 /
    txt_mlp.net.2 fallback for act_unsigned (post-GELU u4.s4 MMA path).
  • Pairs with comfy-kitchen #36 (feat/svdquant-w4a4-kitchen-native).

3ddcc095 AWQ W4A16 modulation integration

  • quant_ops.py: detect TensorCoreAWQW4A16Layout, stub for the no-kitchen fallback (mirrors W4A4 pattern), register the layout class, add awq_w4a16 to QUANT_ALGOS (storage int8 packed
    uint4, params {weight_scale, weight_zero}, default group_size=64).
  • ops.py: add the awq_w4a16 branch in MixedPrecisionOps.Linear._load_from_state_dict that constructs Params(scale, zeros, group_size, …) and wraps qweight into a QuantizedTensor;
    F.linear then dispatches to ck.gemv_awq_w4a16 via the layout's aten handlers.
  • Targets the ~10 GB inflation in Qwen-Image-Edit kitchen-native checkpoints, where the modulation linears (img_mod.1 / txt_mod.1) currently dominate disk + VRAM because they're materialized
    as plain bf16 Linear during conversion.
  • Pairs with comfy-kitchen feat/awq-w4a16-modulation (companion PR coming after Low resolution on HiDPi monitor #36 lands).

Verification

  • Qwen-Image-Edit r96 ComfyUI E2E sampling: kitchen-native AWQ checkpoint runs end-to-end, image visually equivalent to the bf16-dequantized baseline (PSNR ~33 dB; differences sit at the bf16 ULP
    precision floor of (nibble - 8) * scale + zero chains).
  • Stratified sample across 6 variant families (balanced/fast/mid/quality × ranks 32/64/96/128 × base/lightning4/lightning8) all sample successfully with the right.

…major)

quant_ops.py: register TensorCoreSVDQuantW4A4Layout when comfy-kitchen exposes
it; gate the kitchen CUDA backend on cuda >= 13 (the optimized kitchen CUDA
ops are validated against cu13+ runtimes; on older cu the backend falls back
to eager).

ops.py: handle svdquant_w4a4 quant_format by loading weight_scale / proj_down /
proj_up / smooth_factor into TensorCoreSVDQuantW4A4Layout.Params, with the
img_mlp.net.2 / txt_mlp.net.2 fallback for act_unsigned. Targets the row-major
kitchen-native kernels on feat/svdquant-w4a4-kitchen-native; the verbatim
zgemm path is a sibling branch.
Wires comfy-kitchen's TensorCoreAWQW4A16Layout (introduced on
feat/awq-w4a16-modulation) into ComfyUI's MixedPrecisionOps so checkpoints
that tag modulation linears with comfy_quant.format = "awq_w4a16" get
their (qweight, weight_scale, weight_zero) loaded into the kitchen layout
class instead of being dequantized to bf16 plain Linear at conversion time.

quant_ops.py:
- detect TensorCoreAWQW4A16Layout availability and stub it out for the
  no-kitchen fallback (mirrors the SVDQuant W4A4 pattern)
- register the layout class + add "awq_w4a16" to QUANT_ALGOS
  (storage_t = int8 packed uint4, parameters = {weight_scale, weight_zero},
   default group_size = 64)

ops.py: add the awq_w4a16 branch in MixedPrecisionOps.Linear._load_from_state_dict
that constructs Params(scale, zeros, group_size, ...) and wraps qweight
into a QuantizedTensor — F.linear then dispatches to ck.gemv_awq_w4a16
via the layout's aten handlers.

Pairs with comfy-kitchen feat/awq-w4a16-modulation. Targets the ~10 GB
inflation in Qwen-Image-Edit kitchen-native checkpoints, where the
modulation linears (img_mod.1 / txt_mod.1) currently dominate disk + VRAM
because they're materialized as plain bf16 Linear during conversion.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c9ae13fa-9bb4-4d41-8c12-b259c1b6cf64

📥 Commits

Reviewing files that changed from the base of the PR and between b6f438d and 2322ff5.

📒 Files selected for processing (2)
  • comfy/ops.py
  • comfy/quant_ops.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • comfy/ops.py
  • comfy/quant_ops.py

📝 Walkthrough

Walkthrough

This PR adds two quantization formats (svdquant_w4a4, awq_w4a16) with conditional kitchen-backed layout classes and registry entries, updates comfy/ops.py to validate quant_format, deserialize format-specific weight params (including act_unsigned), and serialize act_unsigned in state_dict, and changes inference to only pre-wrap inputs into QuantizedTensor when the resolved layout declares it quantizes inputs.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: comfykit awq w4a16 modulation (CORE-31)' clearly identifies the main feature addition—AWQ W4A16 quantization modulation support—and references the ticket number.
Description check ✅ Passed The description thoroughly explains the changes, implementation details, and verification results, directly corresponding to the code modifications in ops.py and quant_ops.py.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
comfy/ops.py (2)

951-956: Friendlier error when a checkpoint declares a format the kitchen build doesn't support.

With this PR, svdquant_w4a4 and awq_w4a16 are only registered into QUANT_ALGOS when the corresponding kitchen layout import succeeds. A user on an older comfy_kitchen that loads a Qwen-Image-Edit AWQ checkpoint will hit KeyError: 'awq_w4a16' at line 954 rather than a message pointing them at the actual problem. Consider raising a clear ValueError listing the supported formats.

♻️ Suggested message
-                    qconfig = QUANT_ALGOS[self.quant_format]
+                    if self.quant_format not in QUANT_ALGOS:
+                        raise ValueError(
+                            f"Quantization format '{self.quant_format}' for layer {layer_name} "
+                            f"is not available in this build (supported: {sorted(QUANT_ALGOS.keys())}). "
+                            f"Update comfy_kitchen to enable it."
+                        )
+                    qconfig = QUANT_ALGOS[self.quant_format]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/ops.py` around lines 951 - 956, The code assumes self.quant_format
exists in QUANT_ALGOS and will raise a KeyError; change the block to explicitly
check whether self.quant_format is in QUANT_ALGOS and if not raise a ValueError
that names the offending format (self.quant_format) and the layer (layer_name)
and lists supported formats (sorted QUANT_ALGOS.keys()), then continue to look
up qconfig = QUANT_ALGOS[self.quant_format], set self.layout_type and call
get_layout_class(self.layout_type) as before.

1156-1157: Optional micro-optim: cache layout_cls instead of resolving it on every forward.

get_layout_class(self.layout_type) runs on every forward() invocation per linear. It's cheap (dict lookup) but redundant — self.layout_type is set once at load time. Caching once during _load_from_state_dict would also let you compute layout_quantizes_input ahead of time, removing two attribute lookups from the inference hot path.

♻️ Sketch
-                    qconfig = QUANT_ALGOS[self.quant_format]
-                    self.layout_type = qconfig["comfy_tensor_layout"]
-                    layout_cls = get_layout_class(self.layout_type)
+                    qconfig = QUANT_ALGOS[self.quant_format]
+                    self.layout_type = qconfig["comfy_tensor_layout"]
+                    layout_cls = get_layout_class(self.layout_type)
+                    self._layout_quantizes_input = getattr(layout_cls, "QUANTIZES_INPUT", True)
-                    layout_cls = get_layout_class(self.layout_type)
-                    layout_quantizes_input = getattr(layout_cls, "QUANTIZES_INPUT", True)
-
-                    if layout_quantizes_input:
+                    if getattr(self, "_layout_quantizes_input", True):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/ops.py` around lines 1156 - 1157, Cache the resolved layout class and
the boolean QUANTIZES_INPUT during model load instead of resolving them in
forward: in _load_from_state_dict (or wherever self.layout_type is set) call
get_layout_class(self.layout_type) once, store it on the instance (e.g.,
self._layout_cls) and compute self._layout_quantizes_input =
getattr(self._layout_cls, "QUANTIZES_INPUT", True); then update forward to use
self._layout_cls and self._layout_quantizes_input instead of calling
get_layout_class(self.layout_type) and getattr each invocation.
comfy/quant_ops.py (1)

60-83: Optional: align stub-fallback placement with MXFP8 for consistency.

_CKMxfp8Layout defines its stub via if not _CK_MXFP8_AVAILABLE: at module scope (lines 60-62), whereas the new layouts define the stub inside the inner except ImportError branch. Both are functionally equivalent (since lines 40-44 already cover the outer-_CK_AVAILABLE-False case), but mirroring the MXFP8 pattern would make the three blocks read identically.

♻️ Sketch of the consistency-aligned shape
 _CK_SVDQUANT_W4A4_AVAILABLE = False
 if _CK_AVAILABLE:
     try:
         from comfy_kitchen.tensor import TensorCoreSVDQuantW4A4Layout as _CKSVDQuantW4A4Layout
         _CK_SVDQUANT_W4A4_AVAILABLE = True
     except ImportError:
         logging.info("comfy_kitchen does not expose SVDQuant W4A4 layout; int4 SVDQuant checkpoints will not be supported.")
-        class _CKSVDQuantW4A4Layout:
-            pass
+
+if not _CK_SVDQUANT_W4A4_AVAILABLE and _CK_AVAILABLE:
+    class _CKSVDQuantW4A4Layout:
+        pass

(Same shape for AWQ.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/quant_ops.py` around lines 60 - 83, The three layout stubs are
inconsistent: _CKMxfp8Layout is defined at module scope under "if not
_CK_MXFP8_AVAILABLE" while _CKSVDQuantW4A4Layout and _CKAWQW4A16Layout are
defined inside the except ImportError branches; make them consistent by
moving/adding their stub definitions to the same outer-pattern (i.e., define a
stub class under "if not _CK_SVDQUANT_W4A4_AVAILABLE:" and "if not
_CK_AWQ_W4A16_AVAILABLE:" at module scope like _CKMxfp8Layout) and keep the
try/except only responsible for setting the real import and toggling the
_CK_*_AVAILABLE flags, referencing the symbols _CKSVDQuantW4A4Layout and
_CKAWQW4A16Layout to locate the code to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@comfy/ops.py`:
- Around line 1000-1028: The heuristic that forces act_unsigned True for layers
matching ".img_mlp.net.2" or ".txt_mlp.net.2" is too broad; update the condition
in the svdquant_w4a4 branch so the override only runs when the layer_conf does
not explicitly contain the act_unsigned key (e.g., check presence/absence in
layer_conf) and preserve the original act_unsigned when it is explicitly set to
False, and add a one-line TODO comment referencing the Qwen-Image-Edit
conversion path so future re-exports can remove this workaround; locate and
modify the code around the symbols act_unsigned, layer_conf, layer_name inside
the svdquant_w4a4 handling block.

---

Nitpick comments:
In `@comfy/ops.py`:
- Around line 951-956: The code assumes self.quant_format exists in QUANT_ALGOS
and will raise a KeyError; change the block to explicitly check whether
self.quant_format is in QUANT_ALGOS and if not raise a ValueError that names the
offending format (self.quant_format) and the layer (layer_name) and lists
supported formats (sorted QUANT_ALGOS.keys()), then continue to look up qconfig
= QUANT_ALGOS[self.quant_format], set self.layout_type and call
get_layout_class(self.layout_type) as before.
- Around line 1156-1157: Cache the resolved layout class and the boolean
QUANTIZES_INPUT during model load instead of resolving them in forward: in
_load_from_state_dict (or wherever self.layout_type is set) call
get_layout_class(self.layout_type) once, store it on the instance (e.g.,
self._layout_cls) and compute self._layout_quantizes_input =
getattr(self._layout_cls, "QUANTIZES_INPUT", True); then update forward to use
self._layout_cls and self._layout_quantizes_input instead of calling
get_layout_class(self.layout_type) and getattr each invocation.

In `@comfy/quant_ops.py`:
- Around line 60-83: The three layout stubs are inconsistent: _CKMxfp8Layout is
defined at module scope under "if not _CK_MXFP8_AVAILABLE" while
_CKSVDQuantW4A4Layout and _CKAWQW4A16Layout are defined inside the except
ImportError branches; make them consistent by moving/adding their stub
definitions to the same outer-pattern (i.e., define a stub class under "if not
_CK_SVDQUANT_W4A4_AVAILABLE:" and "if not _CK_AWQ_W4A16_AVAILABLE:" at module
scope like _CKMxfp8Layout) and keep the try/except only responsible for setting
the real import and toggling the _CK_*_AVAILABLE flags, referencing the symbols
_CKSVDQuantW4A4Layout and _CKAWQW4A16Layout to locate the code to change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d527adc4-fdbb-40b5-980b-8ca75fe532ee

📥 Commits

Reviewing files that changed from the base of the PR and between 115f418 and 3ddcc09.

📒 Files selected for processing (2)
  • comfy/ops.py
  • comfy/quant_ops.py

Comment thread comfy/ops.py
Comment on lines +1000 to +1028
elif self.quant_format == "svdquant_w4a4":
# SVDQuant W4A4: per-group weight scales + low-rank correction
# (proj_down, proj_up) + activation smoothing (smooth_factor)
wscales = self._load_scale_param(state_dict, prefix, "weight_scale", device, manually_loaded_keys)
proj_down = self._load_scale_param(state_dict, prefix, "proj_down", device, manually_loaded_keys)
proj_up = self._load_scale_param(state_dict, prefix, "proj_up", device, manually_loaded_keys)
smooth_factor = self._load_scale_param(state_dict, prefix, "smooth_factor", device, manually_loaded_keys)
act_unsigned = bool(layer_conf.get("act_unsigned", False))

# Early Qwen-Image conversion artifacts did not persist the
# fused GELU -> fc2 unsigned-activation flag. Those layers
# are the second linear in the feed-forward block.
if not act_unsigned and (
layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2")
):
act_unsigned = True

if any(t is None for t in (wscales, proj_down, proj_up, smooth_factor)):
raise ValueError(f"Missing SVDQuant W4A4 parameters for layer {layer_name}")

params = layout_cls.Params(
scale=wscales,
orig_dtype=MixedPrecisionOps._compute_dtype,
orig_shape=(self.out_features, self.in_features),
proj_down=proj_down,
proj_up=proj_up,
smooth_factor=smooth_factor,
act_unsigned=act_unsigned,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

act_unsigned layer-name heuristic is a load-bearing workaround — please add a TODO and narrow the trigger.

The endswith(".img_mlp.net.2") or endswith(".txt_mlp.net.2") rule unconditionally flips act_unsigned to True for any model whose state_dict happens to use those submodule names — not just early Qwen-Image-Edit kitchen-native checkpoints. If a different topology (or future SVDQuant export) reuses the same names with signed activations, those layers will dispatch with the wrong activation domain and produce silently corrupted outputs (no exception, just bad samples). Two suggestions:

  1. Gate the override so it only triggers when the format truly requires a fused-GELU upstream signal (e.g., also check that act_unsigned is absent from layer_conf rather than merely False — a future exporter explicitly writing act_unsigned: false would currently still get overridden).
  2. Add a TODO referencing the Qwen-Image-Edit conversion path so this can be deleted once re-exported checkpoints carry the flag.
🛡️ Suggested narrowing
-                        act_unsigned = bool(layer_conf.get("act_unsigned", False))
-
-                        # Early Qwen-Image conversion artifacts did not persist the
-                        # fused GELU -> fc2 unsigned-activation flag. Those layers
-                        # are the second linear in the feed-forward block.
-                        if not act_unsigned and (
-                            layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2")
-                        ):
-                            act_unsigned = True
+                        # TODO(comfykit-awq): drop the layer-name heuristic once all SVDQuant
+                        # exporters persist `act_unsigned`. Only override when the flag is
+                        # absent from layer_conf, not when it's explicitly false.
+                        if "act_unsigned" in layer_conf:
+                            act_unsigned = bool(layer_conf["act_unsigned"])
+                        else:
+                            act_unsigned = layer_name.endswith(".img_mlp.net.2") or layer_name.endswith(".txt_mlp.net.2")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@comfy/ops.py` around lines 1000 - 1028, The heuristic that forces
act_unsigned True for layers matching ".img_mlp.net.2" or ".txt_mlp.net.2" is
too broad; update the condition in the svdquant_w4a4 branch so the override only
runs when the layer_conf does not explicitly contain the act_unsigned key (e.g.,
check presence/absence in layer_conf) and preserve the original act_unsigned
when it is explicitly set to False, and add a one-line TODO comment referencing
the Qwen-Image-Edit conversion path so future re-exports can remove this
workaround; locate and modify the code around the symbols act_unsigned,
layer_conf, layer_name inside the svdquant_w4a4 handling block.

@alexisrolland alexisrolland changed the title Feat/comfykit awq w4a16 modulation(CORE-31) feat: comfykit awq w4a16 modulation (CORE-31) Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants