Add Z-Image (Lumina2/NextDiT) support #64
Conversation
|
For anyone who wants to use this now, use my fork: https://github.com/scottmudge/ComfyUI-NAG/tree/main It's been merged into main, along with some other fixes to support recent ComfyUI versions. |
|
Causes me to OOM |
|
Yes it uses more VRAM because it's basically doubling the context processing and requires additional buffers for computing for negative guidance. No easy way around that. The turbo model on CFG 1.0 normally just completely discards the negative conditioning, which saves memory. You'll probably have to free up VRAM at some other step/location. Reduce resolution, etc. I do notice ComfyUI is using 90% of my VRAM on first run, and then drops back down to 70% for all subsequent runs. At least with Z-Image. Not sure if they have a memory management bug or something. You can try disabling NAG for the first run and then re-enabling it. |
|
I have to even restart comfyui for it to stop throwing oom after using it. (This makes me think something is definitely off) I can use this nag node fine with things that are way heavier like chroma at resolutions beyond 720x1280. |
|
Hmm can you post your workflow? |
|
It's just the default workflow for z-image-turbo but with the nag swapped in. Nothing special. I tried it with the current official release comfy, as well as the one you can pull that is more up to date and it's the same thing. |
|
I mean are you using the NAGGuider/NAGCFGGuider or "KSampler with NAG" node? |
i tried both, here is the error |
|
Thanks I'll take a look when I get home later today |
|
Yeah I've re-reviewed the memory allocations, and even explicitly deleting all buffers after they're used (even though torch gc should be cleaning these up automatically), I only see ~3% more VRAM (~700 MB at 1440p generation) used compared to the stock K Sampler. So if you're running on thin margins where you don't have 700 MB free, then you might run into OOM. But there's no way around that, as the context needs to be doubled to handle negative attention. One thing I did notice is more of a bug with ComfyUI itself. The default workflow (same one you mentioned), without the NAG nodes, does not offload the Qwen3 4B CLIP model to CPU before inference with K Sampler when the text prompt is changed. This causes 5GB of VRAM to be consumed unnecessarily. It only seems to offload it to CPU after the entire prompt has finished. If the text prompt doesn't change, then subsequent runs will use far less VRAM, since it doesn't need to load Qwen3 back into VRAM. I did fix the issue with NAG not being properly reverted/unpatched on the model when it's turned off. I'll be pushing those changes soon. But the OOM'ing/high VRAM usage seems to happen with the normal/non-NAG sampler for me. The only thing that seems to avoid high VRAM usage is using the CPU for Qwen3 4B, effectively bypassing the issue where it's only offloaded to CPU after the entire prompt finishes. |
…anagement in NAGJointAttention.
|
I will test it now, however this high vram usage thing you're mentioning I did not notice. Even with my limited vram without nag I can generate at even 4k. |
|
Tested, and it's still happening. |
…n, since transformer_options seems to be getting reset by ComfyUI base.
|
Pushed some minor fixes. Though @NulliferBones your VRAM issues are likely still due to the way ComfyUI handles offloading the Qwen model. They might have improved it in recent updates to their After additional testing, it looks like some good parameter values are:
Notably the |
Whatever you've changed has allowed me to generate at 768x1024 now instead of 512x768 So odd, the model is small, runs quick. The new comfyui vram management almost nullifies oom issues in general, so it's weird nag would cause OOM issues. Especially at a low resolution, when without i can generate at nearly 4k. (With z-image) |
|
does this also fix it working with chroma? the latest comfy update discards all layers, or something like that so it has zero effect at all |
if these help solve the memory concern, it's all i could see in a short review. |
|
Thanks, I'll work on addressing those |
|
Nice work. Qwen Image / Edit version? |
Adds support for NextDiT (Lumina2/Z-Image), following existing design patterns.
Seems to work as expected. Increasing
nag_scaleabove 3 or so seems to cause distortions, but it has the desired effect with:nag_scale = ~1.5-2.3nag_tau = ~2.5nag_alpha = ~0.25nag_sigma_end = ~0.7-0.9Notably the nag_sigma_end should be increased from the default 0.0, as applying NAG to the final few steps of inference seems to reduce quality, and probably isn't needed.
Example:
Note: I mis-captioned the text in the image. Alpha should read
0.25, not2.5.NOTE: if you are encountering issues with nodes not loading, you need to also apply this PR (#59) to your local repo clone, OR use my fork (https://github.com/scottmudge/ComfyUI-NAG/tree/main, which includes this Z-Image support PR and that compatibility PR).
This repo has not been updated in a while, and PR 59 fixes some compatibility issues with more recent versions of ComfyUI.