[MAX] Add Wan VAE and refactor autoencoder module#15
Draft
jglee-sqbits wants to merge 1 commit into
Draft
Conversation
## Summary Add a Wan VAE (3D causal video autoencoder) and restructure the autoencoder module to separate Module V2 and V3 implementations. ## Description ### Wan VAE - Implements the Wan 3D causal VAE with temporal caching for chunked encode/decode - Encoder: processes video in temporal chunks (first frame + subsequent chunks) with cached convolution state to maintain temporal consistency - Decoder: same chunked approach with 3 specialized graphs (post-quant conv, first frame, subsequent frames) - Uses symbolic spatial dims for resolution flexibility - Adds 3D convolution support via cuDNN (`conv.mojo`) with depth-tiled execution for large volumes ### Autoencoder restructuring - Moves existing Module V3 (Flux) autoencoder files to `autoencoders_modulev3/` - The `autoencoders/` directory now contains Module V2 graph-based implementations (Wan VAE, Qwen Image VAE) - Updates `flux1_modulev3` and `flux2_modulev3` import paths accordingly This follows the same pattern as modular#6278 which established the V2/V3 split. ## Dependencies Should be merged **before** modular#6301 (transformer) and modular#6302 (pipeline-t2v), which import from `autoencoders`. ## Checklist - [x] PR is small and focused - [x] I ran `./bazelw run format` to format my changes Assisted-by: Claude Code Assisted-by: Claude Code stack-info: PR: #15, branch: jglee-sqbits/stack/3
0d0cda5 to
ea94825
Compare
228c993 to
b387126
Compare
This was referenced Apr 1, 2026
There was a problem hiding this comment.
Code Review
This pull request introduces support for 3D convolutions using cuDNN and adds a new Wan VAE autoencoder implementation. The changes include new cuDNN descriptor APIs, depth-tiled convolution for large tensors, and a new architecture module for Wan. My feedback highlights that the hardcoded total_cache_slots calculation in vae.py is fragile and should be dynamic, and that the comptime if in conv.mojo is redundant and can be simplified to a standard if statement.
| """Apply Decoder forward pass. | ||
| @property | ||
| def total_cache_slots(self) -> int: | ||
| return 1 + sum(self._block_cache_slots) + 4 + 1 |
There was a problem hiding this comment.
Comment on lines
+4651
to
+4661
| comptime if filter_is_fcrs: | ||
| conv3d_cudnn[input_type, filter_type, output_type]( | ||
| input_lt, | ||
| filter_lt, | ||
| output_lt, | ||
| rebind[IndexList[3]](stride), | ||
| rebind[IndexList[3]](dilation), | ||
| rebind[IndexList[3]](symmetric_padding), | ||
| num_groups, | ||
| ctx, | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked PRs:
[MAX] Add Wan VAE and refactor autoencoder module
Summary
Add a Wan VAE (3D causal video autoencoder) and restructure the autoencoder module to separate Module V2 and V3 implementations.
Description
Wan VAE
conv.mojo) with depth-tiled execution for large volumesAutoencoder restructuring
autoencoders_modulev3/autoencoders/directory now contains Module V2 graph-based implementations (Wan VAE, Qwen Image VAE)flux1_modulev3andflux2_modulev3import paths accordinglyThis follows the same pattern as modular#6278 which established the V2/V3 split.
Dependencies
Should be merged before modular#6301 (transformer) and modular#6302 (pipeline-t2v), which import from
autoencoders.Checklist
./bazelw run formatto format my changesAssisted-by: Claude Code
Assisted-by: Claude Code