I am getting the following error. Seems that TileLang compiles mamba_mimo_fwd_kernel correctly via the JIT compiler, but then mamba_ssm makes repetitive structural calls that hit the lower-level cache ?
Setup
print(f'python version: {sys.version}')
print(f'torch version: {torch.__version__}')
print(f'cuda version: {torch.version.cuda}')
print(f'mamba_ssm version: {mamba_ssm.__version__}')
print(f'os version: {os.uname()}')
print(f'tilelang version: {tilelang.__version__}')
batch, length, dim = 2, 2048, 768
x = torch.randn(batch, length, dim).to(torch.bfloat16).to("cuda")
model = Mamba3(
# This module uses roughly 6 * d_model^2 parameters
d_model=dim, # Model dimension d_model
d_state=128, # SSM state size
headdim=64, # SSM headdim
is_mimo=True, # Use MIMO mode
mimo_rank=4, # MIMO rank when is_mimo=True
chunk_size=16, # 64/mimo_rank if x is in bf16, else 32/mimo_rank
is_outproj_norm=False, # Additional post SSM norm
dtype=torch.bfloat16,
).to("cuda")
for i in range(10):
y = model(x)
assert y.shape == x.shape
Output
python version: 3.12.13 | packaged by Anaconda, Inc. | (main, Mar 19 2026, 20:20:58) [GCC 14.3.0]
torch version: 2.12.0+cu126
cuda version: 12.6
mamba_ssm version: 2.3.2.post1
os version: posix.uname_result(sysname='Linux', nodename='ABC', release='6.2.0-35-generic', version='#35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 6 10:23:26 UTC 2', machine='x86_64')
tilelang version: 0.1.9
2026-05-24 10:28:13 [TileLang:tilelang.jit.kernel:INFO] (kernel.py:133): TileLang begins to compile kernel mamba_mimo_fwd_kernel with out_idx=[]
2026-05-24 10:28:27 [TileLang:tilelang.jit.kernel:INFO] (kernel.py:141): TileLang completes to compile kernel mamba_mimo_fwd_kernel
2026-05-24 10:28:28 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using @tilelang.jit instead of direct kernel caching.
2026-05-24 10:28:28 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using @tilelang.jit instead of direct kernel caching.
2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using @tilelang.jit instead of direct kernel caching.
2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using @tilelang.jit instead of direct kernel caching.
2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using @tilelang.jit instead of direct kernel caching.
2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using @tilelang.jit instead of direct kernel caching.
2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using @tilelang.jit instead of direct kernel caching.
2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using @tilelang.jit instead of direct kernel caching.
2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using @tilelang.jit instead of direct kernel caching.
I am getting the following error. Seems that TileLang compiles mamba_mimo_fwd_kernel correctly via the JIT compiler, but then mamba_ssm makes repetitive structural calls that hit the lower-level cache ?
Setup
Output
python version: 3.12.13 | packaged by Anaconda, Inc. | (main, Mar 19 2026, 20:20:58) [GCC 14.3.0]
torch version: 2.12.0+cu126
cuda version: 12.6
mamba_ssm version: 2.3.2.post1
os version: posix.uname_result(sysname='Linux', nodename='ABC', release='6.2.0-35-generic', version='#35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 6 10:23:26 UTC 2', machine='x86_64')
tilelang version: 0.1.9
2026-05-24 10:28:13 [TileLang:tilelang.jit.kernel:INFO] (kernel.py:133): TileLang begins to compile kernel
mamba_mimo_fwd_kernelwithout_idx=[]2026-05-24 10:28:27 [TileLang:tilelang.jit.kernel:INFO] (kernel.py:141): TileLang completes to compile kernel
mamba_mimo_fwd_kernel2026-05-24 10:28:28 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using
@tilelang.jitinstead of direct kernel caching.2026-05-24 10:28:28 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using
@tilelang.jitinstead of direct kernel caching.2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using
@tilelang.jitinstead of direct kernel caching.2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using
@tilelang.jitinstead of direct kernel caching.2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using
@tilelang.jitinstead of direct kernel caching.2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using
@tilelang.jitinstead of direct kernel caching.2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using
@tilelang.jitinstead of direct kernel caching.2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using
@tilelang.jitinstead of direct kernel caching.2026-05-24 10:28:29 [TileLang:tilelang.cache.kernel_cache:WARNING] (kernel_cache.py:322): Found kernel 'mamba_mimo_fwd_kernel' in memory cache. For better performance, consider using
@tilelang.jitinstead of direct kernel caching.