feat: add speed benchmark, interactive example, and RTX 4080 test by MasahiroOgawa · Pull Request #873 · state-spaces/mamba

MasahiroOgawa · 2026-03-23T07:23:21Z

Objective

Add support for RTX-4080 GPU, and add visual test and interactive test.

Test result

benchmark_text_generation_latency_visual.py successfully outputs this image;
benchmark_speed_mamba123.py successfully outputs this image;

- examples/predict_next_token.py successfully outputs this response;

Summary

New files only — no modifications to existing code.

benchmarks/benchmark_text_generation_latency_visual.py — Compares text generation latency across multiple Mamba model sizes, outputs visual chart
benchmarks/benchmark_speed_mamba123.py — Compares Mamba1 vs Mamba2 vs Mamba3 forward/backward speed on identical workload, outputs visual chart
examples/predict_next_token.py — Interactive next-token prediction with example outputs on launch showing what base LM prediction looks like (supports Mamba1 and Mamba2 pretrained models)
tests/test_rtx4080.py — RTX 4080 (sm_89) test for pretrained inference and Mamba3 SISO module
configs/rtx4080.json — Model presets sized for 12GB VRAM
benchmarks/benchmark_README.md — Explains the purpose of each benchmark
README.md — New sections for interactive example and speed benchmark

RTX 4080 findings

Mamba3 SISO mode works (Triton kernels)
Mamba3 MIMO backward requires >100KB shared memory per SM (H100/sm_90+ only)

Test plan

Ran benchmark_text_generation_latency_visual.py on RTX 4080
Ran benchmark_speed_mamba123.py on RTX 4080
Ran predict_next_token.py with mamba-130m and mamba2-130m
Ran test_rtx4080.py for pretrained inference and Mamba3 SISO fwd/bwd

🤖 Generated with Claude Code

- Add benchmarks/benchmark_speed_mamba123.py: compares Mamba1 vs Mamba2 vs Mamba3 forward/backward speed on identical workload with visual chart - Add examples/predict_next_token.py: interactive next-token prediction with example outputs on launch showing what base LM prediction looks like - Add tests/test_rtx4080.py: RTX 4080 specific test for pretrained inference and Mamba3 SISO module - Add configs/rtx4080.json: model presets sized for 12GB VRAM - Update benchmarks/benchmark_README.md with both benchmarks - Update README.md with new sections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- benchmark_speed_mamba123.py: read VRAM from GPU instead of hardcoded 11.6GB - tests/test_rtx4080.py: use generic python invocation in docstring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add benchmark_text_generation_latency_visual.py: compares generation latency across multiple Mamba model sizes with chart output - Uses dynamic VRAM limit detection - Preserves original benchmark_generation_mamba_simple.py untouched - Update benchmark_README.md to document all three benchmarks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MasahiroOgawa and others added 3 commits March 23, 2026 16:22

fix: use dynamic VRAM limit and remove uv-specific usage from docstring

c6e611a

- benchmark_speed_mamba123.py: read VRAM from GPU instead of hardcoded 11.6GB - tests/test_rtx4080.py: use generic python invocation in docstring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

apstenku123 mentioned this pull request Apr 10, 2026

Mamba3 backward pass (mamba3_siso_bwd_kernel_dqkv) 38.7x slower on GB200 (SM100) due to ptxas C7907 eliminating autotuner configs #904

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add speed benchmark, interactive example, and RTX 4080 test#873

feat: add speed benchmark, interactive example, and RTX 4080 test#873
MasahiroOgawa wants to merge 3 commits into
state-spaces:mainfrom
MasahiroOgawa:feat/benchmarks-and-examples

MasahiroOgawa commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MasahiroOgawa commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Test result

Summary

RTX 4080 findings

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MasahiroOgawa commented Mar 23, 2026 •

edited

Loading