Skip to content

feat: add speed benchmark, interactive example, and RTX 4080 test#873

Open
MasahiroOgawa wants to merge 3 commits into
state-spaces:mainfrom
MasahiroOgawa:feat/benchmarks-and-examples
Open

feat: add speed benchmark, interactive example, and RTX 4080 test#873
MasahiroOgawa wants to merge 3 commits into
state-spaces:mainfrom
MasahiroOgawa:feat/benchmarks-and-examples

Conversation

@MasahiroOgawa
Copy link
Copy Markdown

@MasahiroOgawa MasahiroOgawa commented Mar 23, 2026

Objective

Add support for RTX-4080 GPU, and add visual test and interactive test.

Test result

  • benchmark_text_generation_latency_visual.py successfully outputs this image;text_generation_latency
  • benchmark_speed_mamba123.py successfully outputs this image;
test_results_rtx4080 - examples/predict_next_token.py successfully outputs this response; image

Summary

New files only — no modifications to existing code.

  • benchmarks/benchmark_text_generation_latency_visual.py — Compares text generation latency across multiple Mamba model sizes, outputs visual chart
  • benchmarks/benchmark_speed_mamba123.py — Compares Mamba1 vs Mamba2 vs Mamba3 forward/backward speed on identical workload, outputs visual chart
  • examples/predict_next_token.py — Interactive next-token prediction with example outputs on launch showing what base LM prediction looks like (supports Mamba1 and Mamba2 pretrained models)
  • tests/test_rtx4080.py — RTX 4080 (sm_89) test for pretrained inference and Mamba3 SISO module
  • configs/rtx4080.json — Model presets sized for 12GB VRAM
  • benchmarks/benchmark_README.md — Explains the purpose of each benchmark
  • README.md — New sections for interactive example and speed benchmark

RTX 4080 findings

  • Mamba3 SISO mode works (Triton kernels)
  • Mamba3 MIMO backward requires >100KB shared memory per SM (H100/sm_90+ only)

Test plan

  • Ran benchmark_text_generation_latency_visual.py on RTX 4080
  • Ran benchmark_speed_mamba123.py on RTX 4080
  • Ran predict_next_token.py with mamba-130m and mamba2-130m
  • Ran test_rtx4080.py for pretrained inference and Mamba3 SISO fwd/bwd

🤖 Generated with Claude Code

MasahiroOgawa and others added 3 commits March 23, 2026 16:22
- Add benchmarks/benchmark_speed_mamba123.py: compares Mamba1 vs Mamba2 vs
  Mamba3 forward/backward speed on identical workload with visual chart
- Add examples/predict_next_token.py: interactive next-token prediction
  with example outputs on launch showing what base LM prediction looks like
- Add tests/test_rtx4080.py: RTX 4080 specific test for pretrained
  inference and Mamba3 SISO module
- Add configs/rtx4080.json: model presets sized for 12GB VRAM
- Update benchmarks/benchmark_README.md with both benchmarks
- Update README.md with new sections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- benchmark_speed_mamba123.py: read VRAM from GPU instead of hardcoded 11.6GB
- tests/test_rtx4080.py: use generic python invocation in docstring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add benchmark_text_generation_latency_visual.py: compares generation
  latency across multiple Mamba model sizes with chart output
- Uses dynamic VRAM limit detection
- Preserves original benchmark_generation_mamba_simple.py untouched
- Update benchmark_README.md to document all three benchmarks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant