feat: add speed benchmark, interactive example, and RTX 4080 test#873
Open
MasahiroOgawa wants to merge 3 commits into
Open
feat: add speed benchmark, interactive example, and RTX 4080 test#873MasahiroOgawa wants to merge 3 commits into
MasahiroOgawa wants to merge 3 commits into
Conversation
- Add benchmarks/benchmark_speed_mamba123.py: compares Mamba1 vs Mamba2 vs Mamba3 forward/backward speed on identical workload with visual chart - Add examples/predict_next_token.py: interactive next-token prediction with example outputs on launch showing what base LM prediction looks like - Add tests/test_rtx4080.py: RTX 4080 specific test for pretrained inference and Mamba3 SISO module - Add configs/rtx4080.json: model presets sized for 12GB VRAM - Update benchmarks/benchmark_README.md with both benchmarks - Update README.md with new sections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- benchmark_speed_mamba123.py: read VRAM from GPU instead of hardcoded 11.6GB - tests/test_rtx4080.py: use generic python invocation in docstring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add benchmark_text_generation_latency_visual.py: compares generation latency across multiple Mamba model sizes with chart output - Uses dynamic VRAM limit detection - Preserves original benchmark_generation_mamba_simple.py untouched - Update benchmark_README.md to document all three benchmarks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Objective
Add support for RTX-4080 GPU, and add visual test and interactive test.
Test result
Summary
New files only — no modifications to existing code.
benchmarks/benchmark_text_generation_latency_visual.py— Compares text generation latency across multiple Mamba model sizes, outputs visual chartbenchmarks/benchmark_speed_mamba123.py— Compares Mamba1 vs Mamba2 vs Mamba3 forward/backward speed on identical workload, outputs visual chartexamples/predict_next_token.py— Interactive next-token prediction with example outputs on launch showing what base LM prediction looks like (supports Mamba1 and Mamba2 pretrained models)tests/test_rtx4080.py— RTX 4080 (sm_89) test for pretrained inference and Mamba3 SISO moduleconfigs/rtx4080.json— Model presets sized for 12GB VRAMbenchmarks/benchmark_README.md— Explains the purpose of each benchmarkREADME.md— New sections for interactive example and speed benchmarkRTX 4080 findings
Test plan
🤖 Generated with Claude Code