Summary
Add dim=-1 regression tests and benchmark shape grids for softmax_online, including a long benchmark suite.
Motivation / Use Case
Softmax needs explicit stability and gradient coverage on realistic sequence lengths, plus long-N benchmarks to assess scaling.
Proposed Solution
- Enforce
dim=-1 only for softmax_online.
- Correctness tests (dtype: f16/bf16/f32):
- M: 128, 512, 2048
- N: 128, 256, 1024, 2048, 4096, 8192
- Edge N: 255, 257, 1023, 1025
- Stability tests for N=1024 and N=8192
- Backward tests for N=1024 and N=4096
- Benchmarks:
- Short suite: M in 128, 512, 2048, 8192 and N in 1024, 2048, 4096, 8192
- Long suite: M in 64, 128, 256 and N in 16384, 32768, 65536, 131072
Scope Alignment
v0.1 scope (Weeks 0-2)
Alternatives Considered
Keep dim=0/1 support in tests, or omit large-N benchmarks.
Additional Context
Long-N benchmarks use small M to avoid OOM and isolate N scaling.
Summary
Add dim=-1 regression tests and benchmark shape grids for
softmax_online, including a long benchmark suite.Motivation / Use Case
Softmax needs explicit stability and gradient coverage on realistic sequence lengths, plus long-N benchmarks to assess scaling.
Proposed Solution
dim=-1only for softmax_online.Scope Alignment
v0.1 scope (Weeks 0-2)
Alternatives Considered
Keep dim=0/1 support in tests, or omit large-N benchmarks.
Additional Context
Long-N benchmarks use small M to avoid OOM and isolate N scaling.