Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
dab9c66
Bump version to 0.4.0
farhan-syah Feb 10, 2026
4bd0f3c
chore: remove test migration markers
farhan-syah Feb 10, 2026
144250a
docs: add comprehensive usage examples
farhan-syah Feb 10, 2026
e77e378
ci: add backend compile gates, parity tests, and example verification
farhan-syah Feb 10, 2026
0d6c831
ci: streamline release workflow by reusing CI pipeline
farhan-syah Feb 10, 2026
1f22599
ci: remove CUDA compilation checks from hosted runners
farhan-syah Feb 10, 2026
eae2eaf
perf: add benchmark suite and small matrix SIMD kernels
farhan-syah Feb 11, 2026
47fee0a
perf: optimize matmul microkernels with beta parameter and double-wid…
farhan-syah Feb 11, 2026
5498a48
perf: improve matmul cache blocking and add thread-local buffers
farhan-syah Feb 11, 2026
a5f258d
perf: optimize concatenation with fast-path bulk copy
farhan-syah Feb 11, 2026
5bd8a5e
perf: eliminate type dispatch overhead in CPU concatenation
farhan-syah Feb 11, 2026
4403b10
perf: remove unnecessary memory zeroing in CPU allocator
farhan-syah Feb 11, 2026
ada4c84
bench: relax flux verification thresholds for concatenation
farhan-syah Feb 11, 2026
45549c8
docs: add architecture guide for contributors
farhan-syah Feb 11, 2026
c63999f
chore: add flux benchmark configuration
farhan-syah Feb 11, 2026
5c04398
bench: add CUDA benchmarks and expand backend comparisons
farhan-syah Feb 11, 2026
3055ed9
docs: add comprehensive benchmark documentation
farhan-syah Feb 11, 2026
0ba45da
chore: migrate from fluxbench-cli to unified fluxbench API
farhan-syah Feb 11, 2026
19a0c79
feat: improve dtype feature gate error messages
farhan-syah Feb 11, 2026
e0434b3
feat: extend F16/BF16 support in CUDA operations
farhan-syah Feb 11, 2026
afad765
test: refactor backend parity tests for dtype coverage
farhan-syah Feb 11, 2026
056ddfe
bench: add comprehensive parallelism control benchmarks
farhan-syah Feb 11, 2026
8710df6
test: refactor dtype comparison to use native types
farhan-syah Feb 11, 2026
ab61aea
feat: extend F16/BF16/FP8 support in polynomial and special functions
farhan-syah Feb 11, 2026
86d1ff2
feat: implement comprehensive CUDA kernels for extended dtype coverage
farhan-syah Feb 11, 2026
8588980
feat: improve dtype handling in CPU special functions and WebGPU casts
farhan-syah Feb 11, 2026
0db009f
test: add utilities for dtype-agnostic comparison and boolean mask ha…
farhan-syah Feb 11, 2026
4ab5838
test: add comprehensive backend parity tests for cast operations
farhan-syah Feb 11, 2026
ac598d1
test: refactor backend parity tests to use dtype parameterization
farhan-syah Feb 11, 2026
e36a3ed
feat: add dtype promotion infrastructure for linear algebra operations
farhan-syah Feb 11, 2026
66b3b03
feat: enable reduced-precision dtype support in CPU linalg operations
farhan-syah Feb 11, 2026
478b540
fix: improve FP8 support and random uniform generation for reduced-pr…
farhan-syah Feb 11, 2026
d876ffb
fix: improve CUDA memory allocation robustness and error recovery
farhan-syah Feb 12, 2026
9122bd2
feat: extend CUDA sort kernels with FP8 support and alignment fixes
farhan-syah Feb 12, 2026
1791923
fix: add feature gates for F16/BF16 dtype conversions in tests
farhan-syah Feb 12, 2026
312d9e6
feat: improve error function accuracy to full f64 precision
farhan-syah Feb 12, 2026
a2bf502
fix: add missing feature gates for FP8 tests
farhan-syah Feb 12, 2026
d63d3e3
feat: add FP8 support for CUDA convolution operations
farhan-syah Feb 12, 2026
c1b022b
feat: extend CUDA indexing kernels with FP8 support
farhan-syah Feb 12, 2026
bc151ac
feat: apply dtype promotion to CUDA operations requiring higher preci…
farhan-syah Feb 12, 2026
f29a270
fix: improve CUDA random number generation for reduced-precision types
farhan-syah Feb 12, 2026
57ee272
fix: add fallback for unsupported dtypes in CUDA matmul_bias
farhan-syah Feb 12, 2026
5f9ffe3
test: adjust FP8 tolerances for accumulation and rounding errors
farhan-syah Feb 12, 2026
5b140b6
docs: improve markdown table formatting in benchmark README
farhan-syah Feb 12, 2026
3db79ac
feat: add boundary type conversion for WebGPU non-native dtypes
farhan-syah Feb 12, 2026
6058590
fix: correct WGSL uniform buffer alignment in sorting operations
farhan-syah Feb 12, 2026
43da15c
feat: add broadcast support for WebGPU masking operations
farhan-syah Feb 12, 2026
95ab771
refactor: consolidate benchmarks with parameterized test cases
farhan-syah Feb 12, 2026
4bc86f7
perf: optimize single-batch FFT by avoiding Rayon overhead
farhan-syah Feb 12, 2026
fd64854
fix: adjust concatenation benchmark verification thresholds
farhan-syah Feb 12, 2026
bc389d7
chore: remove minimal benchmark
farhan-syah Feb 12, 2026
87b7e05
refactor: restructure CI workflows with reusable test suite
farhan-syah Feb 13, 2026
53d0349
feat: add CI regression benchmark suite
farhan-syah Feb 13, 2026
950df1e
chore: replace hardcoded constants with standard library equivalents
farhan-syah Feb 13, 2026
fb73e58
test: increase sample size in randn invariant tests
farhan-syah Feb 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions .github/workflows/baseline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Save benchmark baseline.
#
# This workflow runs the CI regression benchmarks in "save" mode:
# it writes a baseline JSON to the GitHub Actions cache, keyed by commit SHA.
#
# benchmark.yml (on PRs) restores this cache to compare against, enabling
# regression detection. Cache keys use prefix matching so the latest baseline
# from main is always picked up, even across many merges.
#
# Triggered manually via workflow_dispatch (should be run from the main branch).

name: Baseline

on:
workflow_dispatch:

concurrency:
group: baseline-${{ github.ref }}
cancel-in-progress: true

permissions:
contents: read

env:
CARGO_TERM_COLOR: always

jobs:
test:
name: Test Suite
uses: ./.github/workflows/test.yml

baseline:
needs: test
name: Save Benchmark Baseline
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install Rust
uses: dtolnay/rust-toolchain@stable

- uses: Swatinem/rust-cache@v2
with:
prefix-key: bench

- name: Run benchmarks and save baseline
run: cargo bench --bench ci_regression -- --save-baseline

# Cache keyed by SHA so each merge gets its own entry.
# benchmark.yml uses restore-keys prefix matching to find the latest one.
- name: Cache baseline
uses: actions/cache/save@v4
with:
path: target/fluxbench/baseline.json
key: numr-bench-baseline-${{ github.sha }}
77 changes: 77 additions & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Benchmark regression check.
#
# Runs on PRs (non-draft) and can be called by other workflows (e.g. release.yml).
#
# How regression detection works:
# 1. baseline.yml saves a baseline JSON after each merge to main (cached by commit SHA).
# 2. This workflow restores that baseline and passes it via --baseline to fluxbench.
# 3. Each benchmark has a per-bench threshold — regressions beyond this are flagged.
# 4. Exit codes are controlled by #[verify] expressions with severity levels:
# - critical: exits non-zero -> job fails -> PR blocked
# - warning: exits zero -> shows warnings in summary
# - info: logged in the summary only
# 5. If no baseline exists yet (first run), benchmarks run without comparison.

name: Benchmark

on:
pull_request:
branches: [main]
types: [opened, synchronize, reopened, ready_for_review]
workflow_call:
workflow_dispatch:

concurrency:
group: benchmark-${{ github.ref }}
cancel-in-progress: true

permissions:
contents: read

env:
CARGO_TERM_COLOR: always

jobs:
test:
name: Test Suite
if: github.event.pull_request.draft == false
uses: ./.github/workflows/test.yml

benchmark:
needs: test
name: Regression Check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install Rust
uses: dtolnay/rust-toolchain@stable

- uses: Swatinem/rust-cache@v2
with:
prefix-key: bench

- name: Build benchmarks
run: cargo build --bench ci_regression --release

# Restore the most recent baseline saved by baseline.yml on main.
# Uses prefix matching — the exact key won't match, but restore-keys
# picks the latest cache entry starting with "numr-bench-baseline-".
# On cache miss (no baseline yet), this is a silent no-op.
- name: Restore baseline from main
uses: actions/cache/restore@v4
with:
path: target/fluxbench/baseline.json
key: numr-bench-baseline-dummy
restore-keys: numr-bench-baseline-

# --format github-summary: renders a markdown table for the step summary.
# --baseline (if file exists): enables regression comparison against main.
# Exit code reflects critical verification failures (see flux.toml: fail_on_critical).
- name: Run benchmarks
run: |
ARGS="--format github-summary"
if [ -f target/fluxbench/baseline.json ]; then
ARGS="$ARGS --baseline target/fluxbench/baseline.json"
fi
cargo bench --bench ci_regression -- $ARGS >> $GITHUB_STEP_SUMMARY
62 changes: 9 additions & 53 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
# CI — thin wrapper that calls the reusable test workflow.
#
# All test jobs (lint, cross-platform tests, backend compile gates, parity,
# examples) live in test.yml to avoid duplication across ci.yml, benchmark.yml,
# baseline.yml, and release.yml.

name: CI

on:
pull_request:
branches: [main]
types: [opened, synchronize, reopened, ready_for_review]
workflow_dispatch:
workflow_call:

concurrency:
group: ci-${{ github.ref }}
Expand All @@ -13,59 +20,8 @@ concurrency:
permissions:
contents: read

env:
CARGO_TERM_COLOR: always

jobs:
lint:
if: github.event.pull_request.draft == false
name: Lint, Format & Docs
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install Rust
uses: dtolnay/rust-toolchain@stable
with:
components: rustfmt, clippy

- uses: Swatinem/rust-cache@v2
with:
prefix-key: lint

- name: Check formatting
run: cargo fmt --all --check

- name: Run clippy (all CI-safe features)
run: cargo clippy --all-targets --features f16,sparse -- -D warnings

- name: Build docs
run: cargo doc --no-deps --features f16,sparse

- name: Run doctests
run: cargo test --doc --features f16,sparse

test:
if: github.event.pull_request.draft == false
name: Test (${{ matrix.os }})
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]

steps:
- uses: actions/checkout@v4

- name: Install Rust
uses: dtolnay/rust-toolchain@stable

- uses: Swatinem/rust-cache@v2
with:
prefix-key: test

- name: Run tests (default)
run: cargo test

- name: Run tests (f16 + sparse)
run: cargo test --features f16,sparse
name: Test Suite
uses: ./.github/workflows/test.yml
56 changes: 5 additions & 51 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,61 +59,15 @@ jobs:

echo "version=$TAG_VERSION" >> $GITHUB_OUTPUT

lint:
name: Lint, Format & Docs
# Reuse benchmark workflow which includes the full test suite + regression check
ci:
name: CI + Benchmark
needs: validate-version
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install Rust
uses: dtolnay/rust-toolchain@stable
with:
components: rustfmt, clippy

- uses: Swatinem/rust-cache@v2
with:
prefix-key: lint

- name: Check formatting
run: cargo fmt --all --check

- name: Run clippy (all CI-safe features)
run: cargo clippy --all-targets --features f16,sparse -- -D warnings

- name: Build docs
run: cargo doc --no-deps --features f16,sparse

- name: Run doctests
run: cargo test --doc --features f16,sparse

test:
name: Test (${{ matrix.os }})
needs: validate-version
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
steps:
- uses: actions/checkout@v4

- name: Install Rust
uses: dtolnay/rust-toolchain@stable

- uses: Swatinem/rust-cache@v2
with:
prefix-key: test

- name: Run tests (default)
run: cargo test

- name: Run tests (f16 + sparse)
run: cargo test --features f16,sparse
uses: ./.github/workflows/benchmark.yml

publish:
name: Publish to crates.io
needs: [validate-version, lint, test]
needs: [validate-version, ci]
runs-on: ubuntu-latest
environment: crates-io
steps:
Expand Down
118 changes: 118 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Reusable test workflow: lint, format, docs, cross-platform tests, backend checks.
#
# Called by:
# - ci.yml (PR checks)
# - benchmark.yml (PR regression checks)
# - baseline.yml (post-merge baseline saves)
# - release.yml (via benchmark.yml)
#
# Not triggered directly — use workflow_call only.

name: Test

on:
workflow_call:

permissions:
contents: read

env:
CARGO_TERM_COLOR: always

jobs:
lint:
name: Lint, Format & Docs
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install Rust
uses: dtolnay/rust-toolchain@stable
with:
components: rustfmt, clippy

- uses: Swatinem/rust-cache@v2
with:
prefix-key: lint

- name: Check formatting
run: cargo fmt --all --check

- name: Run clippy (all CI-safe features)
run: cargo clippy --all-targets --features f16,sparse -- -D warnings

- name: Build docs
run: cargo doc --no-deps --features f16,sparse

- name: Run doctests
run: cargo test --doc --features f16,sparse

test:
name: Test (${{ matrix.os }})
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]

steps:
- uses: actions/checkout@v4

- name: Install Rust
uses: dtolnay/rust-toolchain@stable

- uses: Swatinem/rust-cache@v2
with:
prefix-key: test

- name: Run tests (default)
run: cargo test

- name: Run tests (f16 + sparse)
run: cargo test --features f16,sparse

backend-and-parity:
name: Backend Compile, Parity & Examples
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install Rust
uses: dtolnay/rust-toolchain@stable

- uses: Swatinem/rust-cache@v2
with:
prefix-key: backend-parity

# Backend compile gates
- name: "Compile: cpu-only (no default features)"
run: cargo check --no-default-features --features cpu

- name: "Compile: cpu + f16 + sparse"
run: cargo check --features f16,sparse

- name: "Compile: wgpu"
run: cargo check --features wgpu,f16,sparse

- name: "Compile tests: cpu-only"
run: cargo test --no-run --no-default-features --features cpu

- name: "Compile tests: wgpu"
run: cargo test --no-run --features wgpu,f16,sparse

# Backend parity
- name: Run backend parity tests
run: cargo test backend_parity --features f16,sparse

# Examples
- name: Build all examples
run: cargo build --examples --features sparse

- name: Run examples
run: |
cargo run --example basic_tensor_ops
cargo run --example autograd_linear_regression
cargo run --example conv_unfold_im2col
cargo run --example fft_roundtrip
cargo run --example sparse_coo_csr_workflow --features sparse
cargo run --example backend_switch_cpu_wgpu
Loading