Skip to content

EvanChisholm1/STEEL

Repository files navigation

STEEL

  • S reamlined
  • T ensor
  • E xecution
  • E ngine
  • L ibrary

STEEL is a fun experiment writing essentially pure C++ to try to get to the most elegant tensor/ML library I can get while understanding how these are implemented, from the math, to cache aware efficient algorithims, to good ml practices.

Build

cmake -S . -B build
cmake --build build -j$(nproc)

Binaries

  • build/matrix_bench — low-level matrix multiplication benchmark
  • build/qwen_infer — interactive Qwen2 inference
  • build/steel_bench — inference benchmark (prefill / decode / end-to-end)
  • build/steel_tests — unit tests

Benchmark

./build/steel_bench --model qwen2.5-0.5b-instruct-fp16.gguf --threads 8

Options:

Flag Default Description
--model qwen2.5-0.5b-instruct-fp16.gguf Path to GGUF model
--threads auto CPU threads
--decode-tokens 64 Tokens to generate per decode test
--warmup 1 Warmup iterations
--iters 3 Benchmark iterations (use 5 to match llama-bench)

Reports mean ± stddev and best tok/s for prefill, decode, and end-to-end generation.

About

C++ tensor/ml lib

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages