Skip to content

Releases: Quitetall/blut

blut v0.10.0 — never-OOM-the-box orchestrator

12 Jun 10:58

Choose a tag to compare

First tagged release of the blut engine. The domain-agnostic orchestrator has graduated from the 0.1 prototype to a never-OOM-the-box, parallel DAG orchestrator with a distributed launch seam, run lineage, and declarative scheduling.

blut is the engine crate (Stage → Plan → Recipe → Registry + CLI/TUI + resource broker). The user-facing binaries live in the cookbook crates — blut-lamquant (binary blut, neural EEG codec) and blut-lamu (binary blut-lamu, generic LLM) — which depend on this crate.

Highlights

  • Never-OOM-the-box resource broker (ADR 0046). Every memory-spending stage runs inside a cgroup-capped systemd unit, sized from a scaling footprint model and gated by a live admission probe. A too-small estimate is caught by the cgroup as a clean unit kill — never the OS OOM killer taking the host.
  • Parallel DAG execution. The ParallelExecutor runs independent stages concurrently (the long-missing piece), with capacity-aware memory admission so concurrency never overcommits the box.

Added

  • Resource broker — system probe, conservative-high scaling RAM estimate (workers × prefetch + tier + latent + batch + in_ch), fail-closed admission gate that refuses or right-sizes against live free RAM.
  • Calibration store + self-healing OOM retry — measured cgroup peaks are MAX-merged per config key so the broker stops over-refusing; an OOM records an OomCorrected lower bound that the next admission escalates (box-fit clamped so an escalation can never push the cgroup past the host).
  • Capacity-aware parallel admission — each stage reserves its footprint before being scheduled concurrently.
  • in_ch footprint term — a 168-channel fullband run is no longer billed like a 21-channel L3 run (the under-bill behind admit-then-cgroup-kill on every fullband launch).
  • Distributed launch seamblut … --launcher local|slurm|ray; a backend-agnostic WrappedCommand + LaunchTarget route a stage to a cluster, while Local keeps the cgroup-contained this-box path byte-identical.
  • LineageDB — an embedded, rebuildable SQLite index of every run with git + hardware provenance; blut lineage trace <hash> / reindex.
  • Declarative recipe schedules via blut-managed systemd timers.
  • register_recipe! macro + schema helpers for typed cookbook recipes.
  • Runtime DAG mutationKILL-on-NaN branch pruning (a no-op policy path is byte-identical to static execution).
  • Watchdog + heartbeat, per-stage retry policy and soft/hard timeouts.
  • CLI--json flags, blut runs diff, cache-stats + artifact inspection, and a build-commit stamp (blut --version0.10.0+<commit>[-dirty]) with a stale-binary warning at startup (the "git pull, forgot cargo install --force" trap).

Fixed

  • Single cross-crate source of truth for footprint drivers (fixed a calibration key-parity bug where RESOLVE and RECORD keyed differently).
  • The warm fullband-cache precompute stage (cookbook) was the last uncontained stage and could OOM the box; it now runs contained, fork-pool-sized, box-fit clamped, with a non-fatal OOM path.
  • Footprint right-sized to the measured envelope (tier-3 ~35G → ~23G); many review-driven hardening fixes across broker / executor / launcher / lineage.

Known limitations

  • No mid-epoch durable resume yet (BLUT-API Phase D) — stages retry on OOM, but a killed run restarts from scratch. Does not block normal runs.
  • Warm fork-pool sizing is clamped against total box RAM, not live-free RAM — on a heavily loaded box the warm default may not complete (the run falls back to a slower decode-bound path and warns); tune the worker count down per run where needed.
  • The LamQuant python payload still lives under this crate's python/ transitionally; it moves to blut-lamquant in a later migration.

Full changelog: CHANGELOG.md