modular gradient SDK for building general intelligence
composable rust crates for building brains. you pick the architecture — we provide the primitives: continuous thought machines with full BPTT, graph composition, multimodal codecs, bio-inspired learning, GPU dispatch, and a live 3D debugger.
# your Cargo.toml
[dependencies]
modgrad-ctm = { git = "https://github.com/rotkonetworks/modgrad" }
modgrad-compute = { git = "https://github.com/rotkonetworks/modgrad" }
modgrad-training = { git = "https://github.com/rotkonetworks/modgrad" }use modgrad_ctm::graph::{RegionalConfig, RegionalWeights, RegionalAdamW,
RegionalGradients, regional_train_token, NeuralComputer};
// pick a preset: four_region, eight_region_small (187k params),
// eight_region (~81M), eight_region_medium, eight_region_large,
// eight_region_billion — all in modgrad_ctm::graph::RegionalConfig.
let cfg = RegionalConfig::eight_region_small(
/* obs_dim */ 128,
/* out_dims */ 256,
/* ticks */ 16,
);
let mut w = RegionalWeights::new(cfg);
let mut opt = RegionalAdamW::new(&w).with_lr(3e-4);
// train on your data
let mut grads = RegionalGradients::zeros(&w);
for (token, target) in your_data {
let (loss, _pred) = regional_train_token(&w, &mut grads, token, target);
opt.step(&mut w, &grads);
grads.zero();
}
// run as a neural computer
let mut nc = NeuralComputer::new(w);
let response = nc.chat("hello", 100, 0.8);no frameworks. no config files. just rust functions you compose however you want.
maze route prediction — 21×21 mazes, 5000 training steps, 3 seeds, held-out 200-maze eval. modgrad-generated mazes (random DFS, no pre-drawn solution, new maze every batch).
| config | params | first-step acc | per-step acc | correct prefix (of 20) |
|---|---|---|---|---|
| single CTM | 450k | 51.3 ± 2.0 % | 27.4 ± 0.1 % | 1.2 ± 0.1 |
| brain (8 regions) | 187k | 79.2 ± 13.4 % | 38.1 ± 3.0 % | 2.1 ± 0.3 |
| delta | −59 % | +54 % | +39 % | +75 % |
the 8-region brain wins every metric using 2.4× fewer parameters. brain's lowest per-step across three seeds (35.5 %) beats single-CTM's best (27.5 %). non-overlapping ranges.
reproduce:
cargo run -p mazes --release -- --size 21 --steps 5000 --seed 42
cargo run -p mazes --release -- --brain --size 21 --steps 5000 --seed 42GPU path — resident dispatch over rocBLAS / HIP on AMD (gfx1102, RX 7600M XT) and cudarc on NVIDIA. Resident matvec/AdamW/RoPE keep weights on-device across training steps; measured 55× / 5.6× speedup at 1024×512 vs the host-bounce path.
Qwen2.5-0.5B inference on the resident runtime — loads safetensors, decodes coherent text:
cargo run -p qwen_chat --release --features rocm -- --model models/qwen2.5-0.5bEnd-to-end training on the foundation-model stack — lm_validate proves the
loop trains: 5.72 → 0.74 cross-entropy in 10 steps on real data.
cargo run -p lm_validate --release --features rocmBLT (byte-latent transformer, Pagnoni et al. 2024)
scaffolding for byte-ifying Qwen2.5 — local encoder + cross-attention + latent +
local decoder pipeline. Forward path lands in modgrad-blt; resident backward
through cross-attention is the next slice.
The architectural direction — making the cerebellum the LLM (Qwen2.5 → BLT,
~82% of the brain's parameters) — is laid out in
docs/BRAIN_ARCHITECTURE.md. The default 8-region
preset today is the legacy small one used in the maze result above.
the building blocks — use any of them independently:
| crate | what it gives you |
|---|---|
| modgrad-ctm | single CTM (NLM traces, sync, MHA, U-Net synapse, full BPTT) + graph composition (N CTMs in a directed graph, embedding table, AdamW, NeuralComputer) + plural-alter system + Organism orchestrator |
| modgrad-compute | Linear, ops, tensor, GPU batched dispatch, GpuVec resident buffers |
| modgrad-codec | VisualRetina (V1 fixed Gabors → V2/V4 Hebbian-learned cortex), VQ-VAE, AudioCodec, FSQ, byte n-gram hash |
| modgrad-ffn | SwiGLU MLP language prior + FrozenCerebellum trait for learned weighted blending across transformer layers |
| modgrad-data | type-safe multimodal tokenization, mixed-modality streaming, lazy data loading |
| modgrad-device | CPU / CUDA (cudarc) / AMD ROCm (rocBLAS + HIP) backend abstraction; resident kernels (matvec, AdamW, RoPE, RMSNorm) |
| modgrad-transformer | transformer blocks, MHA, RoPE, KV cache, GptModelResident (full residency), Qwen-class loader pipeline |
| modgrad-blt | byte-latent transformer — entropy patcher, local encoder/decoder, patch-aware cross-attention, byteify recipe (Path B: Qwen2.5 → byte-level) |
| modgrad-substrate | foundation-model substrate — Q4_K residency, streaming weight loaders, 7B-class targeting on 8 GB VRAM |
| modgrad-io | telemetry streaming, wincode serialization, safetensors + ONNX + GGUF backends |
| modgrad-training | AdamW, Adam, SGD optimizers + warmup/cosine schedulers + dream replay |
| modgrad-memory | episodic memory with valence, content-addressable retrieval, retrieval priming |
| modgrad-persist | wincode/JSON save/load, quantization (f32/f16/i8) |
| modgrad-traits | core traits (Brain, TokenInput, Encoder, LossFn) |
optional, toggleable — use them as auxiliary signals or ignore them:
| module | what |
|---|---|
bio::cerebellar |
delta rule forward model + dopamine dynamics |
bio::pain |
relative-loss valence, adaptive learning-rate focus, emotional baseline |
bio::dream |
offline dream replay with retrieval priming and pain-weighted episode selection |
bio::three_factor |
REINFORCE with Titans-style eligibility traces |
bio::neuromod |
dopamine / serotonin / norepinephrine state machine |
bio::salience |
RPE × motor conflict → learning rate gate |
bio::homeostasis |
self-monitoring: sleep pressure, zone detection |
bio::consolidation |
SPSA spindle-ripple offline weight optimization |
plural |
multiple alters sharing one brain: independent episodic memory per alter, neuromod baselines, pain-triggered switching |
organism |
integrated training orchestrator: composes pain + memory + homeostasis + neuromod + plural |
memory::hippocampus |
content-addressable episodic memory (cosine retrieval) |
memory::replay |
prioritized experience buffer (surprise-gated) |
memory::sleep |
offline least-squares weight consolidation |
unified vocabulary — one model, all modalities:
0..255 bytes (text)
256..263 delimiters (<img> </img> <aud> </aud> <vid> </vid>)
264..4359 image VQ codes (4096)
4360..8455 audio VQ codes (4096)
8456..8855 timestamps (0.5s resolution)
8856..9133 action tokens (mouse, keyboard, coordinates)
our runtime built on the SDK. 8 brain regions, multimodal, neural computer mode. you don't need isis to use modgrad — it's just one composition.
# train
isis train model.bin
isis train model.bin --multimodal --images cifar.bin --audio clips/
# interactive neural computer
isis nc model.bin
isis nc model.bin --audio mic.wav --camera frames/ --debug-port 4747
# generate
isis generate model.bin --prompt "the cat "
# run as a service
isis daemon model.bin --port 4747
isis send "hello world" --addr 127.0.0.1:4747
# show devices
isis devicesThe legacy small preset (used by the maze benchmark above):
| region | neurons | memory | role |
|---|---|---|---|
| input | 64 | 4 | perception + motor feedback |
| attention | 64 | 8 | gating, routing |
| output | 64 | 16 | evidence accumulation |
| motor | 64 | 4 | action selection |
| cerebellum | 8 | 4 | forward model |
| basal ganglia | 8 | 8 | value estimation |
| insula | 8 | 4 | interoception |
| hippocampus | 8 | 16 | episodic binding |
The target preset mounts Qwen2.5-0.5B (and later BLT-byte-ified) as a frozen
cerebellum, taking ~82% of the parameter budget — see
docs/BRAIN_ARCHITECTURE.md.
nanoGPT but for CTMs. minimal example — uses the SDK directly, no isis:
cargo run -p minictm --release -- --data train.txt --steps 5000
cargo run -p minictm --release -- --data train.txt --steps 5000 --chatlive 3D brain visualizer. connects to any running modgrad model via TCP:
# connect to isis or any NC with --debug-port
modgrad-debugger 127.0.0.1:4747- 3D neuron particles colored by region, sized by activation
- token stream color-coded by modality (text/image/audio/action)
- NLM trace heatmaps per region
- global sync visualization
- command center: pause/resume/step, inject tokens, inspect state
cargo build --release # CPU only (default)
cargo build --release --features cuda # NVIDIA GPU (via cudarc, fallback dynamic loading)
cargo build --release --features rocm # AMD GPU (rocBLAS + HIP; requires libamdhip64 + libhipblas)
cargo test --releaseROCm is opt-in because modgrad-device/rocm.rs uses hardcoded #[link]
attributes that hard-require the system libraries at link time. CUDA stays in
the default set because cudarc dynamic-loads at runtime.
Requires rust 2024 edition.
- sakana AI CTM (arxiv 2505.05522) — continuous thought machine
- pagnoni et al. (arxiv 2412.09871) — byte-latent transformer (modgrad-blt)
- qwen3-VL (2025) — text timestamps for video
- meta neural computers (2026) — the model as the running computer
- chameleon (meta) — unified discrete token space for multimodal generation
MIT