modgrad

modular gradient SDK for building general intelligence

composable rust crates for building brains. you pick the architecture — we provide the primitives: continuous thought machines with full BPTT, graph composition, multimodal codecs, bio-inspired learning, GPU dispatch, and a live 3D debugger.

build your own brain

# your Cargo.toml
[dependencies]
modgrad-ctm = { git = "https://github.com/rotkonetworks/modgrad" }
modgrad-compute = { git = "https://github.com/rotkonetworks/modgrad" }
modgrad-training = { git = "https://github.com/rotkonetworks/modgrad" }

use modgrad_ctm::graph::{RegionalConfig, RegionalWeights, RegionalAdamW,
    RegionalGradients, regional_train_token, NeuralComputer};

// pick a preset: four_region, eight_region_small (187k params),
// eight_region (~81M), eight_region_medium, eight_region_large,
// eight_region_billion — all in modgrad_ctm::graph::RegionalConfig.
let cfg = RegionalConfig::eight_region_small(
    /* obs_dim  */ 128,
    /* out_dims */ 256,
    /* ticks    */ 16,
);

let mut w = RegionalWeights::new(cfg);
let mut opt = RegionalAdamW::new(&w).with_lr(3e-4);

// train on your data
let mut grads = RegionalGradients::zeros(&w);
for (token, target) in your_data {
    let (loss, _pred) = regional_train_token(&w, &mut grads, token, target);
    opt.step(&mut w, &grads);
    grads.zero();
}

// run as a neural computer
let mut nc = NeuralComputer::new(w);
let response = nc.chat("hello", 100, 0.8);

no frameworks. no config files. just rust functions you compose however you want.

measured results

maze route prediction — 21×21 mazes, 5000 training steps, 3 seeds, held-out 200-maze eval. modgrad-generated mazes (random DFS, no pre-drawn solution, new maze every batch).

config	params	first-step acc	per-step acc	correct prefix (of 20)
single CTM	450k	51.3 ± 2.0 %	27.4 ± 0.1 %	1.2 ± 0.1
brain (8 regions)	187k	79.2 ± 13.4 %	38.1 ± 3.0 %	2.1 ± 0.3
delta	−59 %	+54 %	+39 %	+75 %

the 8-region brain wins every metric using 2.4× fewer parameters. brain's lowest per-step across three seeds (35.5 %) beats single-CTM's best (27.5 %). non-overlapping ranges.

reproduce:

cargo run -p mazes --release -- --size 21 --steps 5000 --seed 42
cargo run -p mazes --release -- --brain --size 21 --steps 5000 --seed 42

GPU path — resident dispatch over rocBLAS / HIP on AMD (gfx1102, RX 7600M XT) and cudarc on NVIDIA. Resident matvec/AdamW/RoPE keep weights on-device across training steps; measured 55× / 5.6× speedup at 1024×512 vs the host-bounce path.

running today

Qwen2.5-0.5B inference on the resident runtime — loads safetensors, decodes coherent text:

cargo run -p qwen_chat --release --features rocm -- --model models/qwen2.5-0.5b

End-to-end training on the foundation-model stack — lm_validate proves the loop trains: 5.72 → 0.74 cross-entropy in 10 steps on real data.

cargo run -p lm_validate --release --features rocm

BLT (byte-latent transformer, Pagnoni et al. 2024) scaffolding for byte-ifying Qwen2.5 — local encoder + cross-attention + latent + local decoder pipeline. Forward path lands in modgrad-blt; resident backward through cross-attention is the next slice.

The architectural direction — making the cerebellum the LLM (Qwen2.5 → BLT, ~82% of the brain's parameters) — is laid out in docs/BRAIN_ARCHITECTURE.md. The default 8-region preset today is the legacy small one used in the maze result above.

SDK crates

the building blocks — use any of them independently:

crate	what it gives you
modgrad-ctm	single CTM (NLM traces, sync, MHA, U-Net synapse, full BPTT) + graph composition (N CTMs in a directed graph, embedding table, AdamW, NeuralComputer) + plural-alter system + Organism orchestrator
modgrad-compute	`Linear`, ops, tensor, GPU batched dispatch, `GpuVec` resident buffers
modgrad-codec	`VisualRetina` (V1 fixed Gabors → V2/V4 Hebbian-learned cortex), VQ-VAE, AudioCodec, FSQ, byte n-gram hash
modgrad-ffn	SwiGLU MLP language prior + `FrozenCerebellum` trait for learned weighted blending across transformer layers
modgrad-data	type-safe multimodal tokenization, mixed-modality streaming, lazy data loading
modgrad-device	CPU / CUDA (cudarc) / AMD ROCm (rocBLAS + HIP) backend abstraction; resident kernels (matvec, AdamW, RoPE, RMSNorm)
modgrad-transformer	transformer blocks, MHA, RoPE, KV cache, `GptModelResident` (full residency), Qwen-class loader pipeline
modgrad-blt	byte-latent transformer — entropy patcher, local encoder/decoder, patch-aware cross-attention, byteify recipe (Path B: Qwen2.5 → byte-level)
modgrad-substrate	foundation-model substrate — Q4_K residency, streaming weight loaders, 7B-class targeting on 8 GB VRAM
modgrad-io	telemetry streaming, wincode serialization, safetensors + ONNX + GGUF backends
modgrad-training	AdamW, Adam, SGD optimizers + warmup/cosine schedulers + dream replay
modgrad-memory	episodic memory with valence, content-addressable retrieval, retrieval priming
modgrad-persist	wincode/JSON save/load, quantization (f32/f16/i8)
modgrad-traits	core traits (`Brain`, `TokenInput`, `Encoder`, `LossFn`)

bio-inspired modules (in modgrad-ctm)

optional, toggleable — use them as auxiliary signals or ignore them:

module	what
`bio::cerebellar`	delta rule forward model + dopamine dynamics
`bio::pain`	relative-loss valence, adaptive learning-rate focus, emotional baseline
`bio::dream`	offline dream replay with retrieval priming and pain-weighted episode selection
`bio::three_factor`	REINFORCE with Titans-style eligibility traces
`bio::neuromod`	dopamine / serotonin / norepinephrine state machine
`bio::salience`	RPE × motor conflict → learning rate gate
`bio::homeostasis`	self-monitoring: sleep pressure, zone detection
`bio::consolidation`	SPSA spindle-ripple offline weight optimization
`plural`	multiple alters sharing one brain: independent episodic memory per alter, neuromod baselines, pain-triggered switching
`organism`	integrated training orchestrator: composes pain + memory + homeostasis + neuromod + plural
`memory::hippocampus`	content-addressable episodic memory (cosine retrieval)
`memory::replay`	prioritized experience buffer (surprise-gated)
`memory::sleep`	offline least-squares weight consolidation

multimodal token space

unified vocabulary — one model, all modalities:

  0..255        bytes (text)
  256..263      delimiters (<img> </img> <aud> </aud> <vid> </vid>)
  264..4359     image VQ codes (4096)
  4360..8455    audio VQ codes (4096)
  8456..8855    timestamps (0.5s resolution)
  8856..9133    action tokens (mouse, keyboard, coordinates)

isis

our runtime built on the SDK. 8 brain regions, multimodal, neural computer mode. you don't need isis to use modgrad — it's just one composition.

# train
isis train model.bin
isis train model.bin --multimodal --images cifar.bin --audio clips/

# interactive neural computer
isis nc model.bin
isis nc model.bin --audio mic.wav --camera frames/ --debug-port 4747

# generate
isis generate model.bin --prompt "the cat "

# run as a service
isis daemon model.bin --port 4747
isis send "hello world" --addr 127.0.0.1:4747

# show devices
isis devices

isis brain regions

The legacy small preset (used by the maze benchmark above):

region	neurons	memory	role
input	64	4	perception + motor feedback
attention	64	8	gating, routing
output	64	16	evidence accumulation
motor	64	4	action selection
cerebellum	8	4	forward model
basal ganglia	8	8	value estimation
insula	8	4	interoception
hippocampus	8	16	episodic binding

The target preset mounts Qwen2.5-0.5B (and later BLT-byte-ified) as a frozen cerebellum, taking ~82% of the parameter budget — see docs/BRAIN_ARCHITECTURE.md.

minictm

nanoGPT but for CTMs. minimal example — uses the SDK directly, no isis:

cargo run -p minictm --release -- --data train.txt --steps 5000
cargo run -p minictm --release -- --data train.txt --steps 5000 --chat

debugger

live 3D brain visualizer. connects to any running modgrad model via TCP:

# connect to isis or any NC with --debug-port
modgrad-debugger 127.0.0.1:4747

3D neuron particles colored by region, sized by activation
token stream color-coded by modality (text/image/audio/action)
NLM trace heatmaps per region
global sync visualization
command center: pause/resume/step, inject tokens, inspect state

building

cargo build --release                      # CPU only (default)
cargo build --release --features cuda      # NVIDIA GPU (via cudarc, fallback dynamic loading)
cargo build --release --features rocm      # AMD GPU (rocBLAS + HIP; requires libamdhip64 + libhipblas)
cargo test  --release

ROCm is opt-in because modgrad-device/rocm.rs uses hardcoded #[link] attributes that hard-require the system libraries at link time. CUDA stays in the default set because cudarc dynamic-loads at runtime.

Requires rust 2024 edition.

references

sakana AI CTM (arxiv 2505.05522) — continuous thought machine
pagnoni et al. (arxiv 2412.09871) — byte-latent transformer (modgrad-blt)
qwen3-VL (2025) — text timestamps for video
meta neural computers (2026) — the model as the running computer
chameleon (meta) — unified discrete token space for multimodal generation

license

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 430 Commits
.cargo		.cargo
benchmarks/maze_vs_sakana_v1		benchmarks/maze_vs_sakana_v1
crates		crates
debugger		debugger
docs		docs
examples		examples
extra/remu		extra/remu
isis		isis
scripts		scripts
src		src
tasks		tasks
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
gemma		gemma
program.md		program.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

modgrad

build your own brain

measured results

running today

SDK crates

bio-inspired modules (in modgrad-ctm)

multimodal token space

isis

isis brain regions

minictm

debugger

building

references

license

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

modgrad

build your own brain

measured results

running today

SDK crates

bio-inspired modules (in modgrad-ctm)

multimodal token space

isis

isis brain regions

minictm

debugger

building

references

license

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages