Skip to content

rotkonetworks/modgrad

Repository files navigation

modgrad

modular gradient SDK for building general intelligence

composable rust crates for building brains. you pick the architecture — we provide the primitives: continuous thought machines with full BPTT, graph composition, multimodal codecs, bio-inspired learning, GPU dispatch, and a live 3D debugger.

build your own brain

# your Cargo.toml
[dependencies]
modgrad-ctm = { git = "https://github.com/rotkonetworks/modgrad" }
modgrad-compute = { git = "https://github.com/rotkonetworks/modgrad" }
modgrad-training = { git = "https://github.com/rotkonetworks/modgrad" }
use modgrad_ctm::graph::{RegionalConfig, RegionalWeights, RegionalAdamW,
    RegionalGradients, regional_train_token, NeuralComputer};

// pick a preset: four_region, eight_region_small (187k params),
// eight_region (~81M), eight_region_medium, eight_region_large,
// eight_region_billion — all in modgrad_ctm::graph::RegionalConfig.
let cfg = RegionalConfig::eight_region_small(
    /* obs_dim  */ 128,
    /* out_dims */ 256,
    /* ticks    */ 16,
);

let mut w = RegionalWeights::new(cfg);
let mut opt = RegionalAdamW::new(&w).with_lr(3e-4);

// train on your data
let mut grads = RegionalGradients::zeros(&w);
for (token, target) in your_data {
    let (loss, _pred) = regional_train_token(&w, &mut grads, token, target);
    opt.step(&mut w, &grads);
    grads.zero();
}

// run as a neural computer
let mut nc = NeuralComputer::new(w);
let response = nc.chat("hello", 100, 0.8);

no frameworks. no config files. just rust functions you compose however you want.

measured results

maze route prediction — 21×21 mazes, 5000 training steps, 3 seeds, held-out 200-maze eval. modgrad-generated mazes (random DFS, no pre-drawn solution, new maze every batch).

config params first-step acc per-step acc correct prefix (of 20)
single CTM 450k 51.3 ± 2.0 % 27.4 ± 0.1 % 1.2 ± 0.1
brain (8 regions) 187k 79.2 ± 13.4 % 38.1 ± 3.0 % 2.1 ± 0.3
delta −59 % +54 % +39 % +75 %

the 8-region brain wins every metric using 2.4× fewer parameters. brain's lowest per-step across three seeds (35.5 %) beats single-CTM's best (27.5 %). non-overlapping ranges.

reproduce:

cargo run -p mazes --release -- --size 21 --steps 5000 --seed 42
cargo run -p mazes --release -- --brain --size 21 --steps 5000 --seed 42

GPU path — resident dispatch over rocBLAS / HIP on AMD (gfx1102, RX 7600M XT) and cudarc on NVIDIA. Resident matvec/AdamW/RoPE keep weights on-device across training steps; measured 55× / 5.6× speedup at 1024×512 vs the host-bounce path.

running today

Qwen2.5-0.5B inference on the resident runtime — loads safetensors, decodes coherent text:

cargo run -p qwen_chat --release --features rocm -- --model models/qwen2.5-0.5b

End-to-end training on the foundation-model stack — lm_validate proves the loop trains: 5.72 → 0.74 cross-entropy in 10 steps on real data.

cargo run -p lm_validate --release --features rocm

BLT (byte-latent transformer, Pagnoni et al. 2024) scaffolding for byte-ifying Qwen2.5 — local encoder + cross-attention + latent + local decoder pipeline. Forward path lands in modgrad-blt; resident backward through cross-attention is the next slice.

The architectural direction — making the cerebellum the LLM (Qwen2.5 → BLT, ~82% of the brain's parameters) — is laid out in docs/BRAIN_ARCHITECTURE.md. The default 8-region preset today is the legacy small one used in the maze result above.

SDK crates

the building blocks — use any of them independently:

crate what it gives you
modgrad-ctm single CTM (NLM traces, sync, MHA, U-Net synapse, full BPTT) + graph composition (N CTMs in a directed graph, embedding table, AdamW, NeuralComputer) + plural-alter system + Organism orchestrator
modgrad-compute Linear, ops, tensor, GPU batched dispatch, GpuVec resident buffers
modgrad-codec VisualRetina (V1 fixed Gabors → V2/V4 Hebbian-learned cortex), VQ-VAE, AudioCodec, FSQ, byte n-gram hash
modgrad-ffn SwiGLU MLP language prior + FrozenCerebellum trait for learned weighted blending across transformer layers
modgrad-data type-safe multimodal tokenization, mixed-modality streaming, lazy data loading
modgrad-device CPU / CUDA (cudarc) / AMD ROCm (rocBLAS + HIP) backend abstraction; resident kernels (matvec, AdamW, RoPE, RMSNorm)
modgrad-transformer transformer blocks, MHA, RoPE, KV cache, GptModelResident (full residency), Qwen-class loader pipeline
modgrad-blt byte-latent transformer — entropy patcher, local encoder/decoder, patch-aware cross-attention, byteify recipe (Path B: Qwen2.5 → byte-level)
modgrad-substrate foundation-model substrate — Q4_K residency, streaming weight loaders, 7B-class targeting on 8 GB VRAM
modgrad-io telemetry streaming, wincode serialization, safetensors + ONNX + GGUF backends
modgrad-training AdamW, Adam, SGD optimizers + warmup/cosine schedulers + dream replay
modgrad-memory episodic memory with valence, content-addressable retrieval, retrieval priming
modgrad-persist wincode/JSON save/load, quantization (f32/f16/i8)
modgrad-traits core traits (Brain, TokenInput, Encoder, LossFn)

bio-inspired modules (in modgrad-ctm)

optional, toggleable — use them as auxiliary signals or ignore them:

module what
bio::cerebellar delta rule forward model + dopamine dynamics
bio::pain relative-loss valence, adaptive learning-rate focus, emotional baseline
bio::dream offline dream replay with retrieval priming and pain-weighted episode selection
bio::three_factor REINFORCE with Titans-style eligibility traces
bio::neuromod dopamine / serotonin / norepinephrine state machine
bio::salience RPE × motor conflict → learning rate gate
bio::homeostasis self-monitoring: sleep pressure, zone detection
bio::consolidation SPSA spindle-ripple offline weight optimization
plural multiple alters sharing one brain: independent episodic memory per alter, neuromod baselines, pain-triggered switching
organism integrated training orchestrator: composes pain + memory + homeostasis + neuromod + plural
memory::hippocampus content-addressable episodic memory (cosine retrieval)
memory::replay prioritized experience buffer (surprise-gated)
memory::sleep offline least-squares weight consolidation

multimodal token space

unified vocabulary — one model, all modalities:

  0..255        bytes (text)
  256..263      delimiters (<img> </img> <aud> </aud> <vid> </vid>)
  264..4359     image VQ codes (4096)
  4360..8455    audio VQ codes (4096)
  8456..8855    timestamps (0.5s resolution)
  8856..9133    action tokens (mouse, keyboard, coordinates)

isis

our runtime built on the SDK. 8 brain regions, multimodal, neural computer mode. you don't need isis to use modgrad — it's just one composition.

# train
isis train model.bin
isis train model.bin --multimodal --images cifar.bin --audio clips/

# interactive neural computer
isis nc model.bin
isis nc model.bin --audio mic.wav --camera frames/ --debug-port 4747

# generate
isis generate model.bin --prompt "the cat "

# run as a service
isis daemon model.bin --port 4747
isis send "hello world" --addr 127.0.0.1:4747

# show devices
isis devices

isis brain regions

The legacy small preset (used by the maze benchmark above):

region neurons memory role
input 64 4 perception + motor feedback
attention 64 8 gating, routing
output 64 16 evidence accumulation
motor 64 4 action selection
cerebellum 8 4 forward model
basal ganglia 8 8 value estimation
insula 8 4 interoception
hippocampus 8 16 episodic binding

The target preset mounts Qwen2.5-0.5B (and later BLT-byte-ified) as a frozen cerebellum, taking ~82% of the parameter budget — see docs/BRAIN_ARCHITECTURE.md.

minictm

nanoGPT but for CTMs. minimal example — uses the SDK directly, no isis:

cargo run -p minictm --release -- --data train.txt --steps 5000
cargo run -p minictm --release -- --data train.txt --steps 5000 --chat

debugger

live 3D brain visualizer. connects to any running modgrad model via TCP:

# connect to isis or any NC with --debug-port
modgrad-debugger 127.0.0.1:4747
  • 3D neuron particles colored by region, sized by activation
  • token stream color-coded by modality (text/image/audio/action)
  • NLM trace heatmaps per region
  • global sync visualization
  • command center: pause/resume/step, inject tokens, inspect state

building

cargo build --release                      # CPU only (default)
cargo build --release --features cuda      # NVIDIA GPU (via cudarc, fallback dynamic loading)
cargo build --release --features rocm      # AMD GPU (rocBLAS + HIP; requires libamdhip64 + libhipblas)
cargo test  --release

ROCm is opt-in because modgrad-device/rocm.rs uses hardcoded #[link] attributes that hard-require the system libraries at link time. CUDA stays in the default set because cudarc dynamic-loads at runtime.

Requires rust 2024 edition.

references

  • sakana AI CTM (arxiv 2505.05522) — continuous thought machine
  • pagnoni et al. (arxiv 2412.09871) — byte-latent transformer (modgrad-blt)
  • qwen3-VL (2025) — text timestamps for video
  • meta neural computers (2026) — the model as the running computer
  • chameleon (meta) — unified discrete token space for multimodal generation

license

MIT

About

Modular Gradient SDK for building general intelligence

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors