Skip to content

gantasmo/theDAW

Repository files navigation

theDAW

by GANTASMO

Python 3.10 PyTorch CUDA 12.8 React 19, Vite 7, Tailwind 4 FastAPI backend
Stable Audio 3 plus Magenta RealTime 2 Windows / Linux Companion: theDAW-XR Status: active development

Listen on Spotify Watch on YouTube Follow @gantasmo on Instagram Follow @gantasmo on X Learn more at gantasmo.com

GANTASMO is an amorphous entity by Daniel Joaquin Trujillo and Josh Valenzuela that defies conventional classification. We make thought provoking, highly technical, yet listenable music inspired by the underappreciated pioneers of modern music. Beyond musical composition and performance, GANTASMO is a powerhouse of research and development in the fields Artificial Intelligence, Augmented Reality, Virtual Reality, the democratization of musical tools and education, and the preservation and evolution of musical history and traditions predating modern recording infrastructure.


theDAW is an all-in-one application for music creation. The generative engine renders audio from several inputs: supplied init audio, a text prompt, a painted inpaint region, and the Chimera engine that analyzes, blends, and beat-aligns several source clips into one generation. The workspace opens into a full studio for composition, arrangement, editing, and mixing, and into a live rig for DJing and VJing with deep MIDI mapping for any controller. theDAW covers the full path from an initial idea through a finished render to a live performance, and pairs with theDAW-XR on Meta Quest 3 for hands-only spatial control.

theDAW also ships the first non-Mac port of Google's Magenta RealTime 2, vendored as the magenta-rt2-nvidia sidecar, which runs on Windows with WSL2 and NVIDIA, on native Linux, and on cloud GPUs. Models stay under the user's control: nothing downloads at startup, local-only mode is on by default, and a model loads at the first CREATE that needs it.

Watch the theDAW feature-tour video
Click to watch the full feature tour, also available vertical (9:16).

theDAW MAKE workspace with prompt-driven generation, the Chimera fusion stack, and the dual live visualizers


Quickstart

Double-click theDAW.bat. That is the entire setup. It checks the machine, installs anything missing after one quick confirmation, and opens theDAW in the browser. The Stable Audio model downloads on its own the first time a track is generated.

.\theDAW.bat

The launcher checks prerequisites, bootstraps dependencies when the tree is fresh (uv sync --group dev, npm install), clears stale processes on ports 5173/8600/5187, then runs the backend, Vite, and an optional tunnel together in one console and opens http://localhost:5173. Manual launch:

uv run uvicorn backend.server:app --host 0.0.0.0 --port 8600 --reload   # backend
cd frontend && npm run dev                                              # frontend

The full User Guide is a deep power-user reference. It runs long and parts can lag the current app, so it works best as a reference rather than a first stop. Quick links: Windows Setup, Prompting, §3 Installation.

Prerequisites

theDAW.bat installs these automatically the first time a tool is missing. The list is here for reference and for manual or non-Windows setups.

Tool Role
uv Python environment and package manager. Creates the venv and installs torch and CUDA.
Node.js 20.19+ or 22.12+ Frontend dev server and the VJ sidecar. Vite 7 sets the floor.
FFmpeg on PATH Every audio path: effects, exports, library ingest, MIDI conversion, import.
Git Clones the repo. --recurse-submodules brings in the Magenta sidecar source.
NVIDIA driver 550+ Runs the Medium model and Magenta. The Small model runs on CPU.

Architecture

theDAW is a React frontend over a FastAPI backend that wraps the Stable Audio 3 pipeline, a plugin module system, and spawned sidecars. The frontend proxies /api/* to the backend on port 8600. The wiki Dataflow page maps every input and output in one chart.

System.

flowchart TD
  UI["theDAW UI<br/>MAKE EDIT MIX DJ VJ TRAIN LEARN"]:::in
  API["FastAPI backend :8600<br/>job queue, FFmpeg, introspection"]:::proc
  SA3["Stable Audio 3<br/>DiT + SAME AE"]:::eng
  MODS["Plugin modules"]:::proc
  MRT2["magenta-rt2-nvidia<br/>WSL2 + JAX"]:::side
  VJ["VJ-9000<br/>WebGL engine"]:::side
  XR["theDAW-XR<br/>Quest 3"]:::side
  UI -->|/api/*| API
  API --> SA3
  API --> MODS
  MODS -. spawn .-> MRT2
  MODS -. iframe .-> VJ
  XR <-->|ADB, MIDI, video| MODS
  classDef in fill:#0f3d57,stroke:#3aa0db,color:#eaf6ff;
  classDef eng fill:#3a2356,stroke:#a877e0,color:#f3ecff;
  classDef proc fill:#0e3b3b,stroke:#2bb3a3,color:#e6fffb;
  classDef side fill:#4a3115,stroke:#e09a3a,color:#fff4e3;
Loading

Generation. Several inputs condition one generation; the DiT renders latents, the autoencoder decodes them, every render saves to the library, and LEARN draws the lineage.

flowchart TD
  P["Text prompt"]:::in
  INIT["Init audio<br/>voice, file, library, pattern"]:::in
  MASK["Inpaint region"]:::in
  CHI["Chimera fusion"]:::in
  P --> GEN
  INIT --> GEN
  MASK --> GEN
  CHI --> GEN
  GEN["DiT transformer"]:::eng --> LAT["SAME latents"]:::eng
  LAT --> DEC["SAME decode"]:::eng
  DEC --> WAV["44.1 kHz stereo"]:::out
  WAV --> LIB["Library"]:::out
  LIB --> LRN["LEARN lineage"]:::out
  classDef in fill:#0f3d57,stroke:#3aa0db,color:#eaf6ff;
  classDef eng fill:#3a2356,stroke:#a877e0,color:#f3ecff;
  classDef out fill:#13402a,stroke:#46c47a,color:#e7ffee;
Loading

Routing. Player audio, a microphone, MIDI, and SLIDE drive the VJ engine and the DJ console, and theDAW-XR feeds hand-tracked MIDI and passthrough video into the same buses.

flowchart TD
  DJ["DJ console<br/>2 decks, FX, stems"]:::live
  MIC["Microphone"]:::in
  MIDI["MIDI<br/>~110 profiles, learn"]:::in
  SLIDE["SLIDE surface"]:::in
  XR["theDAW-XR<br/>hand MIDI, passthrough"]:::side
  DJ --> AUD["Player audio ~30 fps"]:::proc
  AUD --> VJ
  MIC --> VJ
  MIDI --> VJ
  MIDI --> DJ
  SLIDE <-->|sync| VJ
  XR --> MIDI
  XR -->|video| VJ
  VJ["VJ-9000<br/>sources, FX, shaders"]:::live --> OUT["Live output"]:::out
  VJ -->|watch-link| WEB["Remote viewers"]:::out
  classDef in fill:#0f3d57,stroke:#3aa0db,color:#eaf6ff;
  classDef proc fill:#0e3b3b,stroke:#2bb3a3,color:#e6fffb;
  classDef live fill:#4a1530,stroke:#e85a8a,color:#ffe9f1;
  classDef out fill:#13402a,stroke:#46c47a,color:#e7ffee;
  classDef side fill:#4a3115,stroke:#e09a3a,color:#fff4e3;
Loading

Features

Every feature has a full reference in the User Guide. Names link to the section below or the relevant guide.

Studio

Live rig

Library, notation, and tools


Workspaces

MAKE

Generation controls for model, duration, sampler steps, CFG, seed, batch, and the sampler sigma fader

One form drives text-to-audio, audio-to-audio, inpainting, and continuation. Supplied init audio, a text prompt, a painted inpaint region, and a Chimera stack all condition the same generation, and the init noise level sets how far the result departs from the source. Chimera blends several clips into one generation and beat-aligns them under Start, Downbeat, or Phrase Weave alignment. Templates store full parameter sets, Saved Prompts keep a history, and the async job queue saves every render to the library. Full reference: User Guide §6.

Generate

Magenta RealTime 2 text-to-music panel, the first non-Mac MRT2 port

Suno cloud generation runs in the Aurora Cloud Console across simple, custom, cover, and mashup modes, and cover and mashup results write lineage edges. Magenta RealTime 2 provides text-to-music whenever its sidecar is running, through the first non-Mac MRT2 port vendored at magenta-rt2-nvidia. The extended sidecar also accepts MIDI-note and audio-style conditioning. Full reference: User Guide §26 and §27.

EDIT

Multi-track timeline with per-clip waveforms, trim and fade handles, and the cut tool

The timeline holds many tracks, each clip caches its own waveform peaks, Move drags clips along and between tracks, and Cut splits a clip while preserving source alignment. Each track carries name, mute, solo, volume, and pan, and the live mixer applies them during playback. Commit Edit renders the audible tracks into one 44.1 kHz stereo WAV through OfflineAudioContext. Full reference: User Guide §7.

MIX

MIX effects browser, the flowing chain, and the Quick Master macro knobs

A chain of 24 FFmpeg effects covers mastering, compression, filters, vocal processing, lo-fi, stereo widening, reverb, delay, LUFS normalization, pitch shift, and export to FLAC, MP3, AAC, and Opus. Four macro sliders map onto the active effect, and process history keeps the last eight runs. The Edit Tool Stack adds six module families under /api/edit/*, whose GUIs iframe into the effect stage. Full reference: User Guide §8 and §28.

DJ

Two-deck DJ console with jog wheels, the central mixer, and the FX rack

Two decks run from a pro layout with jog wheels, a central mixer, and a track browser. The engine handles octave-aware beatmatch sync, key-lock, a 3-band EQ, a single-knob filter, four hotcues, beat loops and rolls, slip mode, and quantize. The FX rack adds a flanger, an impulse-response reverb, and a resonant wah per deck, with a master limiter on the bus. Live stems ride on per-stem faders, cue output pre-listens through a headphone device chosen with setSinkId, and Automix sequences and crossfades a set on its own. Full reference: User Guide §9.

VJ

3D reactive spectrogram terrain with bloom, particles, and camera input

The VJ tab embeds the VJ-9000 engine, which renders a glowing reactive terrain plus a unified set of live sources: cameras (webcam, phone, tablet, or Quest over the LAN), a GLSL shader source with fractals, eight materials, and audio-mapped params, an ASCII effect, cymatics, depth-cloud and spectra sources, and source banks for snapshot and recall. A composable GPU effect chain, Autopilot, BPM sync, and full MIDI mapping sit on top, and the take records to WebM and transcodes through the backend. Full reference: User Guide §10.

TRAIN

TRAIN workshop for LoRA adapters with layer filtering and interval gating

Eight adapter types are available (lora, dora-rows, dora-cols, bora, and their -xs variants). Layer filtering runs through --include and --exclude with bracket-range expansion. Inference exposes runtime strength, per-LoRA interval gating within a sigma range, and a per-LoRA layer filter, and adapters stack additively. Full reference: User Guide §11 and §22.

LEARN

3D force-directed genealogy galaxy 2D lineage family tree

Every track and the relationships between them render as an interactive force-directed graph in 3D and 2D through react-force-graph and three.js, alongside a layered SVG DAG. Edges trace how a piece descended from its sources, so a remix, an inpaint, a stem split, a Chimera blend, and a Suno cover each show their parentage. Full reference: User Guide §12.

Controllers and XR

Controller recognition identifies hardware across three tiers: a library of roughly 110 device profiles, a scored auto-detect, and a learn-by-capture mode that binds a control the moment it moves. Controller Vision identifies a controller from a photo through OpenCV and a vision model. The theDAW-XR companion turns a Meta Quest 3 into a hands-only surface: hand-tracked MIDI from floating faders and knobs, passthrough video into VJ, co-located multiplayer, and a head-mounted MIDI Reactor, all over ADB. Full reference: User Guide §31 and §34.

Library and Catalogue

Disk-backed library browser with search, favorites, and inline playback Cross-provider Catalogue gallery with provider badges and inspector

The library lives on the backend, with audio on disk, metadata in data/library.db, and access over /api/library/*. Every render saves automatically with its prompt, model, duration, steps, CFG, seed, and timestamp. SUGGEST builds a continuous playlist ordered by Camelot-wheel harmony and a chosen BPM flow, then plays it through the footer queue or sends it to the DJ tab. The Catalogue view adds a cross-provider gallery with provider badges, an inspector with on-demand spectrograms, and a lineage panel. Full reference: User Guide §13 and §29.

Notation and Score

Score panel rendering guitar tablature from a track's MIDI

The Score tab turns a track's MIDI into symbolic music. MAKE SHEET converts the first MIDI to MusicXML with music21 and renders it through OpenSheetMusicDisplay. The Tabs section arranges guitar or bass tablature for a chosen tuning, capo, and difficulty through a dynamic-programming pass and renders with alphaTab. Arrange builds lead-sheet, piano-reduction, simplified, or band-score MusicXML, scores export to ABC, PDF, and SVG, and PROMPT INFERENCE derives a Stable Audio prompt from a track's analysis. Full reference: User Guide §33.

Bottom panel

16-step sequencer with five voices Piano roll with MIDI import and export
Real-time spectral analyzer SLIDE glass control surface

The spectral analyzer shows oscilloscope, spectrum, and radial modes with RMS and peak meters. The piano roll edits MIDI-style notes, imports and exports MIDI, and renders to the editor. The step sequencer runs a 16-step drum machine with five synthesized voices. The media bucket holds session audio, SLIDE presents a glass surface of faders and knobs synced with the VJ engine, and Details and Score show the selected entry. Full reference: User Guide §14 through §16.

Footer, log, and assistant

The footer stays across every tab with the current title, a status chip, transport, a seek bar, a volume slider, and a download button. The processing log is a 500-entry ring buffer with leveled, color-coded lines. The assistant orb streams chat from any configured provider, including Claude Code over the CLI, Gemini, Anthropic, OpenAI, Grok, Groq, OpenRouter, Ollama, LM Studio, llama.cpp, and vLLM, with a hashed multi-key pool, attachments, and RAG over the docs through ChromaDB. Full reference: User Guide §17, §18, and §32.


Ecosystem

theDAW is the hub of a small constellation of repositories, each with its own README and badges.

Project Repo Role
VJ-9000 VJ-9000 The WebGL audio-reactive visual engine embedded in the VJ tab and runnable standalone.
magenta-rt2-nvidia magenta-rt2-nvidia The first non-Mac port of Magenta RealTime 2, vendored at sidecars/magenta-rt2-nvidia.
theDAW-XR theDAW-XR The Meta Quest 3 spatial companion: hand-tracked MIDI, passthrough streaming, and colocation.

In-tree sidecars under sidecars/ (questcast, queststitch, magenta) and the backend modules under backend/modules/ bridge these into theDAW over /api/*.


Structure

Component Location Description
Upstream ML pipeline stable_audio_3/ DiT diffusion transformer, SAME autoencoder, all samplers, LoRA training and inference, distribution-shift schedules.
FastAPI backend backend/server.py Async HTTP wrapper running a generation job queue, FFmpeg audio processing, and model introspection on port 8600.
Backend modules backend/modules/ Plugin system. Each subdirectory provides module.json and router.py, and the loader mounts every enabled module and isolates failures. The repo ships analysis, analyzer, chimera, controllervision, convert, effects, library, midi, notation, settings, stems, storage, vj, and ytimport, the cloud and real-time engines (suno, magenta), the XR bridges (questmidi, questcast, queststitch, xrcontrol), the akvj depth pipeline, broadcast for watch-link, modeldl, and the Edit Tool Stack under /api/edit/*.
theDAW interface frontend/ React 19, Vite 7, Tailwind 4, Zustand 5. Seven workspaces (MAKE, EDIT, MIX, DJ, VJ, TRAIN, LEARN) plus the library, the Catalogue, and the live tools. The dev server on port 5173 proxies /api/* to the backend.
Sidecars sidecars/ The vendored magenta-rt2-nvidia port, the questcast and queststitch Quest bridges, and the magenta studio sidecar.

Models

Key Flavor Params Autoencoder Hardware Max Duration
small ARC 433 M SAME-S CPU 120 s
medium ARC 1.4 B SAME-L GPU (CUDA) 380 s
small-rf / medium-rf RF 433 M / 1.4 B SAME-S / SAME-L CPU / GPU 120 / 380 s
same-s / same-l Autoencoder 266 M / 1.7 B n/a CPU / GPU n/a

ARC checkpoints are post-trained for 8-step inference at cfg_scale=1. RF checkpoints are rectified-flow bases for LoRA training at cfg_scale=7 and roughly 50 steps. This table lists the primary keys; the specialized release checkpoints (small-music, small-sfx, and the medium-base / music-base / sfx-base variants) and their exact folders are catalogued in User Guide §21.2. Nothing downloads at startup; a model loads on the first generation that needs it, and the in-app Settings, then Models panel can register checkpoints already on disk.


Python API

from stable_audio_3 import StableAudioModel
pipe = StableAudioModel.from_pretrained("medium")

# Text-to-audio
audio = pipe.generate(prompt="Lo-fi boom bap meets orchestral strings, 84 BPM", duration=180)

# Audio-to-audio. init_noise_level sets how far the result departs from the source.
audio = pipe.generate(init_audio=torchaudio.load("in.wav"), init_noise_level=0.9,
                      prompt="bossa nova bassline", duration=30)

# LoRA stacks additively; runtime strength is adjustable.
pipe.load_lora("style.safetensors", weight=0.8)
audio = pipe.generate(
    prompt="...", duration=30,
    sampler_type="dpmpp_2m_sde",   # euler | rk4 | dpmpp_2m_sde | ping_pong
    apg_scale=1.0,                 # Adaptive Projected Guidance
    cfg_interval=(0.0, 1.0),       # apply CFG only within this sigma range
)

docs/workflows/lora.md covers adapter types and layer filters, and docs/workflows/autoencoder.md covers the standalone autoencoder.


Documentation

Document Contents
docs/USER_GUIDE.md The complete manual covering every feature, control, and endpoint, rendered in-app by the Docs button.
docs/guides/prompting.md Prompt structure, conditioning signals, and style reference.
docs/guides/SUNO_EXTERNAL_API.md Suno cloud-generation API reference covering modes, polling, and usage.
docs/guides/model-overview.md Architecture design and model comparison.
docs/guides/notation-and-score.md Audio to MIDI, sheet music, tabs, arrangements, and prompt inference.
docs/guides/dj-and-genealogy.md DJ console, the genealogy graph, and the watch-link broadcast.
docs/workflows/inference.md, lora.md, autoencoder.md Inference modes, LoRA adapters and training, and the standalone autoencoder.
docs/windows/setup-guide.md, troubleshooting.md Windows installation (CUDA, Flash Attention, soundfile) and fixes.

The GitHub Wiki mirrors this index in a browsable form across theDAW and its sidecars.


Automation

theDAW generates its own documentation and promo material from the live app. scripts/screenshots/ drives a real session to capture feature screenshots and a coverage report, and frontend/_capture_clips.mjs is a Playwright harness that records the running app into the feature-tour video. The in-app assistant answers from these same documents through a ChromaDB RAG index, so the docs, the video, and the assistant stay sourced from one place.


Troubleshooting

Static glitch output on the Medium model. Flash Attention is not installed correctly. Verify it with uv run python -c "from flash_attn import flash_attn_func; import flash_attn; print(flash_attn.__version__)" and reinstall a wheel matching the Python, torch, and CUDA combination from kingbri1/flash-attention.

"API UNREACHABLE" banner. The backend is not listening on port 8600. Test it with curl http://localhost:8600/api/health. On Windows, .\theDAW.bat clears stale processes automatically.

Out-of-memory on the Medium model. The small model, a shorter duration, or freeing competing CUDA processes resolves it.

User Guide §23 has the full matrix.


Credits

theDAW was built by GANTASMO as part of the Music Hackspace Music Technology Hackathon at Berklee College of Music.

Built With

Corrections and additions to this list are welcome through a GitHub issue.

Special Thanks

To Music Hackspace and Berklee College of Music for hosting the hackathon, and to Zack, CJ, Jordi, Zach, and Matt from Stability AI for their continued help and support.


Listen on Spotify Watch on YouTube Follow @gantasmo on Instagram Follow @gantasmo on X Learn more at gantasmo.com

Made by Daniel Joaquin Trujillo and Josh Valenzuela as GANTASMO.

About

All-in-one AI music studio on Stable Audio 3 and a CUDA port of Magenta RealTime. It generates audio from text, separates stems with Demucs, transcribes to MIDI and notation, edits a multitrack timeline with a real-time Web Audio FX rack and automation, masters, DJs with stem decks, runs a live VJ engine, and plays from a Quest by hand over ADB.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors