Skip to content

wavey-ai/mossnano-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mossnano-rs

Rust-first tooling for MOSS-Audio-Tokenizer-Nano RVQ artifacts, with a small WASM layer for browser playback through ONNX Runtime Web.

The crate owns the deterministic codec-adjacent work:

  • .mossnano container parsing and writing
  • 10-bit RVQ token packing and unpacking
  • metadata, duration, and bitrate accounting
  • streaming decode chunk scheduling
  • ONNX token layout conversion from [quantizer, frame] to [time, quantizer]
  • decoded PCM accumulation
  • PCM16 WAV writing
  • wasm-bindgen exports for browser and Node.js use

The neural model still runs through the official MOSS Nano ONNX graphs using onnxruntime-web. This keeps the Rust/WASM surface small and predictable while preserving the path to browser playback.

Status

This is an early experimental repo. Decode playback uses the official moss_audio_tokenizer_decode_step.onnx graph with transformer offsets and attention cache tensors carried across chunks. The cache position tensors must start at -1, matching the native model reset path.

moss_audio_tokenizer_decode_full.onnx is still useful for whole-file reference decodes. Do not reset that full graph independently for each playback chunk: that creates audible boundary artifacts and does not match native output.

Tested locally with MOSS-Audio-Tokenizer-Nano RVQ16 stereo artifacts at 48 kHz.

Streaming Decode

MOSS Nano emits one RVQ token frame per 3,840 decoded samples. At 48 kHz this is an 80 ms quantum, so second-based chunk targets must snap to whole token frames.

Target Token frames Actual duration
1.333 s 17 1.36 s
1.8 s 23 1.84 s

The WASM API exposes MossNanoDecodeStream:

const stream = new MossNanoDecodeStream(artifactBytes, 17);

while (stream.hasNext()) {
  const start = stream.nextStartFrame();
  const tokenFrames = stream.nextTokenFrames();
  const codes = stream.nextCodesTqI32();

  // Run decode_step.onnx with:
  // audio_codes: [1, tokenFrames, quantizers], int32
  // audio_code_lengths: [1], int32
  // plus the carried transformer/attention state tensors

  stream.pushDecodedPlanar(decodedPlanarF32, channels, decodedFrames);
}

const wavBytes = stream.finishPcm16Wav();

Rust handles chunk scheduling, token slicing, token transposition, decoded audio assembly, and WAV writing. JavaScript loads ONNX Runtime Web, invokes the stateful decoder graph for each chunk, and feeds every state output back into the next chunk.

Container Format

The current .mossnano container is intentionally tiny:

Bytes Field
0..8 ASCII magic MOSSNANO
8..12 sample_rate, little-endian u32
12..16 channels, little-endian u32
16..20 original_samples, little-endian u32
20..24 quantizers, little-endian u32
24..28 frames, little-endian u32
28..32 codebook_size, little-endian u32
32.. LSB-first packed RVQ codes

For MOSS Nano RVQ16, codebook_size = 1024, so each token is packed into 10 bits. The packed code order is [quantizer, frame].

Native CLI

Inspect a .mossnano artifact:

cargo run -- info path/to/file.mossnano

Unpack codes to little-endian u16 values:

cargo run -- unpack-u16le path/to/file.mossnano target/codes.u16le

Browser And Node Setup

Install Rust and Node dependencies:

rustup target add wasm32-unknown-unknown
cargo install wasm-bindgen-cli --version 0.2.121
cd web
npm install
cd ..

Build the WASM package:

scripts/build-wasm.sh

Run the Rust/WASM smoke test:

node scripts/wasm-smoke.mjs

ONNX Weights

Weights are intentionally not committed. Download the official browser-oriented ONNX bundle into weights/:

scripts/download-onnx.sh

Expected files:

  • moss_audio_tokenizer_decode_full.onnx
  • moss_audio_tokenizer_decode_step.onnx
  • moss_audio_tokenizer_decode_shared.data
  • moss_audio_tokenizer_encode.onnx
  • moss_audio_tokenizer_encode.data
  • codec_browser_onnx_meta.json

Decode-only playback needs the decoder graph and shared decoder data, about 45 MB total. Encode plus decode needs about 90 MB.

Decode From Node

Decode a .mossnano artifact with the 1.333-second target. This uses the stateful decode_step graph by default:

node scripts/decode-node.mjs \
  --input path/to/file.mossnano \
  --output target/decoded.wav \
  --chunk-seconds 1.333

Decode with the 1.8-second target:

node scripts/decode-node.mjs \
  --input path/to/file.mossnano \
  --output target/decoded-1p8.wav \
  --chunk-seconds 1.8

You can also pass exact token-frame chunks:

node scripts/decode-node.mjs --input path/to/file.mossnano --chunk-frames 23

For a whole-file reference pass through decode_full.onnx, pass the full token frame count and opt into the full decoder:

node scripts/decode-node.mjs \
  --input path/to/file.mossnano \
  --output target/decoded-full.wav \
  --chunk-frames 50 \
  --decoder full

Compare a chunked output against a reference and inspect chunk joins:

node scripts/compare-wav-boundaries.mjs \
  --reference target/decoded-full.wav \
  --candidate target/decoded.wav \
  --chunk-frames 17

Browser Prototype

Start a local server:

cd web
npm run serve

Open http://localhost:8765/web/, choose a .mossnano file, and leave the model root as:

../weights/MOSS-Audio-Tokenizer-Nano-ONNX/

The page loads the Rust WASM package, fetches the ONNX decoder graph and shared external data, decodes chunk by chunk, and creates a playable WAV blob in the browser.

Development

Run the native tests:

cargo test

Run formatting and JS syntax checks:

cargo fmt --check
node --check scripts/decode-node.mjs
node --check web/mossnano-player.js

Generated files and downloaded weights are ignored by git:

  • target/
  • weights/
  • web/node_modules/
  • web/pkg/

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors