Skip to content

timyjsong/throughline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎧 Throughline — music discovery by sound, not by tags

A from-scratch music discovery engine that finds songs by how they actually sound — using self-supervised audio embeddings and nearest-neighbor search — with an interactive, steerable personalization loop. Built to scratch a real itch: the kind of discovery streaming apps stopped giving me.


What it does

  • Sonic similarity, not metadata. Every track is embedded into a vector by a music foundation model, so "similar" means it actually sounds alike — independent of genre tags, popularity, or who-else-listened-to-it. It surfaces the obscure deep cut that fits, not just the popular thing everyone already clicks.
  • Multi-seed "throughline" search. Give it several songs you love and it finds the shared vibe — the centroid of their embeddings — the un-nameable thread between them. (This was a feature Spotify offered, removed, and a lot of people missed.)
  • Steerable, transparent personalization. Thumb tracks up/down and the list re-ranks live against a model of your taste, with explicit knobs for pocket breadth and explore vs. exploit, plus a "why this track" signal for every result. Discovery you drive, not a black-box feed.

Demo

Discovery results Live personalization
discovery results personalization app

How it works

iTunes 30s previews ──► decode (PyAV) ──► embed: MusicFM ──► L2-normalized vectors
                                                                      │
                          results ◄── MMR diversity + dedup ◄── cosine NN / multi-seed centroid

The discovery engine builds a dense corpus from free iTunes previews, decodes each to a waveform, and embeds it with MusicFM (a self-supervised music foundation model) into a 1024-d vector. Search is cosine nearest-neighbor — single-seed, or the centroid of several seeds for the multi-seed throughline — with MMR for intra-list diversity and embedding-space de-duplication of re-releases/remasters.

The personalization layer is a stateless FastAPI service over the cached embeddings (CPU-only — no model at serve time): an anchor-set model of the user's taste (seeds + thumbed tracks, recency-decayed), soft-top-k relevance scoring, and MMR-driven exploration so the loop discovers instead of collapsing into an echo chamber.

The engineering, not just the code

This began as a spike to test one question — does audio-embedding similarity actually feel right on real taste? — and the most valuable work turned out to be the decisions, documented as I went:

  • Model selection — A/B-tested four embedding models (LAION-CLAP, MuQ, MERT, MusicFM) by ear on a real corpus. Music-specialist models clearly beat the generic audio-text model; I landed on MusicFM — self-hostable, 1024-d, and a match for the best by ear. → docs/model-selection.md
  • License & data-provenance due diligence — most strong music models ship non-commercial weights, or train on non-commercial data, even when their code is permissive. I traced the weights and training-data licenses across the whole landscape so the constraints were explicit rather than assumed. → docs/licensing.md
  • Product validation — before over-building, I ran market-demand and competitive analysis. The honest finding: demand is real but niche, the space is a graveyard for indie consumer apps, and the incumbent is moving into the exact wedge — so I deliberately scoped this as a research / portfolio project instead of chasing a commercial build. Knowing when not to build is part of the engineering.docs/market-validation.md

Personalization: design and honest limits

The taste model is an anchor set (seeds + thumbed-up tracks, with recency decay), not a single drifting centroid; candidates are scored by soft-top-k similarity to that set, then MMR keeps the list diverse. Two knobs separate the axes that a single "more like this" slider conflates: pocket breadth (one tight pocket ↔ all your liked pockets) and explore/exploit (familiar ↔ novel). Thumbs-down hard-excludes the exact track.

The honest ceiling: feedback can only personalize within what the audio embedding expresses. If part of why you love a song isn't sonic, no amount of content-based feedback reaches it — that needs lyrics/metadata or collaborative filtering (which needs a crowd a solo project doesn't have). The spike made that limit visible, which is the point of a spike.

Tech

Python · NumPy · PyTorch (MusicFM inference) · FastAPI · vanilla JS · PyAV · iTunes Search API · per-model embedding cache + cosine NN over a ~16k-track corpus.

Running it

# setup (GPU box for embedding; the web app itself is CPU-only)
./setup.ps1
# build the corpus + embeddings
./run.ps1
# launch the interactive personalization app -> http://localhost:8000
./run_app.ps1

See docs/running.md for details (Windows/CUDA notes, model swap, knobs).

What I'd do next

  • A quick embedding-layer sweep (we used MusicFM layer 7 by default) and multi-window pooling per track.
  • A larger, denser catalog and an ANN index (FAISS / pgvector) to scale past the in-memory matrix.
  • A learned global re-ranking head on top of the frozen backbone — trainable once there's aggregate feedback — as the bridge between the content-only ceiling and real personalization.

A research / portfolio project. Uses third-party models under their respective licenses; not a commercial product. Built by Tim Song — my first end-to-end ML system, as part of moving from backend into AI engineering.

About

Music discovery by sound: self-supervised audio embeddings, multi-seed search, and a steerable personalization loop.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors