🎧 Throughline — music discovery by sound, not by tags

A from-scratch music discovery engine that finds songs by how they actually sound — using self-supervised audio embeddings and nearest-neighbor search — with an interactive, steerable personalization loop. Built to scratch a real itch: the kind of discovery streaming apps stopped giving me.

What it does

Sonic similarity, not metadata. Every track is embedded into a vector by a music foundation model, so "similar" means it actually sounds alike — independent of genre tags, popularity, or who-else-listened-to-it. It surfaces the obscure deep cut that fits, not just the popular thing everyone already clicks.
Multi-seed "throughline" search. Give it several songs you love and it finds the shared vibe — the centroid of their embeddings — the un-nameable thread between them. (This was a feature Spotify offered, removed, and a lot of people missed.)
Steerable, transparent personalization. Thumb tracks up/down and the list re-ranks live against a model of your taste, with explicit knobs for pocket breadth and explore vs. exploit, plus a "why this track" signal for every result. Discovery you drive, not a black-box feed.

Demo

Discovery results	Live personalization

How it works

iTunes 30s previews ──► decode (PyAV) ──► embed: MusicFM ──► L2-normalized vectors
                                                                      │
                          results ◄── MMR diversity + dedup ◄── cosine NN / multi-seed centroid

The discovery engine builds a dense corpus from free iTunes previews, decodes each to a waveform, and embeds it with MusicFM (a self-supervised music foundation model) into a 1024-d vector. Search is cosine nearest-neighbor — single-seed, or the centroid of several seeds for the multi-seed throughline — with MMR for intra-list diversity and embedding-space de-duplication of re-releases/remasters.

The personalization layer is a stateless FastAPI service over the cached embeddings (CPU-only — no model at serve time): an anchor-set model of the user's taste (seeds + thumbed tracks, recency-decayed), soft-top-k relevance scoring, and MMR-driven exploration so the loop discovers instead of collapsing into an echo chamber.

The engineering, not just the code

This began as a spike to test one question — does audio-embedding similarity actually feel right on real taste? — and the most valuable work turned out to be the decisions, documented as I went:

Model selection — A/B-tested four embedding models (LAION-CLAP, MuQ, MERT, MusicFM) by ear on a real corpus. Music-specialist models clearly beat the generic audio-text model; I landed on MusicFM — self-hostable, 1024-d, and a match for the best by ear. → docs/model-selection.md
License & data-provenance due diligence — most strong music models ship non-commercial weights, or train on non-commercial data, even when their code is permissive. I traced the weights and training-data licenses across the whole landscape so the constraints were explicit rather than assumed. → docs/licensing.md
Product validation — before over-building, I ran market-demand and competitive analysis. The honest finding: demand is real but niche, the space is a graveyard for indie consumer apps, and the incumbent is moving into the exact wedge — so I deliberately scoped this as a research / portfolio project instead of chasing a commercial build. Knowing when not to build is part of the engineering. → docs/market-validation.md

Personalization: design and honest limits

The taste model is an anchor set (seeds + thumbed-up tracks, with recency decay), not a single drifting centroid; candidates are scored by soft-top-k similarity to that set, then MMR keeps the list diverse. Two knobs separate the axes that a single "more like this" slider conflates: pocket breadth (one tight pocket ↔ all your liked pockets) and explore/exploit (familiar ↔ novel). Thumbs-down hard-excludes the exact track.

The honest ceiling: feedback can only personalize within what the audio embedding expresses. If part of why you love a song isn't sonic, no amount of content-based feedback reaches it — that needs lyrics/metadata or collaborative filtering (which needs a crowd a solo project doesn't have). The spike made that limit visible, which is the point of a spike.

Tech

Python · NumPy · PyTorch (MusicFM inference) · FastAPI · vanilla JS · PyAV · iTunes Search API · per-model embedding cache + cosine NN over a ~16k-track corpus.

Running it

# setup (GPU box for embedding; the web app itself is CPU-only)
./setup.ps1
# build the corpus + embeddings
./run.ps1
# launch the interactive personalization app -> http://localhost:8000
./run_app.ps1

See docs/running.md for details (Windows/CUDA notes, model swap, knobs).

What I'd do next

A quick embedding-layer sweep (we used MusicFM layer 7 by default) and multi-window pooling per track.
A larger, denser catalog and an ANN index (FAISS / pgvector) to scale past the in-memory matrix.
A learned global re-ranking head on top of the frozen backbone — trainable once there's aggregate feedback — as the bridge between the content-only ceiling and real personalization.

A research / portfolio project. Uses third-party models under their respective licenses; not a commercial product. Built by Tim Song — my first end-to-end ML system, as part of moving from backend into AI engineering.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run.ps1		run.ps1
run_app.ps1		run_app.ps1
serve.py		serve.py
setup.ps1		setup.ps1
spike.py		spike.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎧 Throughline — music discovery by sound, not by tags

What it does

Demo

How it works

The engineering, not just the code

Personalization: design and honest limits

Tech

Running it

What I'd do next

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎧 Throughline — music discovery by sound, not by tags

What it does

Demo

How it works

The engineering, not just the code

Personalization: design and honest limits

Tech

Running it

What I'd do next

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages