A production-grade music recommendation engine built on the Spotify Tracks Dataset from Kaggle, featuring multiple recommendation strategies and a full evaluation framework.
spotify-tracks-dataset/
βββ configs/
β βββ config.yaml # All settings in one place
βββ data/
β βββ raw/
β β βββ spotify_tracks.csv # Kaggle dataset (place here)
β βββ processed/ # Auto-generated after training
βββ models/ # Saved recommender model
βββ notebooks/ # Jupyter exploration notebooks
βββ outputs/ # EDA plots, evaluation charts, CSV exports
βββ src/
β βββ data_info.py # EDA & visualizations
β βββ load_data.py # Preprocessing pipeline
β βββ recommender.py # Core recommendation engine
β βββ evaluate.py # Evaluation metrics & reports
βββ main.py # CLI entry point
uv sync
uv run python main.py --helppip install -r requirement.txt
python main.py --help- Download from Kaggle: https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset
- Place the CSV at:
data/raw/spotify_tracks.csv
python main.py --mode edaGenerates plots in outputs/:
- Genre distribution
- Audio feature histograms
- Correlation heatmap
- Popularity analysis
- Top artists
python main.py --mode train- Cleans and preprocesses data
- Engineers new features (vibe_index, mood_index, etc.)
- Fits KNN + KMeans models
- Saves model to
models/recommender.pkl
Content-based (default β cosine similarity):
python main.py --mode recommend --track "Blinding Lights" --n 10KNN-based (faster for large datasets):
python main.py --mode recommend --track "Shape of You" --n 10 --method knnCluster-based (same musical neighborhood):
python main.py --mode recommend --track "Levitating" --n 10 --method clusterSame genre only:
python main.py --mode recommend --track "Blinding Lights" --n 10 --same_genreExclude same artist (more diverse):
python main.py --mode recommend --track "Blinding Lights" --n 10 --exclude_artistSave results to CSV:
python main.py --mode recommend --track "Blinding Lights" --n 10 --save_outputpython main.py --mode mood --mood happy --n 10
python main.py --mode mood --mood energetic --n 15
python main.py --mode mood --mood calm --genre "acoustic" --n 10Available moods: happy, sad, energetic, calm, party, focus, romantic, aggressive
python main.py --mode playlist --seeds "Blinding Lights,Shape of You,Levitating" --n_per_seed 5
python main.py --mode playlist --seeds "Bohemian Rhapsody,Hotel California" --n_per_seed 8 --save_output# Single track evaluation
python main.py --mode evaluate --track "Blinding Lights"
# Batch evaluation across multiple tracks
python main.py --mode evaluate --batch "Blinding Lights,Shape of You,Levitating,Stay"Metrics reported:
- Intra-list similarity β diversity of recommendations
- Genre coverage β genre entropy
- Popularity stats β mainstream vs niche balance
- Serendipity score β unexpectedness
- Feature drift β how far recs stray from seed
python main.py --mode search --query "blinding"python main.py --mode info --track "Blinding Lights"Beyond raw Spotify audio features, we compute:
| Feature | Formula | Meaning |
|---|---|---|
vibe_index |
(energy + danceability) / 2 | Overall vibe |
mood_index |
valence Γ energy | Emotional energy |
acoustic_electric |
acousticness β energy | Acoustic spectrum |
tempo_bucket |
bucketed tempo | Tempo category |
popularity_tier |
bucketed popularity | Mainstream level |
| Method | Description | Best For |
|---|---|---|
| Content-Based | Cosine similarity on scaled audio features | Default |
| KNN | sklearn NearestNeighbors (brute, cosine) | Speed on large data |
| Cluster | Same KMeans cluster + cosine ranking | Musical neighborhood |
| Mood-Based | Feature range filters + popularity sort | Discovery |
| Playlist | Multi-seed aggregation + deduplication | Session planning |
All methods support optional popularity boost β a weighted blend of similarity score and track popularity to surface well-known similar tracks.
Key settings you can tune:
recommendation:
default_n_recommendations: 10
popularity_boost: true
popularity_weight: 0.15 # 0 = pure similarity, 1 = pure popularity
model:
knn:
n_neighbors: 20
clustering:
n_clusters: 20
preprocessing:
scaler: "minmax" # or "standard"============================================================
EVALUATION REPORT: 'Blinding Lights' (content)
============================================================
Recommendations : 10
Intra-list Similarity: 0.9241 (lower = more diverse)
Serendipity Score : 0.3120
Genre Coverage:
Unique Genres : 3
Genre Entropy : 1.5849
Popularity Stats:
Mean : 71.4
Mainstream (β₯60): 80.0%
Feature Drift from Seed:
Mean Similarity: 0.9241
Mean Distance : 0.0759
============================================================
- Collaborative filtering (user-track matrix)
- Transformer-based track embeddings
- FastAPI / Streamlit web interface
- Spotify API integration (live track lookup)
- User session personalization