🎵 Spotify Music Intelligence Dashboard

An end-to-end Music Analytics Platform analyzing 15,000 tracks across 12 genres and 79 artists — combining advanced EDA, K-Means music archetype clustering, and Gradient Boosting popularity prediction.

🎯 Project Highlights

Metric	Value
🎵 Total Tracks	15,000
🎤 Artists	79
🎶 Genres	12
📡 Total Streams	457 Billion
🤖 Clustering Accuracy	6 Music Archetypes
📈 Popularity Prediction R²	0.597
🎛️ Audio Features Analyzed	8

🗂️ Project Structure

spotify-music-intelligence/
├── 📁 data/
│   └── spotify_tracks.csv          # 15K tracks with 22 features
├── 📁 src/
│   ├── generate_data.py            # Realistic music dataset generator
│   ├── eda_viz.py                  # 4 professional EDA dashboards
│   └── ml_pipeline.py             # K-Means + GBM + PCA pipeline
├── 📁 outputs/
│   ├── 📁 figures/
│   │   ├── 01_music_intelligence_dashboard.png
│   │   ├── 02_audio_features_deepdive.png
│   │   ├── 03_popularity_intelligence.png
│   │   ├── 04_genre_evolution.png
│   │   └── 05_ml_clustering_dashboard.png
│   └── 📁 models/
│       ├── popularity_gbm.pkl
│       ├── kmeans_archetypes.pkl
│       └── scaler.pkl
├── requirements.txt
└── README.md

🚀 Quick Start

git clone https://github.com/Munishx01/spotify-music-intelligence.git
cd spotify-music-intelligence
pip install -r requirements.txt

python src/generate_data.py   # Generate dataset
python src/eda_viz.py         # Run EDA visualizations
python src/ml_pipeline.py     # Train ML models

🧠 ML Pipeline

1. K-Means Music Archetype Clustering

Clusters 15,000 tracks into 6 distinct music archetypes based on 8 audio features using the Elbow Method for optimal k selection.

Archetype	Description	Key Features
🔥 Club Bangers	High energy, danceable	Energy>0.75, Dance>0.72
🎸 Dark Intensity	Aggressive, intense	Energy>0.80, Valence<0.45
🎻 Acoustic Soul	Organic, unplugged	Acousticness>0.60
☀️ Feel Good Vibes	Positive, relaxed	Valence>0.65, Energy<0.55
🎤 Rhythm & Flow	Groove-focused	Dance>0.70, Tempo>120
🌙 Mellow Groove	Mid-tempo, calm	Balanced features

2. Popularity Prediction Models

Model	R² Score	MAE
Linear Regression	0.439	10.99
Random Forest	0.583	9.34
Gradient Boosting	0.597	9.18

3. Dimensionality Reduction (PCA)

PCA reduces 8 audio dimensions to 2 components for cluster visualization, explaining ~68% of variance.

📊 Dashboard Previews

🎵 Music Intelligence Dashboard

🎛️ Audio Features Deep Dive

🔥 Popularity Intelligence

📈 Genre Evolution (2015–2024)

🤖 ML Clustering Dashboard

💡 Key Insights

Finding	Impact
Danceability is the #1 driver of popularity	High
EDM has highest energy (0.88 avg) of all genres	Medium
K-Pop leads in danceability + valence combo	Medium
Classical streams spike in Oct–Dec (holiday season)	Medium
Hip-Hop dominates streams despite 15% genre share	High
Music is trending louder & more danceable year-over-year	High
Club Bangers archetype has 2.3× more streams than Acoustic Soul	High

🎛️ Audio Features Explained

Feature	Range	Description
Danceability	0.0–1.0	How suitable for dancing
Energy	0.0–1.0	Intensity and activity level
Valence	0.0–1.0	Musical positivity
Acousticness	0.0–1.0	Acoustic vs electronic
Speechiness	0.0–1.0	Presence of spoken words
Liveness	0.0–1.0	Live audience detection
Tempo	50–210 BPM	Track speed
Loudness	-20 to -3 dB	Overall loudness

🛠️ Tech Stack

Python 3.10 Pandas NumPy Scikit-learn Matplotlib Seaborn
K-Means Clustering PCA Gradient Boosting Random Forest EDA

👤 Author

Munish Kumar — Data Analyst | Python | SQL | Machine Learning
📧 mk611453@gmail.com | 📍 Palampur, Himachal Pradesh

"Music is data. Data tells stories. Let the music speak." 🎵

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎵 Spotify Music Intelligence Dashboard

🎯 Project Highlights

🗂️ Project Structure

🚀 Quick Start