Skip to content

Munishx01/spotify-music-intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎵 Spotify Music Intelligence Dashboard

Python Scikit-learn Pandas Seaborn License

An end-to-end Music Analytics Platform analyzing 15,000 tracks across 12 genres and 79 artists — combining advanced EDA, K-Means music archetype clustering, and Gradient Boosting popularity prediction.


🎯 Project Highlights

Metric Value
🎵 Total Tracks 15,000
🎤 Artists 79
🎶 Genres 12
📡 Total Streams 457 Billion
🤖 Clustering Accuracy 6 Music Archetypes
📈 Popularity Prediction R² 0.597
🎛️ Audio Features Analyzed 8

🗂️ Project Structure

spotify-music-intelligence/
├── 📁 data/
│   └── spotify_tracks.csv          # 15K tracks with 22 features
├── 📁 src/
│   ├── generate_data.py            # Realistic music dataset generator
│   ├── eda_viz.py                  # 4 professional EDA dashboards
│   └── ml_pipeline.py             # K-Means + GBM + PCA pipeline
├── 📁 outputs/
│   ├── 📁 figures/
│   │   ├── 01_music_intelligence_dashboard.png
│   │   ├── 02_audio_features_deepdive.png
│   │   ├── 03_popularity_intelligence.png
│   │   ├── 04_genre_evolution.png
│   │   └── 05_ml_clustering_dashboard.png
│   └── 📁 models/
│       ├── popularity_gbm.pkl
│       ├── kmeans_archetypes.pkl
│       └── scaler.pkl
├── requirements.txt
└── README.md

🚀 Quick Start

git clone https://github.com/Munishx01/spotify-music-intelligence.git
cd spotify-music-intelligence
pip install -r requirements.txt

python src/generate_data.py   # Generate dataset
python src/eda_viz.py         # Run EDA visualizations
python src/ml_pipeline.py     # Train ML models

🧠 ML Pipeline

1. K-Means Music Archetype Clustering

Clusters 15,000 tracks into 6 distinct music archetypes based on 8 audio features using the Elbow Method for optimal k selection.

Archetype Description Key Features
🔥 Club Bangers High energy, danceable Energy>0.75, Dance>0.72
🎸 Dark Intensity Aggressive, intense Energy>0.80, Valence<0.45
🎻 Acoustic Soul Organic, unplugged Acousticness>0.60
☀️ Feel Good Vibes Positive, relaxed Valence>0.65, Energy<0.55
🎤 Rhythm & Flow Groove-focused Dance>0.70, Tempo>120
🌙 Mellow Groove Mid-tempo, calm Balanced features

2. Popularity Prediction Models

Model R² Score MAE
Linear Regression 0.439 10.99
Random Forest 0.583 9.34
Gradient Boosting 0.597 9.18

3. Dimensionality Reduction (PCA)

PCA reduces 8 audio dimensions to 2 components for cluster visualization, explaining ~68% of variance.


📊 Dashboard Previews

🎵 Music Intelligence Dashboard

Dashboard 1

🎛️ Audio Features Deep Dive

Dashboard 2

🔥 Popularity Intelligence

Dashboard 3

📈 Genre Evolution (2015–2024)

Dashboard 4

🤖 ML Clustering Dashboard

Dashboard 5


💡 Key Insights

Finding Impact
Danceability is the #1 driver of popularity High
EDM has highest energy (0.88 avg) of all genres Medium
K-Pop leads in danceability + valence combo Medium
Classical streams spike in Oct–Dec (holiday season) Medium
Hip-Hop dominates streams despite 15% genre share High
Music is trending louder & more danceable year-over-year High
Club Bangers archetype has 2.3× more streams than Acoustic Soul High

🎛️ Audio Features Explained

Feature Range Description
Danceability 0.0–1.0 How suitable for dancing
Energy 0.0–1.0 Intensity and activity level
Valence 0.0–1.0 Musical positivity
Acousticness 0.0–1.0 Acoustic vs electronic
Speechiness 0.0–1.0 Presence of spoken words
Liveness 0.0–1.0 Live audience detection
Tempo 50–210 BPM Track speed
Loudness -20 to -3 dB Overall loudness

🛠️ Tech Stack

Python 3.10 Pandas NumPy Scikit-learn Matplotlib Seaborn
K-Means Clustering PCA Gradient Boosting Random Forest EDA


👤 Author

Munish Kumar — Data Analyst | Python | SQL | Machine Learning
📧 mk611453@gmail.com | 📍 Palampur, Himachal Pradesh
LinkedIn GitHub


"Music is data. Data tells stories. Let the music speak." 🎵

About

End-to-end Music Analytics Platform — 15K tracks, 12 genres, K-Means archetype clustering, Gradient Boosting popularity prediction, 5 Spotify-themed dashboards | Python | Scikit-learn

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages