Audify — ML Music Recommender

Audify is a machine learning-powered music recommendation platform built to discover Hindi songs that match the vibe, theme, and style of a user's search. It combines a content-based filtering model with a TF-IDF search engine, served through a Flask REST API and a Next.js frontend.

Live Demo: audify-music-oh6t.onrender.com

Preview

Landing Page — Search Interface

Clean search interface with animated beams background and Space Grotesk typography.

Home Page — Song Discovery

The home page displays the 50 most recent songs with album artwork, ready to explore and recommend.

Features

Smart Search — Search by song name, artist, genre, or mood using TF-IDF similarity
Content-Based Recommendations — Click any song to get 50 similar songs powered by cosine similarity on metadata feature vectors
Animated UI — Beams background animation with Space Grotesk typography
REST API — Lightweight Flask backend serving recommendations in milliseconds
Static Export — Next.js frontend compiled to static files and served directly by Flask, eliminating the need for a separate frontend server

Tech Stack

Layer	Technology
Frontend	Next.js 14, React 18, Tailwind CSS, Framer Motion
Backend	Python 3.11, Flask 3.0, Gunicorn
ML Model	Scikit-learn, CountVectorizer, TF-IDF Vectorizer, Cosine Similarity
Data Processing	Pandas, NumPy
Deployment	Render

ML Pipeline

The recommendation engine is built entirely on content-based filtering using song metadata. There is no collaborative filtering or user history involved — recommendations are derived purely from the intrinsic properties of each song.

1. Data Collection

Song data was scraped from public music databases and stored as a CSV file. Each record contains the following fields:

Field	Description
`title`	Song title
`singers`	List of vocalists
`directors`	List of music directors and composers
`lyricist`	List of lyric writers
`genre`	List of genre tags (e.g. Filmi, Pop, Hip-hop)
`album`	Album or single name
`download_link`	YouTube URL
`poster`	Album art URL
`year`	Release year

2. Data Cleaning

The raw dataset of 5610 songs is cleaned through a series of filters:

Songs with variants in their title (reprise, remix, unplugged, instrumental, cover, etc.) are removed to avoid duplicate entries
Rows with missing values in critical fields (title, singers, download link, year, poster) are dropped
Duplicate entries based on YouTube URL are removed
After cleaning, the dataset contains 3883 unique songs spanning 2014 to 2023

3. Feature Engineering — Recommendation Model

Each song is converted into a single text string called a tag by concatenating its metadata fields. Artist and director names are joined into single tokens (e.g. "Arijit Singh" becomes "arijitsingh") to prevent partial word matches from inflating similarity scores.

Example tag for a song:

Tum Hi Ho arijitsingh mohitchauhan pritamchakraborty sandeepnath filmi romantic Aashiqui 2

These tags are vectorized using CountVectorizer with a vocabulary of up to 10,000 features and English stop words removed. The resulting sparse matrix has shape (3883, 8120).

4. Cosine Similarity Matrix

Cosine similarity is computed across all song vectors, producing a (3883, 3883) similarity matrix. For each song, the top 50 most similar songs (excluding itself) are precomputed and saved as sorted index arrays in similarity.npy. This precomputation ensures that at inference time, recommendations are served in O(1) time — a single array lookup with no on-demand computation.

5. Search Engine — TF-IDF

The search feature uses a separate pipeline from the recommendation model. Each song has a search tag that includes the original, non-tokenized names of the title, singers, directors, lyricists, genre, album, and year, making it human-readable and query-friendly.

These tags are saved in search_similarity.npy and loaded at startup. When a user submits a query, it is transformed using the same TfidfVectorizer that was fit on all song tags. Cosine similarity is then computed between the query vector and all song vectors, and the top 50 results are returned ranked by relevance.

This means searches such as "Arijit Singh romantic 2020" or "Filmi sad" return contextually relevant results rather than requiring exact title matches.

6. Homepage Songs

The homepage displays the 50 most recent songs in the dataset, sorted by year in descending order, giving users a starting point to explore the recommendation engine.

System Architecture

User Browser
     |
     | HTTP Request
     v
Flask Server (Gunicorn)
     |
     |-- /api/search/           --> Returns top 50 recent songs
     |-- /api/search/<query>    --> TF-IDF search over song tags
     |-- /api/recommend/<id>    --> Cosine similarity index lookup
     |-- /                      --> Serves Next.js static build (./out/)
     |
     |-- songs.pkl              --> Song metadata (title, link, year)
     |-- similarity.npy         --> Precomputed top-50 similarity indices
     |-- search_similarity.npy  --> TF-IDF search tags array

The entire application runs as a single Python process. Flask serves both the API and the static Next.js frontend from the ./out/ directory, which is generated during the build step. This single-server architecture simplifies deployment significantly — no separate Node.js server or CDN is required.

Project Structure

Audify/
├── app/                         # Next.js app directory
│   ├── layout.js                # Root layout with fonts and metadata
│   ├── page.js                  # Entry page
│   └── globals.css              # Global styles
├── components/                  # React components
│   ├── BeamsBackground.jsx      # Canvas-based animated beams background
│   ├── Home.jsx                 # Main page component
│   ├── MusicSection.jsx         # Song cards grid
│   ├── Headers/
│   │   └── SearchBar.jsx        # Search input with loading bar
│   └── Footers/
│       └── Label.jsx            # Footer with author credit
├── learning/                    # ML pipeline
│   ├── music_recommender.ipynb  # Training and preprocessing notebook
│   ├── songs.pkl                # Cleaned song dataset (3883 records)
│   ├── similarity.npy           # Precomputed top-50 similarity indices
│   └── search_similarity.npy   # TF-IDF search tags array
├── scraping/                    # Data scraping scripts and raw CSV
├── helper/                      # Utility functions
├── app.py                       # Flask API server
├── wsgi.py                      # WSGI entry point
├── requirements.txt             # Python dependencies
├── package.json                 # Node dependencies
├── next.config.mjs              # Next.js configuration (static export)
├── render.yaml                  # Render deployment configuration
└── Procfile                     # Process definition for deployment

Running Locally

Prerequisites

Python 3.11 or higher
Node.js 18 or higher

Steps

# Clone the repository
git clone https://github.com/parinith-web/Audify.git
cd Audify

# Install Python dependencies
pip install -r requirements.txt

# Install Node dependencies and build the frontend
npm install
npm run build

# Start the server
python app.py

Visit http://localhost:8080 in your browser. Flask serves both the API and the compiled Next.js frontend from the same process.

API Endpoints

Method	Endpoint	Description
GET	`/api/search/`	Returns the 50 most recent songs for the homepage
GET	`/api/search/<query>`	Searches songs by query using TF-IDF cosine similarity
GET	`/api/recommend/<song_id>`	Returns the top 50 most similar songs for a given song index

All endpoints return a JSON array of song objects with the following structure:

[
  {
    "index": 142,
    "title": "Tum Hi Ho",
    "download_link": "https://www.youtube.com/watch?v=...",
    "year": 2013
  }
]

Dataset

3883 Hindi songs spanning 2014 to 2023
Original dataset: 5610 songs before cleaning
Fields: title, singers, music directors, lyricists, genre, album, YouTube link, album art, year
Collected via web scraping from public music databases
Variants such as remixes, reprises, and instrumentals are excluded to maintain data quality

Author

Built by Parinith Reddy

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
app		app
components		components
helper		helper
learning		learning
scraping		scraping
.env		.env
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
app.py		app.py
index.py		index.py
jsconfig.json		jsconfig.json
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
render.yaml		render.yaml
requirements.txt		requirements.txt
runtime.txt		runtime.txt
screenshot1.png		screenshot1.png
screenshot2.png		screenshot2.png
tailwind.config.js		tailwind.config.js
vercel.json		vercel.json
wsgi.py		wsgi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audify — ML Music Recommender

Table of Contents

Preview

Landing Page — Search Interface

Home Page — Song Discovery

Features

Tech Stack

ML Pipeline

1. Data Collection

2. Data Cleaning

3. Feature Engineering — Recommendation Model

4. Cosine Similarity Matrix

5. Search Engine — TF-IDF

6. Homepage Songs

System Architecture

Project Structure

Running Locally

Prerequisites

Steps

API Endpoints

Dataset

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Audify — ML Music Recommender

Table of Contents

Preview

Landing Page — Search Interface

Home Page — Song Discovery

Features

Tech Stack

ML Pipeline

1. Data Collection

2. Data Cleaning

3. Feature Engineering — Recommendation Model

4. Cosine Similarity Matrix

5. Search Engine — TF-IDF

6. Homepage Songs

System Architecture

Project Structure

Running Locally

Prerequisites

Steps

API Endpoints

Dataset

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages