Audify is a machine learning-powered music recommendation platform built to discover Hindi songs that match the vibe, theme, and style of a user's search. It combines a content-based filtering model with a TF-IDF search engine, served through a Flask REST API and a Next.js frontend.
Live Demo: audify-music-oh6t.onrender.com
- Preview
- Features
- Tech Stack
- ML Pipeline
- System Architecture
- Project Structure
- Running Locally
- API Endpoints
- Dataset
- Author
- License
Clean search interface with animated beams background and Space Grotesk typography.
The home page displays the 50 most recent songs with album artwork, ready to explore and recommend.
- Smart Search — Search by song name, artist, genre, or mood using TF-IDF similarity
- Content-Based Recommendations — Click any song to get 50 similar songs powered by cosine similarity on metadata feature vectors
- Animated UI — Beams background animation with Space Grotesk typography
- REST API — Lightweight Flask backend serving recommendations in milliseconds
- Static Export — Next.js frontend compiled to static files and served directly by Flask, eliminating the need for a separate frontend server
| Layer | Technology |
|---|---|
| Frontend | Next.js 14, React 18, Tailwind CSS, Framer Motion |
| Backend | Python 3.11, Flask 3.0, Gunicorn |
| ML Model | Scikit-learn, CountVectorizer, TF-IDF Vectorizer, Cosine Similarity |
| Data Processing | Pandas, NumPy |
| Deployment | Render |
The recommendation engine is built entirely on content-based filtering using song metadata. There is no collaborative filtering or user history involved — recommendations are derived purely from the intrinsic properties of each song.
Song data was scraped from public music databases and stored as a CSV file. Each record contains the following fields:
| Field | Description |
|---|---|
title |
Song title |
singers |
List of vocalists |
directors |
List of music directors and composers |
lyricist |
List of lyric writers |
genre |
List of genre tags (e.g. Filmi, Pop, Hip-hop) |
album |
Album or single name |
download_link |
YouTube URL |
poster |
Album art URL |
year |
Release year |
The raw dataset of 5610 songs is cleaned through a series of filters:
- Songs with variants in their title (reprise, remix, unplugged, instrumental, cover, etc.) are removed to avoid duplicate entries
- Rows with missing values in critical fields (title, singers, download link, year, poster) are dropped
- Duplicate entries based on YouTube URL are removed
- After cleaning, the dataset contains 3883 unique songs spanning 2014 to 2023
Each song is converted into a single text string called a tag by concatenating its metadata fields. Artist and director names are joined into single tokens (e.g. "Arijit Singh" becomes "arijitsingh") to prevent partial word matches from inflating similarity scores.
Example tag for a song:
Tum Hi Ho arijitsingh mohitchauhan pritamchakraborty sandeepnath filmi romantic Aashiqui 2
These tags are vectorized using CountVectorizer with a vocabulary of up to 10,000 features and English stop words removed. The resulting sparse matrix has shape (3883, 8120).
Cosine similarity is computed across all song vectors, producing a (3883, 3883) similarity matrix. For each song, the top 50 most similar songs (excluding itself) are precomputed and saved as sorted index arrays in similarity.npy. This precomputation ensures that at inference time, recommendations are served in O(1) time — a single array lookup with no on-demand computation.
The search feature uses a separate pipeline from the recommendation model. Each song has a search tag that includes the original, non-tokenized names of the title, singers, directors, lyricists, genre, album, and year, making it human-readable and query-friendly.
These tags are saved in search_similarity.npy and loaded at startup. When a user submits a query, it is transformed using the same TfidfVectorizer that was fit on all song tags. Cosine similarity is then computed between the query vector and all song vectors, and the top 50 results are returned ranked by relevance.
This means searches such as "Arijit Singh romantic 2020" or "Filmi sad" return contextually relevant results rather than requiring exact title matches.
The homepage displays the 50 most recent songs in the dataset, sorted by year in descending order, giving users a starting point to explore the recommendation engine.
User Browser
|
| HTTP Request
v
Flask Server (Gunicorn)
|
|-- /api/search/ --> Returns top 50 recent songs
|-- /api/search/<query> --> TF-IDF search over song tags
|-- /api/recommend/<id> --> Cosine similarity index lookup
|-- / --> Serves Next.js static build (./out/)
|
|-- songs.pkl --> Song metadata (title, link, year)
|-- similarity.npy --> Precomputed top-50 similarity indices
|-- search_similarity.npy --> TF-IDF search tags array
The entire application runs as a single Python process. Flask serves both the API and the static Next.js frontend from the ./out/ directory, which is generated during the build step. This single-server architecture simplifies deployment significantly — no separate Node.js server or CDN is required.
Audify/
├── app/ # Next.js app directory
│ ├── layout.js # Root layout with fonts and metadata
│ ├── page.js # Entry page
│ └── globals.css # Global styles
├── components/ # React components
│ ├── BeamsBackground.jsx # Canvas-based animated beams background
│ ├── Home.jsx # Main page component
│ ├── MusicSection.jsx # Song cards grid
│ ├── Headers/
│ │ └── SearchBar.jsx # Search input with loading bar
│ └── Footers/
│ └── Label.jsx # Footer with author credit
├── learning/ # ML pipeline
│ ├── music_recommender.ipynb # Training and preprocessing notebook
│ ├── songs.pkl # Cleaned song dataset (3883 records)
│ ├── similarity.npy # Precomputed top-50 similarity indices
│ └── search_similarity.npy # TF-IDF search tags array
├── scraping/ # Data scraping scripts and raw CSV
├── helper/ # Utility functions
├── app.py # Flask API server
├── wsgi.py # WSGI entry point
├── requirements.txt # Python dependencies
├── package.json # Node dependencies
├── next.config.mjs # Next.js configuration (static export)
├── render.yaml # Render deployment configuration
└── Procfile # Process definition for deployment
- Python 3.11 or higher
- Node.js 18 or higher
# Clone the repository
git clone https://github.com/parinith-web/Audify.git
cd Audify
# Install Python dependencies
pip install -r requirements.txt
# Install Node dependencies and build the frontend
npm install
npm run build
# Start the server
python app.pyVisit http://localhost:8080 in your browser. Flask serves both the API and the compiled Next.js frontend from the same process.
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/search/ |
Returns the 50 most recent songs for the homepage |
| GET | /api/search/<query> |
Searches songs by query using TF-IDF cosine similarity |
| GET | /api/recommend/<song_id> |
Returns the top 50 most similar songs for a given song index |
All endpoints return a JSON array of song objects with the following structure:
[
{
"index": 142,
"title": "Tum Hi Ho",
"download_link": "https://www.youtube.com/watch?v=...",
"year": 2013
}
]- 3883 Hindi songs spanning 2014 to 2023
- Original dataset: 5610 songs before cleaning
- Fields: title, singers, music directors, lyricists, genre, album, YouTube link, album art, year
- Collected via web scraping from public music databases
- Variants such as remixes, reprises, and instrumentals are excluded to maintain data quality
Built by Parinith Reddy
This project is licensed under the MIT License. See the LICENSE file for details.

