🎬 Hybrid Movie Recommender System

A hybrid recommendation engine combining Collaborative Filtering (NMF) and Content-Based Filtering (TF-IDF + Cosine Similarity) — built with Python, SQLite, and Streamlit.

📌 Table of Contents

Features
How It Works
Project Structure
Tech Stack
Installation & Setup
Usage
Output
Screenshots
Future Improvements
Author

✨ Features

Feature	Description
🔀 Hybrid Recommendations	Combines CF + Content-Based scores with tunable weight
🎚️ Adjustable Alpha	Slide between collaborative and content-based filtering
👤 Existing Users	Personalized recommendations from historical ratings
🆕 Cold-Start Handling	Popularity-based or genre-based suggestions for new users
🎯 Content-Based Mode	Recommendations from user-selected liked movies
🌐 Interactive UI	Clean, browser-based interface powered by Streamlit

🧠 How It Works

1. Data Pipeline (`create_db.py`)

Reads movies.csv and ratings.csv using pandas
Inserts into a SQLite database (recsys.db) with two tables:
- movies(movieId, title, genres, ...)
- ratings(userId, movieId, rating, timestamp)
Adds indexes on key columns for fast query performance

2. Model Training (`train_models.py`)

Collaborative Filtering (CF) — NMF

Filters to top 500 users and top 1000 movies to keep the matrix manageable
Builds a dense user × item rating matrix R
Trains sklearn.decomposition.NMF to decompose R into:
- user_factors — shape (num_users, k)
- item_factors — shape (k, num_movies)
Saves nmf_user_factors.pkl, nmf_item_factors.pkl, and ID ↔ index mapping dicts

Content-Based Filtering (CB) — TF-IDF

Combines each movie's title + genres into a text field
Applies TF-IDF vectorization across all movies
Computes a full pairwise cosine similarity matrix
Saves tfidf_vectorizer.pkl, content_cosine_sim.npz, and mapping dicts

Artifacts also saved: `movies_df.pkl`, `ratings_df.pkl`

3. Recommendation Logic (`recommender.py`)

Existing User → Hybrid Recommendations

CF Score — predicted rating via NMF dot product:

CF Score = user_factors[u] · item_factors[:, m]

CB Score — for each candidate movie, averages cosine similarity against all movies the user has rated ≥ 4 stars, then normalizes to [0, 1].

Both scores are normalized and blended:

Hybrid Score = α × CF Score + (1 - α) × CB Score

α = 1.0 → Pure Collaborative Filtering
α = 0.0 → Pure Content-Based Filtering
α = 0.5 → Equal blend (recommended default)

New User → Cold-Start Recommendations

Method	How it works
Popularity-based	Score = `mean_rating × log(rating_count)` — surfaces well-rated and widely seen films
Liked movies	User picks titles they enjoyed → CB cosine similarity finds the closest matches

🏗️ Project Structure

mini-recommender/
│
├── data/
│   ├── movies.csv
│   └── ratings.csv
│
├── models/
│   ├── nmf_user_factors.pkl
│   ├── nmf_item_factors.pkl
│   ├── tfidf_vectorizer.pkl
│   └── content_cosine_sim.npz
│
├── create_db.py        # Sets up SQLite database
├── train_models.py     # Trains NMF and TF-IDF models
├── recommender.py      # Core recommendation logic
├── app.py              # Streamlit frontend
├── recsys.db           # SQLite database (auto-generated)
├── requirements.txt
└── README.md

⚙️ Tech Stack

Layer	Technology
Frontend	Streamlit
Backend	Python 3.8+
Database	SQLite
ML / Math	scikit-learn, scipy, numpy
Data	pandas
Serialization	joblib

🛠️ Installation & Setup

Prerequisites

Python 3.8+
MovieLens dataset (movies.csv and ratings.csv)

1. Clone the Repository

git clone https://github.com/yourusername/mini-recommender.git
cd mini-recommender

2. Install Dependencies

pip install -r requirements.txt

3. Add Dataset

Place the MovieLens CSV files in the data/ directory:

data/
├── movies.csv
└── ratings.csv

4. Create the Database

python create_db.py

5. Train the Models

python train_models.py

6. Launch the App

streamlit run app.py

The app will open at http://localhost:8501 in your browser.

🖥️ Usage

Tab 1 — Existing User

Select a User ID from the dropdown
Choose N (number of recommendations) and adjust the alpha slider
Get hybrid recommendations with CF + CB scores
See the user's own top-rated movies alongside results

Tab 2 — New User (Cold Start)

Option A — Popularity-based:
Recommends widely-seen, well-rated movies using mean_rating × log(count) scoring.

Option B — Liked movies:
Pick titles you've enjoyed from a dropdown → content-based similarity returns the closest matches.

📊 Output

Each recommendation includes:

Field	Description
🎬 Movie Title	Recommended movie name
⭐ CF Score	Predicted rating from collaborative filtering
🔀 Hybrid Score	Weighted combination of CF + CB scores
❤️ Liked Movies	User's historically rated movies (existing users)

📸 Screenshots

💡 Future Improvements

Real-time user feedback loop for online learning
Neural Collaborative Filtering (NCF / deep learning)
Sparse matrix handling + ALS for larger datasets (e.g. implicit library)
Replace NMF truncation with Surprise library for better CF
REST API with FastAPI or Flask
Explicit model retrain path within the Streamlit app
Error handling for missing or corrupt model files
Docker containerization & cloud deployment (AWS / GCP / Heroku)

👨‍💻 Author

Reth Rebello
Engineering Student · ML Enthusiast · Data Analytics · software Developer

🤝 Contributing

Contributions are welcome! Here's how:

Fork the repository
Create a feature branch: git checkout -b feature/your-feature
Commit your changes: git commit -m 'Add your feature'
Push to the branch: git push origin feature/your-feature
Open a Pull Request

📄 License

This project is licensed under the MIT License.

If you found this useful, give it a ⭐ on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
README.txt		README.txt
Screenshot (752).png		Screenshot (752).png
Screenshot (753).png		Screenshot (753).png
Screenshot (754).png		Screenshot (754).png
Screenshot (755).png		Screenshot (755).png
Screenshot (756).png		Screenshot (756).png
Screenshot (757).png		Screenshot (757).png
app.py		app.py
create_db.py		create_db.py
recommender.py		recommender.py
requirements.txt		requirements.txt
train_models.py		train_models.py

Folders and files

Latest commit

History

Repository files navigation

🎬 Hybrid Movie Recommender System

📌 Table of Contents

✨ Features

🧠 How It Works

1. Data Pipeline (create_db.py)

2. Model Training (train_models.py)

Collaborative Filtering (CF) — NMF

Content-Based Filtering (CB) — TF-IDF

Artifacts also saved: movies_df.pkl, ratings_df.pkl

3. Recommendation Logic (recommender.py)

Existing User → Hybrid Recommendations

New User → Cold-Start Recommendations

🏗️ Project Structure

⚙️ Tech Stack

🛠️ Installation & Setup

Prerequisites

1. Clone the Repository

2. Install Dependencies

3. Add Dataset

4. Create the Database

5. Train the Models

6. Launch the App

🖥️ Usage

Tab 1 — Existing User

Tab 2 — New User (Cold Start)

📊 Output

📸 Screenshots

💡 Future Improvements

👨‍💻 Author

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Data Pipeline (`create_db.py`)

2. Model Training (`train_models.py`)

Artifacts also saved: `movies_df.pkl`, `ratings_df.pkl`

3. Recommendation Logic (`recommender.py`)

Packages