🎬 CineMatch — Movie Recommendation System

A content-based movie recommendation engine built with Python as an interactive web app. Enter any movie and instantly get 5 similar recommendations with posters — powered by NLP and cosine similarity on the TMDB 5000 dataset.

📸 Preview

🧠 How It Works

Each movie's overview, genres, keywords, cast and director are combined into a single tags string
Tags are stemmed using NLTK's PorterStemmer to normalize word forms
A CountVectorizer converts all tags into a 4806 × 5000 word-count matrix
Cosine similarity is computed between every pair of movies
Given a movie, the app returns the top 5 most similar movies by cosine score

🗂️ Project Structure

cinematch/
├── app.py                          # Streamlit web app
├── netflix_recommendation.py       # Full notebook logic (all steps)
├── movies.pkl                      # Preprocessed movie dataframe
├── similarity.pkl                  # Cosine similarity matrix
├── requirements.txt                # Python dependencies
└── README.md                       # You are here

⬇️ Downloads

You need two dataset files from Kaggle and two model files to run this project.

1. Dataset Files — from Google Drive

Download both CSV files and place them in the project root:

File	Description	Link
`tmdb_5000_movies.csv`	Movie metadata (budget, genres, keywords, overview)	⬇️ Download
`tmdb_5000_credits.csv`	Cast and crew information	⬇️ Download

2. Model Files — pkl files

If you don't want to run the notebook yourself, download the pre-built model files:

File	Description	Link
`movies.pkl`	Preprocessed movie titles + tags	⬇️ Download
`similarity.pkl`	4806×4806 cosine similarity matrix	⬇️ Download

Place all downloaded files in the project root folder alongside app.py.

🚀 Run Locally

Step 1 — Clone the repository

git clone https://github.com/YOUR_USERNAME/cinematch.git
cd cinematch

Step 2 — Create a virtual environment

python3 -m venv venv
source venv/bin/activate        # Mac / Linux
venv\Scripts\activate           # Windows

Step 3 — Install dependencies

pip install -r requirements.txt

Step 4 — Add your TMDB API key

Open app.py and replace the API key on this line:

API_KEY = "your_api_key_here"

Get a free key at → themoviedb.org (Sign up → Settings → API → Create → Developer → Copy API Key v3)

Step 5 — Download the pkl files

Download movies.pkl and similarity.pkl from the links above and place them in the project root.

Or generate them yourself by running all cells in netflix_recommendation.py inside Jupyter:

pip install jupyter
jupyter notebook
# Open netflix_recommendation.py and run all cells
# This will generate movies.pkl and similarity.pkl automatically

Step 6 — Launch the app

streamlit run app.py

The app will open at http://localhost:8501 in your browser.

🛠️ Tech Stack

Tool	Purpose
Python 3.10+	Core language
Pandas & NumPy	Data manipulation
NLTK	Text stemming
Scikit-learn	CountVectorizer + Cosine Similarity
Streamlit	Web app framework
TMDB API	Fetching movie posters

📦 Requirements

streamlit
pandas
numpy
scikit-learn
nltk
requests

Install with:

pip install -r requirements.txt

Note: similarity.pkl is ~90MB. If GitHub rejects it, use Git LFS or host it on Google Drive and load it in the app via gdown.

🙋 FAQ

Q: The app loads but posters are missing? Your TMDB API key may not be activated yet. It can take up to 30 minutes after signup. Replace API_KEY in app.py with your key.

Q: similarity.pkl is too large to push to GitHub? Use Git LFS:

brew install git-lfs       # Mac
git lfs install
git lfs track "*.pkl"
git add .gitattributes similarity.pkl
git commit -m "Add model via LFS"
git push

Q: How do I regenerate the model files? Run all cells in netflix_recommendation.py in Jupyter Notebook. It will create fresh movies.pkl and similarity.pkl files.

👩‍💻 Author

Gargi Joshi

GitHub: @gargijoshi9
LinkedIn:Gargi Joshi

📄 License

This project is open source under the MIT License.

Built with ♥ using Python · Scikit-learn · NLTK · Streamlit · TMDB API

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
1.png		1.png
2.png		2.png
README.md		README.md
app.py		app.py
movie_dict.pkl		movie_dict.pkl
movies.pkl		movies.pkl
netflix-recommender.ipynb		netflix-recommender.ipynb
requirements.txt		requirements.txt
similarity.pbz2		similarity.pbz2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 CineMatch — Movie Recommendation System

📸 Preview

🧠 How It Works

🗂️ Project Structure

⬇️ Downloads

1. Dataset Files — from Google Drive

2. Model Files — pkl files

🚀 Run Locally

Step 1 — Clone the repository

Step 2 — Create a virtual environment

Step 3 — Install dependencies

Step 4 — Add your TMDB API key

Step 5 — Download the pkl files

Step 6 — Launch the app

🛠️ Tech Stack

📦 Requirements

🙋 FAQ

👩‍💻 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎬 CineMatch — Movie Recommendation System

📸 Preview

🧠 How It Works

🗂️ Project Structure

⬇️ Downloads

1. Dataset Files — from Google Drive

2. Model Files — pkl files

🚀 Run Locally

Step 1 — Clone the repository

Step 2 — Create a virtual environment

Step 3 — Install dependencies

Step 4 — Add your TMDB API key

Step 5 — Download the pkl files

Step 6 — Launch the app

🛠️ Tech Stack

📦 Requirements

🙋 FAQ

👩‍💻 Author

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages