Movie Recommendation System

The movie recommendation system is designed to provide users with personalized movie suggestions based on their preferences and viewing history. This system leverages a combination of movie metadata (such as genres, keywords, cast, and crew) and advanced text processing techniques to create a comprehensive profile for each movie. By transforming textual descriptions and attributes into numerical vectors, the system can calculate similarities between movies using cosine similarity. This allows it to identify and recommend movies that are most similar to a user's past choices or a specific movie they liked. The recommendation system enhances user experience by offering tailored suggestions, thereby helping users discover new movies that match their tastes, increasing engagement, and satisfaction. This approach not only simplifies the movie selection process but also introduces users to a broader range of content they are likely to enjoy.

Importing Libraries and Datasets

Import Libraries: Necessary libraries like numpy for numerical operations, pandas for data manipulation, and ast for parsing strings to lists/ dictionaries are imported. These are essential for handling and transforming the data efficiently.
Load Datasets: The movies and credits datasets are loaded. These datasets contain information about movies and their respective cast/crew, which are crucial for building a recommendation system.

Data Merging

Merge Datasets: The movies and credits datasets are merged based on the title column to combine relevant information from both datasets into a single dataframe. This simplifies the data handling process and ensures all necessary information is in one place.

Data Selection

Select Relevant Columns: Only the essential columns (movie_id, title, overview, genres, keywords, cast, crew) are kept. This reduces the complexity of the dataset and focuses on the attributes necessary for creating recommendations.

Data Cleaning

Handle Missing Values: Rows with missing values are dropped to ensure data completeness and avoid errors during further processing. Missing data can lead to inaccurate recommendations.
Check for Duplicates: Duplicate rows are identified and counted to ensure data uniqueness. Duplicate data can skew the results and lead to repetitive recommendations.

Data Preprocessing

Convert JSON Columns: The JSON-like columns (genres, keywords, cast, crew) are converted into lists of strings to make them more manageable and usable for text processing and vectorization.
Extract Top 5 Cast Members: Only the top 5 cast members are kept to reduce the dimensionality of the data and focus on the most prominent actors, which are likely more relevant for recommendations.
Extract Director: The director is extracted from the crew column to add a significant attribute to the recommendation system. Directors often have a unique style that can influence recommendations.
Convert Overview to List: The overview text is split into a list of words to facilitate text processing and feature extraction. This conversion helps in creating tags for vectorization.
Remove Spaces: Spaces in the genres, keywords, cast, and crew columns are removed to prevent issues when creating tags. This ensures that multi-word attributes are treated as single entities during vectorization.
Create Tags: A new Tags column is created by concatenating the overview, genres, keywords, cast, and crew columns. This column combines all relevant textual information into a single attribute, which is used for creating a text-based similarity measure.

Text Vectorization and Similarity Calculation

Text Vectorization: The Tags text is converted into vectors using CountVectorizer, which transforms text into numerical representations. Stemming is applied to reduce words to their root form, ensuring that similar words are treated the same.
Calculate Cosine Similarity: Cosine similarity between the vectors is computed to measure the similarity between movies based on their tags. This similarity measure is the core of the recommendation algorithm, determining how closely related different movies are.

Recommendation Function

Recommendation Function: A function is defined to recommend movies based on cosine similarity. This function retrieves the most similar movies to a given movie, providing the basis for the recommendation system.

Save Models and Data

Save Data and Models: The processed data and similarity matrix are saved using pickle. This ensures that the data and model can be easily loaded and used in the web application without the need for reprocessing.

Web Application using Streamlit

Streamlit App: An interactive web application is created using Streamlit to allow users to interact with the recommendation system. Users can select a movie and get recommendations, making the system accessible and user-friendly.(run with command streamlit app wesbite.py)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.DS_Store		.DS_Store
README.md		README.md
credits_dataset.csv		credits_dataset.csv
movie_recommender_system.ipynb		movie_recommender_system.ipynb
movies_dataset.csv		movies_dataset.csv
movues.pkl		movues.pkl
website.py		website.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Recommendation System

Importing Libraries and Datasets

Data Merging

Data Selection

Data Cleaning

Data Preprocessing

Text Vectorization and Similarity Calculation

Recommendation Function

Save Models and Data

Web Application using Streamlit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Movie Recommendation System

Importing Libraries and Datasets

Data Merging

Data Selection

Data Cleaning

Data Preprocessing

Text Vectorization and Similarity Calculation

Recommendation Function

Save Models and Data

Web Application using Streamlit

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages