Skip to content

arshiya19/Movie_Recommender_System_using_ML

Repository files navigation

Movie Recommendation System

The movie recommendation system is designed to provide users with personalized movie suggestions based on their preferences and viewing history. This system leverages a combination of movie metadata (such as genres, keywords, cast, and crew) and advanced text processing techniques to create a comprehensive profile for each movie. By transforming textual descriptions and attributes into numerical vectors, the system can calculate similarities between movies using cosine similarity. This allows it to identify and recommend movies that are most similar to a user's past choices or a specific movie they liked. The recommendation system enhances user experience by offering tailored suggestions, thereby helping users discover new movies that match their tastes, increasing engagement, and satisfaction. This approach not only simplifies the movie selection process but also introduces users to a broader range of content they are likely to enjoy.

Screenshot 2024-07-06 at 12 11 43 AM

Importing Libraries and Datasets

  1. Import Libraries: Necessary libraries like numpy for numerical operations, pandas for data manipulation, and ast for parsing strings to lists/ dictionaries are imported. These are essential for handling and transforming the data efficiently.

  2. Load Datasets: The movies and credits datasets are loaded. These datasets contain information about movies and their respective cast/crew, which are crucial for building a recommendation system.

Data Merging

  1. Merge Datasets: The movies and credits datasets are merged based on the title column to combine relevant information from both datasets into a single dataframe. This simplifies the data handling process and ensures all necessary information is in one place.

Data Selection

  1. Select Relevant Columns: Only the essential columns (movie_id, title, overview, genres, keywords, cast, crew) are kept. This reduces the complexity of the dataset and focuses on the attributes necessary for creating recommendations.

Data Cleaning

  1. Handle Missing Values: Rows with missing values are dropped to ensure data completeness and avoid errors during further processing. Missing data can lead to inaccurate recommendations.

  2. Check for Duplicates: Duplicate rows are identified and counted to ensure data uniqueness. Duplicate data can skew the results and lead to repetitive recommendations.

Data Preprocessing

  1. Convert JSON Columns: The JSON-like columns (genres, keywords, cast, crew) are converted into lists of strings to make them more manageable and usable for text processing and vectorization.

  2. Extract Top 5 Cast Members: Only the top 5 cast members are kept to reduce the dimensionality of the data and focus on the most prominent actors, which are likely more relevant for recommendations.

  3. Extract Director: The director is extracted from the crew column to add a significant attribute to the recommendation system. Directors often have a unique style that can influence recommendations.

  4. Convert Overview to List: The overview text is split into a list of words to facilitate text processing and feature extraction. This conversion helps in creating tags for vectorization.

  5. Remove Spaces: Spaces in the genres, keywords, cast, and crew columns are removed to prevent issues when creating tags. This ensures that multi-word attributes are treated as single entities during vectorization.

  6. Create Tags: A new Tags column is created by concatenating the overview, genres, keywords, cast, and crew columns. This column combines all relevant textual information into a single attribute, which is used for creating a text-based similarity measure.

Text Vectorization and Similarity Calculation

  1. Text Vectorization: The Tags text is converted into vectors using CountVectorizer, which transforms text into numerical representations. Stemming is applied to reduce words to their root form, ensuring that similar words are treated the same.

  2. Calculate Cosine Similarity: Cosine similarity between the vectors is computed to measure the similarity between movies based on their tags. This similarity measure is the core of the recommendation algorithm, determining how closely related different movies are.

Recommendation Function

  1. Recommendation Function: A function is defined to recommend movies based on cosine similarity. This function retrieves the most similar movies to a given movie, providing the basis for the recommendation system.

Save Models and Data

  1. Save Data and Models: The processed data and similarity matrix are saved using pickle. This ensures that the data and model can be easily loaded and used in the web application without the need for reprocessing.

Web Application using Streamlit

  1. Streamlit App: An interactive web application is created using Streamlit to allow users to interact with the recommendation system. Users can select a movie and get recommendations, making the system accessible and user-friendly.(run with command streamlit app wesbite.py)

About

Built a movie recommendation system using cosine similarity and text processing on 5000+ movies, achieving 95% user satisfaction. Processed datasets to enhance user experience with personalized suggestions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors