Skip to content

MDenizCan/MovieMindAI

Repository files navigation

🎬 MovieMindAI

A RAG-based (Retrieval-Augmented Generation) movie and TV show recommendation system powered by Ollama and Mistral.

Features

  • 🔍 Semantic Search - Natural language queries to find movies/TV shows
  • 🎯 Smart Filtering - Filter by genre, year, rating, actors, directors, country
  • 💬 Conversational AI - Context-aware responses with conversation history
  • 📊 Quality Database - 6,000+ highly-rated titles (rating ≥7, 10K+ votes)
  • Local LLM - Runs entirely on your machine using Ollama

Architecture

User Query → Embedding → FAISS Vector Search → Retrieved Context → Ollama/Mistral → Response

Requirements

  • Python 3.10+
  • Ollama installed locally
  • GPU recommended (for faster embeddings)

Setup

1. Install Dependencies

pip install -r requirements.txt

2. Download IMDb Dataset

Download the following files from IMDb Datasets:

  • title.basics.tsv.gz
  • title.ratings.tsv.gz
  • title.crew.tsv.gz
  • title.principals.tsv.gz
  • title.akas.tsv.gz
  • name.basics.tsv.gz

Extract them to the dataset/ folder.

⚠️ Note: IMDb data is for personal/non-commercial use only per their terms.

Information courtesy of IMDb (https://www.imdb.com). Used with permission.

3. Install Ollama Model

ollama pull mistral
ollama serve

4. Build the Index

python build_index.py

This processes the data and creates embeddings (~10-15 minutes).

5. Run

python main.py

CLI Commands

Command Description
/help Show all commands
/filter genre Action Filter by genre
/filter year 2000-2010 Filter by year range
/info Inception Get info about a title
/compare A vs B Compare two titles
/similar The Matrix Find similar titles
/clear Clear conversation
/exit Exit

Project Structure

MovieMindAI/
├── config.py              # Configuration
├── data_loader.py         # Load IMDb TSV files
├── data_processor.py      # Process and join data
├── embedding_generator.py # Generate embeddings
├── vector_store.py        # FAISS index management
├── rag_retriever.py       # RAG retrieval logic
├── ollama_client.py       # Ollama API wrapper
├── response_generator.py  # LLM response generation
├── conversation.py        # Conversation history
├── chat_cli.py            # CLI interface
├── build_index.py         # Build script
├── main.py                # Entry point
└── requirements.txt       # Dependencies

License

This project is for educational/personal use only. The IMDb dataset has its own licensing terms.

About

A RAG-based (Retrieval-Augmented Generation) movie and TV show recommendation system powered by Ollama and Mistral.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages