A RAG-based (Retrieval-Augmented Generation) movie and TV show recommendation system powered by Ollama and Mistral.
- 🔍 Semantic Search - Natural language queries to find movies/TV shows
- 🎯 Smart Filtering - Filter by genre, year, rating, actors, directors, country
- 💬 Conversational AI - Context-aware responses with conversation history
- 📊 Quality Database - 6,000+ highly-rated titles (rating ≥7, 10K+ votes)
- ⚡ Local LLM - Runs entirely on your machine using Ollama
User Query → Embedding → FAISS Vector Search → Retrieved Context → Ollama/Mistral → Response
- Python 3.10+
- Ollama installed locally
- GPU recommended (for faster embeddings)
pip install -r requirements.txtDownload the following files from IMDb Datasets:
title.basics.tsv.gztitle.ratings.tsv.gztitle.crew.tsv.gztitle.principals.tsv.gztitle.akas.tsv.gzname.basics.tsv.gz
Extract them to the dataset/ folder.
⚠️ Note: IMDb data is for personal/non-commercial use only per their terms.
Information courtesy of IMDb (https://www.imdb.com). Used with permission.
ollama pull mistral
ollama servepython build_index.pyThis processes the data and creates embeddings (~10-15 minutes).
python main.py| Command | Description |
|---|---|
/help |
Show all commands |
/filter genre Action |
Filter by genre |
/filter year 2000-2010 |
Filter by year range |
/info Inception |
Get info about a title |
/compare A vs B |
Compare two titles |
/similar The Matrix |
Find similar titles |
/clear |
Clear conversation |
/exit |
Exit |
MovieMindAI/
├── config.py # Configuration
├── data_loader.py # Load IMDb TSV files
├── data_processor.py # Process and join data
├── embedding_generator.py # Generate embeddings
├── vector_store.py # FAISS index management
├── rag_retriever.py # RAG retrieval logic
├── ollama_client.py # Ollama API wrapper
├── response_generator.py # LLM response generation
├── conversation.py # Conversation history
├── chat_cli.py # CLI interface
├── build_index.py # Build script
├── main.py # Entry point
└── requirements.txt # Dependencies
This project is for educational/personal use only. The IMDb dataset has its own licensing terms.