A sequence-aware collaborative filtering system that uses a Transformer architecture to predict the next movie a user is likely to watch.
Instead of predicting ratings (traditional approach), this system models user behavior as a sequence prediction problem, similar to how GPT models generate text.
Traditional recommender systems (KNN, SVD):
- Ignore sequence order
- Treat interactions as static
This system:
- Learns watch patterns over time
- Captures contextual relationships between movies
- Applies LLM-style learning to recommendation systems
Reframing recommendation as next-token prediction:
- Movies → Tokens
- User history → Sequence
- Next movie → Prediction
Example:
[Movie₁, Movie₂, Movie₃] → Movie₄
This is the same learning paradigm used in GPT models.
-
Data Ingestion
- Netflix Prize Dataset (~100M ratings, sampled)
-
Preprocessing
- Clean and parse raw data
- Merge ratings with movie metadata
-
Filtering
- Keep only positive interactions (rating ≥ 4)
-
Sequence Construction
- Convert user histories into sequential training samples
-
Tokenization
- Map movie IDs → integer tokens
-
Dataset Preparation
- Padding and batching for fixed-length input
-
Model
- Transformer Encoder with multi-head self-attention
-
Prediction
- Outputs probability distribution over all movies
-
Embedding Layer: Converts movie IDs into dense vectors
-
Transformer Encoder:
- Multi-head self-attention
- Captures relationships across watched movies
-
Output Layer:
- Fully connected layer for next-movie prediction
- Loss Function: Cross-Entropy
- Optimizer: Adam
- Objective: Maximize likelihood of correct next movie
Input sequence:
- Reservoir Dogs
- Dogma
- Lilo & Stitch
Predicted next movies:
- North by Northwest
- The Deer Hunter
- Chasing Amy
| Method | Sequence Awareness | Context Understanding |
|---|---|---|
| KNN | ❌ No | ❌ Limited |
| SVD | ❌ No | ❌ Limited |
| Transformer (This Work) | ✅ Yes | ✅ Strong |
- How Transformers generalize beyond NLP
- Importance of sequence modeling in recommendations
- Role of attention in capturing user behavior
- Data pipeline design for large-scale sequential systems
- Trained on sampled dataset (not full scale)
- No hyperparameter optimization
- Cold-start problem not addressed
- Incorporate temporal embeddings
- Hybrid model (content + collaborative)
- Fine-tune with larger dataset
- Deploy as real-time recommendation API
https://drive.google.com/file/d/1zblcSgEyVbHYxe5F_LK7qHxMqzkp1MwW/view?usp=sharing