Project Documentation: News Recommendation System (TDT4215)

Overview

This project implements a news recommendation engine using the MIND_small dataset (Microsoft News Dataset). The system predicts which news articles a user is likely to click based on their historical behavior and article content.

Implemented methods:

Baseline: Most popular based on recent Click-Through Rate
Content-Based Filtering: Uses article features (titles/categories) and vector embeddings
Collaborative Filtering: Item-item silimarity based on user interaction matrices
Hybrid (Score Fusion): Weighted combination of scores from all models
Hybrid (Rank Fusion): Reciprocal Rank Fusion (RRF) with adaptive weights based on user history length

Technical Setup

Prerequisities:

Python 3.8+
MINDsmall_train dataset files placed in ./data/MINDsmall_train (mover here after downloading from MIND website)

Installation

You can install the required packages directly, but using a virtual environment is highly recommended to avoid version conflicts with other projects.

Option 1: Using a Virtual Environment (Recommended)

This keeps the project dependencies isolated from your local computer.

# 1. Create the environment
python -m venv .venv

# 2. Activate it
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate

# 3. Install packages
pip install -r requirements.txt

Option 2: Quick Start (Global Install)

If you prefer not to use a virtual environment, simply run:

pip install -r requirements.txt

How to run

The main entry point is main.py. To load data, initialize models, generate sample recommendations for a test user, and run the evaluation suite:

python src/main.py

Script Workflow

load_mind: Loads news, behaviors and interaction data
setup_models:
- Filters interactions by time (48-hour window) for popularity calculation
- Generates TF-IDF/embeddings for content filtering
- Computes the sparse similarity matrix for collaborative filtering
run_recommenders: Performs a "Live Demo" for a specific user ID, showing their history and what each model suggests
run_evaluation: Samples 5000 impressions to calculate performance metrics

Evaluation Strategy

Accuracy Metrics

We use nDCG@5 (Normalized Discounted Cumulative Gain) to evaluate how well the models rank relevant articles within the top 5 suggestions in the impression logs.

Beyond-Accuracy Metrics

To ensure the system isn't just a filter bubble, we calculate Diversity.

Metric: Intra-list diversity based on article categories
Goal: Ensure the recommended articles cover a variety of topics

Project structure

├── data/
|   ├── processed/
|   |   ├── preprocessing_behaviors.py
|   |   └── preprocessing_news.py 
|   └── MINDsmall_train/
|       ├── behaviors.tsv
|       ├── entity_embeddings.vec
|       ├── news.tsv
|       └── relation_embedding.vec   
├── src/
|   ├── data/
|   │   ├── load_mind.py
|   ├── evaluation/
|   │   ├── accuracy.py     
|   │   └── beyondAccuracy.py
|   ├── models/
|   │   ├── popular.py  
|   │   ├── collaborative.py
|   │   ├── content_based.py
|   │   └── hybrid.py       
|   └── main.py              
└── requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
data/processed		data/processed
reports		reports
src		src
.gitignore		.gitignore
README.md		README.md
requirements.in		requirements.in
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Documentation: News Recommendation System (TDT4215)

Overview

Technical Setup

Prerequisities:

Installation

Option 1: Using a Virtual Environment (Recommended)

Option 2: Quick Start (Global Install)

How to run

Script Workflow

Evaluation Strategy

Accuracy Metrics

Beyond-Accuracy Metrics

Project structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Documentation: News Recommendation System (TDT4215)

Overview

Technical Setup

Prerequisities:

Installation

Option 1: Using a Virtual Environment (Recommended)

Option 2: Quick Start (Global Install)

How to run

Script Workflow

Evaluation Strategy

Accuracy Metrics

Beyond-Accuracy Metrics

Project structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages