Skip to content

AndHunter/Recovering-lost-implicit-feedback-NTO-AI-Final

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

NTO AI 2025-2026 Finals Solution: "Lost Events" RecSys 🏆

Public LB Score: ~0.144289 NDCG@20

This repository contains our solution for the "Lost Items" recommender system case. The premise: a logging failure during a database migration caused a chunk of positive user-book interactions to simply vanish. Our objective was to build an ML pipeline to identify and recover these "lost" events.

Data

Download here: https://www.kaggle.com/datasets/andrewsokolovsky/final-nto-ai-2026

Core Idea

The solution is built on a classic Two-Stage Recommender System architecture: Candidate Generation (Retrieval) -> Ranking.

At first, we made a classic mistake: we tried to solve this as a standard RecSys problem (predicting what the user will like in the future). After consulting with experts, we had a paradigm shift: we are not predicting the future; we are reconstructing a user session within a very specific time window. This completely changed our approach, putting massive emphasis on temporal feature engineering and window-specific candidate generators.


Under the Hood

Stage 1: Candidate Generation

For each user, we retrieve hundreds of potentially relevant books from multiple sources. We built an ensemble of generators:

  • ALS (Alternating Least Squares): The classic matrix factorization baseline for collaborative filtering.
  • BM25: Adapted from text search for RecSys. Penalizes over-popular items better than standard cosine similarity.
  • POP: Top trending books. A solid fallback.
  • I2I (Item-to-Item): Based on ALS vectors, finding books similar to what the user recently read (with time-decay weights).
  • Metadata Heuristics: Generating candidates based on the user's favorite authors, genres, and related book editions.
  • 🔥 Window-Specific Generators (The Killer Feature): These algorithms only look at logs from the exact timeframe of the incident.
    • Window I2I / Covisitation: Finds items frequently viewed together with items the user interacted with during the crash.
    • Exposure Generator: Estimates the "probability of exposure". It mixes the popularity of an item during the incident with the user's affinity for specific authors/genres in that exact same period.

Stage 2: Feature Engineering

To help CatBoost pick the top 20 from hundreds of candidates, we mapped each (user, candidate) pair with specific features:

  • Generator Scores: Weights returned by ALS, BM25, Exposure, etc.
  • Heuristics (Covisitation & Pop): Pure math based on recent co-views (this alone scored 0.09 on the LB without any ML!).
  • Historical Activity: Global and 30-day activity (to check if the user is "alive").
  • Metadata Similarity: Genre overlap, author familiarity, age restriction delta.
  • Temporal Features: Days since the last event.
  • 🔥 Incident Window Signals: The strongest features. Exact interaction counts with the author/book/genre during the crash window.

Stage 3: Ranking

We used CatBoostRanker — it's fast, handles tabular data perfectly, and supports GPU.

  • Loss Function: YetiRankPairwise (gave a +0.003 boost over standard yetirank). It directly optimizes the NDCG metric by learning to order items correctly within a single user's list.
  • PU-Weighting (Propensity Score) — Massive Boost: Since missing logs could mean either "the user didn't like it" or "the log was lost due to the bug", this is a classic Positive-Unlabeled (PU) learning problem. We implemented a sample_weight logic:
    • We calculate a propensity_score (how likely it is that the user was exposed to this item).
    • If an item is labeled as a "lost event" -> we boost its weight.
    • If an item is unlabeled but looks exactly like a lost event (high propensity) -> we lower its weight so the model isn't heavily penalized for ranking it high.

Validation Strategy (How we got labels)

  • Positives (Label = 1): We created a "Pseudo-incident". We took a historical period before the real crash and artificially dropped 25% of the interactions. These hidden pairs became our ground truth.
  • Negatives (Label = 0): All other candidates generated by our retrieval models that aren't in the hidden list. We rely on the Propensity Weights to tell the model not to trust these negatives 100%.

Insights & What went wrong

What worked:

  1. Pseudo-incident validation. Local CV tracked the Public LB perfectly.
  2. Understanding log nature. Reconstructing a specific session requires features that look only at that time window.
  3. Propensity Weights solved the PU-learning trap (preventing the model from learning false negatives).
  4. Maximizing Recall at the retrieval stage (the ranker can't rank what isn't there).

What didn't work:

  1. Reranker (Level 3): Too complex to tune, takes too long to run, and the metric gain was tiny.
  2. Using only the incident month for positives: We tried this early on, treating everything else as negative. The model absolutely hated it.

How to Run

The code is optimized for Kaggle Notebooks or Cloude.ru (GPU recommended).

  1. Clone the repo.
  2. Change the DATA_DIR variable to point to the competition datasets.
  3. Install dependencies: pip install implicit catboost scikit-learn
  4. Run the code. It will build features, train the models, and generate the submission.csv automatically.

Contacts

Telegram: @main4562 and @FeelAiChallenge If you like this solution, please give the repository a ⭐.

About

National Technological Olympiad AI 25/26 Finals solution. Recovering lost implicit feedback in a Recommender System using a Two-Stage pipeline and PU-learning. Score: 0,1442

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors