Public LB Score: ~0.144289 NDCG@20
This repository contains our solution for the "Lost Items" recommender system case. The premise: a logging failure during a database migration caused a chunk of positive user-book interactions to simply vanish. Our objective was to build an ML pipeline to identify and recover these "lost" events.
Download here: https://www.kaggle.com/datasets/andrewsokolovsky/final-nto-ai-2026
The solution is built on a classic Two-Stage Recommender System architecture: Candidate Generation (Retrieval) -> Ranking.
At first, we made a classic mistake: we tried to solve this as a standard RecSys problem (predicting what the user will like in the future). After consulting with experts, we had a paradigm shift: we are not predicting the future; we are reconstructing a user session within a very specific time window. This completely changed our approach, putting massive emphasis on temporal feature engineering and window-specific candidate generators.
For each user, we retrieve hundreds of potentially relevant books from multiple sources. We built an ensemble of generators:
- ALS (Alternating Least Squares): The classic matrix factorization baseline for collaborative filtering.
- BM25: Adapted from text search for RecSys. Penalizes over-popular items better than standard cosine similarity.
- POP: Top trending books. A solid fallback.
- I2I (Item-to-Item): Based on ALS vectors, finding books similar to what the user recently read (with time-decay weights).
- Metadata Heuristics: Generating candidates based on the user's favorite authors, genres, and related book editions.
- 🔥 Window-Specific Generators (The Killer Feature): These algorithms only look at logs from the exact timeframe of the incident.
- Window I2I / Covisitation: Finds items frequently viewed together with items the user interacted with during the crash.
- Exposure Generator: Estimates the "probability of exposure". It mixes the popularity of an item during the incident with the user's affinity for specific authors/genres in that exact same period.
To help CatBoost pick the top 20 from hundreds of candidates, we mapped each (user, candidate) pair with specific features:
- Generator Scores: Weights returned by ALS, BM25, Exposure, etc.
- Heuristics (Covisitation & Pop): Pure math based on recent co-views (this alone scored 0.09 on the LB without any ML!).
- Historical Activity: Global and 30-day activity (to check if the user is "alive").
- Metadata Similarity: Genre overlap, author familiarity, age restriction delta.
- Temporal Features: Days since the last event.
- 🔥 Incident Window Signals: The strongest features. Exact interaction counts with the author/book/genre during the crash window.
We used CatBoostRanker — it's fast, handles tabular data perfectly, and supports GPU.
- Loss Function:
YetiRankPairwise(gave a +0.003 boost over standard yetirank). It directly optimizes the NDCG metric by learning to order items correctly within a single user's list. - PU-Weighting (Propensity Score) — Massive Boost: Since missing logs could mean either "the user didn't like it" or "the log was lost due to the bug", this is a classic Positive-Unlabeled (PU) learning problem. We implemented a
sample_weightlogic:- We calculate a propensity_score (how likely it is that the user was exposed to this item).
- If an item is labeled as a "lost event" -> we boost its weight.
- If an item is unlabeled but looks exactly like a lost event (high propensity) -> we lower its weight so the model isn't heavily penalized for ranking it high.
- Positives (Label = 1): We created a "Pseudo-incident". We took a historical period before the real crash and artificially dropped 25% of the interactions. These hidden pairs became our ground truth.
- Negatives (Label = 0): All other candidates generated by our retrieval models that aren't in the hidden list. We rely on the Propensity Weights to tell the model not to trust these negatives 100%.
✅ What worked:
- Pseudo-incident validation. Local CV tracked the Public LB perfectly.
- Understanding log nature. Reconstructing a specific session requires features that look only at that time window.
- Propensity Weights solved the PU-learning trap (preventing the model from learning false negatives).
- Maximizing Recall at the retrieval stage (the ranker can't rank what isn't there).
❌ What didn't work:
- Reranker (Level 3): Too complex to tune, takes too long to run, and the metric gain was tiny.
- Using only the incident month for positives: We tried this early on, treating everything else as negative. The model absolutely hated it.
The code is optimized for Kaggle Notebooks or Cloude.ru (GPU recommended).
- Clone the repo.
- Change the
DATA_DIRvariable to point to the competition datasets. - Install dependencies:
pip install implicit catboost scikit-learn - Run the code. It will build features, train the models, and generate the
submission.csvautomatically.
Telegram: @main4562 and @FeelAiChallenge If you like this solution, please give the repository a ⭐.