This repository now contains a minimal, runnable PyTorch scaffold for reproducing OneRec (arXiv:2502.18965) in a disciplined, iteration-friendly way.
- Structured paper-to-code plan in
docs/spec_2502.md - Minimal sequential recommendation training loop (BPR objective)
- Config-driven entrypoint
- Core ranking metrics: Recall@K and NDCG@K
- Synthetic data path so the pipeline can run end-to-end immediately
Note: this is a starting point focused on engineering reproducibility. Exact paper-level alignment still requires dataset/protocol implementation from the paper.
python train.py --config configs/onerec_2502_minimal.yamlconfigs/- experiment configsdocs/- structured reproduction specificationdocs/results/- experiment result reportsonerec/- package modulesdata/- dataset and dataloader helpersmodels/- model definitionstraining/- trainer logicmetrics/- ranking metrics
- Implement exact dataset split protocol from 2502.18965.
- Replace synthetic dataset path with real benchmark pipeline.
- Add paper-specific model blocks and losses as ablations.
- Run multi-seed experiments and compare against reported metrics.
- ML-1M (full, all-item eval, MPS): 2026-02-18-ml1m
- Start the dashboard server:
. .venv/bin/activatepython dashboard.py
- Open http://localhost:8765/ to view loss/time/Recall@K/NDCG@K
If you want to manually download paper-related recommendation datasets first, see:
docs/dataset_links.md