Hello! This is where I store the Jupyter Notebooks from all the Kaggle competitions I've done. If you would like to replicate the results, please download the data from Kaggle.
Rank: top 1%
June 30, 2020
M5 Competition Notebook
- Using hierarchical sales data provided by Walmart, the task was to predict sales for the next 28 days for each product.
- The data, covers stores in three US States (California, Texas, and Wisconsin) and includes item level, department, product categories, and store details. In addition, it has explanatory variables such as price, promotions, day of the week, and special events.
- I trained on the last 2 years of data, so the full dataset before feature engineering contains 21,797,534 rows and 17 columns.
- After performing EDA and adding lagged features + features from external data sources, I modeled this as a supervised regression problem using Light GBM.
- Since this is a time series, I split the train, validation, and test datasets based on date. Validation and test each contains 28 days of sales for each item.
- The evaluation metric was calculated using Weighted Root Mean Squared Scaled Error (WRMSSE), a variant of the Mean Absolute Scaled Error with the formula provided by the competition host.
- My final submission achieved a WMRSSE of 0.56580.
For more details on the competition, please follow these links:
Rank: top 29%
November 30, 2020
MoA Notebook