Phase 3: Process

Tools Used

Full reproducible cleaning code lives in notebooks/01_process.Rmd. Headline steps:

Loaded 4 core CSV files: daily activity, sleep, hourly steps, hourly calories.
Standardized column names to snake_case using janitor::clean_names().
Removed 3 duplicate rows from the sleep dataset.
Confirmed no missing values across the four files.
Converted date columns from character strings to proper Date and POSIXct types.
Flagged zero-step days as wear/engagement signals rather than deleting them.
Engineered new features:
- day_of_week — for weekly pattern analysis
- hour — for time-of-day analysis
- user_type — Sedentary / Lightly / Fairly / Very Active classification
- sleep_efficiency — minutes asleep ÷ minutes in bed
- usage_category — High / Moderate / Low engagement
Joined daily activity with sleep data on user ID and date.
Exported 7 clean datasets to data/clean/ for reuse in the analysis phase.

Cleaned datasets: