An exploratory and statistical analysis of 756,031 traffic collision observations in the Greater Toronto Area (GTA) to inform urban policy and infrastructure decisions.
This project investigates the role of spatial and temporal variables on the frequency and severity of traffic collisions across Toronto. By analyzing over a decade of collision data, this repository aims to uncover critical patterns that impact road safety and urban planning.

The analysis was conducted entirely in R, utilizing reproducible literate programming (RMarkdown).
- Data Cleaning: Processed UNIX timestamps into standard dates and filtered geographic data to isolate correct WGS84 encoding for spatial mapping. Recoded binary categorical variables for accurate statistical testing.
- Statistical Testing: Utilized one-way Analysis of Variance (ANOVA) to test for mean differences across multiple time-based group levels simultaneously. Applied Chi-Square goodness-of-fit tests to evaluate if collision counts deviated significantly from uniform distributions across specific intervals.
- Regression & Forecasting: Fitted a linear regression model to monthly aggregated data to evaluate long-term trends and generated a 24-month predictive forecast.
- Spatial Analysis: Conducted polygon-based choropleth mapping using the
sfpackage to visualize collision density across administrative boundaries.
- π Temporal Impacts: Both ANOVA and Chi-Square testing provided consistent evidence that the time of day and day of the week are major contributing factors in collision frequency. Collisions are heavily concentrated between 3:00 PM and 5:00 PM, and Fridays exhibit the highest number of injury-related collisions.
- πΊοΈ Spatial Impacts: Collisions cluster heavily in high-traffic corridors. The Wexford/Maryvale neighbourhood experiences significantly higher total collisions compared to other areas.
- π Long-Term Trends: Regression results indicate a steady upward trend in overall traffic collisions in Toronto from 2014 through 2025.