Wildfire Ignition Risk Heatmap for California

An end-to-end geospatial machine learning pipeline that predicts wildfire ignition risk across California at the level of a 4 km grid cell and weekly time step. The system fuses 7+ heterogeneous datasets — satellite weather grids, fuel rasters, census demographics, fire history, and national park visitation — into a unified modeling dataset of ~2.4 million rows, trains classification models, and renders predictions as interactive, zoomable risk heatmaps.

Raw Data (7 sources)          Grid (4 km cells)         ML Pipeline           Risk Heatmap
┌─────────────────┐          ┌───────────────┐        ┌──────────────┐      ┌──────────────┐
│ FPA-FOD fires   │──┐       │               │        │ 2.4M rows    │      │  Interactive  │
│ gridMET weather │──┤       │  ┌──┬──┬──┐   │        │ 6 features   │      │  folium map   │
│ LANDFIRE fuels  │──┤──────▶│  ├──┼──┼──┤   │──────▶ │ LogReg + RF  │─────▶│  + fire pts   │
│ Census/ACS pop  │──┤       │  ├──┼──┼──┤   │        │ ROC-AUC eval │      │  + colorbar   │
│ CalFire perims  │──┤       │  └──┴──┴──┘   │        │ Feature imp. │      │  per week     │
│ NPS parks+visits│──┘       │               │        └──────────────┘      └──────────────┘
└─────────────────┘          └───────────────┘

Features

Multi-source geospatial data fusion — integrates vector (shapefiles, GeoPackage), raster (GeoTIFF, NetCDF), and tabular (API, CSV) datasets into a single analysis grid
4 km spatial resolution — analysis grid derived from gridMET cell geometry, covering all of California (~30,000+ cells)
Weekly temporal resolution — predictions at the (cell_id, week) level for fine-grained temporal risk tracking
6 engineered features — 7-day rolling mean temperature, population density, fuel type, years since last fire, distance to nearest park, log-transformed park visitor counts
Class-imbalance-aware ML — logistic regression and random forest classifiers with balanced class weights to handle the ~99.5% / 0.5% imbalance
Time-based train/test split — trains on 2019, tests on 2020 to prevent data leakage
Comprehensive evaluation — ROC-AUC, PR-AUC, Brier score, confusion matrices, and feature importance analysis
Interactive risk heatmaps — zoomable folium maps with risk intensity layers, actual fire point overlays, and color legends for any selected week
Static EDA visualizations — distribution plots, feature vs. target comparisons, and geographic maps via matplotlib and seaborn
Reproducible notebook — single Jupyter notebook walks from raw data download through final heatmap in 9 clearly labeled sections

Data Pipeline

Section 1: Data Acquisition
    │   Load 7 datasets: FPA-FOD, gridMET, LANDFIRE, Census, CalFire, NPS parks, visitor stats
    ▼
Section 2: Grid Definition
    │   Build ~30K+ polygons (4 km cells) from gridMET geometry → ca_grid_cells.gpkg
    ▼
Section 3: Fire Labels
    │   Spatial join fires → cells, aggregate to (cell_id, week) → labels_cell_week_full.parquet
    ▼
Section 4: Weather Features
    │   7-day rolling mean tmmx, weekly resample → weather_features_cell_week.parquet
    ▼
Section 5: Human & Geographic Features
    │   Population density, fuel mode, burn history, park distance, visitor counts
    │   → ca_grid_data_done.gpkg
    ▼
Section 6: Modeling Dataset Assembly
    │   Merge labels + weather + static features → modeling_dataset_2019_2020.parquet (2.4M rows)
    ▼
Section 7: Exploratory Data Analysis
    │   Feature distributions, class balance, geographic visualizations
    ▼
Section 8: Model Training & Evaluation
    │   Logistic Regression + Random Forest → metrics + feature importance
    ▼
Section 9: Risk Heatmap
        Score all cells → risk_scores_cell_week.parquet → interactive folium maps

Data Sources

Note: Raw data is not checked into this repo due to size and licensing. The notebook documents where to download each dataset and how to store it under data/raw/.

Dataset	Source	Format	Purpose
Wildfire Ignitions	FPA-FOD (U.S. Forest Service)	GeoPackage	Historical fire discovery locations and dates (CA, 2000–2020, ≥100 acres)
Fire Perimeters	CAL FIRE / FRAP	GeoPackage	Burn history polygons for years-since-last-fire feature
Weather (Temperature)	gridMET	NetCDF	Daily max temperature (tmmx) at 4 km resolution
Population Density	Census TIGER/Line + ACS API	Shapefile + API	Census tract geometries + population estimates
Land Cover / Fuels	LANDFIRE	GeoTIFF	Fire behavior fuel model raster (FBFM13)
Park Boundaries	NPS	GeoPackage	National Park unit boundaries for California
Park Visitor Counts	Melanie Walsh Dataset	CSV	Annual recreation visits per park (1979–2024)

Tech Stack

Category	Libraries
Geospatial	`geopandas`, `shapely`, `rasterio`, `rioxarray`, `pyproj`, `fiona`
Raster & Gridded Data	`xarray`, `rioxarray`, `rasterio`, `dask`
Data Processing	`pandas`, `numpy`, `pyarrow`
Machine Learning	`scikit-learn` (LogisticRegression, RandomForestClassifier, StandardScaler)
Interactive Mapping	`folium`, `branca` (HeatMap plugin, colormaps)
Visualization	`matplotlib`, `seaborn`
Environment	Conda (`geo` environment), Python 3.11

Repository Structure

wildfire-proj/
├── wildfire_proj.ipynb           # Main notebook — full pipeline from raw data to heatmap
├── README.md                     # This file
├── data/
│   ├── raw/                      # Downloaded datasets (not in version control)
│   │   ├── fpa_fod/              #   FPA_FOD_20221014.gpkg
│   │   ├── perim/                #   CalFire perimeter polygons
│   │   ├── ca_boundry/           #   California state boundary shapefile
│   │   ├── tract_info/           #   Census TIGER/Line tracts
│   │   ├── gridmet/              #   gridMET NetCDF files
│   │   ├── parks/                #   NPS boundary GeoPackage
│   │   └── landfire/             #   LANDFIRE fuel model GeoTIFF
│   └── processed/                # Notebook outputs (not in version control)
│       ├── ca_grid_cells.gpkg
│       ├── labels_cell_week_full.parquet
│       ├── weather_features_cell_week.parquet
│       ├── ca_grid_data_done.gpkg
│       ├── modeling_dataset_2019_2020.parquet
│       └── risk_scores_cell_week.parquet
└── environment.yml               # Conda environment export (optional)

Getting Started

Prerequisites

Conda (Miniconda or Anaconda)
~10 GB disk space for raw datasets
Jupyter Notebook or JupyterLab

Installation

Clone the repository

git clone https://github.com/sohan-shingade/wildfire-proj.git
cd wildfire-proj

Create the conda environment

conda create -n geo python=3.11
conda activate geo

conda install -c conda-forge \
  geopandas rasterio rioxarray xarray netcdf4 \
  shapely fiona pyproj scikit-learn \
  folium branca matplotlib dask pyarrow

pip install seaborn

Download the datasets

Follow the links in the Data Sources table and place each dataset under data/raw/ in the folder structure shown above.
Run the notebook
```
jupyter notebook wildfire_proj.ipynb
```
Execute cells sequentially — each section builds on the outputs of previous sections.

Pipeline Walkthrough

1. Grid Definition

Builds a uniform 4 km analysis grid from gridMET's native cell geometry. Each cell gets a unique cell_id used as the join key throughout the pipeline. Saved as ca_grid_cells.gpkg.

2. Fire Labels

Fire ignition points from FPA-FOD are spatially joined to grid cells (gpd.sjoin with within), then aggregated to (cell_id, week) pairs. The binary target fire_occurred is set to 1 if any fire ignited in that cell-week. The resulting dataset has ~2.4M rows with extreme class imbalance (~99.5% negative).

3. Weather Features

Daily gridMET maximum temperature is processed through a 7-day rolling mean, then resampled to weekly aggregates per grid cell. This captures antecedent heat conditions that drive fire risk.

4. Human & Geographic Features

Five static features are computed per grid cell:

Feature	Method
`pop_density`	Census tract population / area, joined by cell centroid
`fuel_mode`	Dominant LANDFIRE fuel class via raster zonal statistics
`years_since_last_fire`	Most recent CalFire perimeter year, subtracted from reference
`dist_to_park_m`	Distance from cell centroid to nearest NPS park boundary
`log_visits`	Log-transformed annual visitor count of the nearest park

5. Modeling

All features are merged into a flat modeling table. The pipeline uses a time-based split (train on 2019, test on 2020) to prevent leakage, applies StandardScaler, and fits two models:

Logistic Regression — class_weight='balanced', max 1000 iterations
Random Forest — 200 trees, class_weight='balanced_subsample', no max depth cap

6. Risk Scoring & Visualization

The chosen model scores every (cell_id, week) pair with ignition probability. These risk scores power both static matplotlib maps (quantile-normalized) and interactive folium heatmaps with fire point overlays.

Model Performance

Metric	Logistic Regression	Random Forest
ROC-AUC	Reasonable	Higher
PR-AUC	Low (expected)	Low (expected)
Brier Score	Computed	Computed

PR-AUC is inherently low due to extreme class imbalance (~0.5% positive rate). The probability threshold is tuned to 0.002 to maximize recall for fire events.

Top features by Random Forest importance:

tmmx_7day_mean (temperature — dominant predictor)
pop_density
fuel_mode
years_since_last_fire
dist_to_park_m
log_visits

Interactive Heatmap

The final output is an interactive folium map for any selected week:

Risk intensity layer — cell centroids colored by predicted ignition probability
Fire point overlay — actual fires that ignited that week shown as markers
Colorbar legend — low-to-high risk scale using the OrRd colormap
Full interactivity — pan, zoom, hover, and export to HTML

# Generate a risk map for a specific week
make_risk_map_folium("2020-08-17")

Future Work

Add more weather variables (VPD, wind speed, precipitation deficit)
Incorporate elevation and slope from DEM data
Test gradient boosting models (XGBoost, LightGBM)
Extend temporal range beyond 2019–2020
Build a Streamlit or Dash dashboard for real-time risk exploration
Add NDVI/EVI vegetation indices from MODIS or Sentinel-2

License

This project is available under the MIT License.

Built with geospatial Python for wildfire risk research

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
wildfire_proj.ipynb		wildfire_proj.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wildfire Ignition Risk Heatmap for California

Table of Contents

Features

Data Pipeline

Data Sources

Tech Stack

Repository Structure

Getting Started

Prerequisites

Installation

Pipeline Walkthrough

1. Grid Definition

2. Fire Labels

3. Weather Features

4. Human & Geographic Features

5. Modeling

6. Risk Scoring & Visualization

Model Performance

Interactive Heatmap

Future Work

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wildfire Ignition Risk Heatmap for California

Table of Contents

Features

Data Pipeline

Data Sources

Tech Stack

Repository Structure

Getting Started

Prerequisites

Installation

Pipeline Walkthrough

1. Grid Definition

2. Fire Labels

3. Weather Features

4. Human & Geographic Features

5. Modeling

6. Risk Scoring & Visualization

Model Performance

Interactive Heatmap

Future Work

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages