SAR change detection, supervised generalisation testing, and explainable GP depth regression for the November 2019 South Yorkshire floods.
GEOL0069 (AI4EO) - Final Project | UCL Earth Sciences
Explore the project description »
Watch the video walkthrough
·
View the notebooks
·
Environmental cost
Table of Contents
In November 2019, an exceptional meteorological event dropped a month's worth of rainfall over South Yorkshire in 24 hours. The River Don breached its banks, severely flooding the village of Fishlake and damaging approximately 1,600 properties in the region.
This repository presents a machine learning pipeline addressing two central challenges in satellite-based disaster response:
- Flood Extent Detection: Optimising microwave backscatter properties from Sentinel-1 Synthetic Aperture Radar (SAR).
- Flood Depth Estimation: Developing topographically driven regressions via data fusion of SAR, Sentinel-2 multispectral optical indices, and Digital Elevation Models (DEM).
Rather than validating models on a single scene, we address three targeted locations to isolate genuine spatial and temporal generalisation from simple pixel memorisation:
- Training Scene: Fishlake, November 2019 (Peak flood event).
- Spatial Test Scene: Bentley / Toll Bar, November 2019 (Same storm event, distinct floodplain characteristics).
- Temporal Test Scene: Fishlake, January 2021 (Storm Christoph – same geographic coordinate, different environmental preconditions and flood boundaries).
- Do machine learning classifiers (Random Forest, SVM, CNN) trained on a change-detection backscatter threshold baseline extract underlying physical indicators that generalise across space and time, or do they simply mirror the empirical rule they were given?
- Does the integration of post-flood optical water-colour indices provide complementary insights for flood depth estimation over a flat floodplain compared to standard terrain-derived proxies alone? Which spectral features are most informative?
- Core Platforms: Google Earth Engine & geemap
- Sensors: Sentinel-1 (SAR) & Sentinel-2 (MSI) via the Copernicus programme
- Data & Terrain: Copernicus 30 m Global DEM
- Machine Learning:
scikit-learn(Random Forest, SVM, Gaussian Process Regression with ARD) - Deep Learning:
TensorFlow/Keras(Patch-based 2D Convolutional Neural Network) - Explainable AI (XAI) & Tracking:
SHAP&CodeCarbon
GEOL0069_Project_FloodDetection/
├── README.md <- Primary landing page & project overview
├── PROJECT_DESCRIPTION.md <- Introduction to the problem
├── ENVIRONMENTAL_COST.md <- Emissions tracking & analysis
├── Sentinel1_INFOGRAPHIC.png
├── LICENSE
├── Project_Notebooks/
│ ├── Flood_Notebook1_DataAcquisition.ipynb <- SAR/DEM/S2 acquisition + threshold baseline
│ ├── Flood_Notebook2_Classification.ipynb <- RF / SVM / CNN + generalisation tests
│ └── Flood_Notebook3_Regression_XAI.ipynb <- depth proxy, GP regression, ARD, SHAP
└── images/
├── LOGO_README.png
├── Notebook1_3Scenes_ThresholdBaseline.png
├── Notebook1_FishlakeSentinel1.png
├── Notebook1_FishlakeThresholdBaseline.png
├── Notebook2_BentleyTollBar_3Models.png
├── Notebook2_ModelComparisonIoU.png
├── Notebook3_FloodUncertainty+Depth.png
├── Notebook3_GP_ARD.png
├── Notebook3_Kmeans.png
├── Notebook3_NDWI+MNDWI.png
└── Notebook3_SHAPbeeswarm.png
| Notebook | Content |
|---|---|
| 1. Data Acquisition | Fetches Sentinel-1 SAR (pre/mid-flood), Copernicus DEM, and Sentinel-2 optical imagery via Earth Engine for all three scenes. Computes an independent SAR change-detection Threshold Baseline flood map for each scene, used throughout the project as a reference rather than ground truth. |
| 2. Classification | Splits the training scene into a confident core and an ambiguous margin to avoid label circularity, then trains Random Forest, SVM, and a CNN on the confident core. Evaluates all three against the Threshold Baseline on the ambiguous margin (in-scene), the spatial test scene, and the temporal test scene. |
| 3. Regression & XAI | Defines a DEM-based relative depth proxy, computes Sentinel-2 water-colour indices (NDWI, MNDWI, Stumpf ratio), and compares three Gaussian Process regression approaches (SAR+terrain, optical, combined) using ARD lengthscales for interpretation. Cross-checks with K-means clustering and SHAP, and discusses the project's environmental footprint. |
Every comparison made in this project is compared against a single SAR change-detection rule: backscatter that drops by more than 3 dB between the pre-flood and mid-flood Sentinel-1 acquisitions, and falls below -17 dB in absolute terms, is flagged as flooded. This Threshold Baseline is the fixed reference point against which every downstream model in Notebooks 2 and 3 is tested.
Figure 1: Pre-flood vs. mid-flood Sentinel-1 backscatter intensity over the study area, the difference between the two, and the Copernicus DEM for reference.
Figure 2: The resulting Threshold Baseline flood extent, used as the reference point throughout the project.
Figure 3: The Threshold Baseline flood extent for all of the Scenes used: Fishlake 2019, Bentley/Toll Bar 2019, and Fishlake 2021.
Training a classifier directly on the Threshold Baseline's own output would guarantee near-perfect agreement by construction, since the labels and the inputs come from the same rule. To avoid this, the training scene is split into a Confident Core (ΔVV < -4.5 dB, confidently flooded, or ΔVV > -1.5 dB, confidently dry) used for training, and an Ambiguous Margin (everything in between) held out for evaluation. Random Forest, SVM (RBF kernel), and a patch-based 2D CNN are trained only on the Confident Core, then tested on the Ambiguous Margin and on two fully independent scenes: a different location hit by the same storm (Bentley/Toll Bar), and the same location during a different storm fourteen months later (Fishlake, January 2021).
Figure 7: Predicted flood extent from Random Forest, SVM, and the CNN for Bentley/Toll Bar, Nov 2019.
Figure 9: An overall model comparison of 'Intersection over Union' (our accuracy measurement) for Margin pixels, Spatial Testing (Bentley/Toll Bar), and Temporal Testing (Fishlake Jan 2021).
SAR backscatter provides floodwater locations, but no depth information. With no obvious LiDAR or gauge data available for this floodplain, depth is approximated with a DEM-derived proxy in the spirit of Height Above Nearest Drainage (Rennó et al., 2008): the maximum elevation along the flood's edge, minus each pixel's own elevation. This is compared against the post-flood Sentinel-2 scene's water-colour indices - NDWI, MNDWI, and the Stumpf ratio, computed from the green, NIR, and SWIR bands. Three Gaussian Process regression models, each using a kernel with Automatic Relevance Determination (ARD), are fitted on SAR+terrain features alone, optical features alone, and the two combined. The resulting feature lengthscales are cross-checked against SHAP values from an independent Random Forest regressor.
(Part of) Figure 10: NDWI and MNDWI derived from the post-flood Sentinel-2 scene, used as regression features.
Figure 16: Global SHAP feature attribution for the Random Forest depth regressor, cross-checked against the GP/ARD lengthscales.
Classification generalisation
| Model | Margin IoU (in-scene) | Spatial IoU (Bentley) | Temporal IoU (Jan 2021) |
|---|---|---|---|
| Random Forest | 0.980 | 0.994 | 0.974 |
| Support Vector Machine | 0.963 | 0.981 | 0.952 |
| Patch-based 2D CNN | 0.547 | 0.679 | 0.421 |
Random Forest generalised most consistently across all three evaluation axes, consistent with learning something close to the underlying SAR threshold rule rather than scene-specific texture. The CNN generalised worst and notably dropped further on the temporal test than the spatial one, suggesting some reliance on acquisition-specific SAR texture rather than a fully transferable flood signal.
Depth regression
| Feature configuration | RMSE (m) | R² |
|---|---|---|
| SAR + terrain | 0.141 | 0.173 |
| Optical only | 0.165 | 0.112 |
| Combined (SAR + terrain + optical) | 0.136 | 0.186 |
Combining SAR/terrain and optical features gave the best held-out performance. This provided a slight improvement over either feature set alone, but this improvement was limited by how flat the floodplain is relative to the resolution of the depth proxy.
Explainability
ARD lengthscales and SHAP partially agree on which features matter (elevation and water-colour indices both feature prominently) but disagree on the specific ranking, illustrating that two reasonable XAI methods applied to related models don't always produce the same pattern of results.
- A Google account with Earth Engine access enabled (required for Notebook 1's data acquisition).
- Python 3.10+ if running locally, or just a Google account if running in Google Colab (recommended - this is how the notebooks were developed and tested).
If running in Google Colab, only the packages not already in the Colab runtime need installing - this is handled by the !pip install cells at the top of each notebook:
pip install geemap shap codecarbon -qIf running locally, install everything from requirements.txt:
git clone https://github.com/eemeleems/GEOL0069_Project_FloodDetection.git
cd GEOL0069_Project_FloodDetection
pip install -r requirements.txtEach notebook is self-contained and should be run in order (1 → 2 → 3), since Notebooks 2 and 3 load the feature stack saved by the previous notebook.
Every model trained in Notebooks 2 and 3 is tracked with CodeCarbon, and the resulting energy and CO2 figures are discussed in context - alongside the UK grid carbon intensity and the broader footprint of AI/data-centre electricity demand - in ENVIRONMENTAL_COST.md.
- Doncaster Council (2020), Section 19 Flood Investigation Report: November 2019 Flood Event. Flood Risk Management Team. doncaster.gov.uk/services/emergencies/flood-recovery-report
- GEOL0069: AI for Earth Observations, Module Content, University College London. AI4EO Github - CPOM.
- International Energy Agency (IEA) (2024), Energy demand from AI: Tracking global data centre electricity trends. iea.org/reports/energy-and-ai/energy-demand-from-ai
- Lundberg, S.M. and Lee, S.-I. (2017), A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765-4774. proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
- McFeeters, S.K. (1996), The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. International Journal of Remote Sensing, 17(7), 1425-1432. doi.org/10.1080/01431169608948714
- Rasmussen, C.E. and Williams, C.K.I. (2006), Gaussian Processes for Machine Learning. MIT Press. gaussianprocess.org/gpml
- Rennó, C.D., Nobre, A.D., Cuartas, L.A., Soares, J.V., Hodnett, M.G., Tomasella, J. and Waterloo, M.W. (2008), HAND, a new terrain descriptor using SRTM-DEM: Mapping terra-firme rainforest environments in Amazonia. Remote Sensing of Environment, 112(9), 3469-3481. doi.org/10.1016/j.rse.2008.03.018
- Roberts, D.R., Bahn, V., Ciuti, S., Boyce, M.S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J.J., Schröder, B., Thuiller, W., Warton, D.I., Wintle, B.A., Hartig, F. and Dormann, C.F. (2017), Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8), 913-929. doi.org/10.1111/ecog.02881
- Sefton, C., Muchan, K., Parry, S., Matthews, B., Barker, L., Turner, S. and Hannaford, J. (2021), The 2019/2020 floods in the UK: a hydrological appraisal. Weather, 76, 378-384. doi.org/10.1002/wea.3993
- Stumpf, R.P., Holderied, K. and Sinclair, M. (2003), Determination of water depth with high-resolution satellite imagery over variable bottom types. Limnology and Oceanography, 48(1), 547-556. doi.org/10.4319/lo.2003.48.1_part_2.0547
- Tupas, M.E., Roth, F., Bauer-Marschallinger, B. and Wagner, W. (2023), An intercomparison of Sentinel-1 based change detection algorithms for flood mapping. Remote Sensing, 15(5), 1200. doi.org/10.3390/rs15051200
- UK Department for Energy Security and Net Zero (DESNZ) (2024), Greenhouse gas reporting: conversion factors 2024. gov.uk/government/publications/greenhouse-gas-reporting-conversion-factors-2024
- UN-SPIDER Knowledge Portal, Recommended Practice: Flood Mapping and Damage Assessment Using Sentinel-1 SAR Data in Google Earth Engine. un-spider.org/.../recommended-practice-google-earth-engine-flood-mapping/step-by-step
- Xu, H. (2006), Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. International Journal of Remote Sensing, 27(14), 3025-3033. doi.org/10.1080/01431160600589179
Emily Grace Adams - LinkedIn - emily.adams.25@ucl.ac.uk
Project Link: https://github.com/eemeleems/GEOL0069_Project_FloodDetection
- This project is the final assignment for GEOL0069 Artificial Intelligence for Earth Observation (25/26) at University College London.
- Thank you to Prof. Michel Tsamados, Weibin Chen and Shambu Bhandari Sharma for the GEOL0069 module content and guidance this project builds on.
- Thank you to ESA/Copernicus for the availability of Sentinel-1 and Sentinel-2 data, and to Google Earth Engine for the processing platform.
- Best-README-Template, on which this README's structure is based.
Distributed under the MIT License. See LICENSE for more information.
