Summer Training Programme on Remote Sensing and GIS β 2025
India Space Academy | Department of Space Education
Prepared by: Satwik Shreshth Β |Β MCA 2nd Year, Sikkim University (Central University)
This project generates a 10-metre resolution Land Use Land Cover (LULC) map of East Sikkim district, Sikkim, India using supervised ensemble machine learning classifiers trained on multi-temporal Sentinel-2 Surface Reflectance imagery. All satellite processing was performed on the Google Earth Engine (GEE) cloud computing platform; accuracy evaluation and visualisation were done in Python (scikit-learn) and QGIS.
Two classifiers are compared β Random Forest (RF) and Gradient Tree Boosting (GTB) β across five thematic land cover classes. RF was selected for the final map export.
10-m resolution LULC map of East Sikkim β Random Forest classifier, visualised in QGIS.
| Symbol | Class |
|---|---|
| π΅ Blue | Water |
| π’ Dark Green | Forest |
| β¬ White | Highland / Snow Cover |
| π΄ Red | Built-up Area |
| Greyish Pink | Barren Land |
LULC_Classification/
β
βββ π EastSikkimSHP/ # District boundary shapefile (FAO GAUL 2015)
β βββ AOI_EastSikkim.shp
β βββ AOI_EastSikkim.shx
β βββ AOI_EastSikkim.dbf
β βββ AOI_EastSikkim.prj
β
βββ π TrainingPoint Gangtok/ # Manual ground-truth training points (GEE asset)
β
βββ π eval_plots_LULC_EastSikkim/ # All evaluation plots
β βββ confusion_matrix_RF.png
β βββ confusion_matrix_GTB.png
β βββ roc_curves_RF.png
β βββ roc_curves_GTB.png
β βββ f1_comparison.png
β βββ omission_commission.png
β βββ overall_accuracy_kappa.png
β
βββ π LULC.ipynb # Jupyter Notebook β Preprocessing, Feature Engineering, Training accuracy assessment & plots
βββ πΌοΈ LULC_East_Sikkim.png # Final LULC map (PNG, QGIS export)
βββ ποΈ LULC_East_Sikkim_Clipped.tif # Full-resolution clipped GeoTIFF
βββ ποΈ LULC_EastSikkim_RF.tif # Random Forest classified raster (GeoTIFF)
βββ π¦ eval_plots_LULC_EastSikkim.zip # Compressed evaluation plots archive
βββ π report_data.json # Accuracy metrics and evaluation data
βββ π Report.pdf # Full project report
βββ π README.md # This file
| Parameter | Value |
|---|---|
| District | East Sikkim, Sikkim, India |
| Area | ~1,521.8 kmΒ² |
| Latitude | 27.14Β°N β 27.42Β°N |
| Longitude | 88.44Β°E β 88.92Β°E |
| Elevation Range | ~300 m to >4,000 m |
| Coordinate System | WGS84 / EPSG:4326 |
- Source:
COPERNICUS/S2_SR_HARMONIZEDon Google Earth Engine - Period: January 2020 β January 2024 | Cloud filter: <80%
- Cloud Masking: QA60 bitmask (bits 10 & 11); snow pixels preserved where NDSI > 0.40
- Composite: Pixel-wise median clipped to AOI
| Category | Features |
|---|---|
| Spectral Bands | B2, B3, B4, B8, B11, B12 |
| Spectral Indices | NDVI, NDWI, NDSI |
| Phenological | NDVIwin (DecβFeb), NDVImon (JunβSep), NDVIdiff |
| Terrain (SRTM) | Elevation, Slope, Aspect |
| ID | Class | Description |
|---|---|---|
| 0 | Water | Rivers, lakes, reservoirs |
| 1 | Forest | Dense / closed-canopy forest |
| 2 | Highland / Snow Cover | Alpine shrubland, seasonal snow, glaciers |
| 3 | Built-up Area | Urban areas, impervious surfaces |
| 4 | Barren Land | Bare rock, soil, scree |
| Source | Count |
|---|---|
| Manual ground-truth points (GEE digitisation) | 346 |
| ESA WorldCover 2021 stratified samples | 1,000 |
| Total (after null removal) | 1,337 |
| Training split (70%, seed=42) | 883 |
| Validation split (30%) | 425 |
| Classifier | Key Hyperparameters |
|---|---|
| Random Forest (RF) | 200 trees, 3 vars/split, min leaf = 2, seed = 42 |
| Gradient Tree Boosting (GTB) | 200 trees, shrinkage = 0.05, seed = 42 |
| Classifier | Overall Accuracy | Cohen's Kappa | Macro AUC-ROC |
|---|---|---|---|
| Random Forest (RF) | 75.76% | 0.6956 | 0.9414 |
| Gradient Tree Boosting (GTB) | 76.00% | 0.6990 | 0.9481 |
Both Kappa values exceed 0.60 β confirming substantial agreement beyond chance.
| Class | RF F1 | GTB F1 | RF AUC | GTB AUC |
|---|---|---|---|---|
| Water | 81.77 | 80.45 | 0.9526 | 0.9556 |
| Forest | 73.68 | 73.47 | 0.9370 | 0.9498 |
| Highland / Snow Cover | 66.18 | 66.19 | 0.9207 | 0.9194 |
| Built-up Area | 86.41 | 87.56 | 0.9756 | 0.9767 |
| Barren Land | 62.71 | 66.67 | 0.9209 | 0.9391 |
Built-up Area achieved the highest F1 (>86%) for both classifiers.
GTB outperforms RF by +3.96 F1 points for Barren Land.
RF achieves higher recall for Forest.
All five classes exceed AUC > 0.92 for both classifiers.
- A Google Earth Engine account
- Python 3.8+ with
scikit-learn,matplotlib,numpy,pandas - QGIS 3.x (for map visualisation)
- Open GEE Code Editor
- Copy the contents of
Script.jsand paste into the editor - Update the training asset path if needed:
var trainingData = ee.FeatureCollection('projects/YOUR_PROJECT/assets/training_gangtok');
- Click Run β the LULC map will appear in the map panel
- Go to the Tasks tab and click Run to export the GeoTIFF to Drive
git clone https://github.com/satwik-shreshth/LULC_Classification.git
cd LULC_Classification
pip install scikit-learn matplotlib numpy pandas
jupyter notebook LULC.ipynb| Tool | Purpose |
|---|---|
| Google Earth Engine | Cloud-based satellite data processing & classification |
| Sentinel-2 SR (ESA Copernicus) | Primary satellite imagery |
| USGS SRTM 30m DEM | Terrain features |
| ESA WorldCover 2021 | Stratified training sample source |
| FAO GAUL 2015 | District boundary |
| GEE SMILE Library | RF & GTB classifier training |
| scikit-learn | AUC-ROC, confusion matrices, evaluation plots |
| QGIS | Map cartography & export |
| Python / Jupyter | Analysis and visualisation |
- Overall accuracy of ~76% reflects inherent spectral mixing in complex Himalayan terrain at 10-m resolution
- WorldCover-derived samples may propagate label noise in transition zones
- No post-classification spatial smoothing β salt-and-pepper noise visible in heterogeneous zones
- Persistent monsoon cloud cover may introduce temporal bias at high-altitude pixels
- Area estimates from pixel counts have not been bias-corrected for map accuracy
- SAR-optical fusion β Sentinel-1 SAR for improved snow/bare rock discrimination
- Object-Based Image Analysis (OBIA) β reduce noise at class boundaries
- Deep learning β CNN/U-Net approaches exploiting spatial context
- Bias-corrected area estimation β following Olofsson et al. (2014)
- Temporal change detection β multi-year LULC change mapping
- India Space Academy (ISA) β Summer Training Programme 2025 opportunity and mentorship
- ESA β Sentinel-2 data (Copernicus Programme) & WorldCover 2021
- USGS β SRTM Digital Elevation Model
- Google LLC β Earth Engine cloud platform
- FAO β GAUL boundary dataset
Report submitted: 12 August 2025 Β |Β India Space Academy, Department of Space Education
β If you found this useful, please star the repository!
