Skip to content

anant96/Village_Development_Model

Repository files navigation

Village Development Model

The complete dataset can be found here

Directories and their functions :

  • GEE_DataDownload\ - These scripts are used to download the scripts from google earth Engine. They download RGB bands of respectives States of India.
  • PreProcessing_Images\ - These scripts are just to cut the state tiff images into village images in png format. These crops are restricted to be fixed size of 150x150 pixels.
  • PreProcessing_Data\ - These scripts are used to classify villages into levels of rudimentary, intermediate & advanced. They also contain scripts to generate population and nightlight features.
  • Arch1, Arch2 & Arch3_Scripts - These scripts are used to generate the features for each model, training models and also the evaluation scripts.
  • Hypothesis_Testing - Performs the hypothesis validation based on the predicted outputs.
  • Visualisations & Error_Analysis - These scripts generate development/nightlight encoded geojsons to visualise the development of villages. Also contains the scripts for occlusion studies, error analysis & statistics.

Generating Nightlight Data

  • Download the year wise tif file from the site
  • To get an idea of what blobs contain and how do they look check the map image
  • Upload the LongNTL_{year}.tif as assets in the GEE.
  • Update the path and run the script to obtain the blob collection
  • Run the VIIRS extended section of the notebook to generate the selected feature computation scheme of nightlight. Ensure that you select the correct year
  • Use the script to score the features of nightlight data

Generating Population Data

Use the script to generate the data about population and number of households in a village. Wrapper functions like logarithm and square root was also applied to them. Note, that this script takes the original census data as input to generate these population features.

Steps to Reproduce the Pipeline

  • Download shapefiles of states using the script, and make sure to select the correct year and states list.
  • The next step is to cut the shapes of villages from the shapefile of the state. Use the script [You need to set the correct path to i) Where the state fill files are contained, ii) Where the state geojsons are present iii) Where you want the split files to be placed]
  • Now we will run the Jupyter Notebook, which takes the shape files of all villages and calculates the center.
  • Using the village centroid data obtained in the previous step, we will compute the nearest neighboring villages for each village: Follow the script
  • Generate the nightlight and population features as explained above.
    Post this follow the pipeline of individual model that you want to train and test.

Arch-1

  1. Generating Features
  • We first run clustering on the raw census data to generate levels of each census indicator like BF, FC, ASSET etc, using the script
  • Use scripts to create an 80:20 split in datasets for generating the training and testing data sets. This creates a train and test set of each cluster(level) for each indicator. The directory structure for each indicator should look like this
  • In the generated dataset, there was Class Imbalance with cluster-1 having fewer villages, so data augmentation was performed using the script (Done only in BF and Literacy)
  1. Training the CNN Model
  • We also use penalty functions to handle class imbalance for indicators literacy and BF. {Both use different loss functions} Check the script. Not for the other three indicators i.e. MSW, FC & Assets which don't have class imbalance we don't use augmentation or penalty functions.
  • Trained models obatained can be found here

Arch-2

Note that the CNN models are the same for Arch-1 & Arch-2

  • Use the script to combine all the features including nightlight, population, nearest neighbours to generate the final input for training regression models for Arch-2. The combined data for training can be found here
  • To perform temporal transferability of the trained model we use the script to identify features that provide maximum temporal transferability.
  • Use the script to train regression models on the selected feature for each indicator on each level. The trained regression models can be found here
  • After obtaining the regression outputs we run the script to get the regression output on the test set
  • Finally to check the performance of the models we use clusters trained on groundtruth to discretize the output of Arch-2 and calculate the RMSE using this

Arch3

We retrain the CNN regression models for each level of each indicator.

  • Use the training scripts on the same train-val image split as done in Arch-1.
  • Use the script to combine and make the outputs of all the models as well as the nearest neighbours data.
  • Combine the obtained data with the nightlight and population features generated earlier.
  • Use similar scipts like that of arch2 to find neccesary indicators that ensure temporal transferability.
  • Train the regression model on the combined dataset using script

Hypothesis Testing

Use the scripts in the Hypothesis_Testing\ directory to run and validate the hypothesis mentioned in the paper.

Visualisation and Error Analysis

The following are the files and their purposes in the directory Visualisations & Error_Analysis:-

  • plot.ipynb & plot_ntl.ipynb - these create development/nightlight encoded list of villages within a district as a geojsons. These geojsons can be imported to geojson plotter to plot and analyse development. The plotted images looks like this
  • Excluded_Villages.ipynb - this file collates the list of villages that were excluded from the analysis study either due to cloud cover or while recovering villages from the tiff tiles of states.
  • Error_analysis.ipynb - this file analysis the trends in the villages whose ADI values are miscalculated by the model
  • stats.ipynb - this file calculates the statewise mean and standard deviation of error in predicted and calculated ADIs. It also creates CDF plots between ADI_2001, ADI_2011 & ADI_2019
  • corelation.ipynb - analysis whether the decrement in the ADI values has a correlation with nightlights
  • occlusion.ipynb - helps validate if the CNN models learns important features on the map of villages. Overlaying the heatmaps on the actual village images lead to visualisations like these

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages