Village Development Model

The complete dataset can be found here

Directories and their functions :

GEE_DataDownload\ - These scripts are used to download the scripts from google earth Engine. They download RGB bands of respectives States of India.
PreProcessing_Images\ - These scripts are just to cut the state tiff images into village images in png format. These crops are restricted to be fixed size of 150x150 pixels.
PreProcessing_Data\ - These scripts are used to classify villages into levels of rudimentary, intermediate & advanced. They also contain scripts to generate population and nightlight features.
Arch1, Arch2 & Arch3_Scripts - These scripts are used to generate the features for each model, training models and also the evaluation scripts.
Hypothesis_Testing - Performs the hypothesis validation based on the predicted outputs.
Visualisations & Error_Analysis - These scripts generate development/nightlight encoded geojsons to visualise the development of villages. Also contains the scripts for occlusion studies, error analysis & statistics.

Generating Nightlight Data

Download the year wise tif file from the site
To get an idea of what blobs contain and how do they look check the map image
Upload the LongNTL_{year}.tif as assets in the GEE.
Update the path and run the script to obtain the blob collection
Run the VIIRS extended section of the notebook to generate the selected feature computation scheme of nightlight. Ensure that you select the correct year
Use the script to score the features of nightlight data

Generating Population Data

Use the script to generate the data about population and number of households in a village. Wrapper functions like logarithm and square root was also applied to them. Note, that this script takes the original census data as input to generate these population features.

Steps to Reproduce the Pipeline

Download shapefiles of states using the script, and make sure to select the correct year and states list.
The next step is to cut the shapes of villages from the shapefile of the state. Use the script [You need to set the correct path to i) Where the state fill files are contained, ii) Where the state geojsons are present iii) Where you want the split files to be placed]
Now we will run the Jupyter Notebook, which takes the shape files of all villages and calculates the center.
Using the village centroid data obtained in the previous step, we will compute the nearest neighboring villages for each village: Follow the script
Generate the nightlight and population features as explained above.
Post this follow the pipeline of individual model that you want to train and test.

Arch-1

Generating Features

We first run clustering on the raw census data to generate levels of each census indicator like BF, FC, ASSET etc, using the script
Use scripts to create an 80:20 split in datasets for generating the training and testing data sets. This creates a train and test set of each cluster(level) for each indicator. The directory structure for each indicator should look like this
In the generated dataset, there was Class Imbalance with cluster-1 having fewer villages, so data augmentation was performed using the script (Done only in BF and Literacy)

Training the CNN Model

We also use penalty functions to handle class imbalance for indicators literacy and BF. {Both use different loss functions} Check the script. Not for the other three indicators i.e. MSW, FC & Assets which don't have class imbalance we don't use augmentation or penalty functions.
Trained models obatained can be found here

Arch-2

Note that the CNN models are the same for Arch-1 & Arch-2

Use the script to combine all the features including nightlight, population, nearest neighbours to generate the final input for training regression models for Arch-2. The combined data for training can be found here
To perform temporal transferability of the trained model we use the script to identify features that provide maximum temporal transferability.
Use the script to train regression models on the selected feature for each indicator on each level. The trained regression models can be found here
After obtaining the regression outputs we run the script to get the regression output on the test set
Finally to check the performance of the models we use clusters trained on groundtruth to discretize the output of Arch-2 and calculate the RMSE using this

Arch3

We retrain the CNN regression models for each level of each indicator.

Use the training scripts on the same train-val image split as done in Arch-1.
Use the script to combine and make the outputs of all the models as well as the nearest neighbours data.
Combine the obtained data with the nightlight and population features generated earlier.
Use similar scipts like that of arch2 to find neccesary indicators that ensure temporal transferability.
Train the regression model on the combined dataset using script

Hypothesis Testing

Use the scripts in the Hypothesis_Testing\ directory to run and validate the hypothesis mentioned in the paper.

Visualisation and Error Analysis

The following are the files and their purposes in the directory Visualisations & Error_Analysis:-

plot.ipynb & plot_ntl.ipynb - these create development/nightlight encoded list of villages within a district as a geojsons. These geojsons can be imported to geojson plotter to plot and analyse development. The plotted images looks like this
Excluded_Villages.ipynb - this file collates the list of villages that were excluded from the analysis study either due to cloud cover or while recovering villages from the tiff tiles of states.
Error_analysis.ipynb - this file analysis the trends in the villages whose ADI values are miscalculated by the model
stats.ipynb - this file calculates the statewise mean and standard deviation of error in predicted and calculated ADIs. It also creates CDF plots between ADI_2001, ADI_2011 & ADI_2019
corelation.ipynb - analysis whether the decrement in the ADI values has a correlation with nightlights
occlusion.ipynb - helps validate if the CNN models learns important features on the map of villages. Overlaying the heatmaps on the actual village images lead to visualisations like these

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Village Development Model

Directories and their functions :

Generating Nightlight Data

Generating Population Data

Steps to Reproduce the Pipeline

Arch-1

Arch-2

Arch3

Hypothesis Testing

Visualisation and Error Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Arch1		Arch1
Arch2		Arch2
Arch3_Scripts		Arch3_Scripts
GEE_DataDownload		GEE_DataDownload
Hypothesis_Testing		Hypothesis_Testing
PreProcessing_Data		PreProcessing_Data
PreProcessing_Images		PreProcessing_Images
Visualisations & Error_Analysis		Visualisations & Error_Analysis
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Village Development Model

Directories and their functions :

Generating Nightlight Data

Generating Population Data

Steps to Reproduce the Pipeline

Arch-1

Arch-2

Arch3

Hypothesis Testing

Visualisation and Error Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages