Ticket Contents
Scraping of data from the Agri Census website, publish as vector asset, use in GEE pipeline to get a tehsil vectorized map. Especially relevant since crop classification is a hard problem
Goals
Goals
Expected Outcome
Extent of different crop types (tehsil)
Acceptance Criteria
Data Scraping
-
Given the Agri Census website is accessible, when the scraper runs, then all relevant tehsil-level boundary and attribute data should be extracted without missing fields.
-
Given the website has paginated or tabular content, when the scraper is executed, then it must handle pagination and table parsing correctly.
-
Scraped data should be stored in a structured format (GeoJSON/CSV with geometry).
Vector Asset Publishing
-
Given the scraped tehsil-level data, when the transformation script runs, then the data must be cleaned (valid geometries, proper CRS, consistent attributes).
-
The processed data must be published as a vector asset (GeoJSON or shapefile uploaded to Earth Engine).
-
Metadata (source, date, schema) must be attached to the asset.
GEE Pipeline Integration
-
Given the vector asset is available in Earth Engine, when the GEE pipeline runs, then it must use the asset to generate a tehsil-level vectorized map.
-
The pipeline should confirm that tehsil boundaries align with existing project geometries.
-
The output should be validated by visual overlay on GEE to ensure no missing or misaligned tehsils.
Implementation Details
- Python notebook or integrate directly into computing module
Mockups/Wireframes
No response
Product Name
Agriculture census (tehsil level)
Organisation Name
C4GT
Domain
No response
Tech Skills Needed
Python
Organizational Mentor
@amanodt @kapildadheech @ankit-work7
Angel Mentor
No response
Complexity
Medium
Category
Backend
Ticket Contents
Scraping of data from the Agri Census website, publish as vector asset, use in GEE pipeline to get a tehsil vectorized map. Especially relevant since crop classification is a hard problem
Goals
Goals
Expected Outcome
Extent of different crop types (tehsil)
Acceptance Criteria
Data Scraping
Given the Agri Census website is accessible, when the scraper runs, then all relevant tehsil-level boundary and attribute data should be extracted without missing fields.
Given the website has paginated or tabular content, when the scraper is executed, then it must handle pagination and table parsing correctly.
Scraped data should be stored in a structured format (GeoJSON/CSV with geometry).
Vector Asset Publishing
Given the scraped tehsil-level data, when the transformation script runs, then the data must be cleaned (valid geometries, proper CRS, consistent attributes).
The processed data must be published as a vector asset (GeoJSON or shapefile uploaded to Earth Engine).
Metadata (source, date, schema) must be attached to the asset.
GEE Pipeline Integration
Given the vector asset is available in Earth Engine, when the GEE pipeline runs, then it must use the asset to generate a tehsil-level vectorized map.
The pipeline should confirm that tehsil boundaries align with existing project geometries.
The output should be validated by visual overlay on GEE to ensure no missing or misaligned tehsils.
Implementation Details
Mockups/Wireframes
No response
Product Name
Agriculture census (tehsil level)
Organisation Name
C4GT
Domain
No response
Tech Skills Needed
Python
Organizational Mentor
@amanodt @kapildadheech @ankit-work7
Angel Mentor
No response
Complexity
Medium
Category
Backend