Skip to content

Add data.gov catalog harvester#5

Merged
jguo144 merged 2 commits into
mainfrom
jguo144/2026-04-22/add-data-gov-harvester
Apr 27, 2026
Merged

Add data.gov catalog harvester#5
jguo144 merged 2 commits into
mainfrom
jguo144/2026-04-22/add-data-gov-harvester

Conversation

@jguo144
Copy link
Copy Markdown
Collaborator

@jguo144 jguo144 commented Apr 24, 2026

Description

Adds a new harvester plugin that ingests datasets from Data.gov using their catalog API. Details about their API can be found at https://resources.data.gov/catalog-api/.

Features

  • DCAT-US metadata mapping: Maps Data.gov's DCAT-US format to CKAN structure (core fields, spatial, temporal, contact points, resources)
  • Configurable query parameters: Supports all Data.gov API parameters (q, keyword, org_slug, spatial_geometry, spatial_within, etc.)
  • Cursor-based pagination: Efficiently handles large result sets using Data.gov's after cursor
  • Theme → Group mapping: Automatically maps Data.gov themes to CKAN groups with slugification
  • Organization filtering: Post-harvest filtering with organizations_filter_include/organizations_filter_exclude
  • Resource ID preservation: Maintains resource IDs across harvests to preserve datastore data
  • Smart date trimming: Trims midnight timestamps to date-only format for cleaner display
  • Composite field support: Enables mapping of converted extras to composite fields via configuration

Files Changed

  • New: ckanext/custom_harvest/harvesters/datagov.py - Main harvester class
  • New: ckanext/custom_harvest/tests/harvesters/mock_datagov.py - Mock server for testing
  • New: ckanext/custom_harvest/tests/harvesters/test_datagov_harvester.py - Comprehensive test suite
  • New: DATA_GOV_HARVESTER.md - Complete usage documentation
  • Modified: ckanext/custom_harvest/converter.py - Added datagov_to_ckan() converter and helpers
  • Modified: setup.py - Registered harvester entry point

Comment thread ckanext/custom_harvest/converter.py
Comment thread ckanext/custom_harvest/harvesters/datagov.py Outdated
Comment thread ckanext/custom_harvest/harvesters/datagov.py
Comment thread ckanext/custom_harvest/converter.py Outdated
Comment thread ckanext/custom_harvest/converter.py Outdated
Comment thread ckanext/custom_harvest/harvesters/datagov.py Outdated
Comment thread ckanext/custom_harvest/converter.py Outdated
Comment thread ckanext/custom_harvest/harvesters/datagov.py Outdated
@jguo144 jguo144 force-pushed the jguo144/2026-04-22/add-data-gov-harvester branch from d26a3f5 to f9bf799 Compare April 24, 2026 17:38
Copy link
Copy Markdown

@nolanpruett nolanpruett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@jguo144 jguo144 merged commit 845d602 into main Apr 27, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants