Overview
In order to shift to assembling via spark we need to also tackle how the old_entity table in postgres is updated. currently this happens by extracting from the sqlite files but actually the data can be updated directly from config files itself without happening at the same time.
Pull Request(PR):
Tech Approach
- create a new DAG for configuration.
- focus on just the old_entity table for now we can easily extract additional files in the future
- apply cleaning and review expectations on the old_entity config files to make sure cleaning is represented in tests.
- load into parquet datasets
- load into postgres
- remove old_entity extraction from digital-land-postgres
Acceptance Criteria/Tests
- old_entity should be updated on the platform for all datasets/collections during a single dag run
- DAG run should take place after config update action but before midnight when pipelines kick off
Overview
In order to shift to assembling via spark we need to also tackle how the old_entity table in postgres is updated. currently this happens by extracting from the sqlite files but actually the data can be updated directly from config files itself without happening at the same time.
Pull Request(PR):
Tech Approach
Acceptance Criteria/Tests