Skip to content

Build performance dataset#30

Open
kena-SL wants to merge 4 commits into
mainfrom
build-performance-dataset
Open

Build performance dataset#30
kena-SL wants to merge 4 commits into
mainfrom
build-performance-dataset

Conversation

@kena-SL
Copy link
Copy Markdown
Contributor

@kena-SL kena-SL commented May 15, 2025

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update

Description

This PR creates a new Airflow DAG, build-performance-dataset, which generates a provision-quality output file that is converted to Parquet and uploads them to S3 using an ECS task.

Added DAG build-performance-dataset with:
PythonOperator (configure-dag) to configure required parameters.
EcsRunTaskOperator (build-performance-task) to run the performance dataset build script (build-performance.sh).
build-performance.sh is defined in collection-task repository.

Once triggered the Parquet file can be accessed using
https://files.development.planning.data.gov.uk/performance/provision-quality/entry-date=2025-05-14/provision-quality.parquet (udpate date in the URL)

DAG on development airflow-https://pipelines.development.planning.data.gov.uk/dags/build-performance-dataset/grid

Related Tickets & Documents

Added/updated tests?

We encourage you to keep the code coverage percentage at 80% and above. Please refer to the Digital Land Testing Guidance for more information.

  • Yes
  • No, and this is why: Creates a new DAG
  • I need help with writing tests

[optional] Are there any post deployment tasks we need to perform?

[optional] Are there any dependencies on other PRs or Work?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant