Langeraert, Ward


keywords: structured data; data quality; unstructured data; data cubes; biodiversity informatics
Scripts to explore the conditions that determine the reliability of models, trends and status by comparing aggregated cubes with structured monitoring schemes.
This code is developed in context of T4.5 of the B-Cubed project.
This repository follows a reproducible, pipeline-based workflow built around {targets}. The analysis proceeds in four clearly separated stages: data acquisition, preparation, pipeline execution, and reporting.
Raw biodiversity data are downloaded and pre-processed using dedicated R Markdown reports.
What to run
prepare_abv_data.Rmdprepare_data_10km.Rmd
Where
source/reports/prepare_data/
What happens
- Downloads the latest available versions of the required datasets (mainly via GBIF).
- Alternatively, the exact same data versions used in the analyses can be retrieved by following the GBIF download links embedded in the Rmd files.
- Performs initial cleaning and standardisation.
- Adds spatial (geometric) information.
Outputs
- Raw data are stored in:
data/raw/ - Cleaned and enriched datasets are written to:
data/processed/in both.csvand.gpkgformats.
Both analysis pipelines rely on a consistent list of ABV bird species.
What to run
get_abv_species.R
Where
source/R/
What happens
- Extracts and prepares the list of ABV species.
- This list is used to filter observations consistently across all pipelines, ensuring comparability between structured and unstructured data sources.
All core analyses are implemented as {targets} pipelines, allowing reproducible, incremental, and efficient execution.
What to run
run_pipeline.R
Where
- Inside the folder of the pipeline you want to execute, e.g.:
source/pipelines/<pipeline_name>/
What happens
- Builds and runs the complete dependency graph defined by
{targets}. - Aggregates data into cubes, fits models, and computes indicators as defined in the pipeline.
- Intermediate and final results are cached automatically by
{targets}.
See https://books.ropensci.org/targets/ for details on how
{targets}works and how to inspect or debug pipelines.
Once a pipeline has been successfully run, results can be summarised and visualised using dedicated reports.
What to run
- The relevant R Markdown (
.Rmd) files
Where
source/reports/<analysis_name>/
What happens
- Reads outputs generated by the corresponding
{targets}pipeline. - Produces figures, tables, and narrative summaries.
- Creates output directories automatically if they do not yet exist.
- A logical order in which to run the reports is:
explorative_analysiscomparing_biodiv_indicatorsstandardisationdataset_cv
Outputs
- Stored under:
output/<analysis_name>/
The repository is organised to clearly separate data, analysis pipelines, and reporting. All necessary directories are created automatically during execution.
├── source
│ ├── pipelines ├ {targets} pipelines (one folder per analysis)
│ │ └── ...
│ ├── R ├ shared R helper scripts
│ └── reports ├ Rmd reports based on pipeline outputs
│ └── ...
│
├── data
│ ├── raw ├ manually created; stores raw downloaded data
│ ├── interim ├ automatically created; stores R Markdown cache data
│ └── processed ├ automatically created; cleaned & spatialised data
│
├── output ├ automatically created; analysis outputs (figures, results)
│ └── ...
│
├── README.md ├ project description
├── LICENSE.md ├ license
├── CITATION.cff ├ citation metadata
├── comp-unstructured-data.Rproj ├ RStudio project file
│
├── checklist.yml ├ checklist package configuration
├── organisation.yml ├ organisation metadata
│
├── inst
│ └── en_gb.dic ├ custom dictionary for checklist
├── .github
│ ├── workflows
│ │ └── checklist_project.yml ├ GitHub Actions workflow
│ ├── CODE_OF_CONDUCT.md
│ └── CONTRIBUTING.md
└── .gitignore