Compare unstructured data (Flanders case study)

Langeraert, Ward ORCID logo ¹²³ Cartuyvels, Emma¹³ Van Daele, Toon¹³ Research Institute for Nature and Forest (INBO)⁴ European Union's Horizon Europe Research and Innovation Programme (ID No 101059592)⁵

keywords: structured data; data quality; unstructured data; data cubes; biodiversity informatics

Scripts to explore the conditions that determine the reliability of models, trends and status by comparing aggregated cubes with structured monitoring schemes.

This code is developed in context of T4.5 of the B-Cubed project.

Analysis workflow

This repository follows a reproducible, pipeline-based workflow built around {targets}. The analysis proceeds in four clearly separated stages: data acquisition, preparation, pipeline execution, and reporting.

1. Data acquisition (raw → processed)

Raw biodiversity data are downloaded and pre-processed using dedicated R Markdown reports.

What to run

prepare_abv_data.Rmd
prepare_data_10km.Rmd

Where

source/reports/prepare_data/

What happens

Downloads the latest available versions of the required datasets (mainly via GBIF).
Alternatively, the exact same data versions used in the analyses can be retrieved by following the GBIF download links embedded in the Rmd files.
Performs initial cleaning and standardisation.
Adds spatial (geometric) information.

Outputs

Raw data are stored in: data/raw/
Cleaned and enriched datasets are written to: data/processed/ in both .csv and .gpkg formats.

2. Species list preparation (shared input)

Both analysis pipelines rely on a consistent list of ABV bird species.

What to run

get_abv_species.R

Where

source/R/

What happens

Extracts and prepares the list of ABV species.
This list is used to filter observations consistently across all pipelines, ensuring comparability between structured and unstructured data sources.

3. Analysis pipelines (targets)

All core analyses are implemented as {targets} pipelines, allowing reproducible, incremental, and efficient execution.

What to run

run_pipeline.R

Where

Inside the folder of the pipeline you want to execute, e.g.: source/pipelines/<pipeline_name>/

What happens

Builds and runs the complete dependency graph defined by {targets}.
Aggregates data into cubes, fits models, and computes indicators as defined in the pipeline.
Intermediate and final results are cached automatically by {targets}.

See https://books.ropensci.org/targets/ for details on how {targets} works and how to inspect or debug pipelines.

4. Reporting and visualisation

Once a pipeline has been successfully run, results can be summarised and visualised using dedicated reports.

What to run

The relevant R Markdown (.Rmd) files

Where

source/reports/<analysis_name>/

What happens

Reads outputs generated by the corresponding {targets} pipeline.
Produces figures, tables, and narrative summaries.
Creates output directories automatically if they do not yet exist.
A logical order in which to run the reports is:
1. explorative_analysis
2. comparing_biodiv_indicators
3. standardisation
4. dataset_cv

Outputs

Stored under: output/<analysis_name>/

Repository structure

The repository is organised to clearly separate data, analysis pipelines, and reporting. All necessary directories are created automatically during execution.

├── source
│   ├── pipelines                  ├ {targets} pipelines (one folder per analysis)
│   │     └── ...
│   ├── R                          ├ shared R helper scripts
│   └── reports                    ├ Rmd reports based on pipeline outputs
│         └── ...
│
├── data
│   ├── raw                        ├ manually created; stores raw downloaded data
│   ├── interim                    ├ automatically created; stores R Markdown cache data
│   └── processed                  ├ automatically created; cleaned & spatialised data
│
├── output                         ├ automatically created; analysis outputs (figures, results)
│     └── ...
│
├── README.md                      ├ project description
├── LICENSE.md                     ├ license
├── CITATION.cff                   ├ citation metadata
├── comp-unstructured-data.Rproj   ├ RStudio project file
│
├── checklist.yml                  ├ checklist package configuration
├── organisation.yml               ├ organisation metadata
│
├── inst
│   └── en_gb.dic                  ├ custom dictionary for checklist
├── .github
│   ├── workflows
│   │   └── checklist_project.yml  ├ GitHub Actions workflow
│   ├── CODE_OF_CONDUCT.md
│   └── CONTRIBUTING.md
└── .gitignore

author ↩ ↩² ↩³
contact person ↩
Research Institute for Nature and Forest (INBO), Herman Teirlinckgebouw, Havenlaan 88 PO Box 73, B-1000 Brussels, Belgium ↩ ↩² ↩³
copyright holder ↩
funder ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compare unstructured data (Flanders case study)

Analysis workflow

1. Data acquisition (raw → processed)

2. Species list preparation (shared input)

3. Analysis pipelines (targets)

4. Reporting and visualisation

Repository structure

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 389 Commits
.github		.github
data		data
inst		inst
source		source
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE.md		LICENSE.md
README.md		README.md
checklist.yml		checklist.yml
comp-unstructured-data.Rproj		comp-unstructured-data.Rproj
organisation.yml		organisation.yml

Folders and files

Latest commit

History

Repository files navigation

Compare unstructured data (Flanders case study)

Analysis workflow

1. Data acquisition (raw → processed)

2. Species list preparation (shared input)

3. Analysis pipelines (targets)

4. Reporting and visualisation

Repository structure

Footnotes

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages