Data Standardization and Curation using Darwin Core (DwC)

Standardization and curation of biodiversity records using the Darwin Core (DwC) data standard

This repository documents a data management and curation workflow developed to improve to quality, structure, and interoperability of biodiversity datasets. The workflow focuses on transforming heterogeneous biodiveristy records into a standardized structure based on the Darwin Core biodiversity data standard.

The repository provides scripts, documentation, and examples that demonstrate how biodiversity data can be curated, normalized, and prepared for integration into global biodiversity information infrastructures.

Objetives

Consolidate a centralized database containing information on species diversity, distribution, and abundance.
Standardize and format biodiversity datasets according to the core structure and terms of the Darwin Core (DwC) standard.
Improve data quality, traceability, and reproducibility during the curation process.
Facilitate data interoperability, reuse, and publication across biodiversity information platforms.

Target audience

This repository may be useful for:

Researchers working with biodiversity and ecological datasets
Environmental professionals and biodiversity monitoring practitioners
Undergraduate and graduate students in ecology, biology, and enviromental sciences
Biodiversity data managers and curators
Citizen science initiatives and biodiversity data contributors

Use of the Darwin Core Standard

This repository applies the Darwin Core biodiversity data standard as the primary framework for structuring biodiversity recods.

Darwin Core is designed to document:

Biological occurrences
Sampling events
Taxonomic information
Spatial and temporal context of observations

Because of this design, some types of ecological data can be directly represented using DwC terms, while others must be documented as metadata or extensions.

The table below summarizes the compatibility of different ecological data types with the Darwin Core standard.

Data Type Compatibility with Darwin Core

Data type	DwC compatibility	Description/considerations
Occurrence and distribution data	✅ High	Core of DwC using terms such as `occurrenceID`, `scientificName`, `eventDate` y `locality`.
Presence/absence data	⚠️ Medium	Documented using `occurrenceStatus`; reliables absence data requires a well -defined sampling design.
Abundance and individual counts	✅ High	Represented using `individualCount`, `organismQuantity` and `organismQuantityType`.
Biomas, size, and life stages	⚠️ Medium	Life stages (`lifeStage`) can be recorded; detailed morphometric data may require extensions.
Abiotic measurements	⚠️ Medium	Documented through the `MeasurementOrFact`; extension DwC is not primarily an environmental data standard.
Biotic measurements	⚠️ Medium	Documented using `MeasurementOrFact` or descriptive fields, with limited support for complex interactions.
Sampling methods and study areas	✅ High (metadata)	Documented using `samplingProtocol`, `eventRemarks`, `locationID`, and dataset-level metadata.
Sample processing protocols	❌ Low	Outside DwC's main scope; recommended to documented to document in EML metadata or external documentation.

Data Processing Workflow

The workflow implemented in this repository includes several steps aimed at ensuring data consistency, traceability, and interoperability.

1. Data consolidation

Multiple source datasets were integrated into a unifed structure while preserving the original information.

2. Cleanining and normalization

Key fields were standardized, including:

scientific names
locality names
taxonomic authorities
date formars
sampling metadata

3. Taxonomic reconcilitiation

Scientific names were reviewed and standardized using external taxonomic authorities such as the World Register of Marine Species to ensure taxonomic consistency.

4. Event and occurrence structuring

Records were reorganized following the Event Core and Occurrence Core structure, alowwing hierachical relationships between sampling events and species occurrences.

5. Recod expansion and relational indexing

Occurrence records were expanded when necessary to represent species-event relationships, allowing a normalized struture compatible with biodiveristy databases.

6. Reproducible scripting

Data processing steps were implemented using reproducible scriptrs to ensure transparency and facilitate future updates of the dataset.

Data Curation and Standardization Services

this repository demonstrates the scope and capabilities of biodiversity data curation and standardization under the Darwin Core framework, including:

Curation of occurrence records with or without spatial information (geographic coordinates)
Standardization and validation of sicentific names, including correction of typographical errors and taxonomic normalization
Conversion of heterogeneous date formats to ISO 8601
Normalization of key fields such as locality identifiers, taconomic auyhorities, and individual counts
Implementation of hierarchical identifiers linking documents, events, and species occurrences
Documentation of curation decisions to ensure traceability of original records
Preparation of datasets compatible with publication in biodiversity infrastructures such as Global Biodiversity Information Facility and Ocean Biodiversity Information System

Final cosiderations

Darwin Core is not a comprehensive ecological data standard, but rather a schema designed to facilitate interoperability of biodiversity occurrence data.

Environmental variables, detailed methodologies, and laboratory protocols should be documented primarily as metadata, often using complementary standards such as:

EML (Ecological Metadata Language)
OBIS-ENV extensions
domain-specific environmental data schemas

Combining these standards allows biodiversity datasets to remain both interoperable and scientifically reproducible.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
jemplo de estandarización DwC (ANTES		jemplo de estandarización DwC (ANTES
Constancia Comisión oceanográfica intergubernamental.pdf		Constancia Comisión oceanográfica intergubernamental.pdf
README.md		README.md
README_ES.md		README_ES.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Standardization and Curation using Darwin Core (DwC)

Objetives

Target audience

Use of the Darwin Core Standard

Data Type Compatibility with Darwin Core

Data Processing Workflow

1. Data consolidation

2. Cleanining and normalization

3. Taxonomic reconcilitiation

4. Event and occurrence structuring

5. Recod expansion and relational indexing

6. Reproducible scripting

Data Curation and Standardization Services

Final cosiderations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Data Standardization and Curation using Darwin Core (DwC)

Objetives

Target audience

Use of the Darwin Core Standard

Data Type Compatibility with Darwin Core

Data Processing Workflow

1. Data consolidation

2. Cleanining and normalization

3. Taxonomic reconcilitiation

4. Event and occurrence structuring

5. Recod expansion and relational indexing

6. Reproducible scripting

Data Curation and Standardization Services

Final cosiderations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages