Skip to content

DiegoFlores00/Darwin_Core_DwC.

Repository files navigation

Data Standardization and Curation using Darwin Core (DwC)

Standardization and curation of biodiversity records using the Darwin Core (DwC) data standard

This repository documents a data management and curation workflow developed to improve to quality, structure, and interoperability of biodiversity datasets. The workflow focuses on transforming heterogeneous biodiveristy records into a standardized structure based on the Darwin Core biodiversity data standard.

The repository provides scripts, documentation, and examples that demonstrate how biodiversity data can be curated, normalized, and prepared for integration into global biodiversity information infrastructures.

Objetives

  • Consolidate a centralized database containing information on species diversity, distribution, and abundance.
  • Standardize and format biodiversity datasets according to the core structure and terms of the Darwin Core (DwC) standard.
  • Improve data quality, traceability, and reproducibility during the curation process.
  • Facilitate data interoperability, reuse, and publication across biodiversity information platforms.

Target audience

This repository may be useful for:

  • Researchers working with biodiversity and ecological datasets
  • Environmental professionals and biodiversity monitoring practitioners
  • Undergraduate and graduate students in ecology, biology, and enviromental sciences
  • Biodiversity data managers and curators
  • Citizen science initiatives and biodiversity data contributors

Use of the Darwin Core Standard

This repository applies the Darwin Core biodiversity data standard as the primary framework for structuring biodiversity recods.

Darwin Core is designed to document:

  • Biological occurrences
  • Sampling events
  • Taxonomic information
  • Spatial and temporal context of observations

Because of this design, some types of ecological data can be directly represented using DwC terms, while others must be documented as metadata or extensions.

The table below summarizes the compatibility of different ecological data types with the Darwin Core standard.

Data Type Compatibility with Darwin Core

Data type DwC compatibility Description/considerations
Occurrence and distribution data ✅ High Core of DwC using terms such as occurrenceID, scientificName, eventDate y locality.
Presence/absence data ⚠️ Medium Documented using occurrenceStatus; reliables absence data requires a well -defined sampling design.
Abundance and individual counts ✅ High Represented using individualCount, organismQuantity and organismQuantityType.
Biomas, size, and life stages ⚠️ Medium Life stages (lifeStage) can be recorded; detailed morphometric data may require extensions.
Abiotic measurements ⚠️ Medium Documented through the MeasurementOrFact; extension DwC is not primarily an environmental data standard.
Biotic measurements ⚠️ Medium Documented using MeasurementOrFact or descriptive fields, with limited support for complex interactions.
Sampling methods and study areas ✅ High (metadata) Documented using samplingProtocol, eventRemarks, locationID, and dataset-level metadata.
Sample processing protocols ❌ Low Outside DwC's main scope; recommended to documented to document in EML metadata or external documentation.

Data Processing Workflow

The workflow implemented in this repository includes several steps aimed at ensuring data consistency, traceability, and interoperability.

1. Data consolidation

Multiple source datasets were integrated into a unifed structure while preserving the original information.

2. Cleanining and normalization

Key fields were standardized, including:

  • scientific names
  • locality names
  • taxonomic authorities
  • date formars
  • sampling metadata

3. Taxonomic reconcilitiation

Scientific names were reviewed and standardized using external taxonomic authorities such as the World Register of Marine Species to ensure taxonomic consistency.

4. Event and occurrence structuring

Records were reorganized following the Event Core and Occurrence Core structure, alowwing hierachical relationships between sampling events and species occurrences.

5. Recod expansion and relational indexing

Occurrence records were expanded when necessary to represent species-event relationships, allowing a normalized struture compatible with biodiveristy databases.

6. Reproducible scripting

Data processing steps were implemented using reproducible scriptrs to ensure transparency and facilitate future updates of the dataset.

Data Curation and Standardization Services

this repository demonstrates the scope and capabilities of biodiversity data curation and standardization under the Darwin Core framework, including:

  • Curation of occurrence records with or without spatial information (geographic coordinates)
  • Standardization and validation of sicentific names, including correction of typographical errors and taxonomic normalization
  • Conversion of heterogeneous date formats to ISO 8601
  • Normalization of key fields such as locality identifiers, taconomic auyhorities, and individual counts
  • Implementation of hierarchical identifiers linking documents, events, and species occurrences
  • Documentation of curation decisions to ensure traceability of original records
  • Preparation of datasets compatible with publication in biodiversity infrastructures such as Global Biodiversity Information Facility and Ocean Biodiversity Information System

Final cosiderations

Darwin Core is not a comprehensive ecological data standard, but rather a schema designed to facilitate interoperability of biodiversity occurrence data.

Environmental variables, detailed methodologies, and laboratory protocols should be documented primarily as metadata, often using complementary standards such as:

  • EML (Ecological Metadata Language)
  • OBIS-ENV extensions
  • domain-specific environmental data schemas

Combining these standards allows biodiversity datasets to remain both interoperable and scientifically reproducible.

About

Standardization of biodiversity records using Darwin Core. This work is applicable to data derived from field monitoring programs, biological collections, citizen science initiatives, and research projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors