Skip to content

Incorrect flagging of geographical outliers in multi-species datasets #104

@ChrKoenig

Description

@ChrKoenig

The test for geographical outliers produces inconsistent results for datasets with a single vs. multiple species. Apparently, incorrect flags are assigned in the multi-species datasets. I didn't have time to dig into the root cause of it, but it seems that the row indexes for the flags get mixed up somewhere along the way.

Here's a reproducible example for the genus Alouatta:

library(CoordinateCleaner)
library(rgbif)
library(dplyr)

# Download Alouatta records from GBIF
alouatta_data <- occ_search(
  scientificName = "Alouatta", 
  hasCoordinate = TRUE,
  fields = c("species", "decimalLongitude", "decimalLatitude"),
  limit = 10000
)$data

# Select relevant columns and remove records with missing coordinates
alouatta_clean <- alouatta_data %>%
  filter(!is.na(decimalLongitude) & !is.na(decimalLatitude)) %>% 
  distinct()

# Flag coordinates
flags_single <- alouatta_clean %>% filter(species == "Alouatta caraya") %>% CoordinateCleaner::clean_coordinates(tests = "outliers")
flags_mult <- alouatta_clean %>% CoordinateCleaner::clean_coordinates(tests = "outliers")


result single species:

dplyr::filter(flags_single, .summary == F) 

#         species        latitude        longitude .val  .otl .summary
# Alouatta caraya       -58.03303        -26.18056 TRUE FALSE    FALSE
# Alouatta caraya       -58.04886        -26.18373 TRUE FALSE    FALSE
# Alouatta caraya        35.07838       -106.66348 TRUE FALSE    FALSE
# Alouatta caraya        10.74601        -84.17884 TRUE FALSE    FALSE

--> four records

result multiple species:

dplyr::filter(flags_mult, .summary == F & species == "Alouatta caraya") 

#         species        latitude        longitude .val  .otl .summary
# Alouatta caraya       -27.01190        -59.44896 TRUE FALSE    FALSE
# Alouatta caraya       -28.53233        -57.13952 TRUE FALSE    FALSE
# Alouatta caraya       -28.56564        -59.25179 TRUE FALSE    FALSE
# Alouatta caraya       -19.44745        -57.07579 TRUE FALSE    FALSE
# Alouatta caraya       -28.56640        -59.26846 TRUE FALSE    FALSE
# Alouatta caraya       -28.57930        -59.24632 TRUE FALSE    FALSE
# Alouatta caraya       -24.97473        -60.88110 TRUE FALSE    FALSE
# Alouatta caraya       -28.40994        -57.18283 TRUE FALSE    FALSE

--> eight records, all well within core range, no overlap with single species flags

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions