dlinero-KBAs.github.io/Notebook.Rmd at master · dLinero-KBAs/dlinero-KBAs.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Locating KBAs: An automated workflow for identifying potential Key Biodiversity Areas"
output: html_document
---


The following workflow is designed to facilitate and streamline the scoping analysis for KBA identification by leveraging the free and open access data of GBIF.

The workflow is divided into three main steps:

1. [Select the species in the study area that could potentially trigger KBA criteria.](#first)

2. [Identify the sites where the potential trigger species occur in significant numbers.](#second)

3. [Cite the sources of information used.](#third)

You may want to add/remove steps or modify the code based on your interests.

Before running the code, I recommend visiting this [link](https://portals.iucn.org/library/sites/library/files/documents/2016-048.pdf) and familiarizing yourself with the different KBA criteria and their associated thresholds.

# Setup

```{r setup, echo=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE)
knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())
```


Load the libraries

```{r load libraries, message=FALSE, warning=FALSE}
library(tidyverse)      # To do data wrangling
library(Hmisc)          # To use the %nin% binary operator
library(rgbif)          # To access records through the GBIF API
library(wellknown)      # To handle Well Known Text Objects
library(rredlist)       # To access IUCN information
library(ggplot2)        # To create graphics
library(ggspatial)      # To add north arrows and annotation scales to ggplot maps
library(RCurl)          # To compose HTTP requests
library(xml2)           # To work with HTML and XML
library(rvest)          # To download and manipulate HTML and XML
library(sf)             # To work with vector data
library(sp)             # To work with vector data
library(DT)             # To display tables in the notebook
library(leaflet)        # To make interactive maps
library(knitr)          # To knit the document
library(foreach)        # For parallelization of loops
library(doSNOW)         # For parallelization of loops
library(parallel)       # For parallelization of loops
library(stringi)        # To remove special characters of species common names
library(CoordinateCleaner) # To flag  potentially problematic coordinate records
library(countrycode)    # For turning country codes into  ISO-3 country codes
library(taxadb)         # To find species accepted names
```


Define initial configuration parameters

```{r initial parameters}
# Increase the time allowed to access URLs
curlSetOpt(timeout = 100000)
# Disable scientific notation
options(scipen=999)
```


To access GBIF data, you must have an account on the platform. If you do not have one, create it by entering the following [link](https://www.gbif.org/user/profile). Additionally, to access IUCN information, you must request an API key [here](https://apiv3.iucnredlist.org/).

Once you have both the account and the API key, you can define new variables to store them. The following are just examples that you can replace with your information.

```{r user info example, eval=FALSE}
GBIF_USER <- "pp"
GBIF_PWD <- "pphh"
GBIF_EMAIL <- "pp@gmail.com"
IUCN_REDLIST_KEY = "12345"
```


# 1. Select potential trigger species {#first}

**1.1.** The first thing we will do is download the occurrence records present in our study area. Save the shapefile of your study area in the data folder. For this example, we will focus the scope analysis on the department of Quindío in Colombia.


```{r load study area}
# Load shapefile of the study area
st_area <- st_read("data/Quindio.shp", quiet = TRUE)

# Convert shapefile into a Well Known Text object (WKT)
wkt <- wellknown::sf_convert(st_area)
```

You can also set additional conditions for the data you want to download. Since we are going to explore the spatial location of potential trigger species, I will only download records that have coordinates. Also, I will only use records taken recently.

```{r request GBIF records}
# Download GBIF records
download <- occ_download(

  # Records with coordinates
  pred("hasCoordinate", TRUE),

  # Records taken between 2017 and 2021
  pred_gte("year", 2017),
  pred_lte("year", 2022),

  # Records that fall within the study area
  pred_within(wkt),

  # Records that fall within Colombia
  pred("country", "CO"),

  # Format of the resulting file
  format = "SIMPLE_CSV",

  # GBIF account information
  user = GBIF_USER,
  pwd = GBIF_PWD,
  email = GBIF_EMAIL)
```

After submitting the download request to GBIF, we must wait a few minutes for it to be accepted.

```{r download status}
# Check download status
occ_download_meta(download)$status
```

Once the request status is successful, we can download the data

```{r download data, message=FALSE}
# download the data
z <- occ_download_get(download, path = "data/", overwrite = TRUE)
```

```{r import GBIF data}
# Import data to an R dataset
data <- occ_download_import(z)
```

Now, we will plot the study area and the records downloaded from GBIF, to verify that all the points fall within our polygon of interest.

```{r map all points}
# Turn dataset into an sf object
data_sf <- st_as_sf(data, coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)

# Plot points and study area
ggplot(data = st_area) +
  geom_sf() +
  xlab("Longitude") + ylab("Latitude") + # Change x and y labels
  geom_sf(color = "black", fill = "grey") + # Change polygon and line color
  geom_sf(data=data_sf, color = 'green') + # Add occurrence records
  annotation_scale() + # Add an annotation scale
  annotation_north_arrow(location='br') + # Add the north arrow to the bottom right
  ggtitle("Occurrence records that fall within the study area") # Add title to map

# Remove variable
rm(data_sf)
```


**1.2**. We will now clean the occurrence dataset because this reduces the probability of including erroneous records and ensures that the data are suitable for the analysis (Gueta & Carmel, 2016).


```{r initial records, echo=FALSE}
print(paste("Number of initial records:", nrow(data)))
```


```{r remove duplicates}
# Remove duplicates
data_wDuplicates <- data %>%

  # Remove records that have the same combination of the following variables
  distinct(decimalLongitude,decimalLatitude,speciesKey,datasetKey,eventDate,individualCount,basisOfRecord, .keep_all = TRUE)

# Remove previous dataset
rm(data)
```

```{r records without duplicates, echo=FALSE}
print(paste("Number of records without duplicates:", nrow(data_wDuplicates)))
```


Records obtained from GBIF may have issues and flags that indicate common quality problems. Therefore, it is important to explore the issues present in our dataset and remove the records that you consider inappropriate for the scoping analysis


```{r GBIF issues}
# Examine the GBIF issues present in our dataset
data_issues <- str_split(unique(data_wDuplicates$issue), ";") %>%

  # Get a list of the issues present in the dataset
unlist() %>% unique()
```


```{r GBIF issues definitions}
# Obtain the definition of the GBIF issues present in our dataset
GBIF_issues <- gbif_issues() %>%

  filter(issue %in% data_issues)

print(paste(GBIF_issues$issue, GBIF_issues$description))
```


If you want to delete records with specific problems, you can use the following example code. However, for this example, I will not delete any of the records with issues.

```{r GBIF issues exclutions, eval=FALSE}

data_wIssues <- data_wDuplicates[which(grepl(paste(c("TAXON_MATCH_HIGHERRANK", "INSTITUTION_MATCH_NONE"), collapse = "|"), data_wDuplicates$issue) == FALSE), ]
```


```{r records without issues, echo=FALSE}
print(paste("Number of presence records after removing issues:", nrow(data_wDuplicates)))
```

The [coordinateCleaner package](https://cran.r-project.org/web/packages/CoordinateCleaner/index.html) also provides some tools to flag records with common spatial errors, such as records assigned to the centroid of a country or province, the headquarters of GBIF or other biodiversity institutions. We will use this package to flag suspicious records, but we will not delete any in this example. It is recommended that you further explore flagged records before removing them from the analysis.

```{r using coordinateCleaner}

# Turn into ISO-3 country codes
data_wDuplicates$countryCode <- countrycode(data_wDuplicates$countryCode, origin = 'iso2c', destination = 'iso3c')

# Identify records with potential spatial errors
data_wDuplicates <- clean_coordinates(x = data_wDuplicates,
                           lon = "decimalLongitude",
                           lat = "decimalLatitude",
                           countries = "countryCode",
                           species = "species",
                           tests = c("capitals", "centroids",
                                     "equal","institutions",
                                     "zeros", "countries"))

```

You can use the following dataset called "flagged_records" to further explore the records with potential issues

```{r flagged records}

# Filter flagged records
flagged_records <- data_wDuplicates %>% filter(.summary == FALSE) %>% droplevels()

# Plot flagged and non-flagged records
ggplot(data = st_area) +
  geom_sf() +
  xlab("Longitude") + ylab("Latitude") + # Change x and y labels
  geom_sf(color = "black", fill = "grey") + # Change polygon and line color
  geom_sf(data=(st_as_sf((data_wDuplicates %>% filter(.summary == TRUE) %>% droplevels()), coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)),color = 'green') + # Add occurrence records that were not flagged
  geom_sf(data=(st_as_sf(flagged_records, coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)),color = 'red') + #Add flagged records
  annotation_scale() + # Add an annotation scale
  annotation_north_arrow(location='br') + # Add the north arrow to the bottom right
  ggtitle("Flagged occurrence records in red") # Add title to map
```


Next, we will remove the records that are not identified at least at the species level, because the thresholds associated with the species-based criteria are designed to be applied at this taxonomic level (KBA Standards and Appeals Committee, 2020).

We will also delete the absence records, as they do not provide information on the species present in the study area.

```{r species level}
data_spLevel <- data_wDuplicates %>%

  # Select the records where the taxon rank is at least species
  filter(taxonRank == "SPECIES" | taxonRank == "SUBSPECIES" | taxonRank == "VARIETY") %>%

  # Select presence records
  filter(occurrenceStatus == "PRESENT") %>%

  # drop levels
  droplevels()

# Remove previous dataset
rm(data_wDuplicates)
```

```{r records species level, echo=FALSE}
print(paste("Number of records identified at least at the species level:", nrow(data_spLevel)))
```


The taxonomy used for KBA identification should be consistent with the taxonomic backbone of the IUCN Red List, i.e., the Species Information Service, SIS (KBA Committee on Standards and Appeals, 2020). Thus, we will exclude records with unrecognized species names based on the IUCN Red List assessments.

In addition, we will eliminate the records of species introduced (non-native) in our study area since these cannot trigger KBAs (KBA Committee on Standards and Appeals, 2020).

```{r species table}
# Create a table with the species names present in the occurrence dataset
list_species <- data_spLevel %>%

  # Get unique species names
  distinct(species, .keep_all = TRUE) %>%

  # Keep species taxonomy
  select(kingdom, class, order, family, species)

```


```{r total species, echo=FALSE}
print(paste("Number of species:", length(unique(data_spLevel$species))))
```

The following chunk is a parallelized loop that uses several of your computer's cores to speed up the process of finding the species names under the IUCN taxonomic backbone. By default, it uses the total number of cores on your computer minus 2. If you want to change this, modify the variable called "cl".


```{r retrieve species IUCN names, warning=FALSE, message=FALSE}

# Create a cluster
cl <- makeCluster(detectCores()-2)
# Register the cluster
registerDoSNOW(cl)
# Define the number of iterations
iterations <- length((1:nrow(list_species)))
# Create a progress bar based on the number of iterations
pb <- txtProgressBar(max = iterations, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)


join1 <- foreach(i = (1:nrow(list_species)),
                .combine = rbind,
                .options.snow = opts,
                .packages = "rredlist") %dopar%
  {

    species <- list_species$species[i]

    # Verify if the species name is a synonym of the IUCN accepted name
    x <- rl_synonyms(name =  species, key = IUCN_REDLIST_KEY, parse = TRUE)

    # If there are no synonyms
    if (length(x$result) == 0) {
      # then leave the original name
      IUCN_name <- x$name[1]
    } else {
      # otherwise extract the IUCN accepted name
    IUCN_name <-  x$result$accepted_name[1]
    }

    # Slow down the loop for 3 seconds as too many frequent calls can temporarily block access to the IUCN API
    Sys.sleep(3)

# Use another rredlist function to find out if the name has any information associated with it or if it can't be found

  x2 <- rl_occ_country(IUCN_name, key = IUCN_REDLIST_KEY, parse = TRUE)

  #If there is no info, it could be because the species name does not have a red list assessment. This could be checked manually.

  if ((x2$count) == 0) {
    IUCN_name <- "CHECK"
    }

  # Slow down the loop for 3 seconds as too many frequent calls can temporarily block access to the IUCN API
  Sys.sleep(3)

  # Retrieve results
results <- data.frame(cbind(species, IUCN_name))
return(results)
  }

```

For the  names that could not be recognized using the IUCN API, we will search for the accepted name under the Integrated Taxonomic Information System.


```{r itis accepted name, warning=FALSE, message=FALSE}

# Create a local copy of the Integrated Taxonomic Information System database
taxadb::td_create("itis")

# Create new field to store ITIS accepted name
join1 <- join1 %>% mutate(ITIS_name = NA)

# Retrieve the ITIS accepted name for the species that were not recognized by the IUCN API
for (i in (which(join1$IUCN_name == "CHECK"))){
  # Find the accepted species name under the Integrated Taxonomic Information System
  join1$ITIS_name[i] <- taxadb::filter_name(join1$species[i])  %>%
    mutate(acceptedNameUsage = get_names(acceptedNameUsageID)) %>%
    select(acceptedNameUsage) %>% pull() %>% as.character()

  # If the name is not found, leave "CHECK"
  if (is.na(join1$ITIS_name[i])){
    join1$ITIS_name[i] <- "CHECK"
  }
}

# Apply loop to the species that were not recognized by the IUCN API
# and where the ITIS accepted name is different from name recorded in the GBIF dataset
# Define the number of iterations
iterations <- length(which(!is.na(join1$ITIS_name) & join1$ITIS_name != "CHECK" & join1$ITIS_name != join1$species))
# Create a progress bar based on the number of iterations
pb <- txtProgressBar(max = iterations, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)

# Search for IUCN accepted names for those spcies where the ITIS accepted name
# is different from the name recorded in the GBIF dataset
join1.1 <- foreach(i = (which(!is.na(join1$ITIS_name) & join1$ITIS_name != "CHECK" & join1$ITIS_name != join1$species)),
                .combine = rbind,
                .options.snow = opts,
                .packages = "rredlist") %dopar%
  {

    species <- join1$species[i]
    ITIS_name <- join1$ITIS_name[i]

    # Verify if the species name is a synonym of the IUCN accepted name
    x <- rl_synonyms(name =  ITIS_name, key = IUCN_REDLIST_KEY, parse = TRUE)

    # If there are no synonyms
    if (length(x$result) == 0) {
      # then leave the original name
      new_IUCN_name <- x$name[1]
    } else {
      # otherwise extract the IUCN accepted name
    new_IUCN_name <-  x$result$accepted_name[1]
    }

    # Slow down the loop for 3 seconds as too many frequent calls can temporarily block access to the IUCN API
    Sys.sleep(3)

#Use another rredlist function to find out if the name has any information associated with it or if it can't be found

  x2 <- rl_occ_country(new_IUCN_name, key = IUCN_REDLIST_KEY, parse = TRUE)

  #If there is no info, it could be because the species name does not have a red list assessment. This could be checked manually.

  if ((x2$count) == 0) {
    new_IUCN_name <- "CHECK"
    }

  # Slow down the loop for 3 seconds as too many frequent calls can temporarily block access to the IUCN API
  Sys.sleep(3)

  # Retrieve results
results <- data.frame(cbind(species, new_IUCN_name))
return(results)
  }

# Join results
join1 <- full_join(join1, join1.1, by = "species") %>%
  mutate(IUCN_name = case_when(IUCN_name != new_IUCN_name & !is.na(new_IUCN_name) ~ new_IUCN_name,
                               TRUE ~ IUCN_name)) %>%
  select(species, IUCN_name)

# Remove temporary dataset
rm(join1.1)

# Stop cluster
stopCluster(cl)
```

```{r number of species without IUCN name, echo=FALSE}
print(paste("Number of species without an IUCN name: ", length(which(join1$IUCN_name == "CHECK")), sep = " "))

print(paste("Number of species with an IUCN name: ", length(which(join1$IUCN_name != "CHECK")), sep = " "))

```

```{r join results first phase}

# Include the IUCN name to the GBIF dataset
data_spLevel <- full_join(data_spLevel, join1, by = "species") %>%
  filter(IUCN_name != "CHECK") %>%
  droplevels()

```

To identify whether occurrences are native or exotic/introduced to our study area, we will use spatial data associated with global Red List assessments which describe native and alien ranges of species.If you are only considering one species in this scoping exercise, you can download its distribution polygon from its assessment website. For example if you want to download the distribution of the species *Zimmerius chrysops*, you can do it from this [website](https://www.iucnredlist.org/species/22735791/95118164)

```{r IUCN distribution, fig.cap="How to download species distributions from their assessment webiste", echo=FALSE, out.width = '75%', fig.align="center"}
knitr::include_graphics("methods/dist.png")
```


If you are analyzing several species you can download the distribution polygons for large taxonomic groups from this [website](https://www.iucnredlist.org/resources/spatial-data-download). Please note that for bird species you must request a copy of the shapefiles from Birdlife International and this may take several days.

In this example we will focus only on birds and terrestrial mammals.

We will also use the distributions of the global Red List assessments to extract the seasonality of occurrence records. For example, to identify whether the occurrences of migratory species are within the breeding or non-breeding ranges.

*Note: Since files containing distributions for large taxonomic groups are very heavy, I recommend clipping them to a smaller extent before uploading them to R to increase performance.*


```{r native species, warning=FALSE, message=FALSE}

# Load polygons for the different groups
terrestrial_mammals <- st_read("./data/distributions/MAMMALS_TERRESTRIAL_CLIP.shp", quiet = TRUE)
birds <- st_read("./data/distributions/Birdlife_distributions_clip.shp", quiet = TRUE)

# Select origin and seasonal fields
birds <- birds %>% rename(binomial = sci_name) %>% select(binomial, origin, seasonal, citation)

terrestrial_mammals <- terrestrial_mammals %>%
  select(colnames(birds))

# Merge everything into the same dataset
distributions <- rbind(terrestrial_mammals, birds)

# Remove non-merged polygons
rm(terrestrial_mammals, birds)

# Remove polygons of species that are not recorded in the study area
distributions <- distributions[which(distributions$binomial %in% data_spLevel$IUCN_name), ]

# Remove occurrences of species that do not have a distribution
data_spLevel <- data_spLevel %>% filter(IUCN_name %in% unique(distributions$binomial)) %>% droplevels()

# Create a table with the resulting species names
list_species <- data_spLevel %>%

  # Get unique species names
  distinct(IUCN_name, .keep_all = TRUE) %>%

  # Keep species taxonomy
  select(kingdom, class, order, family, IUCN_name)

# Create a cluster
cl <- makeCluster((detectCores() - 2))
# Register the cluster
registerDoSNOW(cl)
# Define the number of iterations
iterations <- nrow(list_species)
# Create a progress bar based on the number of iterations
pb <- txtProgressBar(max = iterations, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)


data_native <- foreach(i = 1:iterations,
                .combine = rbind,
                .options.snow = opts,
                .packages = c("dplyr", "sf")) %dopar%
    {

      # Turn off the s2 processing
      sf::sf_use_s2(FALSE)

      # Create a spatial dataframe of the species
      occurrences <- data_spLevel %>%
      dplyr::filter(IUCN_name == list_species$IUCN_name[i]) %>%
      droplevels() %>%
      sf::st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)

      # Extract the distribution of that species
      ranges <- distributions[which(distributions$binomial == list_species$IUCN_name[i]), ]

      # Intersect points(records) and distribution
      result <- sf::st_intersection(occurrences, ranges) %>%

      # Define origin of records
      dplyr::mutate(origin = case_when(origin == 1 ~ "Native",
                                origin == 2 ~ "Reintroduced",
                                origin == 3 ~ "Introduced",
                                origin == 4 ~ "Vagrant",
                                origin == 5 ~ "Origin Uncertain",
                                origin == 6 ~ "Assisted Colonisation"),

                    # Define seasonality of records
             seasonal = case_when(seasonal == 1 ~ "Resident",
                                  seasonal == 2 ~ "Breeding Season",
                                  seasonal == 3 ~ "Non-breeding Season",
                                  seasonal == 4 ~ "Passage",
                                  seasonal == 5 ~ "Seasonal Occurrence Uncertain")) %>%

        # Create coordinates fields
        dplyr::mutate(decimalLongitude = sf::st_coordinates(.)[,1],
           decimalLatitude = sf::st_coordinates(.)[,2]) %>%

        # Turn to non-spatial dataframe
      as.data.frame()

      # Retrieve result
      return(result)

    }


# Stop cluster
stopCluster(cl)

# Remove distributions
rm(distributions)

# Remove non-native species
data_native <- data_native %>%
  filter(origin == "Native" | origin == "Reintroduced") %>%
  droplevels() %>%
  select(-c(binomial, geometry))

```


```{r free memory, message=FALSE}
# Free unused R memory
gc()
```


```{r number native species, echo=FALSE}

print(paste("Number of native species: ", length(unique(data_native$IUCN_name)), sep = " "))

```

```{r location native species}

# Plot native records and removed records (Species records that fall outside their respective distributions are removed in the previous step)

ggplot(data = st_area) +
  geom_sf() +
  xlab("Longitude") + ylab("Latitude") + # Change x and y labels
  geom_sf(color = "black", fill = "grey") + # Change polygon and line color
  geom_sf(data=(st_as_sf((data_spLevel %>% filter(occurrenceID %nin% data_native$occurrenceID) %>% droplevels()), coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)),color = 'red') + #Add removed records
  geom_sf(data=(st_as_sf((data_native), coords = c("decimalLongitude", "decimalLatitude"), crs = 4326)),color = 'green') + # Add occurrence records of native species
  annotation_scale() + # Add an annotation scale
  annotation_north_arrow(location='br') + # Add the north arrow to the bottom right
  ggtitle("Occurrences of native species (removed records in red)") # Add title to map

```


With the resulting table, we can easily identify the species that have a red list assessment and are considered native to our study area.

```{r native species preview, echo = FALSE}
# Show preview of the resulting table
datatable((data_native %>% select(kingdom, phylum, class, order, family, genus, species, IUCN_name, origin, seasonal, decimalLongitude, decimalLatitude) %>% arrange(IUCN_name)), caption = "Table 1: preview of the table used to compare GBIF and IUCN taxonomy and to identify the origin and seasonality of each species record", rownames = FALSE, filter="top", options = list(pageLength = 5, scrollX=TRUE))

```


**1.3.** We will now identify the species that could potentially trigger KBA criteria, specifically criteria A1 (for threatened species), B2 (for co-occurring geographically restricted species), and D1 (for demographic aggregations).

For criterion A1 we will identify threatened species (Vulnerable, Endangered and Critically Endangered) based on the IUCN Red List assessments.

```{r IUCN category, warning=FALSE, message=FALSE}

# Remove previous dataset
rm(data_spLevel)

# Create a table with the resulting species names
list_species <- data_native %>%

  # Get unique species names
  distinct(IUCN_name, .keep_all = TRUE) %>%

  # Keep species taxonomy
  select(kingdom, order, family, IUCN_name)

# Create a cluster
cl <- makeCluster((detectCores() - 2))
# Register the cluster
registerDoSNOW(cl)
# Define number of iterations
iterations <- nrow(list_species)
# Create progress bar
pb <- txtProgressBar(max = iterations, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)

join2 <- foreach(i = 1:iterations,
               .combine = rbind,
               .options.snow = opts,
               .packages = "rredlist") %dopar%
  {
    # Search for IUCN assessment
    x <- rl_search(name =  list_species$IUCN_name[i], key = IUCN_REDLIST_KEY, parse = TRUE)

   # Fill in a table with the information retrieved
  IUCN_name <- x$result$scientific_name
  commonName  <- x$result$main_common_name
  global_RedListCategory <- x$result$category
  redList_Criteria <- x$result$criteria

  # Slow down the loop for 3 seconds as too many frequent calls can temporarily block access to the IUCN API
  Sys.sleep(3)

  result <- data.frame(cbind(IUCN_name, commonName, global_RedListCategory, redList_Criteria))
  return(result)
}

# Stop cluster
stopCluster(cl)

```

```{r join results second phase}

# Add info to GBIF dataset
data_native <- full_join(data_native, join2, by = "IUCN_name")
```


```{r second graph, fig.align = 'center'}

# Create a table with the number of species per IUCN category
Second_graph <- (data.frame(table(join2$global_RedListCategory)) %>%
  rename(Category = Var1))

  # Add a row that sums the count for categories LC and LR/lc
Second_graph <- Second_graph %>%

  # Add a row that sums the count for categories LC and LR/lc
  add_row(Category = "LC or LR/lc", Freq = sum(Second_graph$Freq[which(Second_graph$Category == "LC")], Second_graph$Freq[which(Second_graph$Category == "LR/lc")])) %>%

  # Remove the rows of the categories of LC and LR/lc
  filter(Category %nin% c("LC", "LR/lc")) %>% droplevels()

# Plot
ggplot(Second_graph, aes(x=Category, y=Freq, fill=Category)) +
  geom_bar(stat="identity")+theme_minimal() +

  # Change order of levels
  scale_x_discrete(limits=c("LC or LR/lc", "NT", "VU", "EN", "CR", "DD")) +

  # Change x label
  xlab("Red list category") +

  # Add label with the number of species
  geom_text(aes(label=Freq), position=position_dodge(width=0.9), vjust=-0.25) +

  # Change y label
  ylab("Number of species")


```

For criterion B2 we will identify range-restricted species using the list provided by the KBA [website](http://www.keybiodiversityareas.org/working-with-kbas/proposing-updating/criteria-tools). Use that link to download the file, remove the merged rows that provide contextualization information, and save to your data folder as a csv file.

```{r KBA list}
# Load list of geographically restricted species
Range_restricted_species <- read.csv("./data/Restricted_range_species_B2_June2021_web.csv", stringsAsFactors = FALSE, sep = ";") %>%

  # Select the fields of interest
  select(binomial, Range.type) %>%

  # Rename them, so they are the same as the ones of the GBIF dataset
  rename(IUCN_name = binomial, seasonal = Range.type) %>%

  # Create a field holding the combination of the species and the seasonal range type
  mutate(seasonal.combination = paste(IUCN_name, seasonal, sep = "|"))
```


```{r range-restricted}

# Identify the species that appear in the geographically restricted species list with the proper seasonal range
data_native <- data_native %>%

  mutate(seasonal.combination = paste(IUCN_name, seasonal, sep = "|")) %>%

  mutate(geographicallyRestricted = case_when(seasonal.combination %in% Range_restricted_species$seasonal.combination ~ seasonal,
                                              TRUE ~ NA_character_)) %>%

  select(-seasonal.combination)

```


```{r third graph, fig.align = 'center'}
# Create table with the number of range-restricted species
Third_graph <- data_native %>% distinct(IUCN_name, seasonal, .keep_all = TRUE)
Third_graph <- data.frame(table(Third_graph$geographicallyRestricted, useNA = "ifany")) %>%
  rename(Type = Var1)

# Plot number of species that are range restricted
ggplot(Third_graph, aes(x=Type, y=Freq, fill=Type)) +
  geom_bar(stat="identity")+theme_minimal() +

  # Add label with the number of species
  geom_text(aes(label=Freq), position=position_dodge(width=0.9), vjust=-0.25) +

  # Change y label
  ylab("Number of species")


```


Criterion D1 is suitable for application to migratory species that form seasonal aggregations (KBA Standards and Appeals Committee, 2020). Therefore, we will identify migratory birds, and then we will have to verify if they congregate in our study area, using, for example, expert knowledge. We will use the [Birdlife Data Zone](http://datazone.birdlife.org/home) to identify the migratory birds.

```{r migratory birds, warning=FALSE, message=FALSE}

# Remove accents from species common names
data_native <- data_native %>%
  mutate(commonName = stri_trans_general(commonName, "Latin-ASCII"))

# Create a table with the resulting species names
list_species <- data_native %>%

  # Get unique species names
  distinct(IUCN_name, .keep_all = TRUE) %>%

  # Keep species taxonomy
  select(kingdom, class, order, family, IUCN_name, commonName)

# Create a cluster
cl <- makeCluster((detectCores() - 2))
# Register the cluster
registerDoSNOW(cl)
# Create a vector with the index of species that are birds
Aves <- which(list_species$class == "Aves")

# Define number of iterations
iterations <- length(Aves)

# Create progress bar
pb <- txtProgressBar(max = iterations, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)

join3 <- foreach(i = Aves,
               .combine = rbind,
               .options.snow = opts,
               .packages = c("RCurl", "xml2", "rvest")) %dopar%
  {
    # Specify the birdlife base url
  base_url <- paste("http://datazone.birdlife.org/species/factsheet/")

  # Create a character vector with the name of the species that will be included in the url (everything must be in lower case and without special characters)
  species_url <- paste(tolower(stringr::str_replace_all((stringr::str_replace_all(stringr::str_trim(list_species$commonName[i], side = "both"), "[^[:alnum:][:blank:]+-]", "")), " ", "-")),
                       tolower(stringr::str_replace_all(list_species$IUCN_name[i], " ", "-")), sep = "-")

  # Paste base url with species name
  complete_url <- paste0(base_url, species_url)

  # Add a /details at the end, to access to the "Data table and detailed info" tab of the birdlife webpage
  complete_url <- paste0(complete_url, "/details")

  # Turn HTML document into XML
  html  <- read_html(complete_url)

  migratory <- html %>% html_nodes("table") %>%

    # Extract table that has information on migratory status
    .[3] %>% html_table(fill = TRUE) %>%

    # Extract migratory status
    as.data.frame() %>% .[1,2]

  IUCN_name <- list_species$IUCN_name[i]

  result <- data.frame(cbind(IUCN_name, migratory))

  return(result)

  }

# close connection
stopCluster(cl)

```

```{r join migratory results}

# Add migratory status to GBIF dataset
data_native <- full_join(data_native, join3, by = "IUCN_name")

```


```{r fourth graph, fig.align = 'center'}

# Create a table with the number of migratory bird species
Fourth_graph <- data.frame(table(join3$migratory)) %>%
  rename(Type = Var1)

# Plot number of migratory bird species
ggplot(Fourth_graph, aes(x= reorder(Type, -Freq) , y=Freq, fill=Type)) +
  geom_bar(stat="identity")+ theme_minimal() +

  # Add label with the number of species
  geom_text(aes(label=Freq), position=position_dodge(width=0.9), vjust=-0.25) +

  # Change y label
  ylab("Number of species") +

  # Change x label
  xlab("Type of migrant")

```

```{r list of potential trigger species}

# Select the potential trigger species
ptrigger_species <- data_native %>%

  # Filter threatened, range-restricted and / or migratory species
  filter(global_RedListCategory %in% c("VU", "EN", "CR") | !is.na(geographicallyRestricted)  | migratory %nin% c(NA, "not a migrant")) %>%

  droplevels() %>%

  distinct(IUCN_name, seasonal, .keep_all = TRUE) %>%

  select(kingdom, phylum, class, order, family, genus, IUCN_name, commonName, origin, global_RedListCategory, redList_Criteria, geographicallyRestricted, migratory) %>%

  # Arrange by red list category
  arrange(global_RedListCategory)


```

<br><br><br><br>

<style>
div.blue { background-color:#e6f0ff; border-radius: 5px; padding: 20px;}
</style>
<div class = "blue">

**The following table shows the species that could potentially trigger KBA criteria in our study area.**

</div>


```{r list of potential trigger species number, echo=FALSE}
print(paste("Number of potential trigger species:", nrow(ptrigger_species)))
```


```{r list of potential trigger species table , echo=FALSE}
# Show preview of the resulting table
datatable(ptrigger_species, caption = "Table 2: Species that could potentially trigger KBA criteria in the study area", rownames = FALSE, filter="top", options = list(pageLength = 5, scrollX=TRUE))
```


# 2. Identify the sites where the potential trigger species occur in significant numbers {#second}


**2.1** We will use a function that will help us to visually identify the sites where the potential trigger species have large counts or where there are multiple records of potential trigger species.

The function creates an interactive map showing the location of records that meet a user-defined threshold of individuals. It also allows to insert a grid and count the total number of records that belong to potential trigger species per cell, or the number of records that reach a count threshold per cell. The function also allows you to explore the spatial relationship between the points and a polygon different than the study area.


The arguments of the function are the following:

* *GBIFtable:* dataset. The clean GBIF occurrence dataset that includes the fields of IUCN_name, red list category, geographically restricted, and migratory status.

* *potential_triggerSpecies:* dataset. Table containing only the possible trigger species for criteria A1, B2 and D1.

* *study_area:* sf object. Polygon of the study area.

* *criteria:* character. It can be "A1", "B2", "D1" or empty if you want to include all three criteria.

* *selected_species:* character vector. Names of the potential trigger species you want to evaluate. It can be left empty if you want to evaluate all potential trigger species.

* *individualCount_filter:* number. Filter to include only records that reach a specific number of individuals. It can be left empty.

* *gridCell_Size:* number. Cell size for the grid in meters.

* *extra_polygon:* sf object. Another polygon that you want to add to the map.

For this example, I will evaluate the species that could trigger criterion A1 (threatened species), as they do not require any additional review of information.

```{r select species to plot}

species_map <- ptrigger_species %>%

  # Select endangered species
  filter(global_RedListCategory %in% c("VU", "EN", "CR")) %>%

  # Export species names to a character vector
  droplevels() %>% select(IUCN_name) %>% pull() %>% as.character()

```

```{r first function}

# Load a different polygon
polygon <- st_read("./data/IBAs_Quindio.shp", quiet = TRUE)

# Define function
map_of_large_counts <- function(GBIFtable, study_area, criteria, selected_species, individualCount_filter, gridCell_Size, extra_polygon){


  # Select only the occurrence records of the potential trigger species
  dataset <- GBIFtable %>%
    # Filter threatened, range-restricted and / or migratory species
  filter(global_RedListCategory %in% c("VU", "EN", "CR") | !is.na(geographicallyRestricted)  | migratory %nin% c(NA, "not a migrant")) %>%

  droplevels()

  # If there is no argument for the selected species, leave the table as is
  if (missing(selected_species)){
    dataset <- dataset
  } else {
    # Otherwise filter the selected species
    dataset <- subset(dataset, species %in% selected_species) %>%
      droplevels()
  }

  # If there is no extra polygon, then assign the extra polygon as the study area
  if(missing(extra_polygon)){
    extra_polygon <- study_area
  }


  # If the count filter is missing, then leave the table as is