PFTC_Peru3Tutorial/2_RtutorialPeru.Rmd at master · Plant-Functional-Trait-Course/PFTC_Peru3Tutorial · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
title: 'R tutorial Peru: trait data exploration'
author: "Julia Chacón"
date: "22/3/2020"
output: html_document
---

<style>
p.caption {
  font-size: 0.8em;
}
</style>

```{r, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE
)
```

```{r load-packages, include=FALSE}
library("tidyverse")
library("dataDownloader")
library("vegan")
library("ggvegan")
library("patchwork")
library("glue")
```

```{r read trait data, include=FALSE}
traits <- read_csv(file = "traits/data/PFTC3.7_Traits_2018_Peru_cleaned.csv")
```

# PFTC3 Wayqecha (PERÚ)

## Exploring how the data are organized

### Trait data

We suggest starting exploring the data sets of traits, asking for the names of the columns, the levels of each factor, and dimensions.

```{r Trait data, eval=FALSE}
colnames(traits)
```

Explore a bit the levels of each factor, and the variable names.

```{r Trait data2, eval=TRUE}
unique(traits$Elevation)
```

```{r Trait data3, eval=TRUE}
unique(traits$Treatment)
unique(traits$Site)## just change here the name of the column you want to explore
```

## Multivariate exploratory
### PCA

This document explains how to perform a PCA using `vegan::rda()` and `ggvegan::autoplot.rda()`

First, subset the big "traits" tibble into a smaller one. Log-transform some of the traits, select a few traits that are quite complete and the columns Taxon and Site.

```{r transform-data, eval=TRUE}
traits.sel <- traits %>% # trait selection
    mutate(Leaf_Thickness_Ave_mm.log = log(Leaf_Thickness_Ave_mm),
           Leaf_Area_cm2.log = log(Leaf_Area_cm2)) %>%
    select(Taxon, Site, Plant_Height_cm,
           Leaf_Thickness_Ave_mm.log, Leaf_Area_cm2.log)
```

Select only those data from the site Wayquecha. For doing the PCA we only need the traits values. Eliminate the NAs too.

```{r filter-data, eval=TRUE}
traits.WAY <- traits.sel %>%
  filter(Site == "WAY") %>%
  drop_na()
```

Now, store the result of the PCA into an object pca.WAY. Eliminate the columns Site and Taxon, we don't need them for the PCA.
Then, do the PCA using `vegan::rda()`.
We use the argument `scale = TRUE` so that each variable is transformed to have a mean of zero and unit variance.
This is necessary because the variables have different scales.

```{r PCA, eval=TRUE}
pca.WAY <- traits.WAY %>% # store result as "pca.WAY"
  select(-Site, -Taxon) %>%   # remove Site & Taxon columns
  rda(scale = TRUE) #do the PCA
```

And plot it using the `autoplot` function (this is coming from the `ggvegan` package, and is using `autoplot.rda`).

```{r plot-PCA, eval=TRUE}
autoplot(pca.WAY) +
  theme(legend.position = "none")
```

Every leaf appears as a red dot.
The traits are shown by arrows. Unfortunately the defaults have made the arrows too long, but we can control that with the `const` argument to `autoplot.rda` (You will need the latest version of `ggvegan` for this to work - install with `remotes::install_github("richardjtelford/ggvegan")`).

```{r better-plot-PCA, eval=TRUE}
autoplot(pca.WAY, const = c(0.2, 1)) +
  xlim(-.22, NA) + # make more space for the labels
  theme(legend.position = "none")
```

It would be nice to show PCA with different colours for the different species.
We cannot do that directly with `autoplot.rda()`, but we can do it manually, using `fortify()` to extract the scores from the PCA.
This gives us lots of control over the plot.

```{r better-plot-PCA2, eval=TRUE}
#extract scores from ordination
fpca.WAY <- fortify(pca.WAY, const = c(0.2, 1), axes = 1:2)

head(fpca.WAY)#inspect the object to see the differences between "sites" and "species" in vegan terminology

#get the "site" scores (i.e. each leaf)
fpca.WAY_leaves <- fpca.WAY %>%
  filter(Score == "sites") %>%
  bind_cols(traits.WAY)

# get the trait/variable scores ("species" in vegan terminology)
fpca.WAY_traits <- fpca.WAY %>%
  filter(Score == "species")

#plot the leaf scores
g <- ggplot(fpca.WAY_leaves, aes(x = PC1, y = PC2)) +
  geom_point(aes(colour = Taxon), show.legend = FALSE) +
  coord_equal() # force scale to be correct

g
```

Add trait arrows to plot:

```{r better-plot-PCA3, eval=TRUE}
g <- g +
  geom_segment(data = fpca.WAY_traits,
               aes(x = 0, y = 0, xend = PC1, yend = PC2),
               arrow = arrow(length = unit(0.2, "cm")),
               colour = "navy") +
  geom_text(data = fpca.WAY_traits, # add labels
            aes(x = PC1, y = PC2, label = Label), nudge_x = -0.005, size = 3, hjust = 1)

g <- g +
  xlim(-.20, NA) + #space for labels
  theme(legend.position = "none")# no legend as too large
g
```

We can finish off the plot by adding the proportion of variance explained by each axis to the plot.
We can calculate this with the function `eigenvals`.
We can make the labels with `glue::glue()` or with `paste()`.

```{r better-plot-PCA4}
explained <- eigenvals(pca.WAY)[1:2] / sum(eigenvals(pca.WAY)) * 100
explained <- round(explained, 1)

g +
  labs(x = glue("PC1 [{explained[1]}%]"), y = glue("PC1 [{explained[2]}%]"))
```