PFTC_ChinaTutorial/3_RtutorialChina.Rmd at master · Plant-Functional-Trait-Course/PFTC_ChinaTutorial · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
---
title: "R tutorial China: Traits"
author: "Julia Chacón"
date: "21/2/2020"
output: html_document
---

<style>
p.caption {
  font-size: 0.8em;
}
</style>

```{r, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE
)
```

```{r, include=FALSE}
source("hidden/starter.R")
```

### Trait Distributions

####  Data wrangling

Here is the code for plotting the traits distributions.
Before plotting the trait distributions, we need to make some changes to the datasets and log transform the variables, to finally merge the two trait data sets into a single one.

First, you can see that Site is stored as "character".
```{r Data wrangling, eval=TRUE}
mode(traitsLeaf$Site)
```

Step 1:  We will log transform the traits, this will help to reduce the skewness of the values and to make the relationships between traits clearer.

```{r Data wrangling2, eval=TRUE}
traitsWideL <- traitsLeaf %>%
  mutate(Wet_Mass_g.log = log(Wet_Mass_g),
         Dry_Mass_g.log = log(Dry_Mass_g),
         Leaf_Thickness_Ave_mm.log = log(Leaf_Thickness_Ave_mm),
         Leaf_Area_cm2.log = log(Leaf_Area_cm2))
```

Step 2: New format to the leaf traits data, with traits and values in two columns:

```{r Data wrangling3, eval=TRUE}
traitsLongL <- traitsWideL %>%
  select(Date, Elevation, Site, Treatment, Taxon, Individual_number, Leaf_number, Wet_Mass_g.log, Dry_Mass_g.log, Leaf_Thickness_Ave_mm.log, Leaf_Area_cm2.log, SLA_cm2_g, LDMC) %>%
  pivot_longer(cols = c("Wet_Mass_g.log", "Dry_Mass_g.log", "Leaf_Thickness_Ave_mm.log", "Leaf_Area_cm2.log", "SLA_cm2_g", "LDMC"),
               names_to = "Traits", values_to = "Value")
```

Step 3: New format to the leaf Chemical traits (same manner). These values are percentage, so we don´t need to log transform these traits:

```{r Data wrangling4, eval=TRUE}
traitsLongC <- traitsChem %>%
  select(Date, Elevation, Site, Treatment, Taxon, P_percent, C_percent, N_percent, CN_ratio, dN15_percent, dC13_percent) %>%
  pivot_longer(cols = c("P_percent", "C_percent","N_percent", "CN_ratio", "dN15_percent", "dC13_percent"),
               names_to = "Traits", values_to = "Value")
```

Step 4: Bind the two data sets (Leaf and Chemical traits), using the two "long" tables, and transform the Site variable variable to a factor, with four levels ("Lowland", "Middle", "Alpine", "High alpine").

```{r Data wrangling5, eval=TRUE}
traitsLong <- traitsLongL %>%
  bind_rows(traitsLongC) %>%
  mutate(Site = factor(Site, levels = c("L", "M", "A", "H"), labels = c("Lowland", "Middle", "Alpine", "High alpine")))
```

A short note:

Reasons for mutating character names into a factor (or why sometimes is better to use factor instead of character in R): This is not 100% necessary and you can load the upcoming codes without mutating the Traits names into a factor, so no worries too much about this lines. But here is the explanation. In case you are working with categorical data (predefined, finite number of values) is generally better to use factors instead of character. Although factors look like as character vectors they are integers in their "heart". They are stored in alphabetical order or we can define a meaningful order (as we did with Site or will do with the Traits names). When defined in a particular order you are forcing the order of facets, or the order of labels, etc... in plots. Compared to simple integers (1,2,3,4), factors are "self-describing".

#### Density plot

Here is the code to plot the traits distributions, using density plots. A density plot is similar to an histogram, and allows visualizing the distribution of a numerical variable (it shows the interval and the peaks of the distribution). The peaks of a density plot show where the values are concentrated over the interval.

We want to plot the distribution of each individual trait, at each site, to visualize their variation. So, first we `filter` the values containing NA´s, since we are not interested in that and transform the Traits names into a factor (they are stored as character). Then, we plot it using `ggplot`. The values of each trait are going to appear in the x coordinate (Value) and their frequency in coordinate y. Density curves appear on different colours based on the Site (`fill = Site`).

`geom_density(alpha = 0.5)´ argument simply fills the density curves with a transparent colour.

`facet_wrap´ argument allows splitting the plot into the different Traits

```{r Density plot, eval=TRUE}
traitsLong <- traitsLong %>%
  filter(!is.na(Value)) %>%
  mutate(Traits = factor(Traits,
                         levels = c("Wet_Mass_g.log", "Dry_Mass_g.log", "Leaf_Thickness_Ave_mm.log", "Leaf_Area_cm2.log", "SLA_cm2_g", "LDMC", "P_percent", "C_percent", "N_percent", "CN_ratio", "dN15_percent", "dC13_percent")))

TraitDist.plot <- ggplot(traitsLong, aes(x = Value, fill = Site)) +
  geom_density(alpha = 0.5) +
  scale_fill_brewer(palette = "RdBu") +
  labs(x = "Mean trait value", y = "Density") +
  facet_wrap( ~ Traits, scales = "free") +
  theme(legend.position = "top")

TraitDist.plot
```

Exercise:

Try to plot the same density plot, but using only the values of one of the treatments. For example, only the "LOCAL" values, or the "OTC" values.

#### Boxplots

A different way of displaying the distribution of the data is using boxplots. Boxplots are a graph showing how the values are spread out, similarly to the density plots. Sometimes boxplots are more informative that other measures of central tendencies like the mean. We are using here the object `TraitDist`, defined before.

```{r Boxplots, eval=TRUE}
TraitBoxp.plot <- ggplot(traitsLong, aes(x = Site, y = Value, fill = Site)) +
 geom_boxplot() +
 scale_fill_brewer(palette = "RdBu") +
  facet_wrap( ~ Traits, scales = "free") +
  theme(legend.position = "top")

TraitBoxp.plot
```

For saving the plots:
```{r, echo=FALSE, eval=FALSE}
#ggsave(TraitDist.plot, filename = "TraitDist.jpg", height = 13, width = 13, dpi = 300)
#ggsave(TraitBox.plot, filename = "TraitDist.jpg", height = 13, width = 13, dpi = 300)
```

### Traits scatterplots

####  Data wrangling

This are just some scatterplots relating 2 traits values. Each dot is an observation, and each coordinate represent the value of each of the traits. Fitting traits again each other is very useful to detect outliers and weird relationships that could indicate an error or outliers in the dataset.

Some examples of scatterplots:
```{r Traits scatterplots, eval=TRUE}
DryWet <- traitsWideL %>%
  ggplot(aes(x = Dry_Mass_g.log, y = Wet_Mass_g.log, colour = Site)) +
  geom_point(alpha = 0.4) +
  scale_color_brewer(palette = "RdBu", direction = -1, labels = c("High alpine", "Alpine", "Middle", "Lowland")) +
  theme(legend.position = "right")
```

If we want to plot different variables, we can just change the aesthetics rather that rewriting the entire plotting code.
```{r Traits scatterplots2, eval=TRUE}
DryArea <- DryWet + aes(x = Dry_Mass_g.log, y = Leaf_Area_cm2.log)
AreaSLA <- DryWet + aes(x = Leaf_Area_cm2, y = SLA_cm2_g)
LDMCThick <- DryWet + aes(x = LDMC, y = Leaf_Thickness_Ave_mm)
```

We are going to plot all the scatterplots together using the `patchwork` package.
```{r Traits scatterplots3, eval=TRUE}
DryWet + DryArea + AreaSLA + LDMCThick + plot_layout(guides = "collect")
```

### Traits means

We want to plot the mean values and standard error of each individual trait, at each site, and for each treatment separately. In the plots that we did before we did not differentiate between treatments, but now we do. We are going to use again `traitsLong` object.

#### Plot of the means per trait

Plotting the mean values and standard errors for each trait, per site and treatment.

Step 1: Summarize the data from `traitsLong`. The next code, will make a summary table with the mean values, standard deviation and standard errors, grouping the column Value, by Trait, Treatment and Site.

I have excluded the values of "outexp" (data from outside the experimental turfs)

```{r Plot of the means per trait2, eval=TRUE}
TraitsSE <- traitsLong %>%
  filter(!is.na(Value)) %>%
  filter(Treatment != "outexp") %>%
  group_by(Site, Treatment, Traits) %>%
  summarize(N = length(Value), mean = mean(Value), sd = sd(Value), se = sd / sqrt(N))
```

Step 2: Make the plot. Define first the `position_dodge`, this argument allow you to move horizontally the errors bars, so they don´t overlap. Exercise: see what happens if you don´t define the `position_dodge`, or try to change the value.

```{r Plot of the means per trait3, eval=TRUE}
pd <- position_dodge(0.4)

TraitsSE.p <- ggplot(TraitsSE, aes(x = Site, y = mean,  colour = Treatment)) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = 0.1, position = pd) +
  geom_point(position = pd) +
  labs(x = "Site", y = "Mean/se trait value") +
  facet_wrap( ~ Traits, scales = "free") +
  theme(legend.position = "top")

TraitsSE.p
```

For saving the plot:
```{r Plot of the means per trait4, echo=TRUE, eval=FALSE}
ggsave(TraitsSE.p, filename = "TraitSE.jpg", height = 13, width = 13, dpi = 300)
```

### Community weighted means

Calculation of community weighted means. An 'easy' (and not very meaningful) example using only data from 2012 and the Control plots.

#### Data wrangling

For the traits calculations we are going to use `traitsLong`, you can check how we built this data frame in the previous codes. Now, just select the variables Site, Treatment, Taxon, Traits and Value to manage a smaller set of data.

```{r}
traitsLong <- traitsLong %>%
  select(Site, Treatment, Taxon, Traits, Value)
```

##### Calculate mean trait values

And calculate the mean trait values for each taxon per site. Trait values can be assign in different ways. But now, we are going to use mean trait values per sites for simplicity.

```{r}
traits.meanvalues <- traitsLong %>%
  filter(!is.na(Value)) %>%
  group_by(Site, Traits, Taxon) %>%
  summarise(mean = mean(Value))
```

##### Calculate the weights

Subset the cover data, by year and treatment, to include only 2012 and Control plots. We are going to use the cover data of each species in each site as a "weight" for calculating the community weighted means.

```{r}
cover_2012C <- cover_thin %>%
  filter(year == "2012") %>% # subset the year
  filter(TTtreat == "control") %>% # subset the control
  select(originSiteID, originPlotID, turfID, TTtreat, year, cover, speciesName) %>%
  mutate(originSiteID = factor(originSiteID, levels = c("L", "M", "A", "H"),
                               labels = c("Lowland", "Middle", "Alpine", "High alpine"))) # select only a few variables (we dont need everything)
```

Now, calculate the species weights based on mean cover of each species at each site in 2012 in the Control plots. Just a mean of the cover per site for each species!

```{r}
spp.wts2012C <- cover_2012C %>%
  group_by(originSiteID, speciesName) %>%
  summarise(wts = mean(cover))
```

Now, we need to merge the species weights with their mean trait values in a single data frame, and we are going to use the column "Taxon" to connect both data frames. But, the two data frames have different column names for the species names as you will see here:

```{r}
names(spp.wts2012C) # names of the columns or variables
names(traits.meanvalues) # names of the columns or variables
```

So, first we are going to change the second column name in `ssp.wts2012C` and call it "Taxon", and after that we can join the two tables.

```{r}
spp.wts2012C <- spp.wts2012C %>%
  rename(Taxon = speciesName, Site = originSiteID)

# and then join:
traits.wts <- inner_join(x = spp.wts2012C, y = traits.meanvalues,
                         by = c("Taxon", "Site")) %>%
    arrange(Site)
```


There we go:

Step 1: Summarize the CWM for each trait (but now we also group by year for calculating)

```{r}
cwm <- traits.wts %>%
  group_by(Site, Traits) %>%
  summarise(wm = weighted.mean(mean, wts, na.rm = TRUE))
```

Step 2: Plot

```{r}
cwm.p <- ggplot(cwm, aes(x = Site, y = wm)) +
 geom_point(shape = 17) +
 labs(x = "Site", y = "CWM", title = "Control plots 2012") +
 facet_wrap( ~ Traits, scales = "free") +
 theme(legend.position = "top", plot.title = element_text(size = (15)))

cwm.p
```