-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathR_Tutorial.Rmd
More file actions
271 lines (206 loc) · 7.14 KB
/
R_Tutorial.Rmd
File metadata and controls
271 lines (206 loc) · 7.14 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
---
title: "Tutorial"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
---
Basic functions of R :
Loading data, basic data manipulation and analysis, plotting, saving
---
First we want to load some packages
```{r}
library(tidyverse) # core package for data manipulation
```
Then we want to load our .csv or .txt datasets
```{r}
# Example : read.csv("Path where your CSV file is located on your computer\\File Name.csv")
data <- read.csv("some_data.csv")
```
Now we might want to view the data
```{r}
View(data)
```
Now we can ask some questions of the data. i.e how many cells per animal? Basic mean, sd, sem
```{r}
average_data <- data %>%
select(Mouse, number_of_cells) %>%
group_by(Mouse) %>%
summarise(mean = mean(number_of_cells), sd = sd(number_of_cells), count = sum(number_of_cells))
```
Plot the data as a bar graph
```{r}
ggplot(data=average_data, aes(x= Mouse, y = count)) +
geom_bar(stat="identity",width = 0.9, alpha = .4) +
labs(y = "Count", x = "\nMouse id") +
theme_classic() +
theme(axis.text.x = element_text(size=16),
axis.text.y = element_text(size=16),
text = element_text(size=16),
legend.text=element_text(size=16),
axis.title.y = element_text(margin = margin(t = 0, r = 20, b = 0, l = 0)))
```
Save the plot to .png
```{r}
ggsave(file = "cell_count.png", width = 3, height = 4)
```
Does cell number depend on brain region?
Lets run the analysis again but include brain region as a factor
```{r}
average_data <- data %>%
select(Mouse, number_of_cells, estimated_location) %>%
group_by(estimated_location, Mouse) %>%
summarise(mean = mean(number_of_cells), sd = sd(number_of_cells), count = sum(number_of_cells))
```
Plot the data again but grouped by brain region
```{r}
ggplot(data=average_data, aes(x= estimated_location, y = count)) +
geom_bar(stat="identity",width = 0.9, alpha = .4) + # alpha defines the opacity
labs(y = "Count", x = "\nMouse id") +
theme_classic() +
theme(axis.text.x = element_text(size=16),
axis.text.y = element_text(size=16),
text = element_text(size=16),
legend.text=element_text(size=16),
axis.title.y = element_text(margin = margin(t = 0, r = 20, b = 0, l = 0)))
```
Install package for color palettes
```{r}
# Install
install.packages("wesanderson")
# Load
library(wesanderson)
```
Plot the data again but grouped by brain region
```{r}
ggplot(data=average_data, aes(x= estimated_location, y = count, fill=estimated_location)) +
geom_bar(stat="identity",width = 0.9, alpha = .7) +
scale_fill_manual(values=wes_palette(n=5, name="Cavalcanti1")) +
labs(y = "Count", x = "\nMouse id") +
theme_classic() +
theme(axis.text.x = element_text(size=14),
axis.text.y = element_text(size=14),
legend.title = element_blank(),
legend.position = "bottom",
text = element_text(size=14),
legend.text=element_text(size=14),
axis.title.y = element_text(margin = margin(t = 0, r = 20, b = 0, l = 0)))
```
We can even see if this is statistically significant!
```{r}
one.way <- aov(number_of_cells ~ estimated_location, data = data)
summary(one.way)
```
Save results as a .txt or .csv
```{r}
write_csv2(average_data, "average_data.csv")
```
---
More complicated applications of R :
Loading data with python, building functions, efficient plotting
---
## load some packages
```{r}
library(reticulate) # package that allows R to call python code
library(pheatmap) # this package lets you make nice heatmaps
```
## load pickled dataframes i.e. python output
First we need to set up the python environment. This is so we can call a python script from R that loads the pickled dataframes and sends it back to the R workspace.
The python environment needs to be >v.3 as 2.7 (system python) doesnt have Pandas package which is needed to open dataframes
```{r}
require(reticulate) # if a particular package is needed, you can use require to check its loaded
Sys.setenv(RETICULATE_PYTHON = "/usr/local/bin/python/bin/python") # working directory for python
```
Load python code to load pickle dataframe
```{r}
source_python("pickle_reader.py") # run python script which loads the dataframes - should be in working directory
```
Specify name of dataframe to load, and load it
```{r}
dataframe_to_load <- "spatial_firing_test.pkl" # name of the pickled dataframe want to load
dataset <- read_pickle_file(file.path(dataframe_to_load)) # function to call in the python code
```
Alternatively, you can load a .Rda dataframe that has been previousy saved in R
```{r}
spatial_firing <- readRDS(file="dataset.Rda")
```
View the data : because its large, we will just view a few rows & check that it's all good
```{r}
View(head(dataset, n=3)) # this loads the first 5 rows of the dataframe
```
Introducting functions...
# Plot heat map of firing rate across location for all neurons
First, scale firing rate for all neurons
1. make function to load rates and normalise
2. Run on dataframe
```{r}
normalise_rates <- function(m){
m <- as.vector(unlist(m))
m <- (m - min(m))/(max(m)-min(m))
return(m)
}
```
```{r}
dataset <- dataset %>%
mutate(normalised_rates = map(Rates_averaged_rewarded_b, normalise_rates))
```
Add position to for plotting
```{r}
add_position <- function(df) {
df <- tibble(Rates = df, Position = rep(1:200))
}
```
```{r}
dataset <- dataset %>%
mutate(nested_normalised_rates = map(normalised_rates, add_position))
```
Extract columns (normalised rates) for plotting into a tibble
```{r}
concat_firing <- unnest(select(dataset, cluster_id, nested_normalised_rates))
```
First convert it to wide format
```{r}
wide_DF <- concat_firing %>% spread(Position, Rates)
```
Change the column/row names of the dataframe
```{r}
colnames(wide_DF) <-rep(1:200, times=1)
rownames(wide_DF) <- paste("neuron", 1:nrow(dataset), sep="_")
```
Get rid of unused columns
```{r}
#remove unused column
name <- "cluster_id"
wide_DF <- wide_DF %>% select(-one_of(name))
```
Now we can plot the heatmap using pheatmap
```{r}
myheatmap<-pheatmap(wide_DF,cluster_cols = F, cluster_rows = F, show_rownames = T, show_colnames = F )
```
Save the heatmap (bit of a nightmare here...)
```{r}
save_pheatmap_png <- function(x, filename, width=1300, height=2500, res = 250) {
png(filename, width = width, height = height, res = res)
grid::grid.newpage()
grid::grid.draw(x$gtable)
dev.off()
}
save_pheatmap_png(myheatmap, "my_heatmap_all.png")
```
Save the analysed dataframe with new results
```{r}
saveRDS(dataset, file="dataset_mod.Rda")
```