-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathtutorial.Rmd
More file actions
140 lines (95 loc) · 5.27 KB
/
tutorial.Rmd
File metadata and controls
140 lines (95 loc) · 5.27 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: "ERC Analysis Walkthrough"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output:
pdf_document:
toc: yes
html_document:
css: custom.css
toc: yes
authors: Jordan Little, Guillermo Hoffmann Meyer
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
\newpage
## Installing and loading ERC
Make sure to visit the [**installation page**](https://github.com/nclark-lab/erc/blob/main/install.md) to get the code and prerequisite packages before following this tutorial.
# ERC Pipeline Walkthrough
You can follow along in R or RStudio, or you can read along in **runERC.R**. First, if you are using RStudio, you need to set your directory. Change the string in `setwd` to the directory your runERC.R file is in, generally the repository you cloned/downloaded.
```{r}
setwd("~/Documents/GitHub/erc")
```
Once you set your directory, you can source the relevant packages and ERC files.
```{r results='hide', message = FALSE, warning = FALSE, cache = FALSE}
require(devtools)
remotes::install_github("ms609/TreeTools")
source("ERC_functions.R")
source("ERC.R")
Rcpp::sourceCpp("cppFuncs.cpp")
```
## File setup
To run the workflow, you need to delineate two things: the tree file to read in, and your output file. Find the path for the tree file, and choose a name (and optionally a path) for your output file. Here we set `treefile` and `outputfile` accordingly.
```{r, cache = TRUE}
treefile = "physical_interaction_paper/domains_trees.tre"
outputfile = "out.RDS"
```
## Workflow
Now that you've selected your file names, you can begin running the main functions. For a detailed description of what they do and their parameters, visit the [functions page](https://github.com/nclark-lab/erc/blob/main/functions.md). First, your tree file is read in using `readTrees`. The trees are then transformed (via a square root transform) with `transformPaths`.
```{r message = FALSE, warning = FALSE, cache = TRUE}
trees=readTrees(treefile)
compTrees = transformPaths(trees, transform = "sqrt",impute = F)
```
Above is a plot of the tree paths before and after a square root transform.
Next, Relative Evolutionary Rates (RERs) are calculated by `coreGetResiduals`, and finally those rates are formatted into a matrix with `getRMat`. `getAllResiduals` (commented out below) has the same output as running these three together, but here we do not use it because we will later need the compTrees object.
```{r message=FALSE, warning=FALSE, cache=TRUE}
compResid = coreGetResiduals(compTrees, n.pcs=0)
residuals = getRMat(compResid, all = T, rmatweights = compTrees$weights)
# Wrapper function: (you still need to separately get compTrees
# for the later clusterList function)
# residuals = getAllResiduals(trees, impute=F, n.pcs=0, all=T)
```
Finally, we get a list of gene clusters for the ERC function.
```{r message = FALSE, warning = FALSE, cache = TRUE}
clusterList = getClusterList(compTrees)
```
## ERC function
Finally, you can compute the ERC values for your trees. You can tune many parameters (also visible on the [**functions page**](https://github.com/nclark-lab/erc/blob/main/functions.md)), but the main ones you want to worry about are below:
- Here is where you will edit the threshold you want, `minSp` is the number of species two genes have to share
- If you only want to run a few genes you can set the parameter `doOnly = c("genea","geneb")`
- If you want the plot of the RERs set `plot = T` (I would only recommend doing this for a few genes because it uses up a lot of space)
```{r}
corres=computeERC(residuals, compTrees, clusterListOutput = clusterList,
minSp = 15, saveFile = outputfile)
```
## Fisher transformation
After you create your ERC matrices, we recommend Fisher transforming them. This creates a single matrix taking into account the two matrices of the `corres` object: the correlation matrix and the matrix of observation/branch counts for each correlation. We also make it symmetrical here, but if your matrix is too large you may just want to make subsets symmetrical as you need them.
```{r}
ft_data = fisherTransform(corres)
#makes the matrix symmetrical
sym_ft = make_symmetric(ft_data)
```
Congratulations for making it to the end! Now with `ft_data` you have the data we usually operate on. Below we have some sample analysis you can do with it.
# Next Steps:
## Example: examine 10 genes' relations to each other
In this example, we show how to visualize ERC data. We take a sample ten genes, and create a symmetrical ERC matrix of their values (we round the values at the end to make display clearer).
```{r}
genes = c("NSE5_1", "NSE6_3", "CSE1_3", "CSE1_1", "EXO70_1",
"MCM2_4", "MDY2_1", "ATP1_2", "MCM5_1", "SEC8_2")
# You could also generate a random sample:
# genes = colnames(ft_data)[sample(1:length(ft_data), 10, replace=FALSE)]
# makes a matrix of the 10 genes against themselves
# (it can be against different genes too)
ft_filtered = betweencomplex(genes,genes,sym_ft)
ft_filtered = round(ft_filtered,3)
#output
ft_filtered
```
## Another example: non-Fisher transformed data
We can also use our raw ERC correlation data from before the Fisher transformation. Again, we round to simplify the display.
```{r}
filtered = betweencomplex(genes,genes,corres[["cor"]])
sym = round(make_symmetric(filtered),3)
#output
sym
```