This pipeline performs all steps of RNA sequencing. It was designed specifically for use at the Bryan Sun Lab at the University of California, San Diego's School of Medicine. It is hosted in the SDSC TSCC super computer, where a graphics user interface on the users personal computer sends tasks to the super computer that execute the scripts in this repository. Code for the GUI is not included in this repository. See https://github.com/jpbucsd/GUI-RNAseq-pipeline for the GUI.
Usage of the commands in this repository that are automated by the GUI are as follows:
RNAseq.sh is the main script in this pipeline that executes all of the subscripts.
-s (path to slr file, a custom file format that contains the necessary information to determine grouping and comparisons fastq files in different combinations)
This R script performs differential expression analysis with DESEQ2. It produces a .csv file containing the log2Fold chain for each gene between the sample sets, and it produces a .csv file containing Rlog normalized gene counts for each gene among each sample.
Rscript Rtest.R -1 (name of first sample set) set1file1.genes.results set1file2.genes.results ... set1fileN.genes.results
-2 (name of second sample set) set2file1.genes.results set2file2.genes.results ... set2fileN.genes.results
This python script performs principled component analysis on each sample, resulting in a chart where each point represents one sample. This can be used to confirm the efficacy of the experiment by ensuring that experimental samples cluster together and control samples cluster together.
--numComps <number of comparissons for PCA, default is sample number. There must be at least 2 files!>
A deprecated file, its functionality was incorporated directly into RNAseq.sh
A deprecated file, its functionality was incorporated directly into RNAseq.sh
A deprecated file, its functionality was incorporated directly into RNAseq.sh
This script produces a heatmap using the Rlog values produced by differential expression analysis. Usage of this script has not yet been implemented into the graphics user interface.
This script produces lists of up and down regulated genes, a background list, and volcano plots based on the log2Fold change calculated with differential expression analysis.
--padjusted or --padj - adjusted pvalue, a padj of 0.5 implies 50% of significant results are false positives. results with Padj above 50% are filtered out
--Llog10 to determine which results are worth naming, a log 10 value as threshold for which values to name
This script produces a heatmap and .csv files comparing the control sample to all other samples. This script has not yet been implemented into the complete pipeline but will be in the future.
-Z zeroSet zerofile1.genes.results zerofile2.genes.results ... zerofileN.genes.results (indicates the control sample name and RSEM files of biological replicates)
-S setN setNfile1.genes.results ... setNfile2.genes.results (indicates the name and RSEM files of biological replicates of a test sample)
-f filtering by zscore (the threshold for the highest zscore among samples for a gene to appear in the heatmap)
This script is not a part of the pipeline, and is just an additional tool. It has not been configured for reuse and currently takes a log2Fold change .csv and a .csv containing alternative splicing counts and produces a chart comparing the two.