About this markdown: This guide provides step-by-step instructions to set up a bioinformatics environment for the WCS NGS course, both lab and informatics. Pre-requisites: Set up Virtual Machine with miniforge installed (refer to Internal SOP on how to create, develop and deploy virtual machines)
Also, find at the end NGS Software Table and NGS Environment
Run the following command to ensure your system is up to date:
sudo apt-get update && sudo apt-get install -y git wget && sudo apt-get cleanCreate a directory and clone the necessary GitHub repositories:
cd $HOME
mkdir -p course_data
cd course_data
git clone http://www.github.com/WTAC-NGS/unix
git clone http://www.github.com/WTAC-NGS/data_formats
git clone http://www.github.com/WTAC-NGS/read_alignment
git clone http://www.github.com/WTAC-NGS/variant_calling
git clone http://www.github.com/WTAC-NGS/structural_variation
git clone http://www.github.com/WTAC-NGS/rna_seq
git clone http://www.github.com/WTAC-NGS/chip_seq
git clone http://www.github.com/WTAC-NGS/assembly
git clone http://www.github.com/WTAC-NGS/igvTo exit the course_data folder, enter the following command:
cdActivate Conda and create a new environment for the bioinformatics tools:
source $HOME/miniconda/etc/profile.d/conda.sh
conda create -n ngsbio breakdancer=1.4.5 -y
conda activate ngsbioconda install -y samtools=1.15.1
conda install -y bcftools=1.15.1
conda install -y bedtools=2.30.0
conda install -y openmpi=4.1.4
conda install -y r-base=4.0.5
conda install -y bowtie2=2.4.5
conda install -y macs2=2.2.7.1
conda install -y meme=5.4.1
conda install -y ucsc-bedgraphtobigwig=377
conda install -y ucsc-fetchchromsizes=377
conda install -y r-sleuth=0.30.0
conda install -y bioconductor-rhdf5=2.34.0
conda install -y bioconductor-rhdf5filters=1.2.0
conda install -y bioconductor-rhdf5lib=1.12.0
conda install -y hdf5=1.10.5
conda install -y hisat2=2.2.1
conda install -y kallisto=0.46.2conda activate ngsbio
samtools --version
bcftools --version
bedtools --version
mpirun --version
R --version
bowtie2 --version
macs2 --version
meme --version
ucsc-bedgraphtobigwig
ucsc-fetchchromsizes
R -e 'library(sleuth)'
R -e 'library(rhdf5)'
R -e 'library(rhdf5filters)'
R -e 'library(rhdf5lib)'
h5cc -showconfig
hisat2 --version
kallisto version
conda deactivateconda install -y bwa=0.7.17conda activate ngsbio
bwa
conda deactivateconda install -y assembly-stats=1.0.1
conda install -y canu=2.2
conda install -y kmer-jellyfish=2.3.0
conda install -y seqtk=1.3
conda install -y velvet=1.2.10
conda install -y wtdbg=2.5
conda install -y genomescope2=2.0conda activate ngsbio
assembly-stats --version
canu --version
jellyfish --version
seqtk
velvetg
wtdbg2
conda deactivateconda install -y freebayes=0.9.21.7conda activate ngsbio
freebayes --version
conda deactivateconda install -y gatk4=4.2.6.1
conda install -y picard-slim=2.27.4conda activate ngsbio
gatk --version
picard -h
conda deactivateconda install -y minimap2=2.24
conda install -y sniffles=2.0.7conda activate ngsbio
minimap2 --version
sniffles --version
conda deactivateTo get out of the conda environment, enter the following command:
conda deactivateconda create -n chipseq-project r-ngsplot
conda activate chipseq-project
conda install python=2.7
conda deactivateconda activate chipseq-project
R --version # Check if R is installed
python --version # Check Python version
ngsplot --help # Check if NGSplot is installed correctlyIf all commands return a valid response, the ChipSeq environment is set up correctly.
Deactivate the environment:
conda deactivateconda install -y pytz edlib threadpoolctl six scipy networkx joblib cython click scikit-learn python-dateutil pandas lightgbm sortedcontainers
pip install dysguFor system-wide installed tools, check version using:
samtools --version
bcftools --version
bedtools --version
bwa
minimap2 --versioncd $HOME
wget https://data.broadinstitute.org/igv/projects/downloads/2.14/IGV_Linux_2.14.1_WithJava.zip
unzip IGV_Linux_2.14.1_WithJava.zip
rm IGV_Linux_2.14.1_WithJava.zipcd $HOME/IGV_Linux_2.14.1/
./igv.shconda create -n jupyter jupyter=1.0.0 pandoc=2.12 -y
conda activate jupyter
pip install bash_kernel
python -m bash_kernel.install
conda deactivateconda activate jupyter
jupyter notebook --version # Check if Jupyter is installed
jupyter notebook # Start Jupyter Notebook serverThis will open Jupyter Notebook in your web browser. If you see the Jupyter dashboard, the installation is successful.
Deactivate the environment:
conda deactivatesudo apt-get install -y texlive-base texlive-xetex texlive-formats-extra texlive-fonts-extra texlive-luatexecho "source $HOME/miniconda/etc/profile.d/conda.sh" >> ~/.bashrc
echo "conda activate ngsbio" >> ~/.bashrcecho 'alias igv="igv.sh"' >> ~/.bashrc
echo 'export PATH="$HOME/IGV_Linux_2.14.1:$PATH"' >> ~/.bashrcsudo apt-get install -y libreofficelibreoffice --versionIn Firefox browser bookmark the following:
- Course Github Repository
- Learning and Management System (LMS) Login Page
- Participant Access Google Drive
This table summarizes all installed software, their versions, official download links, and commands to verify installation.
| Software | Version | Download Link | Test Command | Dependencies |
|---|---|---|---|---|
| Samtools | 1.15.1 | Link | samtools --version |
None |
| Bcftools | 1.15.1 | Link | bcftools --version |
None |
| Bedtools | 2.30.0 | Link | bedtools --version |
None |
| OpenMPI | 4.1.4 | Link | mpirun --version |
None |
| R-base | 4.0.5 | Link | R --version |
None |
| Bowtie2 | 2.4.5 | Link | bowtie2 --version |
None |
| MACS2 | 2.2.7.1 | Link | macs2 --version |
Python, NumPy |
| MEME Suite | 5.4.1 | Link | meme --version |
Perl |
| UCSC BedGraphToBigWig | 377 | Link | ucsc-bedgraphtobigwig |
None |
| UCSC FetchChromSizes | 377 | Link | ucsc-fetchchromsizes |
None |
| R-sleuth | 0.30.0 | Link | R -e 'library(sleuth)' |
R, Bioconductor |
| Bioconductor-rhdf5 | 2.34.0 | Link | R -e 'library(rhdf5)' |
R, HDF5 |
| Bioconductor-rhdf5filters | 1.2.0 | Link | R -e 'library(rhdf5filters)' |
R, HDF5 |
| Bioconductor-rhdf5lib | 1.12.0 | Link | R -e 'library(rhdf5lib)' |
R, HDF5 |
| HDF5 | 1.10.5 | Link | h5cc -showconfig |
None |
| Hisat2 | 2.2.1 | Link | hisat2 --version |
None |
| Kallisto | 0.46.2 | Link | kallisto version |
None |
| BWA | 0.7.17 | Link | bwa |
None |
| Assembly-Stats | 1.0.1 | Link | assembly-stats |
None |
| Canu | 2.2 | Link | canu --version |
Java, gnuplot |
| Kmer-Jellyfish | 2.3.0 | Link | jellyfish --version |
None |
| Seqtk | 1.3 | Link | seqtk |
None |
| Velvet | 1.2.10 | Link | velveth --help |
None |
| Wtdbg | 2.5 | Link | wtdbg2 |
None |
| GenomeScope2 | 2.0 | Link | N/A | R, Bioconductor |
| FreeBayes | 0.9.21.7 | Link | freebayes --version |
None |
| GATK4 | 4.2.6.1 | Link | gatk --version |
Java |
| Picard | 2.27.4 | Link | picard -h |
Java |
| Minimap2 | 2.24 | Link | minimap2 --version |
None |
| Sniffles | 2.0.7 | Link | sniffles -h |
None |
| Dysgu | latest | Link | dysgu --version |
Python, scikit-learn |
| Breakdancer | 1.4.5 | Link | breakdancer-max -h |
Perl, Samtools |
| IGV | 2.14.1 | Link | igv |
Java |
| Jupyter Notebook | 1.0.0 | Link | jupyter --version |
Python, IPython |
| LaTeX (for Jupyter) | latest | Link | pdflatex --version |
None |
Bioinformatics Environments and Installed Software
| Environment Name | Software | Version |
|---|---|---|
| ngsbio | Samtools | 1.15.1 |
| Bcftools | 1.15.1 | |
| Bedtools | 2.30.0 | |
| OpenMPI | 4.1.4 | |
| R-base | 4.0.5 | |
| Bowtie2 | 2.4.5 | |
| MACS2 | 2.2.7.1 | |
| MEME Suite | 5.4.1 | |
| UCSC BedGraphToBigWig | 377 | |
| UCSC FetchChromSizes | 377 | |
| R-sleuth | 0.30.0 | |
| Bioconductor-rhdf5 | 2.34.0 | |
| Bioconductor-rhdf5filters | 1.2.0 | |
| Bioconductor-rhdf5lib | 1.12.0 | |
| HDF5 | 1.10.5 | |
| Hisat2 | 2.2.1 | |
| Kallisto | 0.46.2 | |
| BWA | 0.7.17 | |
| Assembly-Stats | 1.0.1 | |
| Canu | 2.2 | |
| Kmer-Jellyfish | 2.3.0 | |
| Seqtk | 1.3 | |
| Velvet | 1.2.10 | |
| Wtdbg | 2.5 | |
| GenomeScope2 | 2.0 | |
| FreeBayes | 0.9.21.7 | |
| GATK4 | 4.2.6.1 | |
| Picard | 2.27.4 | |
| Minimap2 | 2.24 | |
| Sniffles | 2.0.7 | |
| Dysgu | latest | |
| Breakdancer | 1.4.5 | |
| chipseq-project | R-ngsplot | latest |
| Python | 2.7 | |
| jupyter | Jupyter Notebook | 1.0.0 |
| Pandoc | 2.12 | |
| system-wide | IGV | 2.14.1 |
| LaTeX | latest |
- ngsbio: The primary bioinformatics environment containing sequencing analysis tools.
- chipseq-project: Contains tools specific to ChIP-seq analysis.
- jupyter: Environment for running Jupyter Notebooks.
- system-wide: Software installed outside Conda (e.g., IGV, LaTeX).