Ana González Toro (anagtoro7@gmail.com), Joaquín Tamargo Azpilicueta (joatamazp@alum.us.es), José Vázquez Pacheco (josvazpac@gmail.com)
ChIPnCHOP is a Chromatin Immuno-Precipitation analyisis pipeline for histones, transcription factors and other proteins that interact with DNA in Arabidopsis thaliana col-0 developed by Biochemistry Degree students at the University of Seville.
See how it works at:
Basically, this pipeline receives a list of the fastq samples you are willing to analyze, together with the control samples, and process them to give a list of the genes which these proteins are hipothetically interacting with. Appart from the cistrome, we are keen on protein-nucleic acids interactomics. Thus, this pipeline will provide an output list with motif enrichment found in the ChIP sequences. It surely does sound fancy, but once you have downloaded the repository, using it is easy as pie 🍰:
- Download your A. thaliana col-0 genome (FASTA) and genome annotation (GFF3). If not sure which to use, Ensembl Plants hosts both the genome assembly and genome annotation. They can be downloaded with
wgetcommand.
- If not yet stored at your computer, you can download the ChIP-seq datasets you are interested in from GEO. Just fetch the SRA that corresponds to the samples you are willing to work on, and download via fastq-dump (see SRA Toolkit). ChIP-seq datasets must be compressed (you can do so by typing
gzip <sequence.fq>).
- Once you have downloaded it all, you must specify the path of the parameters that are required at parameters_file.txt file.
- Then, in the chipnchop folder you must call chipnchop by typing:
bash chipnchop <FULL/PATH/to/params/parameter_file.txt> [options]. Remember, in<FULL/PATH/to/params/parameter_file.txt>, you must CLARIFY THE COMPLETE DIRECTORY TO PARAMETERS FILE.
With regard to other [options], you must write -TF if the analyzed proteins are transcription factors or interact with the chromatine in a similar way, or -HI if the proteins analyzed are histones or other specific epigenetic marks that affect large portions of DNA, in contrast to transcription factors. By default, chipnchop will assume you are working with a transcription factor (-TF) unless you state otherwise by using option -HI. Basically, changing this option includes slight changes in some of the functions that are used so that they are optimized for the case study.
There are two examples available that you can check before you perform your own analysis. Those can be found at "test" file, within this repository. Depending on your interests, you could either try the analysis on transcription factors or epigenetic marks. Check out how to start your analysis with this video:
For running this pipeline, you have to check that the following software are installed in your system:
At R, you must have downloaded these following packages previously:
- BiocManager
- ChIPseeker
- clusterProfiler
- TxDb.Athaliana.BioMart.plantsmart28
- org.At.tair.db
- DOSE
- enrichplot
First of all, you must open the terminal and get into the folder where you want to download this repository. In order to get it, you must write in the terminal "git clone https://github.com/jvazpa/chipnchop.git". You can find this link in the CODE green button above. Then, you must specify your github account and password and the download will begin. That's all!
An error like this may appear if your genome is compressed. Uncompress your "genome.fq.gz" files using gunzip, modify parameter file accordingly and retry.
This error message comes up when the parameter file is not correctly specified or, most likely, if the path is not FULLY specified. In other words, you must type the whole path, and not going back and forth with double dots (..).
Once you have checked you have specified the correct path to the parameter file, you must ascertain that there are one space between the colon and the parameter. For instance, "working_directory:/home/mickeymouse/tmp/" won't be read properly, whereas "working_directory: /home/anagontor1/tmp/" will.
ChIPnCHOP has been tested on MacOS Catalina, MacOS Big Sur and Ubuntu 20.04.
- Make multiple sampling parallelization available by using Sun Grid Engine (SGE), Simple Linux Utility for Resource Management (Slurm) or similar. Yet there were a first version in which we included SGE parallelization, it has been tested with obsolete versions of the software. Thus, it has to be re-tested so that it can be used that way.
- Make the the pipeline executable for working with pair-end samples.
- Make it possible to work with other ecotype or species different from Arabidopsis thaliana col-0.
- Network analysis might be carried on through a pipeline that makes use of chipnchop main pipeline. Our scratchs regarding this objective are located at network_building folder at scripts directory. We are trying to figure out how to easily integrate RNA-seq datasets performed under same conditions as CHIP-seq with the latter.
This software is licensed under GNU General Public License v3.0. More info: https://choosealicense.com/licenses/gpl-3.0/


