This repository contains the Snakemake workflow used for the bioinformatic analyses for the paper Hososhima et al (2022) "Proton-transporting heliorhodopsins from marine giant viruses" eLife.
All of the dependencies are taken care of with conda, so it is recommended to run snakemake with --use-conda.
This repository is organized as follows:
analysis-- contains the intermediate filesannotations-- manually curated datadatabases-- includes Pfam databases and algal protein sequences. To run the workflow from scratch, the soft linkdatabases/Pfamshould point to the Pfam database folder and soft links indatabases/algaeshould point the corresponding fasta files.output-- final output filesproteins-- curated sequence data for algal and viral heliorhodopsinsviruses-- GenBank files with the viral genomesworkflow-- workflow files, including:envs-- conda environment filesSnakefile-- the snakemake filescripts-- folder with scripts
The output files are as follows:
cat_phylogeny.pdf-- concatenation phylogeny of the virusesEhVHeRs.tsv-- distribution of heliorhodopsin genes among EhVsHeR_tree.jtree-- phylogenetic tree of viral and algal heliorhodopsins in.jtreeformatHeR_tree.pdf-- image version of the same treeminiset_chronos.pdf-- small HeR tree with alignment of critical positionsorthogroups.tsv-- EhV orthogroups