The pipeline takes the list of fosmids in fosmids.txt that contains three columns: fosmid name and left and right reads.
The pipeline will run:
- quality trimming with
trim_galore - assembly with
spades - vector trimming with
cutadapt(the vector flanks are specified inworkflow/metadata/vector_{fw,rv}.fna) - annotation with
pgap(using default metadata for Escherichia coli) - conversion to
.tblwith gbf2tbl.pl (from NCBI)
The main output files are:
analysis/fosmids/{id}.fna-- trimmed fosmid sequenceanalysis/fosmids/{id}.info-- vector trimming reportanalysis/pgap/{id}/annot.tbl-- annotations in.tblformatanalysis/pgap/{id}/annot.gbf-- annotations in.tblformat
How to run:
- add fosmids to
fosmids.txt - check that the raw data are there
- run:
snakemake -c{threads} --use-conda