mice computes synteny blocks from genomes expressed as sequences of genomic elements.
These elements can come from a genome graph (e.g., unitigs of a compacted de Bruijn graph), or from any other segmentation such as k-mers, genes, or MUMs/MEMs.
The input of mice is a GFF file in which each feature has an ID attribute (1-based index) specifying the element used in the path spelling the genome or chromosome.
mice is written in rust, therefore you only need cargo to install it:
cargo install --path .Alternatively, mice is available on bioconda (use conda or mamba):
mamba install -c bioconda mice We provide five E. coli genomes as an example dataset.
-
Use the provided graph
A precomputedexample/graph.gff.gzis included.
Uncompress it (for example:gunzip -c example/graph.gff.gz > graph.gff) and go directly to runningmice. -
(Optional) Build the pangenome graph yourself
Install
ggcat:conda install -c conda-forge -c bioconda ggcat
Build a compacted de Bruijn graph:
ggcat build -k 31 -s 1 -l example/list.txt -o graph.gfa --gfa-v1
Convert the graph to GFF:
git clone https://github.com/lucaparmigiani/gfa2gff.git cd gfa2gff make cd .. ./gfa2gff/gfa2gff 31 graph.gfa $(ls -1 example/*.fna.gz) > graph.gff
-
Run mice
mice graph.gff
mice [OPTIONS] <GRAPH_INPUT><GRAPH_INPUT>– input graph file (GFF or GFA with path representing genomes)
-
-o, --out-dir <DIR>Output directory (default:mice_output) -
-r, --remove-dup <X>Remove an element if it occurs more than X times in any genome (0= disable, default:0) -
-m, --min-size <bp>After first compression, drop unmerged elements shorter than<bp>base pairs, then recompress (default:0) -
-s, --no-group-byTreat every path as its own genome -
-h, --help,-V, --version
In <OUT_DIR> MICE writes:
output.gff: block annotations (GFF)paths.txt: genomes rewritten as synteny blockspartitions.txt: each synteny block which element it contains