A Nextstrain workflow for phylogenetic analysis of canine distemper virus (CDV) sequences. This repository contains customized Snakemake workflows for constructing, annotating, and visualizing molecular phylogenetic trees for CDV H gene.
Interactive phylogenetic views are accessible through Nextstrain community page.
Manuscript for this work is in preparation as of January 2026. Please contact Kota Nakasato for details.
This workflow performs comprehensive phylogenetic analyses of CDV sequences, focusing on H gene:
- H gene (Hemagglutinin): Complete H gene analysis
- H gene recombination-free: H gene analysis with recombinant sequences excluded
- pH gene (partial): Partial H gene analysis
- pH gene recombination-free: Partial H gene analysis with recombinant sequences excluded
Each target produces a separate Nextstrain dataset that can be visualized in Auspice.
nextstrain-cdv/
├── phylogenetic/ # Main phylogenetic workflow
│ ├── config/ # Configuration files and reference sequences
│ ├── rules/ # Snakemake rules for each workflow step
│ ├── data/ # Input metadata and sequences
│ ├── results/ # Workflow outputs
│ ├── auspice/ # Final Auspice JSON outputs
│ └── Snakefile # Main workflow orchestration
├── docs/ # Documentation
├── CHANGELOG.md # Version history
└── nextstrain-pathogen.yaml # Pathogen configuration
# From the repository root
nextstrain build phylogenetic
# Or from within the phylogenetic directory
cd phylogenetic
nextstrain build .This will:
- Prepare and filter sequences for each gene target
- Perform multiple sequence alignment (MAFFT)
- Construct phylogenetic trees (IQ-TREE)
- Refine branch lengths and dates (TreeTime)
- Infer ancestral traits
- Generate interactive Auspice visualizations
Generated files are located in:
- Phylogenetic trees:
phylogenetic/results/[gene]/tree.nwk - Alignments:
phylogenetic/results/[gene]/aligned_filtered.fasta - Auspice JSON:
phylogenetic/auspice/cdv_*.json - Detailed results:
phylogenetic/results/[gene]/(mutations, traits, etc.)
Input data files are specified in phylogenetic/config/config.yaml:
- Metadata: Tab-separated file with strain information (name, date, region, country, host, etc.)
- Sequences: FASTA file with CDV sequences aligned to reference genomes
- Reference genomes: GenBank format reference sequences for each gene
- Exclusion lists: Strain IDs to exclude from analysis (e.g., low-quality sequences)
- Recombination lists: Identified recombinant strains for alternative builds
Four separate rules prepare sequences for each gene target:
prepare_sequences_cdv_h.smk: H gene with all sequencesprepare_sequences_cdv_h_rec_free.smk: H gene excluding recombinantsprepare_sequences_cdv_pH.smk: Partial H gene with all sequencesprepare_sequences_cdv_pH_rec_free.smk: Partial H gene excluding recombinants
Each preparation step includes:
- Sequence filtering by length
- Alignment to reference genome
- Extraction of region of interest
- Quality-based filtering
Using IQ-TREE with automatic model selection for robust phylogenetic reconstruction. Branch lengths and divergence times are refined using TreeTime coalescent model.
Ancestral character state inference for:
- Geographic location (region, country, division)
- Host species
- Recombination status
Final datasets are exported in Auspice JSON format for interactive visualization in web browsers.
The main workflow is defined in phylogenetic/Snakefile. Individual workflow steps are implemented as modular rules in phylogenetic/rules/.
To modify specific workflow steps, edit the corresponding .smk file in the rules/ directory.
To add analysis of additional CDV genomic regions:
- Create a new preparation rule file:
rules/prepare_sequences_cdv_[gene].smk - Define appropriate filtering parameters
- Add the gene to the
geneslist in the Snakefile - Add new auspice output to the
rule alltarget
For questions or issues, please open a GitHub issue or contact the repository maintainer.