Canine Distemper Virus (CDV) Nextstrain Phylogenetic Analysis

A Nextstrain workflow for phylogenetic analysis of canine distemper virus (CDV) sequences. This repository contains customized Snakemake workflows for constructing, annotating, and visualizing molecular phylogenetic trees for CDV H gene.

Interactive phylogenetic views are accessible through Nextstrain community page.

Manuscript for this work is in preparation as of January 2026. Please contact Kota Nakasato for details.

Overview

This workflow performs comprehensive phylogenetic analyses of CDV sequences, focusing on H gene:

H gene (Hemagglutinin): Complete H gene analysis
H gene recombination-free: H gene analysis with recombinant sequences excluded
pH gene (partial): Partial H gene analysis
pH gene recombination-free: Partial H gene analysis with recombinant sequences excluded

Each target produces a separate Nextstrain dataset that can be visualized in Auspice.

Repository Structure

nextstrain-cdv/
├── phylogenetic/           # Main phylogenetic workflow
│   ├── config/             # Configuration files and reference sequences
│   ├── rules/              # Snakemake rules for each workflow step
│   ├── data/               # Input metadata and sequences
│   ├── results/            # Workflow outputs
│   ├── auspice/            # Final Auspice JSON outputs
│   └── Snakefile           # Main workflow orchestration
├── docs/                   # Documentation
├── CHANGELOG.md            # Version history
└── nextstrain-pathogen.yaml # Pathogen configuration

Quick Start

Running the Complete Workflow

# From the repository root
nextstrain build phylogenetic

# Or from within the phylogenetic directory
cd phylogenetic
nextstrain build .

This will:

Prepare and filter sequences for each gene target
Perform multiple sequence alignment (MAFFT)
Construct phylogenetic trees (IQ-TREE)
Refine branch lengths and dates (TreeTime)
Infer ancestral traits
Generate interactive Auspice visualizations

Outputs

Generated files are located in:

Phylogenetic trees: phylogenetic/results/[gene]/tree.nwk
Alignments: phylogenetic/results/[gene]/aligned_filtered.fasta
Auspice JSON: phylogenetic/auspice/cdv_*.json
Detailed results: phylogenetic/results/[gene]/ (mutations, traits, etc.)

Configuration

Data Files

Input data files are specified in phylogenetic/config/config.yaml:

Metadata: Tab-separated file with strain information (name, date, region, country, host, etc.)
Sequences: FASTA file with CDV sequences aligned to reference genomes
Reference genomes: GenBank format reference sequences for each gene
Exclusion lists: Strain IDs to exclude from analysis (e.g., low-quality sequences)
Recombination lists: Identified recombinant strains for alternative builds

Workflow Details

Sequence Preparation

Four separate rules prepare sequences for each gene target:

prepare_sequences_cdv_h.smk: H gene with all sequences
prepare_sequences_cdv_h_rec_free.smk: H gene excluding recombinants
prepare_sequences_cdv_pH.smk: Partial H gene with all sequences
prepare_sequences_cdv_pH_rec_free.smk: Partial H gene excluding recombinants

Each preparation step includes:

Sequence filtering by length
Alignment to reference genome
Extraction of region of interest
Quality-based filtering

Phylogenetic Inference

Using IQ-TREE with automatic model selection for robust phylogenetic reconstruction. Branch lengths and divergence times are refined using TreeTime coalescent model.

Trait Inference

Ancestral character state inference for:

Geographic location (region, country, division)
Host species
Recombination status

Data Export

Final datasets are exported in Auspice JSON format for interactive visualization in web browsers.

Customization and Development

Modifying the Workflow

The main workflow is defined in phylogenetic/Snakefile. Individual workflow steps are implemented as modular rules in phylogenetic/rules/.

To modify specific workflow steps, edit the corresponding .smk file in the rules/ directory.

Adding New Gene Targets

To add analysis of additional CDV genomic regions:

Create a new preparation rule file: rules/prepare_sequences_cdv_[gene].smk
Define appropriate filtering parameters
Add the gene to the genes list in the Snakefile
Add new auspice output to the rule all target

Contact

For questions or issues, please open a GitHub issue or contact the repository maintainer.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
auspice		auspice
docs/nextstrain-internal		docs/nextstrain-internal
phylogenetic		phylogenetic
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
README.md		README.md
nextstrain-pathogen.yaml		nextstrain-pathogen.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Canine Distemper Virus (CDV) Nextstrain Phylogenetic Analysis

Overview

Repository Structure

Quick Start

Running the Complete Workflow

Outputs

Configuration

Data Files

Workflow Details

Sequence Preparation

Phylogenetic Inference

Trait Inference

Data Export

Customization and Development

Modifying the Workflow

Adding New Gene Targets

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Canine Distemper Virus (CDV) Nextstrain Phylogenetic Analysis

Overview

Repository Structure

Quick Start

Running the Complete Workflow

Outputs

Configuration

Data Files

Workflow Details

Sequence Preparation

Phylogenetic Inference

Trait Inference

Data Export

Customization and Development

Modifying the Workflow

Adding New Gene Targets

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages