Skip to content

nakakotanaka/nextstrain-cdv

Repository files navigation

Canine Distemper Virus (CDV) Nextstrain Phylogenetic Analysis

A Nextstrain workflow for phylogenetic analysis of canine distemper virus (CDV) sequences. This repository contains customized Snakemake workflows for constructing, annotating, and visualizing molecular phylogenetic trees for CDV H gene.

Interactive phylogenetic views are accessible through Nextstrain community page.

Manuscript for this work is in preparation as of January 2026. Please contact Kota Nakasato for details.

Overview

This workflow performs comprehensive phylogenetic analyses of CDV sequences, focusing on H gene:

  • H gene (Hemagglutinin): Complete H gene analysis
  • H gene recombination-free: H gene analysis with recombinant sequences excluded
  • pH gene (partial): Partial H gene analysis
  • pH gene recombination-free: Partial H gene analysis with recombinant sequences excluded

Each target produces a separate Nextstrain dataset that can be visualized in Auspice.

Repository Structure

nextstrain-cdv/
├── phylogenetic/           # Main phylogenetic workflow
│   ├── config/             # Configuration files and reference sequences
│   ├── rules/              # Snakemake rules for each workflow step
│   ├── data/               # Input metadata and sequences
│   ├── results/            # Workflow outputs
│   ├── auspice/            # Final Auspice JSON outputs
│   └── Snakefile           # Main workflow orchestration
├── docs/                   # Documentation
├── CHANGELOG.md            # Version history
└── nextstrain-pathogen.yaml # Pathogen configuration

Quick Start

Running the Complete Workflow

# From the repository root
nextstrain build phylogenetic

# Or from within the phylogenetic directory
cd phylogenetic
nextstrain build .

This will:

  1. Prepare and filter sequences for each gene target
  2. Perform multiple sequence alignment (MAFFT)
  3. Construct phylogenetic trees (IQ-TREE)
  4. Refine branch lengths and dates (TreeTime)
  5. Infer ancestral traits
  6. Generate interactive Auspice visualizations

Outputs

Generated files are located in:

  • Phylogenetic trees: phylogenetic/results/[gene]/tree.nwk
  • Alignments: phylogenetic/results/[gene]/aligned_filtered.fasta
  • Auspice JSON: phylogenetic/auspice/cdv_*.json
  • Detailed results: phylogenetic/results/[gene]/ (mutations, traits, etc.)

Configuration

Data Files

Input data files are specified in phylogenetic/config/config.yaml:

  • Metadata: Tab-separated file with strain information (name, date, region, country, host, etc.)
  • Sequences: FASTA file with CDV sequences aligned to reference genomes
  • Reference genomes: GenBank format reference sequences for each gene
  • Exclusion lists: Strain IDs to exclude from analysis (e.g., low-quality sequences)
  • Recombination lists: Identified recombinant strains for alternative builds

Workflow Details

Sequence Preparation

Four separate rules prepare sequences for each gene target:

  • prepare_sequences_cdv_h.smk: H gene with all sequences
  • prepare_sequences_cdv_h_rec_free.smk: H gene excluding recombinants
  • prepare_sequences_cdv_pH.smk: Partial H gene with all sequences
  • prepare_sequences_cdv_pH_rec_free.smk: Partial H gene excluding recombinants

Each preparation step includes:

  • Sequence filtering by length
  • Alignment to reference genome
  • Extraction of region of interest
  • Quality-based filtering

Phylogenetic Inference

Using IQ-TREE with automatic model selection for robust phylogenetic reconstruction. Branch lengths and divergence times are refined using TreeTime coalescent model.

Trait Inference

Ancestral character state inference for:

  • Geographic location (region, country, division)
  • Host species
  • Recombination status

Data Export

Final datasets are exported in Auspice JSON format for interactive visualization in web browsers.

Customization and Development

Modifying the Workflow

The main workflow is defined in phylogenetic/Snakefile. Individual workflow steps are implemented as modular rules in phylogenetic/rules/.

To modify specific workflow steps, edit the corresponding .smk file in the rules/ directory.

Adding New Gene Targets

To add analysis of additional CDV genomic regions:

  1. Create a new preparation rule file: rules/prepare_sequences_cdv_[gene].smk
  2. Define appropriate filtering parameters
  3. Add the gene to the genes list in the Snakefile
  4. Add new auspice output to the rule all target

Contact

For questions or issues, please open a GitHub issue or contact the repository maintainer.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages