wisp

WISP ctDNA tumor fraction estimation workflow. The pipeline proceeds through seven major steps: (1) Primary Tumor Characterization - AMBER calculates B-allele frequencies at known heterozygous germline SNP positions, COBALT measures read depth ratios across the genome for copy number analysis, while a pre-computed SAGE VCF provides somatic variants. (2) Variant Annotation - PAVE annotates the SAGE VCF with gene and transcript coding effects, protein impacts, and other functional annotations. (3) Purity and Ploidy Estimation - PURPLE integrates AMBER, COBALT, and PAVE outputs to estimate tumor purity, overall ploidy, and genome-wide copy number segments, establishing the reference somatic profile. (4) Plasma Sample Processing - AMBER and COBALT run on the plasma sample against the matched normal, generating allele frequency and read depth data from cfDNA. (5) Data Integration - AMBER and COBALT outputs from both primary and plasma analyses are merged into unified directories as required by WISP. (6) Variant Force-Calling - SAGE runs in append mode on the plasma sample, force-calling read evidence at variant positions from the primary tumor. (7) Tumor Fraction Estimation - WISP integrates all outputs to estimate ctDNA fraction by examining variant allele frequencies in plasma at known somatic sites. Requirements: whole-genome sequencing data with sufficient coverage; chromosome-subset or targeted panel data will not produce valid estimates.

Overview

Dependencies

Usage

Cromwell

java -jar cromwell.jar run wisp.wdl --inputs inputs.json

Inputs

Required workflow parameters:

Parameter	Value	Description
`tumour_bam`	File	Input tumor file (bam) of primary sample
`tumour_bai`	File	Input tumor file index (bai) of primary sample
`normal_bam`	File	Input normal file (bam) of primary sample
`normal_bai`	File	Input normal file index (bai) of primary sample
`plasma_bam`	File	Input of bam file of plasma from same donor as primary sample
`plasma_bai`	File	Input of bam index file of plasma from same donor
`donor`	String	Patient identifier
`sage_primary_vcf`	File	Pre-computed SAGE VCF from primary tumor/normal run
`sage_primary_vcf_index`	File	Index for pre-computed SAGE VCF

Optional workflow parameters:

Parameter	Value	Default	Description
`genomeVersion`	String	"hg38"	Genome Version, only 38 supported
`chromosomes`	Array[String]	["chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chr21", "chr22,chrX,chrY"]	List of chromosomes to process in parallel

Optional task parameters:

Parameter	Value	Default	Description
`extractTumorName.memory`	Int	4	Memory allocated for this job (GB)
`extractTumorName.heapRam`	Int	1	Heap RAM allocation for GATK (GB)
`extractTumorName.timeout`	Int	4	Hours before task timeout
`extractNormalName.memory`	Int	4	Memory allocated for this job (GB)
`extractNormalName.heapRam`	Int	1	Heap RAM allocation for GATK (GB)
`extractNormalName.timeout`	Int	4	Hours before task timeout
`splitPonByChromosome.memory`	Int	4	Memory allocated for this job (GB)
`splitPonByChromosome.timeout`	Int	10	Hours before task timeout
`amberPrimaryChr.min_mapping_quality`	Int	30	Minimum mapping quality for an alignment to be used
`amberPrimaryChr.min_base_quality`	Int	25	Minimum quality for a base to be considered
`amberPrimaryChr.heapRam`	Int	32	Heap RAM allocation for AMBER (GB)
`amberPrimaryChr.additionalParameters`	String	""	Additional parameters to pass to AMBER
`amberPrimary.memory`	Int	8	Memory allocated for this job (GB)
`amberPrimary.timeout`	Int	10	Hours before task timeout
`cobaltPrimary.gamma`	String	"300"	gamma (penalty) value for segmenting
`cobaltPrimary.min_mapping_quality`	Int	30	Minimum mapping quality for an alignment to be used
`cobaltPrimary.threads`	Int	8	Requested CPU threads
`cobaltPrimary.memory`	Int	32	Memory allocated for this job (GB)
`cobaltPrimary.heapRam`	Int	8	Heap RAM allocation for COBALT (GB)
`cobaltPrimary.timeout`	Int	100	Hours before task timeout
`cobaltPrimary.additionalParameters`	String	""	Additional parameters to pass to COBALT
`pave.driverGenePanel`	String?	None	Driver gene panel file (optional)
`pave.threads`	Int	8	Number of threads
`pave.memory`	Int	32	Memory allocated for this job (GB)
`pave.heapRam`	Int	16	Heap RAM allocation for Pave (GB)
`pave.timeout`	Int	24	Hours before task timeout
`pave.additionalParameters`	String	""	Additional parameters to pass to Pave
`purple.solution_name`	String	"Primary"	Name of solution
`purple.outfilePrefix`	String	tumour_name + ".sol" + solution_name	Prefix of output file
`purple.min_diploid_tumor_ratio_count`	Int	60	smooth over contiguous segments
`purple.min_ploidy`	String?	None	minimum ploidy to consider
`purple.max_ploidy`	String?	None	maximum ploidy to consider
`purple.min_purity`	String?	None	minimum purity to consider
`purple.max_purity`	String?	None	maximum purity to consider
`purple.ploidy_penalty_factor`	String?	None	ploidy penalty factor
`purple.ploidy_penalty_standard_deviation`	String?	None	ploidy penalty standard deviation
`purple.threads`	Int	8	Requested CPU threads
`purple.memory`	Int	32	Memory allocated for this job (GB)
`purple.heapRam`	Int	8	Heap RAM allocation for PURPLE (GB)
`purple.timeout`	Int	100	Hours before task timeout
`purple.additionalParameters`	String	""	Additional parameters to pass to PURPLE
`amberPlasmaChr.min_mapping_quality`	Int	30	Minimum mapping quality for an alignment to be used
`amberPlasmaChr.min_base_quality`	Int	25	Minimum quality for a base to be considered
`amberPlasmaChr.heapRam`	Int	32	Heap RAM allocation for AMBER (GB)
`amberPlasmaChr.additionalParameters`	String	""	Additional parameters to pass to AMBER
`amberPlasma.memory`	Int	8	Memory allocated for this job (GB)
`amberPlasma.timeout`	Int	10	Hours before task timeout
`cobaltPlasma.gamma`	String	"300"	gamma (penalty) value for segmenting
`cobaltPlasma.min_mapping_quality`	Int	30	Minimum mapping quality for an alignment to be used
`cobaltPlasma.threads`	Int	8	Requested CPU threads
`cobaltPlasma.memory`	Int	32	Memory allocated for this job (GB)
`cobaltPlasma.heapRam`	Int	8	Heap RAM allocation for COBALT (GB)
`cobaltPlasma.timeout`	Int	100	Hours before task timeout
`cobaltPlasma.additionalParameters`	String	""	Additional parameters to pass to COBALT
`sage_append.driverGenePanel`	String?	None	Driver gene panel TSV (optional)
`sage_append.min_map_quality`	Int	10	Minimum mapping quality
`sage_append.hard_min_tumor_qual`	Int	50	Hard minimum tumor quality
`sage_append.hard_min_tumor_raw_alt_support`	Int	2	Minimum raw alt support
`sage_append.hard_min_tumor_vaf`	Float	0.002	Minimum tumor VAF
`sage_append.threads`	Int	8	Requested CPU threads
`sage_append.memory`	Int	40	Memory allocated for this job (GB)
`sage_append.heapRam`	Int	32	Heap RAM allocation for SAGE (GB)
`sage_append.timeout`	Int	24	Hours before task timeout
`sage_append.additionalParameters`	String	""	Additional parameters to pass to SAGE
`annotatePrimaryVcfWithPurple.modules`	String	"bcftools/1.9"	Required environment modules
`generateProbeVariants.outputFileName`	String	"probe_variants.csv"	Output CSV filename for probe variants
`generateProbeVariants.jobMemory`	Int	4	Memory allocated for this job (GB)
`generateProbeVariants.timeout`	Int	1	Hours before task timeout
`runWisp.additionalParameters`	String	""	Additional parameters to pass to WISP
`runWisp.threads`	Int	4	Requested CPU threads
`runWisp.heapRam`	Int	16	Heap RAM allocation for WISP (GB)
`runWisp.memory`	Int	16	Memory allocated for this job (GB)
`runWisp.timeout`	Int	50	Hours before task timeout

Outputs

Output	Type	Description
`wisp_directory`	File	Zipped WISP output directory
`wisp_summary`	File	WISP tumor fraction summary
`wisp_somatic_variants`	File	Per-variant SNV results
`sage_plasma_vcf`	File	SAGE VCF with plasma sample appended
`sage_plasma_vcf_index`	File	Index for SAGE plasma VCF

Commands

This section lists command(s) run by WORKFLOW workflow

Running WORKFLOW

    set -euo pipefail

    if [ -f "~{inputBam}" ]; then
      gatk --java-options "-Xmx~{heapRam}g" GetSampleName -R ~{refFasta} -I ~{inputBam} -O input_name.txt -encode
    fi

    cat input_name.txt

    set -euo pipefail
    
    # Extract just this chromosome from PON
    bcftools view -r ~{chromosome} ~{PON} -O z -o pon.~{chromosome}.vcf.gz
    tabix -p vcf pon.~{chromosome}.vcf.gz

    set -euo pipefail

    mkdir ~{file_prefix}.amber
    genome_version=$(if [ "~{genomeVersion}" == "hg38" ]; then echo "38"; else echo "19"; fi)

    java -Xmx~{heapRam}G -cp $HMFTOOLS_ROOT/amber.jar com.hartwig.hmftools.amber.AmberApplication \
      -reference ~{normal_name} -reference_bam ~{normal_bam} \
      -tumor ~{tumour_name} -tumor_bam ~{tumour_bam} \
      -output_dir ~{file_prefix}.amber/ \
      -loci ~{PON} \
      -ref_genome_version ${genome_version} \
      -min_mapping_quality ~{min_mapping_quality} \
      -min_base_quality ~{min_base_quality} \
      ~{additionalParameters}

    zip -r ~{file_prefix}.amber.zip ~{file_prefix}.amber/

    set -euo pipefail

    mkdir -p ~{tumour_name}.amber
    mkdir -p temp

    # Unzip all chromosome results
    for zip_file in ~{sep=' ' chr_zips}; do
      unzip -o ${zip_file} -d temp/
    done

    first_baf=$(ls temp/*/*.amber.baf.tsv.gz | sort -V | sed -n '1p')
    zcat "$first_baf" | sed -n '1p' > temp_header.txt
    zcat temp/*/*.amber.baf.tsv.gz | grep -v "^chromosome" | sort -k1,1V -k2,2n > temp_data.txt
    cat temp_header.txt temp_data.txt | gzip > ~{tumour_name}.amber/~{tumour_name}.amber.baf.tsv.gz
    rm temp_header.txt temp_data.txt

    first_pcf=$(ls temp/*/*.amber.baf.pcf | sort -V | sed -n '1p')
    sed -n '1p' "$first_pcf" > temp_header_pcf.txt
    tail -q -n +2 temp/*/*.amber.baf.pcf | sort -k2,2V -k4,4n > temp_data_pcf.txt
    cat temp_header_pcf.txt temp_data_pcf.txt > ~{tumour_name}.amber/~{tumour_name}.amber.baf.pcf
    rm temp_header_pcf.txt temp_data_pcf.txt

    cp $(find temp -name "*.amber.qc" -type f | sed -n '1p') ~{tumour_name}.amber/~{tumour_name}.amber.qc

    if ls temp/*/*.amber.contamination.tsv 1> /dev/null 2>&1; then
      {
        head -n1 $(ls temp/*/*.amber.contamination.tsv | sort -V | head -n1)
        tail -q -n +2 temp/*/*.amber.contamination.tsv | sort -k1,1V -k2,2n
      } > ~{tumour_name}.amber/~{tumour_name}.amber.contamination.tsv
    fi

    if ls temp/*/*.amber.contamination.vcf.gz 1> /dev/null 2>&1; then
      bcftools concat $(ls temp/*/*.amber.contamination.vcf.gz | sort -V) | \
        bcftools sort -O z -o ~{tumour_name}.amber/~{tumour_name}.amber.contamination.vcf.gz
      tabix -p vcf ~{tumour_name}.amber/~{tumour_name}.amber.contamination.vcf.gz
    fi

    if ls temp/*/*.amber.homozygousregion.tsv 1> /dev/null 2>&1; then
      {
        head -n1 $(ls temp/*/*.amber.homozygousregion.tsv | sort -V | head -n1)
        tail -q -n +2 temp/*/*.amber.homozygousregion.tsv | sort -k1,1V -k2,2n
      } > ~{tumour_name}.amber/~{tumour_name}.amber.homozygousregion.tsv
    fi

    if find temp -name "amber.version" -type f | grep -q .; then
      cp $(find temp -name "amber.version" -type f | head -n1) ~{tumour_name}.amber/amber.version
    fi

    zip -r ~{tumour_name}.amber.zip ~{tumour_name}.amber/

    set -euo pipefail

    mkdir ~{tumour_name}.cobalt

    java -Xmx~{heapRam}G -cp $HMFTOOLS_ROOT/cobalt.jar com.hartwig.hmftools.cobalt.CobaltApplication \
      -reference ~{normal_name} -reference_bam ~{normal_bam} \
      -tumor ~{tumour_name} -tumor_bam ~{tumour_bam} \
      -output_dir ~{tumour_name}.cobalt/ \
      -gc_profile ~{gcProfile} \
      -pcf_gamma ~{gamma} \
      -min_quality ~{min_mapping_quality} \
      ~{additionalParameters}

    zip -r ~{tumour_name}.cobalt.zip ~{tumour_name}.cobalt/

    mkdir -p ~{output_name}
    unzip -o ~{primary_zip} -d ~{output_name}/
    unzip -o ~{plasma_zip} -d ~{output_name}/
    # Flatten if nested
    find ~{output_name} -mindepth 2 -type f -exec mv {} ~{output_name}/ \;
    find ~{output_name} -mindepth 1 -type d -empty -delete
    zip -r ~{output_name}.zip ~{output_name}/

    set -euo pipefail

    echo $SAGE_ROOT
    ls -la ${SAGE_ROOT}/

    java -Xmx~{heapRam}G -cp $SAGE_ROOT/sage.jar com.hartwig.hmftools.sage.append.SageAppendApplication \
      -reference ~{reference_name} \
      -reference_bam ~{reference_bam} \
      -ref_genome_version 38 \
      -ref_genome ~{refFasta} \
      ~{"-driver_gene_panel " + driverGenePanel} \
      -input_vcf ~{input_vcf} \
      -output_vcf ~{reference_name}.sage.vcf.gz \
      -threads ~{threads} \
      -min_map_quality ~{min_map_quality} \
      -hard_min_tumor_qual ~{hard_min_tumor_qual} \
      -hard_min_tumor_raw_alt_support ~{hard_min_tumor_raw_alt_support} \
      -hard_min_tumor_vaf ~{hard_min_tumor_vaf} \
      -skip_msi_jitter \
      ~{additionalParameters}

    mkdir -p ~{reference_name}.sage.bqr/
    mv *.sage.bqr.tsv  ~{reference_name}.sage.bqr/
    zip -r ~{reference_name}.sage.bqr.zip ~{reference_name}.sage.bqr/

    set -euo pipefail

    # Unzip input directories
    unzip ~{purple_dir}
    unzip ~{amber_dir}
    unzip ~{cobalt_dir}
    unzip ~{bqr_dir}

    # Create output directory
    mkdir -p ~{plasma_name}.wisp

    # Run WISP with skip_bqr
    java -Xmx~{heapRam}G -jar $WISP_ROOT/wisp.jar \
      -patient_id ~{donor} \
      -tumor_id ~{tumour_name} \
      -samples ~{plasma_name} \
      -purple_dir ~{tumour_name}.solPrimary.purple/ \
      -amber_dir ~{tumour_name}.amber/ \
      -cobalt_dir ~{tumour_name}.cobalt/ \
      -bqr_dir ~{plasma_name}.sage.bqr \
      -somatic_vcf ~{somatic_vcf} \
      -ref_genome ~{refFasta} \
      -output_dir ~{plasma_name}.wisp/ \
      -probe_variants_file ~{probe_variants_file} \
      -threads ~{threads} \
      -write_types ALL \
      ~{additionalParameters}

    # Zip output
    zip -r ~{plasma_name}.wisp.zip ~{plasma_name}.wisp/
    cp ~{plasma_name}.wisp/~{donor}_~{plasma_name}.wisp.summary.tsv ~{plasma_name}.wisp.summary.tsv 
    cp ~{plasma_name}.wisp/~{donor}_~{plasma_name}.wisp.somatic_variants.tsv ~{plasma_name}.wisp.somatic_variants.tsv

    set -euo pipefail

    unzip ~{amber_directory}
    unzip ~{cobalt_directory}
    mkdir ~{outfilePrefix}.purple
    genome_version=$(if [ "~{genomeVersion}" == "hg38" ]; then echo "38"; else echo "19"; fi)

    java -Xmx~{heapRam}G -jar $HMFTOOLS_ROOT/purple.jar \
      -ref_genome_version ${genome_version} \
      -ref_genome ~{refFasta}  \
      -gc_profile ~{gcProfile} \
      -ensembl_data_dir ~{ensemblDir}  \
      -reference ~{normal_name} -tumor ~{tumour_name}  \
      -amber ~{tumour_name}.amber -cobalt ~{tumour_name}.cobalt \
      ~{"-ploidy_penalty_factor " + ploidy_penalty_factor} \
      ~{"-ploidy_penalty_standard_deviation " + ploidy_penalty_standard_deviation} \
      ~{"-somatic_vcf " + somatic_vcf} \
      ~{"-min_ploidy " + min_ploidy} \
      ~{"-max_ploidy " + max_ploidy} \
      ~{"-min_purity " + min_purity} \
      ~{"-max_purity " + max_purity} \
      -no_charts \
      -min_diploid_tumor_ratio_count ~{min_diploid_tumor_ratio_count} \
      -output_dir ~{outfilePrefix}.purple \
      ~{additionalParameters}

    zip -r ~{outfilePrefix}.purple.zip ~{outfilePrefix}.purple/

    # Extract Purple INFO annotations and merge into plasma VCF
    bcftools annotate \
      -a ~{purple_vcf} \
      -c INFO/SUBCL,INFO/PURPLE_VCN,INFO/PURPLE_AF,INFO/PURPLE_CN \
      ~{sage_vcf} \
      -Oz -o annotated.vcf.gz
    
    tabix -p vcf annotated.vcf.gz

    set -euo pipefail

    # Generate probe variants CSV from PASS + HIGH_CONFIDENCE variants
    zcat ~{annotated_vcf} | \
      grep -v "^#" | \
      awk 'BEGIN{print "TumorId,Category,Variant"}
           $7=="PASS" && $8~"TIER=HIGH_CONFIDENCE" {
             printf "~{tumor_id},REPORTABLE_MUTATION,%s:%s %s>%s\n", $1, $2, $4, $5
           }' > ~{outputFileName}

    # Verify file has content
    line_count=$(wc -l < ~{outputFileName})
    if [ "$line_count" -le 1 ]; then
      echo "WARNING: No PASS + HIGH_CONFIDENCE variants found"
      echo "This may be expected for downsampled/test data"
    fi

    echo "Generated probe variants file with $((line_count - 1)) variants"

    set -euo pipefail

    genome_version=$(if [ "~{genomeVersion}" == "hg38" ]; then echo "38"; else echo "37"; fi)

    java -Xmx~{heapRam}G -jar $PAVE_ROOT/pave.jar \
      -sample ~{sample} \
      -input_vcf ~{vcf_file} \
      -ref_genome ~{refFasta} \
      -ref_genome_version ${genome_version} \
      -ensembl_data_dir ~{ensemblDir} \
      ~{"-driver_gene_panel " + driverGenePanel} \
      -output_dir ./ \
      -threads ~{threads} \
      ~{additionalParameters}

Support

For support, please file an issue on the Github project or send an email to gsi@oicr.on.ca .

Generated with generate-markdown-readme (https://github.com/oicr-gsi/gsi-wdl-tools/)

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
tests		tests
CHANGELOG.md		CHANGELOG.md
Jenkinsfile		Jenkinsfile
README.md		README.md
commands.txt		commands.txt
vidarrbuild.json		vidarrbuild.json
vidarrtest-regression.json.in		vidarrtest-regression.json.in
wisp.wdl		wisp.wdl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wisp

Overview

Dependencies

Usage

Cromwell

Inputs

Required workflow parameters:

Optional workflow parameters:

Optional task parameters:

Outputs

Commands

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

wisp

Overview

Dependencies

Usage

Cromwell

Inputs

Required workflow parameters:

Optional workflow parameters:

Optional task parameters:

Outputs

Commands

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages