WISP ctDNA tumor fraction estimation workflow. The pipeline proceeds through seven major steps: (1) Primary Tumor Characterization - AMBER calculates B-allele frequencies at known heterozygous germline SNP positions, COBALT measures read depth ratios across the genome for copy number analysis, while a pre-computed SAGE VCF provides somatic variants. (2) Variant Annotation - PAVE annotates the SAGE VCF with gene and transcript coding effects, protein impacts, and other functional annotations. (3) Purity and Ploidy Estimation - PURPLE integrates AMBER, COBALT, and PAVE outputs to estimate tumor purity, overall ploidy, and genome-wide copy number segments, establishing the reference somatic profile. (4) Plasma Sample Processing - AMBER and COBALT run on the plasma sample against the matched normal, generating allele frequency and read depth data from cfDNA. (5) Data Integration - AMBER and COBALT outputs from both primary and plasma analyses are merged into unified directories as required by WISP. (6) Variant Force-Calling - SAGE runs in append mode on the plasma sample, force-calling read evidence at variant positions from the primary tumor. (7) Tumor Fraction Estimation - WISP integrates all outputs to estimate ctDNA fraction by examining variant allele frequencies in plasma at known somatic sites. Requirements: whole-genome sequencing data with sufficient coverage; chromosome-subset or targeted panel data will not produce valid estimates.
java -jar cromwell.jar run wisp.wdl --inputs inputs.json
| Parameter | Value | Description |
|---|---|---|
tumour_bam |
File | Input tumor file (bam) of primary sample |
tumour_bai |
File | Input tumor file index (bai) of primary sample |
normal_bam |
File | Input normal file (bam) of primary sample |
normal_bai |
File | Input normal file index (bai) of primary sample |
plasma_bam |
File | Input of bam file of plasma from same donor as primary sample |
plasma_bai |
File | Input of bam index file of plasma from same donor |
donor |
String | Patient identifier |
sage_primary_vcf |
File | Pre-computed SAGE VCF from primary tumor/normal run |
sage_primary_vcf_index |
File | Index for pre-computed SAGE VCF |
| Parameter | Value | Default | Description |
|---|---|---|---|
genomeVersion |
String | "hg38" | Genome Version, only 38 supported |
chromosomes |
Array[String] | ["chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr20", "chr21", "chr22,chrX,chrY"] | List of chromosomes to process in parallel |
| Parameter | Value | Default | Description |
|---|---|---|---|
extractTumorName.memory |
Int | 4 | Memory allocated for this job (GB) |
extractTumorName.heapRam |
Int | 1 | Heap RAM allocation for GATK (GB) |
extractTumorName.timeout |
Int | 4 | Hours before task timeout |
extractNormalName.memory |
Int | 4 | Memory allocated for this job (GB) |
extractNormalName.heapRam |
Int | 1 | Heap RAM allocation for GATK (GB) |
extractNormalName.timeout |
Int | 4 | Hours before task timeout |
splitPonByChromosome.memory |
Int | 4 | Memory allocated for this job (GB) |
splitPonByChromosome.timeout |
Int | 10 | Hours before task timeout |
amberPrimaryChr.min_mapping_quality |
Int | 30 | Minimum mapping quality for an alignment to be used |
amberPrimaryChr.min_base_quality |
Int | 25 | Minimum quality for a base to be considered |
amberPrimaryChr.heapRam |
Int | 32 | Heap RAM allocation for AMBER (GB) |
amberPrimaryChr.additionalParameters |
String | "" | Additional parameters to pass to AMBER |
amberPrimary.memory |
Int | 8 | Memory allocated for this job (GB) |
amberPrimary.timeout |
Int | 10 | Hours before task timeout |
cobaltPrimary.gamma |
String | "300" | gamma (penalty) value for segmenting |
cobaltPrimary.min_mapping_quality |
Int | 30 | Minimum mapping quality for an alignment to be used |
cobaltPrimary.threads |
Int | 8 | Requested CPU threads |
cobaltPrimary.memory |
Int | 32 | Memory allocated for this job (GB) |
cobaltPrimary.heapRam |
Int | 8 | Heap RAM allocation for COBALT (GB) |
cobaltPrimary.timeout |
Int | 100 | Hours before task timeout |
cobaltPrimary.additionalParameters |
String | "" | Additional parameters to pass to COBALT |
pave.driverGenePanel |
String? | None | Driver gene panel file (optional) |
pave.threads |
Int | 8 | Number of threads |
pave.memory |
Int | 32 | Memory allocated for this job (GB) |
pave.heapRam |
Int | 16 | Heap RAM allocation for Pave (GB) |
pave.timeout |
Int | 24 | Hours before task timeout |
pave.additionalParameters |
String | "" | Additional parameters to pass to Pave |
purple.solution_name |
String | "Primary" | Name of solution |
purple.outfilePrefix |
String | tumour_name + ".sol" + solution_name | Prefix of output file |
purple.min_diploid_tumor_ratio_count |
Int | 60 | smooth over contiguous segments |
purple.min_ploidy |
String? | None | minimum ploidy to consider |
purple.max_ploidy |
String? | None | maximum ploidy to consider |
purple.min_purity |
String? | None | minimum purity to consider |
purple.max_purity |
String? | None | maximum purity to consider |
purple.ploidy_penalty_factor |
String? | None | ploidy penalty factor |
purple.ploidy_penalty_standard_deviation |
String? | None | ploidy penalty standard deviation |
purple.threads |
Int | 8 | Requested CPU threads |
purple.memory |
Int | 32 | Memory allocated for this job (GB) |
purple.heapRam |
Int | 8 | Heap RAM allocation for PURPLE (GB) |
purple.timeout |
Int | 100 | Hours before task timeout |
purple.additionalParameters |
String | "" | Additional parameters to pass to PURPLE |
amberPlasmaChr.min_mapping_quality |
Int | 30 | Minimum mapping quality for an alignment to be used |
amberPlasmaChr.min_base_quality |
Int | 25 | Minimum quality for a base to be considered |
amberPlasmaChr.heapRam |
Int | 32 | Heap RAM allocation for AMBER (GB) |
amberPlasmaChr.additionalParameters |
String | "" | Additional parameters to pass to AMBER |
amberPlasma.memory |
Int | 8 | Memory allocated for this job (GB) |
amberPlasma.timeout |
Int | 10 | Hours before task timeout |
cobaltPlasma.gamma |
String | "300" | gamma (penalty) value for segmenting |
cobaltPlasma.min_mapping_quality |
Int | 30 | Minimum mapping quality for an alignment to be used |
cobaltPlasma.threads |
Int | 8 | Requested CPU threads |
cobaltPlasma.memory |
Int | 32 | Memory allocated for this job (GB) |
cobaltPlasma.heapRam |
Int | 8 | Heap RAM allocation for COBALT (GB) |
cobaltPlasma.timeout |
Int | 100 | Hours before task timeout |
cobaltPlasma.additionalParameters |
String | "" | Additional parameters to pass to COBALT |
sage_append.driverGenePanel |
String? | None | Driver gene panel TSV (optional) |
sage_append.min_map_quality |
Int | 10 | Minimum mapping quality |
sage_append.hard_min_tumor_qual |
Int | 50 | Hard minimum tumor quality |
sage_append.hard_min_tumor_raw_alt_support |
Int | 2 | Minimum raw alt support |
sage_append.hard_min_tumor_vaf |
Float | 0.002 | Minimum tumor VAF |
sage_append.threads |
Int | 8 | Requested CPU threads |
sage_append.memory |
Int | 40 | Memory allocated for this job (GB) |
sage_append.heapRam |
Int | 32 | Heap RAM allocation for SAGE (GB) |
sage_append.timeout |
Int | 24 | Hours before task timeout |
sage_append.additionalParameters |
String | "" | Additional parameters to pass to SAGE |
annotatePrimaryVcfWithPurple.modules |
String | "bcftools/1.9" | Required environment modules |
generateProbeVariants.outputFileName |
String | "probe_variants.csv" | Output CSV filename for probe variants |
generateProbeVariants.jobMemory |
Int | 4 | Memory allocated for this job (GB) |
generateProbeVariants.timeout |
Int | 1 | Hours before task timeout |
runWisp.additionalParameters |
String | "" | Additional parameters to pass to WISP |
runWisp.threads |
Int | 4 | Requested CPU threads |
runWisp.heapRam |
Int | 16 | Heap RAM allocation for WISP (GB) |
runWisp.memory |
Int | 16 | Memory allocated for this job (GB) |
runWisp.timeout |
Int | 50 | Hours before task timeout |
| Output | Type | Description | Labels |
|---|---|---|---|
wisp_directory |
File | Zipped WISP output directory | |
wisp_summary |
File | WISP tumor fraction summary | |
wisp_somatic_variants |
File | Per-variant SNV results | |
sage_plasma_vcf |
File | SAGE VCF with plasma sample appended | |
sage_plasma_vcf_index |
File | Index for SAGE plasma VCF |
This section lists command(s) run by WORKFLOW workflow
- Running WORKFLOW
set -euo pipefail
if [ -f "~{inputBam}" ]; then
gatk --java-options "-Xmx~{heapRam}g" GetSampleName -R ~{refFasta} -I ~{inputBam} -O input_name.txt -encode
fi
cat input_name.txt
set -euo pipefail
# Extract just this chromosome from PON
bcftools view -r ~{chromosome} ~{PON} -O z -o pon.~{chromosome}.vcf.gz
tabix -p vcf pon.~{chromosome}.vcf.gz
set -euo pipefail
mkdir ~{file_prefix}.amber
genome_version=$(if [ "~{genomeVersion}" == "hg38" ]; then echo "38"; else echo "19"; fi)
java -Xmx~{heapRam}G -cp $HMFTOOLS_ROOT/amber.jar com.hartwig.hmftools.amber.AmberApplication \
-reference ~{normal_name} -reference_bam ~{normal_bam} \
-tumor ~{tumour_name} -tumor_bam ~{tumour_bam} \
-output_dir ~{file_prefix}.amber/ \
-loci ~{PON} \
-ref_genome_version ${genome_version} \
-min_mapping_quality ~{min_mapping_quality} \
-min_base_quality ~{min_base_quality} \
~{additionalParameters}
zip -r ~{file_prefix}.amber.zip ~{file_prefix}.amber/
set -euo pipefail
mkdir -p ~{tumour_name}.amber
mkdir -p temp
# Unzip all chromosome results
for zip_file in ~{sep=' ' chr_zips}; do
unzip -o ${zip_file} -d temp/
done
first_baf=$(ls temp/*/*.amber.baf.tsv.gz | sort -V | sed -n '1p')
zcat "$first_baf" | sed -n '1p' > temp_header.txt
zcat temp/*/*.amber.baf.tsv.gz | grep -v "^chromosome" | sort -k1,1V -k2,2n > temp_data.txt
cat temp_header.txt temp_data.txt | gzip > ~{tumour_name}.amber/~{tumour_name}.amber.baf.tsv.gz
rm temp_header.txt temp_data.txt
first_pcf=$(ls temp/*/*.amber.baf.pcf | sort -V | sed -n '1p')
sed -n '1p' "$first_pcf" > temp_header_pcf.txt
tail -q -n +2 temp/*/*.amber.baf.pcf | sort -k2,2V -k4,4n > temp_data_pcf.txt
cat temp_header_pcf.txt temp_data_pcf.txt > ~{tumour_name}.amber/~{tumour_name}.amber.baf.pcf
rm temp_header_pcf.txt temp_data_pcf.txt
cp $(find temp -name "*.amber.qc" -type f | sed -n '1p') ~{tumour_name}.amber/~{tumour_name}.amber.qc
if ls temp/*/*.amber.contamination.tsv 1> /dev/null 2>&1; then
{
head -n1 $(ls temp/*/*.amber.contamination.tsv | sort -V | head -n1)
tail -q -n +2 temp/*/*.amber.contamination.tsv | sort -k1,1V -k2,2n
} > ~{tumour_name}.amber/~{tumour_name}.amber.contamination.tsv
fi
if ls temp/*/*.amber.contamination.vcf.gz 1> /dev/null 2>&1; then
bcftools concat $(ls temp/*/*.amber.contamination.vcf.gz | sort -V) | \
bcftools sort -O z -o ~{tumour_name}.amber/~{tumour_name}.amber.contamination.vcf.gz
tabix -p vcf ~{tumour_name}.amber/~{tumour_name}.amber.contamination.vcf.gz
fi
if ls temp/*/*.amber.homozygousregion.tsv 1> /dev/null 2>&1; then
{
head -n1 $(ls temp/*/*.amber.homozygousregion.tsv | sort -V | head -n1)
tail -q -n +2 temp/*/*.amber.homozygousregion.tsv | sort -k1,1V -k2,2n
} > ~{tumour_name}.amber/~{tumour_name}.amber.homozygousregion.tsv
fi
if find temp -name "amber.version" -type f | grep -q .; then
cp $(find temp -name "amber.version" -type f | head -n1) ~{tumour_name}.amber/amber.version
fi
zip -r ~{tumour_name}.amber.zip ~{tumour_name}.amber/
set -euo pipefail
mkdir ~{tumour_name}.cobalt
java -Xmx~{heapRam}G -cp $HMFTOOLS_ROOT/cobalt.jar com.hartwig.hmftools.cobalt.CobaltApplication \
-reference ~{normal_name} -reference_bam ~{normal_bam} \
-tumor ~{tumour_name} -tumor_bam ~{tumour_bam} \
-output_dir ~{tumour_name}.cobalt/ \
-gc_profile ~{gcProfile} \
-pcf_gamma ~{gamma} \
-min_quality ~{min_mapping_quality} \
~{additionalParameters}
zip -r ~{tumour_name}.cobalt.zip ~{tumour_name}.cobalt/
mkdir -p ~{output_name}
unzip -o ~{primary_zip} -d ~{output_name}/
unzip -o ~{plasma_zip} -d ~{output_name}/
# Flatten if nested
find ~{output_name} -mindepth 2 -type f -exec mv {} ~{output_name}/ \;
find ~{output_name} -mindepth 1 -type d -empty -delete
zip -r ~{output_name}.zip ~{output_name}/
set -euo pipefail
echo $SAGE_ROOT
ls -la ${SAGE_ROOT}/
java -Xmx~{heapRam}G -cp $SAGE_ROOT/sage.jar com.hartwig.hmftools.sage.append.SageAppendApplication \
-reference ~{reference_name} \
-reference_bam ~{reference_bam} \
-ref_genome_version 38 \
-ref_genome ~{refFasta} \
~{"-driver_gene_panel " + driverGenePanel} \
-input_vcf ~{input_vcf} \
-output_vcf ~{reference_name}.sage.vcf.gz \
-threads ~{threads} \
-min_map_quality ~{min_map_quality} \
-hard_min_tumor_qual ~{hard_min_tumor_qual} \
-hard_min_tumor_raw_alt_support ~{hard_min_tumor_raw_alt_support} \
-hard_min_tumor_vaf ~{hard_min_tumor_vaf} \
-skip_msi_jitter \
~{additionalParameters}
mkdir -p ~{reference_name}.sage.bqr/
mv *.sage.bqr.tsv ~{reference_name}.sage.bqr/
zip -r ~{reference_name}.sage.bqr.zip ~{reference_name}.sage.bqr/
set -euo pipefail
# Unzip input directories
unzip ~{purple_dir}
unzip ~{amber_dir}
unzip ~{cobalt_dir}
unzip ~{bqr_dir}
# Create output directory
mkdir -p ~{plasma_name}.wisp
# Run WISP with skip_bqr
java -Xmx~{heapRam}G -jar $WISP_ROOT/wisp.jar \
-patient_id ~{donor} \
-tumor_id ~{tumour_name} \
-samples ~{plasma_name} \
-purple_dir ~{tumour_name}.solPrimary.purple/ \
-amber_dir ~{tumour_name}.amber/ \
-cobalt_dir ~{tumour_name}.cobalt/ \
-bqr_dir ~{plasma_name}.sage.bqr \
-somatic_vcf ~{somatic_vcf} \
-ref_genome ~{refFasta} \
-output_dir ~{plasma_name}.wisp/ \
-probe_variants_file ~{probe_variants_file} \
-threads ~{threads} \
-write_types ALL \
~{additionalParameters}
# Zip output
zip -r ~{plasma_name}.wisp.zip ~{plasma_name}.wisp/
cp ~{plasma_name}.wisp/~{donor}_~{plasma_name}.wisp.summary.tsv ~{plasma_name}.wisp.summary.tsv
cp ~{plasma_name}.wisp/~{donor}_~{plasma_name}.wisp.somatic_variants.tsv ~{plasma_name}.wisp.somatic_variants.tsv
set -euo pipefail
unzip ~{amber_directory}
unzip ~{cobalt_directory}
mkdir ~{outfilePrefix}.purple
genome_version=$(if [ "~{genomeVersion}" == "hg38" ]; then echo "38"; else echo "19"; fi)
java -Xmx~{heapRam}G -jar $HMFTOOLS_ROOT/purple.jar \
-ref_genome_version ${genome_version} \
-ref_genome ~{refFasta} \
-gc_profile ~{gcProfile} \
-ensembl_data_dir ~{ensemblDir} \
-reference ~{normal_name} -tumor ~{tumour_name} \
-amber ~{tumour_name}.amber -cobalt ~{tumour_name}.cobalt \
~{"-ploidy_penalty_factor " + ploidy_penalty_factor} \
~{"-ploidy_penalty_standard_deviation " + ploidy_penalty_standard_deviation} \
~{"-somatic_vcf " + somatic_vcf} \
~{"-min_ploidy " + min_ploidy} \
~{"-max_ploidy " + max_ploidy} \
~{"-min_purity " + min_purity} \
~{"-max_purity " + max_purity} \
-no_charts \
-min_diploid_tumor_ratio_count ~{min_diploid_tumor_ratio_count} \
-output_dir ~{outfilePrefix}.purple \
~{additionalParameters}
zip -r ~{outfilePrefix}.purple.zip ~{outfilePrefix}.purple/
# Extract Purple INFO annotations and merge into plasma VCF
bcftools annotate \
-a ~{purple_vcf} \
-c INFO/SUBCL,INFO/PURPLE_VCN,INFO/PURPLE_AF,INFO/PURPLE_CN \
~{sage_vcf} \
-Oz -o annotated.vcf.gz
tabix -p vcf annotated.vcf.gz
set -euo pipefail
# Generate probe variants CSV from PASS + HIGH_CONFIDENCE variants
zcat ~{annotated_vcf} | \
grep -v "^#" | \
awk 'BEGIN{print "TumorId,Category,Variant"}
$7=="PASS" && $8~"TIER=HIGH_CONFIDENCE" {
printf "~{tumor_id},REPORTABLE_MUTATION,%s:%s %s>%s\n", $1, $2, $4, $5
}' > ~{outputFileName}
# Verify file has content
line_count=$(wc -l < ~{outputFileName})
if [ "$line_count" -le 1 ]; then
echo "WARNING: No PASS + HIGH_CONFIDENCE variants found"
echo "This may be expected for downsampled/test data"
fi
echo "Generated probe variants file with $((line_count - 1)) variants"
set -euo pipefail
genome_version=$(if [ "~{genomeVersion}" == "hg38" ]; then echo "38"; else echo "37"; fi)
java -Xmx~{heapRam}G -jar $PAVE_ROOT/pave.jar \
-sample ~{sample} \
-input_vcf ~{vcf_file} \
-ref_genome ~{refFasta} \
-ref_genome_version ${genome_version} \
-ensembl_data_dir ~{ensemblDir} \
~{"-driver_gene_panel " + driverGenePanel} \
-output_dir ./ \
-threads ~{threads} \
~{additionalParameters}
For support, please file an issue on the Github project or send an email to gsi@oicr.on.ca .
Generated with generate-markdown-readme (https://github.com/oicr-gsi/gsi-wdl-tools/)