Context
While fine-tuning a PRRSV Nextclade dataset, I noticed that the QC summary fields qc.overallScore (numeric) and qc.overallStatus (good / mediocre / bad) are not added by default in the ingest pipelines.
As far as I can tell, these fields provide a concise, high-level summary of confidence in a Nextclade assignment, and are generally useful for downstream filtering and interpretation.
Description
QC summary fields should be available by default in ingested Nextclade outputs, alongside other QC metrics.
Possible solution
Mainly add the following mappings to the default Nextclade ingest config:
qc.overallScore: "QC_overall_score"
qc.overallStatus: "QC_overall_status"
Specifically, in:
|
field_map: |
|
seqName: "seqName" |
|
clade: "clade" |
|
coverage: "coverage" |
|
totalMissing: "missing_data" |
|
totalSubstitutions: "divergence" |
|
totalNonACGTNs: "nonACGTN" |
|
qc.missingData.status: "QC_missing_data" |
|
qc.mixedSites.status: "QC_mixed_sites" |
|
qc.privateMutations.status: "QC_rare_mutations" |
|
qc.frameShifts.status: "QC_frame_shifts" |
|
qc.stopCodons.status: "QC_stop_codons" |
|
frameShifts: "frame_shifts" |
Context
While fine-tuning a PRRSV Nextclade dataset, I noticed that the QC summary fields qc.overallScore (numeric) and qc.overallStatus (good / mediocre / bad) are not added by default in the ingest pipelines.
As far as I can tell, these fields provide a concise, high-level summary of confidence in a Nextclade assignment, and are generally useful for downstream filtering and interpretation.
Description
QC summary fields should be available by default in ingested Nextclade outputs, alongside other QC metrics.
Possible solution
Mainly add the following mappings to the default Nextclade ingest config:
Specifically, in:
pathogen-repo-guide/ingest/defaults/nextclade_config.yaml
Lines 11 to 23 in e42caed