Skip to content

Remove trimreaddescription from reformat.sh command in shortReadsqc.wdl#79

Merged
samobermiller merged 5 commits into
masterfrom
new-tests
Jun 26, 2026
Merged

Remove trimreaddescription from reformat.sh command in shortReadsqc.wdl#79
samobermiller merged 5 commits into
masterfrom
new-tests

Conversation

@samobermiller

@samobermiller samobermiller commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Action:
Remove trimreaddescription=t from reformat.sh command in the stage_interleave task of shortReadsqc.wdl (introduced in SRA PR #55 (line, ReadsQC v1.0.20, 11/3/25)).

Issue:
The parameter is removing pair information from headers (example: header should be @NB551228:6:H7FFKBGX5:4:23602:17332:3792 1:N:0:CTGAAGCT+AGGCTATA but instead @NB551228:6:H7FFKBGX5:4:23602:17332:3792) which is causing parsing errors.

Direct test passed:
local on cromwell via /global/cfs/cdirs/m3408/www/test_data/smalltest.R1.fastq.gz and /global/cfs/cdirs/m3408/www/test_data/smalltest.R2.fastq.gz. Output headers on filtered.fastq.gz contain pair information (ex @NS500756:11:H7FHCBGXH:1:11101:5478:9880 1:N:0:AGCGAGAT+GATACTGG).

Impact:
Only non-interleaved (paired) data goes through the impacted stage_interleave task. The Reads QC Interleave workflow in workflows.yaml was still using the interleave_rqcfilter.wdl instead of rqcfilter.wdl (and therefore shortReadsqc.wdl) until PR microbiomedata/nmdc_automation#688, at which point the ReadsQC version was also updated to v1.0.22. This means that only paired ReadsQC records run under v1.0.22 on (from ~4/2026 on) are affected.

Assessment of existing records:
The below mongo query was used to count the number of paired ReadsQC records run under each version number after 10/1/2025. Results pasted below, indicating only 13 records in affected version.

Resolution of problematic records in mongo:

  • Apply changesheet for problematic records indicating failure reason
  • Delete problematic records
  • Reapply problematic records to allow list
{
  "_id": "v1.0.20",
  "count": 486
}
{
  "_id": "v1.0.18",
  "count": 19,
}
{
  "_id": "v1.0.24",
  "count": 13,
}

recent_noninterleaved_RQC.json

MongoDB workflows query

[
{
'$match': {
'type': 'nmdc:ReadQcAnalysis',
'ended_at_time': {
'$gte': '2025-10-01T00:00:00.000000+00:00'
},
'processing_institution': 'NMDC'
}
}, {
'$lookup': {
'from': 'data_object_set',
'localField': 'has_input',
'foreignField': 'id',
'as': 'input_do_set',
'pipeline': [
{
'$match': {
'data_object_type': {
'$in': [
'Metagenome Raw Read 2', 'Metagenome Raw Read 1'
]
}
}
}
]
}
}, {
'$match': {
'$expr': {
'$gte': [
{
'$size': '$input_do_set'
}, 2
]
}
}
}, {
'$group': {
'_id': '$version',
'count': {
'$sum': 1
},
'details': {
'$push': '$$ROOT'
}
}
}
]

@samobermiller samobermiller self-assigned this Jun 26, 2026
Comment thread shortReadsqc.wdl Outdated
Comment thread shortReadsqc.wdl Outdated
Co-authored-by: aclum <aclum@users.noreply.github.com>
@samobermiller samobermiller marked this pull request as ready for review June 26, 2026 01:20
Comment thread shortReadsqc.wdl Outdated
Comment thread shortReadsqc.wdl Outdated
@samobermiller samobermiller merged commit 68ff749 into master Jun 26, 2026
1 check passed
@samobermiller samobermiller deleted the new-tests branch June 26, 2026 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants