Skip to content

Conversation

@adamrtalbot
Copy link
Contributor

@adamrtalbot adamrtalbot commented Nov 2, 2024

Validation of FASTQS early prevents running the pipeline on invalid FASTQ files which will make the pipeline more efficient at achieving it's ultimate objective of checking FASTQ validity.

It adds 3 more parameters:

  • --skip_linting which enables the linting of FASTQs [update March 25] Replaced with --skip_tools 'fq'
  • --fq_lint_args which is a string of arguments to pass to the linting tool
  • --continue_with_lint_fail which is a boolean to determine whether to continue if the linting fails

Between these three options the user has a high degree of control over how the pipeline lints which should handle most use cases.

Implements tests for all cases using the rnaseq minimal test dataset which has invalid sequencing names 🙄 .

Closes #31

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/seqinspector branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

mahesh-panchal and others added 30 commits March 19, 2024 10:09
Co-authored-by: Adrien Coulier <adrien.coulier@medsci.uu.se>
Co-authored-by: Karthik Nair <35717861+KarNair@users.noreply.github.com>
Issue with the previous implementation was that sometimes
MULTIQC_PER_LANE would execute before the extra files were collected
into `ch_multiqc_extra_files`, causing `null` to be added to the list of
files passed to multiqc.
Important! Template update for nf-core/tools v2.14.1
Aratz and others added 21 commits January 17, 2025 14:16
….2.dev0

Important! Template update for nf-core/tools v3.1.2.dev0
Add skip tools parameter for tool selection
Important! Template update for nf-core/tools v3.2.0
This reverts commit 0ba1652.
Replace hard-coded path to fastqscreen example csv with parameter-supplied one
Added missing citations to citation tool
@FranBonath
Copy link
Member

Hej, @adamrtalbot, thanks for your PR :). Just to let you know that we decided in the last seqinspector meeting on a defined list of modules to add to version 1. So while this is great, we will only implement it in a version after the first release. It's basically just to keep the first release simple.

Copy link
Contributor

@pontushojer pontushojer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, just had a minor comment on this PR.

"type": "string",
"description": "Comma-separated string of tools to skip",
"pattern": "^((fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$"
"pattern": "^((fq|fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"pattern": "^((fq|fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$"
"pattern": "^((fq_lint|fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$"

Since we have used the naming convention <tool>_<subcommand> for the other tools, it seems prudent to keep this going.

//
// MODULE: Run FQ_LINT to catch early errors
//
if ( !("fq" in skip_tools) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ( !("fq" in skip_tools) ) {
if ( !("fq_lint" in skip_tools) ) {

@pontushojer pontushojer mentioned this pull request Jul 16, 2025
@pontushojer
Copy link
Contributor

Just had an idea regarding the --continue_with_lint_fail option. With this enabled, it would be great to have a section in the MultiQC report(s) listing which samples that failed linting. This could be a separate PR though.

Thinking further on this option, have you considered reversing the logic here so that the pipeline would continue by default even if some samples fail linting? For me, it would seem that the main purpose of this pipeline is to identify which samples are bad (failed lint, contamination, low quality, etc.) and good for continued analysis. Stopping everything early due to one failed samples would go against this.

@adamrtalbot
Copy link
Contributor Author

Just had an idea regarding the --continue_with_lint_fail option. With this enabled, it would be great to have a section in the MultiQC report(s) listing which samples that failed linting. This could be a separate PR though.

Thinking further on this option, have you considered reversing the logic here so that the pipeline would continue by default even if some samples fail linting? For me, it would seem that the main purpose of this pipeline is to identify which samples are bad (failed lint, contamination, low quality, etc.) and good for continued analysis. Stopping everything early due to one failed samples would go against this.

Based on @FranBonath's comment here I've stopped any further development on this feature, but yes, I think "keep going and report on all samples" is a good strategy for handling FQ linting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.