-
Notifications
You must be signed in to change notification settings - Fork 37
Adds fq/lint for early validation of FASTQs #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Co-authored-by: Adrien Coulier <adrien.coulier@medsci.uu.se>
Co-authored-by: Karthik Nair <35717861+KarNair@users.noreply.github.com>
Input workflow
Issue with the previous implementation was that sometimes MULTIQC_PER_LANE would execute before the extra files were collected into `ch_multiqc_extra_files`, causing `null` to be added to the list of files passed to multiqc.
Important! Template update for nf-core/tools v2.14.1
….2.dev0 Important! Template update for nf-core/tools v3.1.2.dev0
Add skip tools parameter for tool selection
Important! Template update for nf-core/tools v3.2.0
Set up nft-utils in tests
This reverts commit 0ba1652.
Replace hard-coded path to fastqscreen example csv with parameter-supplied one
Added missing citations to citation tool
|
Hej, @adamrtalbot, thanks for your PR :). Just to let you know that we decided in the last seqinspector meeting on a defined list of modules to add to version 1. So while this is great, we will only implement it in a version after the first release. It's basically just to keep the first release simple. |
pontushojer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, just had a minor comment on this PR.
| "type": "string", | ||
| "description": "Comma-separated string of tools to skip", | ||
| "pattern": "^((fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$" | ||
| "pattern": "^((fq|fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "pattern": "^((fq|fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$" | |
| "pattern": "^((fq_lint|fastqc|fastqscreen|seqfu_stats|seqtk_sample)?,?)*(?<!,)$" |
Since we have used the naming convention <tool>_<subcommand> for the other tools, it seems prudent to keep this going.
| // | ||
| // MODULE: Run FQ_LINT to catch early errors | ||
| // | ||
| if ( !("fq" in skip_tools) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if ( !("fq" in skip_tools) ) { | |
| if ( !("fq_lint" in skip_tools) ) { |
|
Just had an idea regarding the Thinking further on this option, have you considered reversing the logic here so that the pipeline would continue by default even if some samples fail linting? For me, it would seem that the main purpose of this pipeline is to identify which samples are bad (failed lint, contamination, low quality, etc.) and good for continued analysis. Stopping everything early due to one failed samples would go against this. |
Based on @FranBonath's comment here I've stopped any further development on this feature, but yes, I think "keep going and report on all samples" is a good strategy for handling FQ linting. |
Validation of FASTQS early prevents running the pipeline on invalid FASTQ files which will make the pipeline more efficient at achieving it's ultimate objective of checking FASTQ validity.
It adds 3 more parameters:
[update March 25] Replaced with--skip_lintingwhich enables the linting of FASTQs--skip_tools 'fq'--fq_lint_argswhich is a string of arguments to pass to the linting tool--continue_with_lint_failwhich is a boolean to determine whether to continue if the linting failsBetween these three options the user has a high degree of control over how the pipeline lints which should handle most use cases.
Implements tests for all cases using the rnaseq minimal test dataset which has invalid sequencing names 🙄 .
Closes #31
PR checklist
nf-core lint).nf-test test main.nf.test -profile test,docker).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).