Skip to content

count_control error: estimated false positive rate is 0.385 (FPR too high, bailing out!!! #385

@moldach

Description

@moldach

After running successfully through the example dataset I've ran kevlar on my own data but am getting an error at the count_control step, that the FPR is too high.

I'm trying to figure out

  • why?
  • what does this mean?
  • how can I solve this issue?

Error log

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	16	assemble
	16	call
	1	calls
	1	count_case
	2	count_control
	1	count_reference
	1	create_mask
	1	filter_novel
	1	like_scores
	1	link_input_seqs
	1	link_mask
	1	link_reference
	1	localize
	1	novel
	1	partition
	1	split
	47
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
  warnings.warn("Spaces are not permitted in the name. Converted to '_'")

[Thu Oct 22 10:46:50 2020]
Job 24: Create internal links for sample sequence data.

[Thu Oct 22 10:46:50 2020]
Job 42: Create internal links for mask sequence data.

[Thu Oct 22 10:46:50 2020]
Job 22: Create internal links for reference genome, and index if needed.

�[33mJob counts:
	count	jobs
	1	link_input_seqs
	1�[0m
�[33mJob counts:
	count	jobs
	1	link_mask
	1�[0m
�[33mJob counts:
	count	jobs
	1	link_reference
	1�[0m
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
  warnings.warn("Spaces are not permitted in the name. Converted to '_'")
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
  warnings.warn("Spaces are not permitted in the name. Converted to '_'")
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
  warnings.warn("Spaces are not permitted in the name. Converted to '_'")
[Thu Oct 22 10:46:51 2020]
Finished job 42.
1 of 47 steps (2%) done
[Thu Oct 22 10:46:51 2020]
Finished job 24.
2 of 47 steps (4%) done
[Thu Oct 22 10:46:51 2020]
Finished job 22.
3 of 47 steps (6%) done

[Thu Oct 22 10:46:51 2020]
Job 2: Count k-mers in the reference genome.

kevlar --tee --logfile Logs/refrcount.log count --ksize 31 --counter-size 4 --memory 12G --max-fpr 0.025 --threads 8 Reference/refr-counts.smallcounttable Reference/Homo_sapiens.GRCh37.dna.toplevel.fa
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a small count table, a CountMin sketch with a counter size of 4 bits, for k-mer abundance queries (max abundance 15)
[kevlar::count] - processing "Reference/Homo_sapiens.GRCh37.dna.toplevel.fa"
[kevlar::count] Done loading k-mers;
    297 reads processed, 2486108683 distinct k-mers stored;
    estimated false positive rate is 0.013;
    saved to "Reference/refr-counts.smallcounttable"
[kevlar::count] Total time: 17203.38 seconds
[Thu Oct 22 15:33:50 2020]
Finished job 2.
4 of 47 steps (9%) done

[Thu Oct 22 15:33:50 2020]
Job 23: Generate a mask of sequences to ignore while k-mer counting.

kevlar --tee --logfile Logs/mask.log count --ksize 31 --counter-size 1 --memory 6G --max-fpr 0.005 --threads 8 Mask/mask.nodetable Mask/Homo_sapiens.GRCh37.dna.toplevel.fa
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a node table (Bloom filter) for k-mer presence/absence queries
[kevlar::count] - processing "Mask/Homo_sapiens.GRCh37.dna.toplevel.fa"
[kevlar::count] Done loading k-mers;
    297 reads processed, 2493095857 distinct k-mers stored;
    estimated false positive rate is 0.001;
    saved to "Mask/mask.nodetable"
[kevlar::count] Total time: 5911.40 seconds
[Thu Oct 22 17:12:24 2020]
Finished job 23.
5 of 47 steps (11%) done

[Thu Oct 22 17:12:24 2020]
Job 5: Count k-mers in a control sample

�[33mJob counts:
	count	jobs
	1	count_control
	1�[0m
/home/moldach/miniconda3/lib/python3.7/site-packages/pulp/pulp.py:1195: UserWarning: Spaces are not permitted in the name. Converted to '_'
  warnings.warn("Spaces are not permitted in the name. Converted to '_'")
�[33mkevlar --tee --logfile Logs/ctrl1count.log count --ksize 31 --memory 16G --max-fpr 0.05 --mask Mask/mask.nodetable --threads 8 Sketches/ctrl1-counts.counttable Reads/ctrl1.inseq.0.fastq.gz Reads/ctrl1.inseq.1.fastq.gz�[0m
[kevlar] running version 0.7+15.gebabd62
[kevlar::count] Storing k-mers in a count table, a CountMin sketch with a counter size of 8 bits, for k-mer abundance queries (max abundance 255)
[kevlar::count] - processing "Reads/ctrl1.inseq.0.fastq.gz"
[kevlar::count] - processing "Reads/ctrl1.inseq.1.fastq.gz"
[kevlar::count] Done loading k-mers;
    851849754 reads processed, 5429363124 distinct k-mers stored;
    estimated false positive rate is 0.385 (FPR too high, bailing out!!!)
�[32m[Thu Oct 22 19:34:15 2020]�[0m
�[31mError in rule count_control:�[0m
�[31m    jobid: 0�[0m
�[31m    output: Sketches/ctrl1-counts.counttable, Logs/ctrl1count.log�[0m
�[31m�[0m
�[31mRuleException:
CalledProcessError in line 243 of /gpfs/home/moldach/projects/CG00018/Snakefile:
Command 'set -euo pipefail;  kevlar --tee --logfile Logs/ctrl1count.log count --ksize 31 --memory 16G --max-fpr 0.05 --mask Mask/mask.nodetable --threads 8 Sketches/ctrl1-counts.counttable Reads/ctrl1.inseq.0.fastq.gz Reads/ctrl1.inseq.1.fastq.gz' returned non-zero exit status 1.
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2189, in run_wrapper
  File "/gpfs/home/moldach/projects/CG00018/Snakefile", line 243, in __rule_count_control
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 529, in _callback
  File "/home/moldach/miniconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
  File "/home/moldach/miniconda3/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2201, in run_wrapper�[0m
�[31mExiting because a job execution failed. Look above for error message�[0m
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/home/moldach/projects/CG00018/.snakemake/log/2020-10-22T104648.700597.snakemake.log

config.json

{
    "ksize": 31,
    "recountmem": "1G",
    "numsplit": 16,
    "samples": {
        "casemin": 6,
        "ctrlmax": 1,
        "case": {
            "fastx": [
                "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018P_R1.fastq.gz",
                "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018P_R2.fastq.gz"
            ],
            "memory": "16G",
            "label": "Proband",
            "max_fpr": 0.3
        },
	"controls": [
            {
             	"fastx": [
                    "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018M_R1.fastq.gz",
                    "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018M_R2.fastq.gz"
                ],
                "memory": "16G",
                "label": "Mother",
                "max_fpr": 0.05
            },
            {
             	"fastx": [
                    "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018F_R1.fastq.gz",
                    "/tiered/mtgraovac/CombGene/CG00018/FASTQ/CG00018F_R2.fastq.gz"
                ],
                "memory": "16G",
                "label": "Father",
                "max_fpr": 0.05
            }
	],
	"coverage": {
            "mean": 30.0,
            "stdev": 10.0
        }
    },
    "mask": {
	"fastx": [
            "/tiered/mtgraovac/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa"
        ],
	"memory": "6G",
        "max_fpr": 0.005
    },
    "reference": {
        "fasta": "/tiered/mtgraovac/indexes/GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa",
        "memory": "12G",
        "max_fpr": 0.025
    },
    "localize": {
        "seedsize": 51,
        "delta": 50,
        "seqpattern": ".",
        "maxdiff": 10000
    },
    "varfilter": null
}

Submission

#!/bin/bash
#BSUB -q normal
#BSUB -J kevlar
#BSUB -R "rusage[mem=16G]"
#BSUB -n 8
#BSUB -M 16000
#BSUB -W 600:00
#BSUB -u moldach@ucalgary.ca
#BSUB -R "select[hname!=node013]"
#BSUB -B
#BSUB -N
#BSUB -o kevlar_CG00018.out
#BSUB -e kevlar_CG00018.err

source ~/kavlar-test/kevlar-env/bin/activate
snakemake --snakefile Snakefile --configfile config.json --cores 8 --directory ./ -p calls

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions