Skip to content
This repository was archived by the owner on Jan 31, 2020. It is now read-only.
This repository was archived by the owner on Jan 31, 2020. It is now read-only.

Duplicate entries in Pindel output #109

@stevekm

Description

@stevekm

I was having problems with the annotation of the .vcf output from Pindel, due to the presence of duplicate entries in the .vcf. For example:


#CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | NORMAL | TUMOR

chr2 | 113983582 | . | T | TGGGAGTCCGGGGCCAGGAGGGACAGAGGAGTCAGTATTCTGTATTTTCAACGCCCCCCACCCGGACGGGTGGGAGGGT | . | PASS | END=113983582;HOMLEN=0;SVLEN=78;SVTYPE=INS | GT:AD | 0/0:1083,1 | 0/0:1115,0

chr2 | 113983582 | . | T | TGGGAGTCCGGGGCCAGGAGGGACAGAGGAGTCAGTATTCTGTATTTTCAACGCCCCCCACCCGGACGGGTGGGAGGGT | . | PASS | END=113983582;HOMLEN=0;SVLEN=78;SVTYPE=INS | GT:AD | 0/0:1083,1 | 0/0:1115,0

There are many such entries in the .vcf file produced.

I thought this might an issue with the .vcf conversion from the original data format, but the duplicates actually appear inside the raw data output as well:

$ grep 113983582 pindel_output/*
pindel_output/_SI:530	I 78	NT 78 "GGGAGTCCGGGGCCAGGAGGGACAGAGGAGTCAGTATTCTGTATTTTCAACGCCCCCCACCCGGACGGGTGGGAGGGT"	ChrID chr2	BP 113983582	113983583	BP_range 113983581	113983583	Supports 1	1	+ 1	1	- 0	0	S1 2	SUM_MS 60	2	NumSupSamples 1	1	NORMAL 1083 1071 1 1 0 0	TUMOR 1115 1105 0 0 0 0
pindel_output/_SI:552	I 78	NT 78 "GGGAGTCCGGGGCCAGGAGGGACAGAGGAGTCAGTATTCTGTATTTTCAACGCCCCCCACCCGGACGGGTGGGAGGGT"	ChrID chr2	BP 113983582	113983583	BP_range 113983581	113983583	Supports 1	1	+ 1	1	- 0	0	S1 2	SUM_MS 60	2	NumSupSamples 1	1	NORMAL 1083 1071 1 1 0 0	TUMOR 1115 1105 0 0 0 0

Why are duplicate entries being reported? And is it safe to remove them? What is the recommended removal method?

I am using Pindel version 0.2.5b9

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions