Skip to content

Tag handling with --keep-tags creates invalid SAM output #53

@adamjorr

Description

@adamjorr

Hi,

I was attempting to re-align reads that were aligned with another aligner.
I was using the --keep-tags option, primarily because I have RG tags and OQ tags that I care about on my reads. However, with --keep-tags, the other tags including MD, NM, MC, and AS are also copied. Since NGM also sets these tags, they are appended to the end of the read so that these tags all appear twice in the read. This is a violation of the SAM specification and consequently causes SAMtools to crash when it tries to parse the read.

As an example, the malformed read looks like this: 2 151M = 129799794 343 CCCTTGCTGCATGAGCCAGTAGCTGGGTGGGCATGGTAGCCTCTTGTCTTCCTAGCTTGCCCCTCCAGACATGGAACCTCCACACTGTGAGCGACTTGGTGTGGGGCAATCCAGGCAGATGTGCTCAGTCTGCCACACCTAGGATGGGGCT :862939:9=:=<<=9===<>4=>==<,;054=6;':=>8;/1/5;==?-<>??;<>>>9<<9?=&><7;;>28=.<<0:9-7>>@97<+<'+;3?>3)<:>==????@.8=@2:1-)>><?4?A).=??<)3=.;@>?A,*4@A5;#### MD:Z:10C48A91 PG:Z:MarkDuplicates.1E.5J RG:Z:HK2WY.5 NM:C:2 OQ:Z:####A7AA7,,FFFAA,A7,7FFA,,AF7AA<<,,7A7KF<,FFKKKFFAA<,7FAA<7,F,F7AKKFA,AA,FF,FA7FAF7FF(FKAFFAKKFFFKKKF,KKFF7,7,FAKF<,F7F<<<F,FKKKF<KAKKKAKKFKAFA<A<<,<<< UQ:C:22 AS:C:141 MQ:i:60 MC:Z:151M AS:i:1460 NM:i:2 NH:i:0 XI:f:0.9868 X0:i:0 XE:i:39 XR:i:151 MD:Z:91T48G10

For now, I'll get around this by getting the reads and aligning them as FASTQ, but if NGM is still being developed I think a good option would be to allow the user to specify which tags to keep when using --keep-tags, have NGM overwrite tags it outputs, or allow more user control over which tags are output by NGM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions