Skip to content

Regtools junction extract: strandness detection malfunction #197

@santataRU

Description

@santataRU

Dear all,

I am working with paired-end RNA-seq data and trying to use Regtools (version 1.0.0) to extract splicing junctions. I aligned my reads with HISAT2, using the --rna-strandness RF option, so that the output BAM files include strand information in the "XS" attribute. My HISAT2 command is as follows:

hisat2 -p 10 --rna-strandness RF -x /Index/hg38_index  -1 ./Input_R1_001.fastq.gz -2 ./Input_R2_001.fastq.gz  \ 
| samtools sort -o output.sorted.bam && samtools index output.sorted.bam

To extract junctions, I used this command:

regtools junctions extract -s RF output.sorted.bam

In the 6th column of the output .bed file, I noticed that some junctions are labeled with "+", some with "-", and some with "?". Specifically, the output .bed file contains 606,976 junctions, with 223,254 marked as "-", 231,447 as "+", and 152,275 marked as "?".

This seems unusual since every aligned read includes strand information as either "+" or "-", so I would not expect any "?" marks in the Regtools output.

I would appreciate any insights or suggestions on this issue.

I have attached a small-sized sample (first 1000 reads of a fastq file) of the FASTQ and BAM files to reproduce the error at the following link: https://drive.google.com/drive/folders/1vNdGXk74L8E9SHMaobbnFq2nwF0xe7vq?usp=drive_link

Best regards,
Xiao

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions