Skip to content

Add NMD prediction support #1370

@mhoang22

Description

@mhoang22

Add Nonsense Mediated Decay (NMD) prediction support for pVACsplice (might work for pVACseq as well, for indels).

Contenders: (tl,dr: NMDj is top tool candidate based on benchmark result and accessibility)

  • aenmd:

    • R tool. Paper in 2023. Predict NMD-escape from 5 well-known rules
  • NMDClassifier (2nd best candidate?):

    • perl tool. Paper in 2017 .
    • Explore only 1 rule: 50-NT rule. (Reason: benchmark shows that 3/5 other rules arent equally reliable, the 55-NT rule is equally good, yet largest MCC value occurred at 51 NT, very close to 50NT -> prioritize 50-NT rule - see section 'Detection of NMDTs').
    • How to install: download the .tar on biotools: https://bio.tools/nmd_classifier . Have no github repo.
    • inputs: transcripts.gtf , annotation gtf file (ensembl/ncbi reference gtf ), genomic sequence (fa?) , annotation mode (ensembl or ncbi) , optional filtering parameter.
  • NMDj (top candidate?): python tool

    • python tool. Paper in 2025.
    • Principle: also use 50-NT rule and compare (novel?/input) transcript to reference transcript, just as NMD Classifier. However, NMD Classifier uses best partner transcript as reference, whereas NMDj uses (1) MANE-select transcript as default or (2) transcript with best expression level , in user-input mode. Transcript with best expression level is determined by psi value, calculated using split read counts reported by pyIPSA.
    • Rationale behind option number 2:

the probability of NMDT being derived from a protein- coding transcript via AS depends not only on the similarity in their exon-intron architectures but also on their expression levels. The coding transcript with the highest expression level is more likely to be the source of NMDT [14]. Furthermore, NMDT may be derived from different transcripts with comparable expression levels, which calls into question the validity of the approach based on the selection of only one matching transcript partner.

  • repo: https://github.com/zavilev/NMDj/

  • installation: git clone https://github.com/zavilev/NMDj.git

  • inputs: input trancripts.gtf , OPTIONAL inputs: annotation.gtf (reference gtf?), genome.fa (reference fa), transcripts.txt (user-input reference transcript ids), file.txt (File with paths to ipsa files containing counts of RNA-Seq split-reads aligned to junctions), ...

  • benchmark with NMDclassifier. Also benchmark MANE-select vs best expression transcript. Best expression transcript wins.

  • NMDEP: an AI/ML model. Manuscript on arxiv in 2025.

    • repo/installation note: none, as of Mar 2026.
    • no benchmark with previous tools. but the model also have PCT as top predictor, so pretty much agree with literature.
  • factR/predictNMD: R function. https://rdrr.io/github/fursham-h/factR/man/predictNMD.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions