Add NMD prediction support

Add Nonsense Mediated Decay (NMD) prediction support for pVACsplice (might work for pVACseq as well, for indels).


Contenders: (tl,dr: NMDj is top tool candidate based on benchmark result and accessibility)

* **aenmd**: 
  * R tool. [Paper in 2023](https://academic.oup.com/bioinformatics/article/39/9/btad556/7265396). Predict NMD-escape from 5 well-known rules

* **NMDClassifier** (2nd best candidate?): 
  * perl tool. [Paper in 2017](https://pmc.ncbi.nlm.nih.gov/articles/PMC5378362/) . 
  * Explore only 1 rule: 50-NT rule. (Reason: benchmark shows that 3/5 other rules arent equally reliable, the 55-NT rule is equally good, yet largest MCC value occurred at 51 NT, very close to 50NT -> prioritize 50-NT rule  - see section 'Detection of NMDTs').  
  * How to install: download the .tar on biotools: https://bio.tools/nmd_classifier . Have no github repo.
  * inputs: transcripts.gtf  , annotation gtf file (ensembl/ncbi reference gtf ), genomic sequence (fa?) , annotation mode (ensembl or ncbi) , optional filtering parameter.

* **NMDj** (top candidate?): python tool
  * python tool. [Paper in 2025](https://pmc.ncbi.nlm.nih.gov/articles/PMC12322889/). 
  * Principle: also use 50-NT rule and compare (novel?/input) transcript to reference transcript, just as NMD Classifier. However, NMD Classifier uses best partner transcript as reference, whereas NMDj uses (1)  MANE-select transcript as default or (2) transcript with best expression level , in user-input mode. Transcript with best expression level is determined by psi value, calculated using split read counts reported by pyIPSA. 
  * Rationale behind option number 2: 
> the probability of NMDT being derived from a protein- coding transcript via AS depends not only on the similarity in their exon-intron architectures but also on their expression levels. The coding transcript with the highest expression level is more likely to be the source of NMDT [[14](https://pmc.ncbi.nlm.nih.gov/articles/PMC12322889/#R14)]. Furthermore, NMDT may be derived from different transcripts with comparable expression levels, which calls into question the validity of the approach based on the selection of only one matching transcript partner.  
  * repo: https://github.com/zavilev/NMDj/
  * installation: git clone https://github.com/zavilev/NMDj.git
  * inputs: input trancripts.gtf , OPTIONAL inputs:  annotation.gtf (reference gtf?), genome.fa (reference fa), transcripts.txt (user-input reference transcript ids), file.txt (File with paths to ipsa files containing counts of RNA-Seq split-reads aligned to junctions), ...
  * benchmark with NMDclassifier. Also benchmark MANE-select vs best expression transcript. Best expression transcript wins.

* **NMDEP**: an AI/ML model. [Manuscript on arxiv in 2025](https://arxiv.org/html/2502.14547v1). 
  * repo/installation note: none, as of Mar 2026.  
  * no benchmark with previous tools. but the model also have PCT as top predictor, so pretty much agree with literature.

* **factR/predictNMD**: R function. https://rdrr.io/github/fursham-h/factR/man/predictNMD.html 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NMD prediction support #1370

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add NMD prediction support #1370

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions