Skip to content

geokousis/TNT-seq

Repository files navigation

TNT-seq peak-calling and anchor-distance pipeline

Pipeline to (i) build strand-specific TNT-seq peak-summits from replicate bigWigs, (ii) compute summit to anchor distances for 5′SS / 3′SS / start / stop codon, and (iii) optionally compare to steady-state m6A peaks (Schwartz et al., 2014). This workflow is designed to reproduce the distance to anchor style plots used in Louloupi & Ntini et al. (Cell Reports, 2018; 10.1016/j.celrep.2018.05.077).


TNT-seq vs m6A normalized densities

Overview

Inputs

  • TNT-seq strand-specific bigWigs (IP and Input; 3 replicates; plus and minus)
  • hg19 chromosome sizes (hg19.chrom.sizes)
  • GENCODE v19 GTF (gencode.v19.annotation.gtf.gz) for anchors
  • Optional: mmc3.xlsx (Schwartz 2014) for steady-state m6A comparison

Outputs

  • TNT-seq peak summits: results/peaks/summits.all.bed6
  • Anchor distance tables: results/dist/*.tsv
  • Optional m6A distance tables: results/dist_m6aseq/*.tsv
  • Figures: results/figs/*.png

Requirements

Software

  • bedtools
  • bigtools (for bigWigAverageOverBed)
  • GNU parallel
  • Python 3 with: pandas numpy scipy matplotlib (optional: seaborn)

Suggested conda environment

mamba create -n tntseq -c conda-forge -c bioconda \
  python=3.11 pandas numpy scipy matplotlib seaborn \
  bedtools parallel bigtools
mamba activate tntseq

Repository layout

Recommended folder structure:

TNT_Assignment/
├── raw_data/                 # bigWig inputs
├── annotation/               # chrom.sizes, windows, GTF, split beds
├── anchors/                  # anchor BEDs (generated)
├── scripts/                  # helper scripts + pipelines
├── results/                  # outputs
├── scripts/                  # helper scripts + pipelines
├── 00_make_windows.sh        # Preprocessing helper to create windows for the genome
├── 01_prep_anchors.sh        # Preprocessing helper to create anchors based on annotation file
└── 02_run_tntseq_analysis.sh # main analysis runner

Inputs

TNT-seq bigWigs

Place strand-specific bigWigs under raw_data/ and set paths in 02_run_tntseq_analysis.sh. Example (from GSE83561):

  • Inputs: GSM3143797-799
  • IP: GSM3143800-802 Each replicate has:
  • *.plus.bw
  • *.minus.bw

So the header will be

IP1_PLUS="raw_data/GSM3143800_m6AIP1.plus.bw"                                                                                                                                                                                                           
IP1_MINUS="raw_data/GSM3143800_m6AIP1.minus.bw"                                                                                                                                                                                                         
IN1_PLUS="raw_data/GSM3143797_m6AInput1.plus.bw"                                                                                                                                                                                                        
IN1_MINUS="raw_data/GSM3143797_m6AInput1.minus.bw"                                                                                                                                                                                                      
IP2_PLUS="raw_data/GSM3143801_m6AIP2.plus.bw"                                                                                                                                                                                                           
IP2_MINUS="raw_data/GSM3143801_m6AIP2.minus.bw"                                                                                                                                                                                                         
IN2_PLUS="raw_data/GSM3143798_m6AInput2.plus.bw"                                                                                                                                                                                                        
IN2_MINUS="raw_data/GSM3143798_m6AInput2.minus.bw"                                                                                                                                                                                                     
IP3_PLUS="raw_data/GSM3143802_m6aIP3.plus.bw"                                                                                                                                                                                                           
IP3_MINUS="raw_data/GSM3143802_m6aIP3.minus.bw"                                                                                                                                                                                                         
IN3_PLUS="raw_data/GSM3143799_m6aInput3.plus.bw"                                                                                                                                                                                                        
IN3_MINUS="raw_data/GSM3143799_m6aInput3.minus.bw"

hg19 chromosome sizes

The paper used hg19; this pipeline assumes hg19 to avoid liftover.

mkdir -p annotation
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes \
  -O annotation/hg19.chrom.sizes

GENCODE v19 annotation (hg19)

wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz \
  -O annotation/gencode.v19.annotation.gtf.gz

Optional: steady-state m6A peaks (Schwartz 2014)

If you want TNT-seq vs m6A-seq comparison, provide the Excel file and set M6A=/path/to/xlsx in 02_run_tntseq_analysis.sh.


How to run

Step 0 - Build 20 bp genome windows (hg19)

Creates:

  • annotation/genome_20bp.bed
  • annotation/genome_20bp_named.bed
  • annotation/split_beds/chr*.bed (canonical chroms only random, Un etc.. will be ignoredx)
./00_make_windows.sh annotation/hg19.chrom.sizes annotation 20
# args: chrom.sizes  outdir  window_size

Step 1 - Build anchors from GTF (hg19)

Creates (BED6, 1bp anchors, strand-aware):

  • anchors/anchors_5SS.bed
  • anchors/anchors_3SS.bed
  • anchors/anchors_startcodon.bed
  • anchors/anchors_stopcodon.bed
./01_prep_anchors.sh annotation/gencode.v19.annotation.gtf.gz anchors scripts
# args: gtf.gz  out_anchor_dir  scripts_dir

Step 2 - Run the full TNT-seq analysis

This produces TNT-seq summits, distance tables, and plots; optionally includes m6A-seq comparison if M6A is set and the file exists.

./02_run_tntseq_analysis.sh
# configure paths at the top of the script (bigWigs, OUTDIR, ANCHORS, M6A)

What Step 2 does

  1. Coverage matrices (20 bp bins) For each replicate and strand:
  • computes IP and Input coverage per 20 bp bin using bigWigAverageOverBed
  • writes results/matrx/repX_{plus,minus}_matrix.txt (chr, start, end, IP_sum, Input_sum)
  1. Bin scoring (per replicate) Runs score_bins.py to compute per-bin enrichment/statistics and writes:
  • results/scored/repX.{plus,minus}.bed
  1. Consensus bins across replicates This step enforces two filters:
    1. Statistical significance in all replicates (FDR / q-value)
      • Each replicate is scored score_bins.py and only bins with q < 0.05 are kept.
      • An all-3 intersection is then taken so the bin must be significant (q < 0.05) in rep1 AND rep2 AND rep3 (coordinate match, strand-specific)
    2. Enrichment support across replicates (Fold enrichment)
      • For bins passing the all-3 q-filter, it's required FE ≥ 4 in at least 2 out of 3 replicates (2/3 rule).
        Outputs:
  • results/final_bins/sigbins.plus.bed6
  • results/final_bins/sigbins.minus.bed6
  1. Peak merging and summits
  • merges adjacent significant bins into peaks (strand preserved as BED6)

  • maps bins → peak IDs

  • calls per-peak summit (via make_summits.py)

  • writes:

    • results/peaks/summits.plus.bed6
    • results/peaks/summits.minus.bed6
    • results/peaks/summits.all.bed6
  1. Distances to anchors
  • sorts summits + anchors

  • computes nearest anchor distances using bedtools closest -sorted -s -t first -d

  • writes:

    • results/dist/summit_to_5SS.tsv
    • results/dist/summit_to_3SS.tsv
    • results/dist/summit_to_start.tsv
    • results/dist/summit_to_stop.tsv
  1. Optional: steady-state m6A-seq comparison If M6A exists:
  • converts Excel peaks to BED6 summits (xlsx_to_summits_bed.py, sheet "Human Peaks")

  • computes m6A summit distances to the same anchors

  • writes:

    • results/m6aseq/peaks/summits.all.bed6
    • results/dist_m6aseq/*.tsv
  1. Plotting Generates two-line plots (TNT-seq vs m6A-seq):
  • raw counts: results/figs/fig_TNT_vs_m6A.png
  • normalized density: results/figs/fig_TNT_vs_m6A_norm.png (value/MAX per window, so max value is 1)

Outputs

Core

  • results/peaks/summits.all.bed6 Strand-aware TNT-seq summits (BED6)

  • results/dist/*.tsv Summit-to-anchor closest results (bedtools closest output)

Optional (m6A-seq)

  • results/m6aseq/peaks/summits.all.bed6
  • results/dist_m6aseq/*.tsv

Figures

  • results/figs/fig_TNT_vs_m6A.png TNT-seq vs m6A raw counts
  • results/figs/fig_TNT_vs_m6A_norm.png TNT-seq vs m6A normalized densities

Notes & Results

My plots are close to Fig. 1C–F but not identical. This is probably for two reasons:

  1. The paper computed coverage from BAM files with bedtools coverageBed (bam files), while we used bigWig signal. Small differences in how coverage is summarized can change IP/Input values and therefore the bins that pass the statistical filters.
  2. The authors likely used a slightly different set of anchors (especially for start/stop codons and transcript isoforms). If the anchor definitions differ, the “distance-to-nearest-anchor” frequencies will also shift. This is also suggested by the fact that the public steady-state m6A dataset shows frequency differences too.

Even so, the main pattern matches the paper: TNT-seq shows strong enrichment near the 5′ and 3′ splice junctions, and it also shows enrichment near the start and stop codons. However, the start/stop profiles are generally broader, weaker, and noisier than the splice-junction signal. The normalized plot makes this easier to see: the overall trends remain the same even if the exact peak heights differ, confirming that TNT-seq and m6A-seq follow similar overall profiles.


Citations

  • Louloupi, A. & Ntini, E. et al. Cell Reports (2018). 10.1016/j.celrep.2018.05.077
  • Schwartz, S. et al. Cell Reports (2014). 10.1016/j.celrep.2014.05.048

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published