TNT-seq peak-calling and anchor-distance pipeline

Pipeline to (i) build strand-specific TNT-seq peak-summits from replicate bigWigs, (ii) compute summit to anchor distances for 5′SS / 3′SS / start / stop codon, and (iii) optionally compare to steady-state m6A peaks (Schwartz et al., 2014). This workflow is designed to reproduce the distance to anchor style plots used in Louloupi & Ntini et al. (Cell Reports, 2018; 10.1016/j.celrep.2018.05.077).

Overview

Inputs

TNT-seq strand-specific bigWigs (IP and Input; 3 replicates; plus and minus)
hg19 chromosome sizes (hg19.chrom.sizes)
GENCODE v19 GTF (gencode.v19.annotation.gtf.gz) for anchors
Optional: mmc3.xlsx (Schwartz 2014) for steady-state m6A comparison

Outputs

TNT-seq peak summits: results/peaks/summits.all.bed6
Anchor distance tables: results/dist/*.tsv
Optional m6A distance tables: results/dist_m6aseq/*.tsv
Figures: results/figs/*.png

Requirements

Software

bedtools
bigtools (for bigWigAverageOverBed)
GNU parallel
Python 3 with: pandas numpy scipy matplotlib (optional: seaborn)

Suggested conda environment

mamba create -n tntseq -c conda-forge -c bioconda \
  python=3.11 pandas numpy scipy matplotlib seaborn \
  bedtools parallel bigtools
mamba activate tntseq

Repository layout

Recommended folder structure:

TNT_Assignment/
├── raw_data/                 # bigWig inputs
├── annotation/               # chrom.sizes, windows, GTF, split beds
├── anchors/                  # anchor BEDs (generated)
├── scripts/                  # helper scripts + pipelines
├── results/                  # outputs
├── scripts/                  # helper scripts + pipelines
├── 00_make_windows.sh        # Preprocessing helper to create windows for the genome
├── 01_prep_anchors.sh        # Preprocessing helper to create anchors based on annotation file
└── 02_run_tntseq_analysis.sh # main analysis runner

Inputs

TNT-seq bigWigs

Place strand-specific bigWigs under raw_data/ and set paths in 02_run_tntseq_analysis.sh. Example (from GSE83561):

Inputs: GSM3143797-799
IP: GSM3143800-802 Each replicate has:
*.plus.bw
*.minus.bw

So the header will be

IP1_PLUS="raw_data/GSM3143800_m6AIP1.plus.bw"                                                                                                                                                                                                           
IP1_MINUS="raw_data/GSM3143800_m6AIP1.minus.bw"                                                                                                                                                                                                         
IN1_PLUS="raw_data/GSM3143797_m6AInput1.plus.bw"                                                                                                                                                                                                        
IN1_MINUS="raw_data/GSM3143797_m6AInput1.minus.bw"                                                                                                                                                                                                      
IP2_PLUS="raw_data/GSM3143801_m6AIP2.plus.bw"                                                                                                                                                                                                           
IP2_MINUS="raw_data/GSM3143801_m6AIP2.minus.bw"                                                                                                                                                                                                         
IN2_PLUS="raw_data/GSM3143798_m6AInput2.plus.bw"                                                                                                                                                                                                        
IN2_MINUS="raw_data/GSM3143798_m6AInput2.minus.bw"                                                                                                                                                                                                     
IP3_PLUS="raw_data/GSM3143802_m6aIP3.plus.bw"                                                                                                                                                                                                           
IP3_MINUS="raw_data/GSM3143802_m6aIP3.minus.bw"                                                                                                                                                                                                         
IN3_PLUS="raw_data/GSM3143799_m6aInput3.plus.bw"                                                                                                                                                                                                        
IN3_MINUS="raw_data/GSM3143799_m6aInput3.minus.bw"

hg19 chromosome sizes

The paper used hg19; this pipeline assumes hg19 to avoid liftover.

mkdir -p annotation
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes \
  -O annotation/hg19.chrom.sizes

GENCODE v19 annotation (hg19)

wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz \
  -O annotation/gencode.v19.annotation.gtf.gz

Optional: steady-state m6A peaks (Schwartz 2014)

If you want TNT-seq vs m6A-seq comparison, provide the Excel file and set M6A=/path/to/xlsx in 02_run_tntseq_analysis.sh.

How to run

Step 0 - Build 20 bp genome windows (hg19)

Creates:

annotation/genome_20bp.bed
annotation/genome_20bp_named.bed
annotation/split_beds/chr*.bed (canonical chroms only random, Un etc.. will be ignoredx)

./00_make_windows.sh annotation/hg19.chrom.sizes annotation 20
# args: chrom.sizes  outdir  window_size

Step 1 - Build anchors from GTF (hg19)

Creates (BED6, 1bp anchors, strand-aware):

anchors/anchors_5SS.bed
anchors/anchors_3SS.bed
anchors/anchors_startcodon.bed
anchors/anchors_stopcodon.bed

./01_prep_anchors.sh annotation/gencode.v19.annotation.gtf.gz anchors scripts
# args: gtf.gz  out_anchor_dir  scripts_dir

Step 2 - Run the full TNT-seq analysis

This produces TNT-seq summits, distance tables, and plots; optionally includes m6A-seq comparison if M6A is set and the file exists.

./02_run_tntseq_analysis.sh
# configure paths at the top of the script (bigWigs, OUTDIR, ANCHORS, M6A)

What Step 2 does

Coverage matrices (20 bp bins) For each replicate and strand:

computes IP and Input coverage per 20 bp bin using bigWigAverageOverBed
writes results/matrx/repX_{plus,minus}_matrix.txt (chr, start, end, IP_sum, Input_sum)

Bin scoring (per replicate) Runs score_bins.py to compute per-bin enrichment/statistics and writes:

results/scored/repX.{plus,minus}.bed

Consensus bins across replicates This step enforces two filters:
1. Statistical significance in all replicates (FDR / q-value)
  - Each replicate is scored score_bins.py and only bins with q < 0.05 are kept.
  - An all-3 intersection is then taken so the bin must be significant (q < 0.05) in rep1 AND rep2 AND rep3 (coordinate match, strand-specific)
2. Enrichment support across replicates (Fold enrichment)
  - For bins passing the all-3 q-filter, it's required FE ≥ 4 in at least 2 out of 3 replicates (2/3 rule).
    Outputs:

results/final_bins/sigbins.plus.bed6
results/final_bins/sigbins.minus.bed6

Peak merging and summits

merges adjacent significant bins into peaks (strand preserved as BED6)
maps bins → peak IDs
calls per-peak summit (via make_summits.py)
writes:
- results/peaks/summits.plus.bed6
- results/peaks/summits.minus.bed6
- results/peaks/summits.all.bed6

Distances to anchors

sorts summits + anchors
computes nearest anchor distances using bedtools closest -sorted -s -t first -d
writes:
- results/dist/summit_to_5SS.tsv
- results/dist/summit_to_3SS.tsv
- results/dist/summit_to_start.tsv
- results/dist/summit_to_stop.tsv

Optional: steady-state m6A-seq comparison If M6A exists:

converts Excel peaks to BED6 summits (xlsx_to_summits_bed.py, sheet "Human Peaks")
computes m6A summit distances to the same anchors
writes:
- results/m6aseq/peaks/summits.all.bed6
- results/dist_m6aseq/*.tsv

Plotting Generates two-line plots (TNT-seq vs m6A-seq):

raw counts: results/figs/fig_TNT_vs_m6A.png
normalized density: results/figs/fig_TNT_vs_m6A_norm.png (value/MAX per window, so max value is 1)

Outputs

Core

results/peaks/summits.all.bed6 Strand-aware TNT-seq summits (BED6)
results/dist/*.tsv Summit-to-anchor closest results (bedtools closest output)

Optional (m6A-seq)

results/m6aseq/peaks/summits.all.bed6
results/dist_m6aseq/*.tsv

Figures

results/figs/fig_TNT_vs_m6A.png
results/figs/fig_TNT_vs_m6A_norm.png

Notes & Results

My plots are close to Fig. 1C–F but not identical. This is probably for two reasons:

The paper computed coverage from BAM files with bedtools coverageBed (bam files), while we used bigWig signal. Small differences in how coverage is summarized can change IP/Input values and therefore the bins that pass the statistical filters.
The authors likely used a slightly different set of anchors (especially for start/stop codons and transcript isoforms). If the anchor definitions differ, the “distance-to-nearest-anchor” frequencies will also shift. This is also suggested by the fact that the public steady-state m6A dataset shows frequency differences too.

Even so, the main pattern matches the paper: TNT-seq shows strong enrichment near the 5′ and 3′ splice junctions, and it also shows enrichment near the start and stop codons. However, the start/stop profiles are generally broader, weaker, and noisier than the splice-junction signal. The normalized plot makes this easier to see: the overall trends remain the same even if the exact peak heights differ, confirming that TNT-seq and m6A-seq follow similar overall profiles.

Citations

Louloupi, A. & Ntini, E. et al. Cell Reports (2018). 10.1016/j.celrep.2018.05.077
Schwartz, S. et al. Cell Reports (2014). 10.1016/j.celrep.2014.05.048

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TNT-seq peak-calling and anchor-distance pipeline

Overview

Requirements

Software

Suggested conda environment

Repository layout

Inputs

TNT-seq bigWigs

hg19 chromosome sizes

GENCODE v19 annotation (hg19)

Optional: steady-state m6A peaks (Schwartz 2014)

How to run

Step 0 - Build 20 bp genome windows (hg19)

Step 1 - Build anchors from GTF (hg19)

Step 2 - Run the full TNT-seq analysis

What Step 2 does

Outputs

Core

Optional (m6A-seq)

Figures

Notes & Results

Citations

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
results/figs		results/figs
scripts		scripts
.gitignore		.gitignore
00_make_windows.sh		00_make_windows.sh
01_prep_anchors.sh		01_prep_anchors.sh
02_run_tntseq_analysis.sh		02_run_tntseq_analysis.sh
README.md		README.md

geokousis/TNT-seq

Folders and files

Latest commit

History

Repository files navigation

TNT-seq peak-calling and anchor-distance pipeline

Overview

Requirements

Software

Suggested conda environment

Repository layout

Inputs

TNT-seq bigWigs

hg19 chromosome sizes

GENCODE v19 annotation (hg19)

Optional: steady-state m6A peaks (Schwartz 2014)

How to run

Step 0 - Build 20 bp genome windows (hg19)

Step 1 - Build anchors from GTF (hg19)

Step 2 - Run the full TNT-seq analysis

What Step 2 does

Outputs

Core

Optional (m6A-seq)

Figures

Notes & Results

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages