PoreMeth2

PoreMeth2 is an R package for the identification of Differentially Methylated Regions from Nanopore methylation data (inferred by methcallers such as Dorado or Guppy) of paired samples and for their functional interpretation.

Installation

Install devtools from CRAN (Required)

install.packages("devtools")

install PoreMeth2

options(timeout=9999999)

devtools::install_github("Lab-CoMBINE/PoreMeth2")

To use PoreMeth2 for identifying Differentially Methylated Regions (DMRs), methylation files from both samples need to be processed using Shell script provided with this package. Follow these steps:

Download or copy the additional scripts from the "AdditionalScripts" folder.
Ensure that ModkitResorter.sh and ParseModkit.pl are executable

	chmod 755 ModkitResorter.sh
	chmod 755 ParseModkit.pl

The script generating the entropy file requires an input file containing read-level methylation calls. These calls should be obtained using Modkit on Guppy or Dorado output data (See 1. Data Preparation, Modkit).

Features

The functions provided by PoreMeth2 allow to:

Detect shifts in CpGs' methylation levels and segment such shifts to obtain precise DMRs.
Predict DMRs' mechanism of origin (random or selection) by simultaneously segmenting entropy levels.
Functionally interpret DMRs by annotating them on genic and regulatory elements such as CpG Islands, Transcription Factor Binding Sites and Enhancers.

Usage

1. Data Preparation

Modkit

If you don't have modkit, you'll need to install modkit (not included in the additional scripts). You can find it at https://github.com/nanoporetech/modkit. If you already have modkit output files, you can skip this installation.

To run modkit on the Dorado BAM file or any other modbam, follow the instructions in their guide or execute modkit as follows:

REF=hg38.fasta # your fasta reference
THN=10
modkit extract full --mapped-only --threads ${THN} --cpg --reference ${REF} dorado.output.bam modkit.output.tsv

Calculating Entropy from Methylation Calls

Then run ModkitResorter.sh to process methylation, hydroxymethylation profiles, or hydroxymethylation/methylation profiles:

sh ModkitResorter.sh modkit.output.tsv

The script ModkitResorter.sh takes read-level methylation calls as input and calculates:

β (methylation levels across reads) Entropy (a measure of the relative proportion of possible epialleles) values for each CpG site

The entropy output files will be generated in the same directory as modkit.output.tsv.

The output from the parser is a tab separated file that follows this structure:

column	description
chrom	chromosome
position	genomic position of the CpG
entropy	entropy ($S$) value
entropy_cov	number of reads used to calculate $S$
beta	methylation level ($\beta$) of the CpG
beta_cov	number of reads used to calculate $\beta$

2. DMR identification

PoreMeth2DMR

The R function PoreMeth2DMR performs the double segmentation of $\beta$ and $S$ to identify DMRs and calculate $\Delta\beta$ and $\Delta S$ between test and control sample. We suggest to import TableTest and TableControl with fread function (from data.table package) or vroom (from Vroom package)

   TableDMR  <-  PoreMeth2DMR(TableTest, TableControl, omega = 0.1, eta = 1e-5, FW = 3)

Where:

TableTest and TableControl are the tables output by ModkitResorter.sh on the Test and Control samples respectively.

Note that we strongly suggest to filter input data based on beta_cov (a threshold of 10 is appropriate for 25-30x sequencing data, while it should be lowered for experiments with less coverage).
omega is an optional parameter that modulates the relative weight between the experimental and the biological variance. When omega is close to 1, the biological variance is much larger than the experimental one, while for values of omega close to 0 the experimental noise gives the leading contribution to the total variance. We suggest to use omega in the range 0.1-0.5.
eta is an optional parameter that represents the baseline probability the mean process (m_i) changes its value for the HSLM algorithm. Suggested values are inside $10^{-7}$-$10^{−3}$ range.
FW is the minimum number of datapoints for a DMR to be called (DMRs made of a number of CpGs smaller than FW are discarded).

The output of PoreMeth2DMR is a data table containing the following fields:

column	description
chr	chromosome
start	DMR's start position
end	DMR's end position
DeltaBeta	$\Delta\beta$ value between Test and Control samples
DeltaEntropy	$\Delta S$ value between Test and Control samples
BetaTest	mean $\beta$ value inside the DMR's coordinates for the Test sample
BetaControl	mean $\beta$ value inside the DMR's coordinates for the Control sample
EntropyTest	mean $S$ value inside the DMR's coordinates for the Test sample
EntropyControl	mean $S$ value inside the DMR's coordinates for the Control sample
NumCpG	number of CpG dinucleotides included in the region
p	pvalue associated to the DMR

Note that the function reports DMRs with any $\Delta\beta$ absolute value, but it is recommended to filter out DMRs with |$\Delta\beta$| < 0.2 to exclude unreliable results.

PoreMeth1DMR

In case read-level methylation calls are not available and entropy cannot be calculated, it is possible to identify DMRs based strictly on $\beta$ shifts between samples by using the function PoreMeth1DMR from the previous version of this tool.

The output of the function is identical to the one described for PoreMeth2DMR, without the fields related to Entropy.

3. DMR interpretation

PoreMethAnnotate

DMRs obtained with PoreMeth2DMR can be annotated to genic and regulatory elements with the following command:

	AnnotatedTableDMR <- PoreMethAnnotate(TableDMR, NumProc = 5, AnnotationType = "Genes", Assembly = "hg19")

Where:

TableDMR is the output table of PoreMeth2DMR
NumProc is an optional argument for the number of cores to use in parallel
AnnotationType is an optional argument that specifies whether to annotate DMRs on genic elements only ("Genes") or genic elements and regulatory features ("GenesReg").
Assembly is an optional parameter that specifies the reference version to use for annotation ("hg19" or "hg38").

The output file for "Genes" mode will contain the following additional fields:

column	description
chr.GenCode	chromosome
start.GenCode	start position of the annotated genic feature
end.GenCode	end position of the annotated genic feature
feature.GenCode	genic feature ( either `Promoter`, `FirstExon`, `LastExon` or numbered `Exon`/`Intron` )
strand.GenCode	either `+` or `-`
symbol.GenCode	gene name
type.GenCode	feature type ( e.g. `protein_coding`, `processed_transcript`, `lincRNA`... )
chr.GenCode.overlap	chromosome
start.GenCode.overlap	start position of the overlap with the annotated genic feature
end.GenCode.overlap	end position of the overlap with the annotated genic feature
ratio1.GenCode.overlap	ratio of the overlap's length with respect to genic feature's length
ratio2.GenCode.overlap	ratio of the overlap's length with respect to DMR's length

The output from "GenesReg" mode will also output, for CGIs, Enhancers, DNAse and TFBSs:

name
chr
start
end
chr overlap
start overlap
end overlap

4. Plots and statistics

PoreMeth2DMRStatistics

Given a DMR table (annotated or not) PoreMeth2DMRStatistics returns a summary the number of hyper/hypo-methylated regions with different values of $\Delta S$ across different genic features, CpG Islands and enhancers. The function can be used with:

	PoreMeth2DMRStatistics(TableDMR, Assembly = "hg19", BetaThr = 0.2, EntropyThr = 0.1, PValueThr = 0.05)

Where:

TableDMR is the output table of PoreMeth2DMR
Assembly is an optional parameter that specifies the reference version to use for statistics ("hg19" or "hg38").
BetaThr is the $\Delta \beta$ threshold applied for DMRs' classification
EntropyThr is the $\Delta S$ threshold applied for DMRs' classification
PValueThr is the value threshold to consider a DMR

The output table presents the following rows for each category:


HyperHigh	hyper-methylated ($\Delta \beta$ > `\|BetaThr\|`) and with high entropy ($\Delta S$ > `\|EntropyThr\|`)
HyperMid	hyper-methylated ($\Delta \beta$ > `\|BetaThr\|`) and with mid entropy ( `-\|EntropyThr\|` < $\Delta S$ < `\|EntropyThr\|`)
HyperLow	hyper-methylated ($\Delta \beta$ > `\|BetaThr\|`) and with low entropy ($\Delta S$ < `-\|EntropyThr\|`)
HypoHigh	hypo-methylated ($\Delta \beta$ < `-\|BetaThr\|`) and with high entropy ($\Delta S$ > `\|EntropyThr\|`)
HypoMid	hypo-methylated ($\Delta \beta$ < `-\|BetaThr\|`) and with mid entropy ( `-\|EntropyThr\|` < $\Delta S$ < `\|EntropyThr\|`)
HypoLow	hypo-methylated ($\Delta \beta$ < `-\|BetaThr\|`) and with low entropy ($\Delta S$ < `-\|EntropyThr\|`)

PoreMeth2SingleExpQualityPlot

This function allows to automatically print plots for the evaluation of input data quality by displaying stats about beta_cov and entropy_cov (see Data Preparation) and $\beta$ and $S$ density functions. Run with:

	PoreMeth2SingleExpQualityPlot(TableIn)

Where:

TableIn is the output from ModkitResorter.sh.

PoreMeth2PairedExpQualityPlot

With this function it is possible to display stats about beta_cov and entropy_cov for common CpG dineucleotides in the Test/Control samples pair.

	PoreMeth2PairedExpQualityPlot(TableTest,TableControl)

TableTest is the output from ModkitResorter.pl.
TableControl is the output from ModkitResorter.pl.

PoreMeth2Plot

This function permits to plot DMRs $\Delta \beta$ and $\Delta S$ levels with genomic and regulatory annotations. It can be used with:

	PoreMeth2Plot(Input, AnnotatedRes, PoreMeth2DMRResults, Meth1, Meth2)

Where:

Input Are the coordinates to plot (chr:start-end) or gene symbol.
AnnotatedRes Results from PoreMethAnnotate.
PoreMeth2DMRResults Results from PoreMeth2DMR.
Meth1 Results from ModkitResorter.pl.
Meth2 Results from ModkitResorter.pl.

By default the resulting panel shows 3 plots a) the shifting levels of the DMRs, the $\Delta \beta$ of the CpGs on which the DMRs are calculated, b) the shifting levels of the Entropy, the $\Delta S$ of the CpGs on which the DMRs are calculated, c) the following genomic regions: the gene features (promoters, exons, introns and direction), the CGIs, the DNAse sites, the enhancers and the TFBS Other the aesthetic parameters are available for a better readibility and understanding of the plots.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
AdditionalScripts		AdditionalScripts
R		R
data		data
man		man
src		src
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PoreMeth2

Installation

Install devtools from CRAN (Required)

install PoreMeth2

Features

Usage

1. Data Preparation

Modkit

Calculating Entropy from Methylation Calls

2. DMR identification

PoreMeth2DMR

PoreMeth1DMR

3. DMR interpretation

PoreMethAnnotate

4. Plots and statistics

PoreMeth2DMRStatistics

PoreMeth2SingleExpQualityPlot

PoreMeth2PairedExpQualityPlot

PoreMeth2Plot

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PoreMeth2

Installation

Install devtools from CRAN (Required)

install PoreMeth2

Features

Usage

1. Data Preparation

Modkit

Calculating Entropy from Methylation Calls

2. DMR identification

PoreMeth2DMR

PoreMeth1DMR

3. DMR interpretation

PoreMethAnnotate

4. Plots and statistics

PoreMeth2DMRStatistics

PoreMeth2SingleExpQualityPlot

PoreMeth2PairedExpQualityPlot

PoreMeth2Plot

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages