RepliCNN is a tool for predicting replication timing from GLOE-Seq, TrAEL-Seq, or OK-Seq data using convolutional neural networks.
We recommend installing RepliCNN using pip:
pip install 'replicnn @ git+https://github.com/zarnackgroup/replicnn.git@main'or
pip install 'replicnn @ git+ssh://git@github.com/zarnackgroup/replicnn.git@main'You can also use RepliCNN as a Docker/Singularity/Apptainer container. We provide pre-built containers as well as Dockerfiles and Singularity/Apptainer definition files. Ensure that you have Docker/Singularity/Apptainer available in your PATH.
# Using Docker
user@dev:/tmp$ docker run docker://ghcr.io/zarnackgroup/replicnn:0.1.0 --version
0.1.0
# Using Singularity
user@dev:/tmp$ singularity run docker://ghcr.io/zarnackgroup/replicnn:0.1.0 --version
0.1.0
# Using Apptainer
user@dev:/tmp$ apptainer run docker://ghcr.io/zarnackgroup/replicnn:0.1.0 --version
0.1.0The main way how to use RepliCNN is through its command line interface.
user@dev:/tmp$ replicnn --help
usage: replicnn [-h] [-v] {prepare,train,predict,rfd_oem,ori_ter} ...
RepliCNN - Replication timing prediction and analyses
positional arguments:
{prepare,train,predict,rfd_oem,ori_ter}
Commands
prepare Prepare data format for this tool.
train Train a model.
predict Predict timing for file.
rfd_oem Compute RFD or OEM tracks from Watson/Crick BigWig files.
ori_ter Detect replication origins (ORIs) and termination zones (TERMs) from RFD/OEM tracks.
options:
-h, --help show this help message and exit
-v, --version show program's version number and exitFor additional help and documentation, please check out replicnn --help or replicnn {prepare,train,predict,rfd_oem,ori_ter} --help or the corresponding publication.
Below you will find more detailled explanation of the subcommands, their arguments, how they function, and what they do.
replicnn prepare
replicnn prepare is used when you want to predict replication timing from 3' end sequencing data.Prepare takes the data in bigwig format, split up by forward and reverse strand. The forward/reverse bigwigs can be created from the bam-files using deeptools bamCoverage or a similar tool (we do not recommend binning here).
The binsize argument corresponds to the prediction resolution of RepliCNN. We recommend to adjust this based on the used organisms genome size. We recommend to use a resolution such that we end up with 10,000 to 300,000 bins. This corresponds to a binsize of 500 bp for yeast and 10 kb for human and mouse.
Chromosome sizes as they can be found in "https://hgdownload.cse.ucsc.edu/goldenpath/XXX/bigZips/XXX.chrom.sizes". The file should only include the chromosomes that should be used by the tool. Here is the point to adjust which chromosomes should be used for training and prediction.
The outpath should be a path to a file where the results should be written to.
The phasing parameter invert needs to be adjusted depending on the type of experiment used. The data needs to. be oriented such that in RFD tracks the sign switch from negative to positive corresponds to an ORI/IZ.
The timing file is in bedgraph format and corresponds to the gold standard timing that is used in RepliCNN during training. The data binsize does not need to directly correspond to the prepare binsize. Differences are interpolated. This parameter is optional.
user@dev:/tmp$ replicnn prepare --help
usage: replicnn prepare [-h] -fwd FORWARD -rev REVERSE -bs BINSIZE -cs CHROMSIZES -o OUTPATH [-t TIMING] [-i] [-nl]
RepliCNN prepare - Prepare a file in the SDF format for usage in the tool and user specific analyses.
options:
-h, --help show this help message and exit
-fwd, --forward FORWARD
Path to the forward bigWig file.
-rev, --reverse REVERSE
Path to the reverse bigWig file.
-bs, --binsize BINSIZE
Binsize to use.
-cs, --chromsizes CHROMSIZES
Path to a chromsizes file.
-o, --outpath OUTPATH
File where the output should be written to.
-t, --timing TIMING Path to a timing file.
-i, --invert Invert phasing of the track.
-nl, --nolog Disable logging.replicnn train
RepliCNN train is used to train a model for predicting replication timing.The input is one or multiple files from the prepare step.
The outpath gives a folder were the Keras model is saved to.
The GPU parameter enables model training on the GPU, if it is available. Availability is logged. GPU training greatly increases training speed and is highly recommended.
The windowsize parameter defines how many adjacent windows around the to-be-predicted bin are used as context window. Needs to be the same as for the prediction.
The epochs tel how many training rounds are done of the data. It is advisable to keep this parameter at its default 300.
The batchsize parameter tells how many records are used at once. The larger the GPUs mempry, the larger this parameter can be.
The NoEarlyStopping parameter disables early stopping during model training. EarlyStopping tries to prevent overtraining/overfittign of the model. It is highly advisable to keep early stopping enabled.
The validation split gives the amount of data in percent which is heldout during training to estimate model performance.
The learning rate passes the parameter to the neural networks optimiser.
The Crossvalidation parameter implements the Leave-One-Chromosome-Out Cross validation (LOCO-CV) as described in the publication.
user@dev:/tmp$ replicnn train --help
usage: replicnn train [-h] -i INPUT [INPUT ...] -o OUTPATH [-g] [-ws WINDOWSIZE] [-e EPOCHS] [-bs BATCHSIZE] [-nes] [-v VALIDATIONSPLIT] [-lr LEARNINGRATE] [-cv] [-nl]
RepliCNN train - Train a model using SDF-file(s). Model quality can be assessed using the -cv option performing a Leave-One-Chromosome-Out Cross-Validation.
options:
-h, --help show this help message and exit
-i, --input INPUT [INPUT ...]
Path(-s) to one/multiple sdf file(-s).
-o, --outpath OUTPATH
Folder where the model should be written to.
-g, --gpu Enables training on gpu. Defaults to False
-ws, --windowsize WINDOWSIZE
Window size for chunks. Defaults to 201.
-e, --epochs EPOCHS Number of epochs to train for. Defaults to 300.
-bs, --batchsize BATCHSIZE
Batch size. Defaults to 32.
-nes, --noearlystopping
Whether to inactivate early stopping during training. Defaults to False.
-v, --validationsplit VALIDATIONSPLIT
Percent of data used as validation. Defaults to 0.1.
-lr, --learningrate LEARNINGRATE
Learning rate for Adam optimizer. Defaults to 0.001.
-cv, --crossvalidate Leave-One-Chromosome-Out Cross-Validation on the given dataset. Only compatible with one SDF-file.
-nl, --nolog Disable logging.replicnn predict
Predict is used after train created a model. Predict does the prediction of replication timing.
Modelpath gives the path of the saved model.
Outpath specifies where to save the output.
GPU enables predicion on GPU. Highly recommended as it speeds up inference time strongly.
user@dev:/tmp$ replicnn predict --help
usage: replicnn predict [-h] -i INPUT -m MODELPATH [-o OUTPATH] [-g] [-nl]
RepliCNN predict - Predict timing for a SDF-file using a previously trained model.
options:
-h, --help show this help message and exit
-i, --input INPUT Path to one sdf-file.
-m, --modelpath MODELPATH
Path to a model file.
-o, --outpath OUTPATH
File where the output should be written to.
-g, --gpu Enables prediction on gpu. Defaults to False
-nl, --nolog Disable logging.replicnn oem_rfd
OEM_RFD is the utility to create replication fork directionality and origin efficiency metric tracks.
OEM_RFD takes the data in bigwig format, split up by forward and reverse strand. The forward/reverse bigwigs can be created from the bam-files using deeptools bamCoverage or a similar tool (we do not recommend binning here).
Chromosome sizes as they can be found in "https://hgdownload.cse.ucsc.edu/goldenpath/XXX/bigZips/XXX.chrom.sizes".
Output prefix gives the prefix that should be used for the output files.
Resolution gives the window size around that should be factored into the calculation of the respective track. For details please check the formulas in the publication. We generally recommend to use resolution in the order of 50000, 75000, 100000, and/or 150000 for human and mouse and 2500, 5000, 10000, 15000 for yeast. Smaller resolution provide a finer more detailled view of the replication landscape but are more prone to get biased by noise. Larger resolutions capture more general trends with less detailled views.
Stride defines the step size of the bigwig file. Stride 1 means that the tracks are calculated on a per nucleotide base. Larger strides make longer steps. This is a tradeoff between resolution and file size. We recommend strides of 1-100 for yeast and 10-1000 for human.
Track defines which track type should be generated.
Bedgraph defines that the output should be written into bedgraph format instead of bigwig.
NoNormDepth is a parameter to disable depth normalisation. Generally it is expected that the fwd and rev bigwigs have the same signal strength. If this is not the case, RepliCNN adjusts this. This behavior can be disabled.
The phasing parameter invert needs to be adjusted depending on the type of experiment used. The data needs to. be oriented such that in RFD tracks the sign switch from negative to positive corresponds to an ORI/IZ.
user@dev:/tmp$ replicnn rfd_oem --help
usage: replicnn rfd_oem [-h] -w WATSON -c CRICK -cs CHROMSIZES -o OUTPUT_PREFIX -res RESOLUTION -st STRIDE -t {rfd,oem} [-bg] [-nd] [-inv]
RepliCNN analyse - Compute replication fork directionality (RFD) or origin efficiency metric (OEM) from strand-specific BigWig files and write the results as BigWig or bedGraph.
options:
-h, --help show this help message and exit
-w, --watson WATSON Path to Watson strand BigWig file.
-c, --crick CRICK Path to Crick strand BigWig file.
-cs, --chromsizes CHROMSIZES
Path to chromosome sizes file.
-o, --output_prefix OUTPUT_PREFIX
Prefix for output file(s).
-res, --resolution RESOLUTION
Window size in bp.
-st, --stride STRIDE Stride (step size in bp).
-t, --track {rfd,oem}
Track to compute: 'rfd' or 'oem'.
-bg, --bedgraph Write output as bedGraph instead of BigWig.
-nd, --no_norm_depth Do not normalize depth balance.
-inv, --invert Swap Watson/Crick signals.replicnn ori_ter
ORI_TER is used to analyse origins of replication (ORIs), initiation zones (IZs), and termination zones (TERMs) from OEm and RFD tracks.
The input of this function are usually multiple RFD and OEM tracks from multiple resolutions. The advantage of multiple resolutions is that fine grained and coarser signatures can be found. This can be adjusted by supplying more higher or more lower resolution tracks.
Chromsizes expects a chromsizes file.
Output prefix gives the prefix of all output files.
Save intermediates save all intermediate files from this stepwise process.
ORI and TER threshold give a percentage of signal that is used for recentering the ORI/TER. E.g. 5% recenters the ORI to the 5% decrease of maximal peak signal. We recommend 0.05 for ORI and 0.15 for TERMs.
Window radius is the radius around the center of the called ORI/TERM candidate to look for a local extremum.
Max merge size gives the maximum of basepairs between candidate ORIs/TERMs so they are merged together.
N evidence gives the number of tracks the ORI/TERM needs to be identified in to be considered. This corresponds to the number of resolutions that it has to be found in.
Eval resolution is the OEM track resolution that should be used to give each ORI/TERM a score. The score is written as a vale up to 999 with higher values indicating a better OEm score.
Cutoff filters ORI/TERM candidates by a fixed treshold to exclude low quality candidates.
Smooth factor base give a smoothing parameter that can be used during the spline approximation of finding ORIs/TERMs to smooth out very small signal varieties.
user@dev:/tmp$ replicnn ori_ter --help
usage: replicnn ori_ter [-h] -i INPUT [INPUT ...] -cs CHROMSIZES -o OUTPUT_PREFIX [-si] [-nl] [--ori-threshold ORI_THRESHOLD] [--ter-threshold TER_THRESHOLD] [--window-radius WINDOW_RADIUS] [--max-merge-size MAX_MERGE_SIZE] [--n-evidence N_EVIDENCE] [--smooth-factor-base SMOOTH_FACTOR_BASE] [--cutoff CUTOFF] -er EVAL_RESOLUTION
RepliCNN ori_ter - Detect ORI and TER zones, timing transition regions, and constant timing regions based on RFD/OEM tracks.
options:
-h, --help show this help message and exit
-i, --input INPUT [INPUT ...]
Path(s) to RFD/OEM BigWig files.
-cs, --chromsizes CHROMSIZES
Path to chromosome sizes file.
-o, --output_prefix OUTPUT_PREFIX
Prefix for output file(s).
-si, --save_intermediates
Save intermediate candidate and filtering files.
-nl, --nolog Disable debug logging.
--ori-threshold ORI_THRESHOLD
Threshold for ORI recentering.
--ter-threshold TER_THRESHOLD
Threshold for TER recentering.
--window-radius WINDOW_RADIUS
Window radius (bp) for recentering around OEM extrema.
--max-merge-size MAX_MERGE_SIZE
Maximum size (bp) for merging candidate regions.
--n-evidence N_EVIDENCE
Minimum number of supporting evidences for a candidate.
--smooth-factor-base SMOOTH_FACTOR_BASE
Smoothing factor for raw candidate generation.
--cutoff CUTOFF Cutoff for filtering efficiency scores.
-er, --eval_resolution EVAL_RESOLUTION
OEM resolution used for recentering and scoring.Besides the usage as a command line tool, RepliCNN can also be imported into a python script or jupyter notebook. The results of the commandline tool and the imported version are equivalent.
user@dev:/tmp$ python -c "import replicnn; print(replicnn.__version__)"
0.1.0If you've found a bug, would like to suggest a new feature or you have any issues regarding RepliCNN installation, walkthrough, and output interpretation please open a new issue.
This works was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) via project ID 393547839 – SFB 1361, to K.Z., H.D.U., V.R. and M.C.C., via project ID 533767322 – EXC 3113/1, Cluster for Nucleic Acid Sciences and Technologies – NUCLEATE, to K.Z., and via project ID 529989072 – CA 198/20-1, to M.C.C. We gratefully acknowledge the IMB Genomics Core Facility and its NextSeq 2000 sequencer (funded by the DFG – INST 247/870-1 FUGG).
We would like to express our gratitude to the Genomics and Bioinformatics Core Facilities of the IMB gGmbH (Mainz, Germany) for their assistance in sequencing and data processing. We thank Nicolas Delhomme, Maximilian Reuter, Mario Keller and all members of the Zarnack group for helpful discussions.
If you use RepliCNN in your research, please cite this project like this:
RepliCNN: High-resolution inference of the DNA replication program from strand-specific 3′ DNA end sequencing Dominik Stroh, Nicola Zilio, Maruthi K. Pabba, Vassilis Roukos, M. Cristina Cardoso, Helle D. Ulrich, Kathi Zarnack bioRxiv 2026.03.12.710907; doi: https://doi.org/10.64898/2026.03.12.710907
BibTex:
@article {Stroh2026.03.12.710907,
author = {Stroh, Dominik and Zilio, Nicola and Pabba, Maruthi K. and Roukos, Vassilis and Cardoso, M. Cristina and Ulrich, Helle D. and Zarnack, Kathi},
title = {RepliCNN: High-resolution inference of the DNA replication program from strand-specific 3' DNA end sequencing},
elocation-id = {2026.03.12.710907},
year = {2026},
doi = {10.64898/2026.03.12.710907},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2026/03/14/2026.03.12.710907},
journal = {bioRxiv}
}