This repository contains the code and processed result tables used to generate the figures for the manuscript: “Disentangling RNA evolution and thermodynamics in genomic language models.”
It is intended to (1) reproduce the plotting/figure panels from precomputed result tables, and (2) provide single-sequence scripts to compute model-derived pairwise signals (CJ/REDIAL) and thermodynamic base-pairing probabilities (BPP) for custom inputs.
About: reproducibility code and processed tables for CJ-based pairwise dependency analyses and thermodynamic baselines in genomic language models.
Codes_for_figures/: Jupyter notebooks used to generate Figures 1–5.Processed_results/: processed CSV tables used by the figure notebooks (subfolders:6dataset/,RFAM/).Candidate_csv/: small metadata / example input tables used by notebooks.Single_sequence_python_script/: command-line scripts for single-sequence runs (CJ/BPP/REDIAL) and helper analyses.
- Create an environment with standard scientific Python packages (see
requirements_figures.txt). - Start Jupyter in this repo and run the notebooks in
Codes_for_figures/:Codes_for_figures/Figure1.ipynbCodes_for_figures/Figure2.ipynbCodes_for_figures/Figure3.ipynbCodes_for_figures/Figure4.ipynbCodes_for_figures/Figure5.ipynb
By default, the notebooks write figure files under Codes_for_figures/_outputs/ (created at runtime).
In addition, Single_sequence_python_script/generate_synthetic_rnas_for_fig5.ipynb generates structure-matched synthetic sequences used in Figure 5 analyses.
The scripts in Single_sequence_python_script/ take a single sequence (--seq) and dot-bracket structure (--dbn) and write matrices/plots/metrics into an output folder:
RNAfm_single_sequence.py: RNA-FM CJ + EternaFold BPP (requires RNA-FM checkpoint + Arnie/EternaFold).Evo2_single_sequence.py: Evo2 CJ + EternaFold BPP.gLM2_single_sequence.py: gLM2 CJ + EternaFold BPP.Vienna_single_sequence.py: ViennaRNA BPP.REDIAL_single_sequence.py: REDIAL contact maps.
These scripts are designed to be run either in your local environment (if dependencies are installed) or inside a container. See the docstrings at the top of each script for example commands.
If you have a sequence you are interested in, you can also try the RNA-FM-based CJ and mirror-test demo in Colab: CJ + Mirror Test (Colab)