Skip to content

RichardOnalbi/mvef-framework

Repository files navigation

Mosquito Virome Emergence Framework (MVEF)

Version: 0.2-alpha (April 2026)
Author: Richard Hoyos-López
Affiliation: Faculty of Basic Sciences, Universidad de Córdoba, Montería, Colombia
ORCID: 0000-0003-1195-681X
Contact: richardhoyosl@correo.unicordoba.edu.co

Companion code for the manuscript:

Foundation model-based embeddings for host range prediction and zoonotic risk assessment of novel viruses in mosquito metagenomes: a dual-route conceptual framework.
Briefings in Bioinformatics (submitted, 2026).


What this repository contains (and what it does not)

✅ Fully implemented and executable

Module Manuscript section Description
mvef/feature_extraction.py §2.3 RSCU, ENC, GC3, GC%, CpG O/E, AT/GC skew, k-mer vectors (k=3,4,6)
mvef/embeddings.py §2.4 DNABERT-2 and Nucleotide Transformer v2 embeddings in inference mode
mvef/vves.py §2.7 Vector Viral Emergence Score — min-max normalisation, equal-weight and optimised configurations
mvef/utils.py FASTA I/O, logging, checkpoint system
test_data/test_contigs.fa 60 synthetic contigs with biologically grounded nucleotide composition (20 per host class)
demo_notebook.ipynb End-to-end walkthrough executable on Google Colab (free tier)

⚠️ Declared placeholders (require external tools or training data)

Component Reason What is needed
Viral detection (§2.2) Requires VirSorter2, DeepVirFinder, VIBRANT installed via bioconda Linux HPC environment
Route A classifier (§2.5) Requires VirusHostDB 2023-09 training set ~2,284 labelled sequences
Route B k-NN transfer (§2.6) Requires trained Route A embedding space Depends on Route A
SRA download + assembly (§2.1) Requires fasterq-dump, Bowtie2, MEGAHIT 50–100 GB per accession

Placeholders are clearly marked with # PLACEHOLDER comments and raise NotImplementedError if called directly. They are never called silently.


Installation

git clone https://github.com/RichardOnalbi/mvef-framework.git
cd mvef-framework
conda create -n mvef python=3.11
conda activate mvef
pip install -r requirements.txt

Quick start — test mode (no GPU required, ~5 min)

python mvef_pipeline.py --input test_data/test_contigs.fa --mode test

Quick start — full embeddings mode (GPU recommended)

python mvef_pipeline.py --input your_contigs.fasta --mode embeddings --model dnabert2

Google Colab

Open demo_notebook.ipynb directly in Colab:
Open In Colab


Repository structure

mvef-framework/
├── README.md
├── requirements.txt
├── config.py
├── mvef_pipeline.py          # Main orchestrator
├── mvef/
│   ├── __init__.py
│   ├── feature_extraction.py # §2.3 — genomic features
│   ├── embeddings.py         # §2.4 — foundation model embeddings
│   ├── vves.py               # §2.7 — VVES scoring
│   └── utils.py              # I/O, logging, checkpoints
├── test_data/
│   └── test_contigs.fa       # 60 synthetic contigs (20 per class)
├── results/                  # Output directory (created at runtime)
└── demo_notebook.ipynb       # Colab-ready demonstration

Citation

If you use this code, please cite:

Hoyos-López R. Foundation model-based embeddings for host range prediction
and zoonotic risk assessment of novel viruses in mosquito metagenomes:
a dual-route conceptual framework.
Briefings in Bioinformatics, 2026 (submitted).

License

MIT License. See LICENSE file.

About

Mosquito Virome Emergence Framework — companion code for Hoyos-López (2026), Briefings in Bioinformatics

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors