Version: 0.2-alpha (April 2026)
Author: Richard Hoyos-López
Affiliation: Faculty of Basic Sciences, Universidad de Córdoba, Montería, Colombia
ORCID: 0000-0003-1195-681X
Contact: richardhoyosl@correo.unicordoba.edu.co
Companion code for the manuscript:
Foundation model-based embeddings for host range prediction and zoonotic risk assessment of novel viruses in mosquito metagenomes: a dual-route conceptual framework.
Briefings in Bioinformatics (submitted, 2026).
| Module | Manuscript section | Description |
|---|---|---|
mvef/feature_extraction.py |
§2.3 | RSCU, ENC, GC3, GC%, CpG O/E, AT/GC skew, k-mer vectors (k=3,4,6) |
mvef/embeddings.py |
§2.4 | DNABERT-2 and Nucleotide Transformer v2 embeddings in inference mode |
mvef/vves.py |
§2.7 | Vector Viral Emergence Score — min-max normalisation, equal-weight and optimised configurations |
mvef/utils.py |
— | FASTA I/O, logging, checkpoint system |
test_data/test_contigs.fa |
— | 60 synthetic contigs with biologically grounded nucleotide composition (20 per host class) |
demo_notebook.ipynb |
— | End-to-end walkthrough executable on Google Colab (free tier) |
| Component | Reason | What is needed |
|---|---|---|
| Viral detection (§2.2) | Requires VirSorter2, DeepVirFinder, VIBRANT installed via bioconda | Linux HPC environment |
| Route A classifier (§2.5) | Requires VirusHostDB 2023-09 training set | ~2,284 labelled sequences |
| Route B k-NN transfer (§2.6) | Requires trained Route A embedding space | Depends on Route A |
| SRA download + assembly (§2.1) | Requires fasterq-dump, Bowtie2, MEGAHIT | 50–100 GB per accession |
Placeholders are clearly marked with # PLACEHOLDER comments and raise NotImplementedError if called directly. They are never called silently.
git clone https://github.com/RichardOnalbi/mvef-framework.git
cd mvef-framework
conda create -n mvef python=3.11
conda activate mvef
pip install -r requirements.txtpython mvef_pipeline.py --input test_data/test_contigs.fa --mode testpython mvef_pipeline.py --input your_contigs.fasta --mode embeddings --model dnabert2Open demo_notebook.ipynb directly in Colab:
mvef-framework/
├── README.md
├── requirements.txt
├── config.py
├── mvef_pipeline.py # Main orchestrator
├── mvef/
│ ├── __init__.py
│ ├── feature_extraction.py # §2.3 — genomic features
│ ├── embeddings.py # §2.4 — foundation model embeddings
│ ├── vves.py # §2.7 — VVES scoring
│ └── utils.py # I/O, logging, checkpoints
├── test_data/
│ └── test_contigs.fa # 60 synthetic contigs (20 per class)
├── results/ # Output directory (created at runtime)
└── demo_notebook.ipynb # Colab-ready demonstration
If you use this code, please cite:
Hoyos-López R. Foundation model-based embeddings for host range prediction
and zoonotic risk assessment of novel viruses in mosquito metagenomes:
a dual-route conceptual framework.
Briefings in Bioinformatics, 2026 (submitted).
MIT License. See LICENSE file.