MetaSage

A machine learning–based framework designed to systematically infer regulatory mechanisms underlying metabolic dysregulation in different conditions

Usage

3 python scripts are provided, corresponding to the 3 major steps of MetaSage:

Feature_generation.py

This script generates per-metabolite input files for downstream model training. For each target metabolite, the output file contains:

The abundance of the target metabolite
Multi-omics–derived features associated with that metabolite

The script requires the following input files:

gene_expression_file: Gene expression matrix derived from omics data (e.g., RNA-seq or proteomics).
- Rows: genes (gene symbols)
- Columns: samples (sample IDs in the first row)
metabolite_expression_file: Metabolite expression matrix from metabolomics data.
- Rows: metabolites (unified metabolite names)
- Columns: samples (sample IDs in the first row)
meta_gene_relation_file: A curated mapping file describing, for each target metabolite:
- Associated genes
- Upstream reactants
- These relationships are derived from known genome-scale metabolic models (GEMs) and filtered based on the study-specific multi-omics datasets.
ESTIMATE_score_results: A matrix containing 4 inferred tumor microenvironment scores generated by the ESTIMATE algorithm:
- Stromal score
- Immune score
- ESTIMATE score
- Tumor purity

Predictability_assessment.py

This script implements an XGBoost-based regression model to assess the predictability of each target metabolite.

Input: feature files generated in the Feature Generation step
Output: a .tsv file summarizing model perfromance, including the coefficients and p-values of Peasron correlation between the observed and predicted metabolite abundance from 5-fold cross-validation.

Regulator_prioritization.py

This script re-tain the model using the complete datasets for metabolites identified as predictable in the previous step. Feature importance is evaluated using Shapley values (SHAP), and features are ranked according to their average absolute SHAP values. The top-ranking features are considered the most influential regulators of the corresponding metabolite.

Input: feature files generated in the Feature Generation step
Output: a .tsv file summarizing the average absolute SHAP values of all features, and a visualization illustrating the SHAP values at the individual-sample level.

Citation

MetaSage: Machine Learning-Based Prioritization of Metabolic Regulators from Multi-Omics Data
Chenwei Wang, John M. Elizarraras, Bing Zhang

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Example_files		Example_files
Feature_generation.py		Feature_generation.py
Predictability_assessment.py		Predictability_assessment.py
README.md		README.md
Regulator_prioritization.py		Regulator_prioritization.py
Supporting_functions.py		Supporting_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MetaSage

Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MetaSage

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages