A state-of-the-art, fully integrated, hierarchical multi-omics biological knowledge graph spanning 54 human organs.
Built with Python, Neo4j, and Streamlit, this architecture serves as a Clinical-Grade In-Silico Human Engine designed to map complex, multi-organ biological cascades, enabling precision medicine, drug discovery, and zero-shot systemic analysis.
The graph integrates massive, mathematically averaged biological ground truths from multiple global consortiums to map not just Organs, but the Systemic Communication flowing between them.
The ultimate integrated dimensions include:
- Transcriptomics & Secretomics (GTEx & HPA): Baseline tissue expression and protein secretion profiles.
- The Microbiome Axis (GMrepo): Mapping gut bacteria and the specialized metabolites they produce which circulate into the blood.
- Phenomics (DisGeNET): Diseases explicitly mapped to the genetic and metabolic pathways whose failures trigger them.
- Genomics (GWAS): 1,000+ SNPs and mutational variants mapped directly to their downstream expression failures.
- Single-Cell Resolution (HCA): High-definition cell types (e.g., Pancreatic Beta Cells) mapped within bulk tissues.
- Intracellular PPIs (STRING-DB): Receptor activation cascades triggering deep intracellular protein-protein interactions down to the nucleus.
- Pharmacology (DrugBank): Explicit FDA-approved drug mappings designed to disrupt or enhance multi-organ topological connections.
This repository automates the extraction, cleaning, mapping, and native visualization of the data through an 11-Phase pipeline structure housed in /scripts:
1_acquire/- Asynchronous fetching of synthetic multi-gigabyte flat files.2_clean/- Data harmonization and triplet generation.3_graph/- Bulk optimized ingest scripts utilizing CypherUNWINDbatches capable of mapping >1M topological edges.4_validate/- Graph traversal and literature validation.5_analysis/- Simulated Graph Neural Network (GNN) embeddings and Network Centrality calculations.6_pharmacology/- Drug repurposing target ingestion.7_dashboard/- Dynamic Web Framework built natively on Streamlit andstreamlit-agraph.8_ultimate_integration/- The capstone multi-dimensional integration.
- Neo4j Desktop / Server (running locally on port
7687) - Python 3.12+
git clone https://github.com/sandeepmanimala/knowledge-graph-organs.git
cd knowledge-graph-organsAll dependencies are strictly locked to prevent versioning conflicts.
python3 -m venv knowledge-graph-organs
source knowledge-graph-organs/bin/activate
pip install -r requirements.txtWith a blank Neo4j database running (password: admin123), execute the pipeline to construct the human topology.
# Instantiate the baseline multi-omics connections
python3 scripts/3_graph/live_instantiate_bulk.py
# Inject the Ultimate 5 Dimensions (Mutations, Microbiome, Phenomics, Single-Cell, PPIs)
python3 scripts/8_ultimate_integration/ingest_missing_five.py
# Inject Pharmacology Mappings
python3 scripts/6_pharmacology/ingest_drugs.pySpin up the interactive Streamlit and physics-simulated Agraph web application.
streamlit run scripts/7_dashboard/app.pyOpen your browser to http://localhost:8501.
The graph features an embedded agentic workflow (.agent/workflows/data_refresh.md) designed to autonomously scrape external APIs, process new biological knowledge, and safely deploy Cypher patches continuously without human intervention.
This open-source project is distributed under the MIT License.