Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
51e25e1
Variable renaming and new CLI cleanup
nicoloesch May 8, 2026
4a0c685
Initial support for omop-emb 0.5.0
nicoloesch May 11, 2026
43be1d3
Update omop-emb version, move .csv to config, update docker-compose
nicoloesch May 13, 2026
ea39029
Updated Docs
nicoloesch May 13, 2026
e038889
Fix positional column dependency for EdgeView
nicoloesch May 13, 2026
5a58aab
Clearer description of synonym and regular table, added attribute
nicoloesch May 13, 2026
a478e63
Fix LabelMatchGroupView sorting in from_matches
nicoloesch May 13, 2026
3dc8c54
Fix traverse to include only visited nodes
nicoloesch May 13, 2026
9abbf27
Fix deduplication issue in standard_paths
nicoloesch May 13, 2026
b303a72
Adapt docstring and signature of find_standard_paths to accurately re…
nicoloesch May 13, 2026
ced97eb
Fix for empty path steps to not crash
nicoloesch May 13, 2026
ba5e952
Adapt docstring of previous commit
nicoloesch May 13, 2026
0598f5a
Fix scoring and grounding description and variable names
nicoloesch May 13, 2026
7952358
Do not rely on standard=False for path reconstruction
nicoloesch May 13, 2026
47bd7aa
Rename the row elements of edges query to fall in line with EdgeView.…
nicoloesch May 13, 2026
2d09ca5
Include immediate parent/child in ancestor/descendant
nicoloesch May 13, 2026
ec20e88
Dont include self in count of ancestors and descendants
nicoloesch May 13, 2026
de8f8a4
Include class_id and sublcass_id in query
nicoloesch May 13, 2026
0778344
Correctly reference column name that was renamed for synonym
nicoloesch May 13, 2026
1509e7f
Correctly clear all LRU caches
nicoloesch May 13, 2026
399d595
Correctly have invert of relationships
nicoloesch May 13, 2026
6b6c6cf
Rectify the relationship cache warning in __init__
nicoloesch May 13, 2026
6ac3930
Correct CLI log message for relationship-classification
nicoloesch May 13, 2026
24ea78e
Yield from within session scope of entailed incoming relationships
nicoloesch May 13, 2026
4e753b8
Include oaklib as dependency
nicoloesch May 13, 2026
b5550ab
Correct CLI docs for env variables
nicoloesch May 13, 2026
3f507d2
Remove logging from tracked files
nicoloesch May 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,7 @@ wheels/
docs/backup/
docs/omop_relationships.csv
.vscode/
.env
.env
resources/
*.DS_Store
logging/
115 changes: 48 additions & 67 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,12 @@
# Architecture

This library provides a lightweight, query-time knowledge-graph layer over an OMOP vocabulary database, with explicit separation between:

* graph access (nodes, edges, predicates),
* graph algorithms (traversal, pathfinding),
* path scoring and explanation, and
* presentation / inspection utilities.

# omop-graph

**omop-graph** is a lightweight, opinionated knowledge-graph traversal and path-analysis library built on top of the OMOP vocabulary model.

It provides:
- a stable **KnowledgeGraph façade** over OMOP concepts and relationships
- flexible **graph traversal** (forward, backward, bidirectional)
- **path discovery and ranking** with transparent scoring
- **traceable explanations** of why one path is preferred over another
- **path discovery** with transparent scoring
- **traceable explanations** of traversal decisions
- multiple **rendering backends** (text, HTML, Mermaid)

The library is designed for:
Expand All @@ -31,105 +22,95 @@ The library is designed for:
pip install omop-graph
```

With embedding support (sqlite-vec backend, zero config):

```bash
pip install "omop-graph[emb]"
```

For larger deployments use `[pgvector]` or `[faiss-cpu]` instead (or in addition).
Full setup is covered in the [omop-emb documentation](https://australiancancerdatanetwork.github.io/omop-emb/).

---

## Core Concepts

### KnowledgeGraph

KnowledgeGraph is the main entry point. It wraps an existing SQLAlchemy session connected to an OMOP vocabulary schema. kg-core assumes OMOP semantics and tables.
`KnowledgeGraph` is the main entry point. It wraps a SQLAlchemy `Engine` connected to an OMOP vocabulary schema and provides a high-level Pythonic API over the relational tables.

```python
from sqlalchemy import create_engine
from omop_graph.graph.kg import KnowledgeGraph
```

### Nodes and Edges
engine = create_engine("postgresql://user:pass@localhost/omop")
kg = KnowledgeGraph(engine)

Nodes are OMOP Concepts; Edges are OMOP Concept_Relationships
# Lookup a concept by label
match_group = kg.label_lookup("Atrial Fibrillation", fuzzy=False)
concept = match_group.best_match
print(f"ID: {concept.concept_id}, Name: {concept.matched_label}")

Relationships are classified into semantic kinds:
# Traverse the hierarchy
parents = kg.parents(concept.concept_id)
```

### Nodes and Edges

* ONTOLOGICAL
* MAPPING
* ATTRIBUTE
* VERSIONING
* METADATA
Nodes are OMOP Concepts; Edges are OMOP Concept_Relationships.

This classification drives traversal and scoring.
Relationships are pre-classified into semantic kinds (`ClassIDEnum`):

### Traversal, Paths and Scoring
- `HIERARCHY` — parent/child ontological relationships
- `IDENTITY` — mapping to standard concepts
- `COMPOSITION` — part-of relationships
- `ASSOCIATION` — lateral clinical associations
- `ATTRIBUTE` — concept attribute relationships

You can:
This classification drives traversal filtering and scoring.

* expand neighbourhoods
* extract subgraphs
* trace traversal decisions
* control which relationship kinds are followed
* discover multiple candidate paths between concepts and rank them
* render simple HTML cards for easy interactive exploration
### Traversal and Paths

```python
from omop_graph.graph.paths import find_shortest_paths
from omop_graph.extensions.omop_alchemy import ClassIDEnum

ingredient = kg.concept_id_by_code("RxNorm", "6809") # Metformin
drug = kg.concept_id_by_code("RxNorm", "860975") # Metformin 500 MG Oral Tablet

kg.concept_view(drug) # ConceptView(id=40163924, RxNorm:860975, name='24 HR metformin hydrochloride 500 MG Extended Release Oral Tablet')
kg.concept_view(ingredient) # ConceptView(id=1503297, RxNorm:6809, name='metformin')
ingredient = kg.concept_id_by_code("RxNorm", "6809") # Metformin
drug = kg.concept_id_by_code("RxNorm", "860975") # Metformin 500 MG Oral Tablet

paths, trace = find_shortest_paths(
kg,
source=drug,
target=ingredient,
predicate_kinds={
ClassIDEnum.HIERARCHICAL,
ClassIDEnum.IDENTITY,
},
predicate_kinds=frozenset({ClassIDEnum.HIERARCHY, ClassIDEnum.IDENTITY}),
max_depth=6,
traced=True,
)

ranked = rank_paths(kg, paths)

```

###

```python
paths = kg.find_shortest_paths(
source=a,
target=b,
max_depth=6,
)
ranked = kg.rank_paths(paths)
```

### Rendering

Outputs can be rendered as:
Outputs can be rendered as plain text, HTML (Jupyter), or Mermaid diagrams. Rendering auto-detects the environment.

* plain text (CLI / logs)
* HTML (Jupyter)
* Mermaid diagrams

Rendering auto-detects the environment.

```python
```python
from IPython.display import HTML, display
from omop_graph.render import render_trace

display(HTML(render_trace(kg, trace)))
```

---

## Project Structure
```graphql

```
omop_graph/
├── graph/ # graph logic, traversal, paths, scoring
├── render/ # HTML / text / Mermaid renderers
├── reasoning/ # Ontology traversal methods for specific reasoner tasks
├────── resolvers/ # Resolve labels for exact / fuzzy / synonym matches - TODO: embedding matches
├────── phenotypes/ # Set operations to build efficient hierarchical groupings for reasoning
├── reasoning/ # ontology traversal methods for specific reasoner tasks
│ ├── resolvers/ # resolve labels via exact / fuzzy / full-text / synonym search
│ └── phenotypes/ # set operations for hierarchical groupings
├── oaklib_interface/ # OAK-compliant adapter
├── api.py # stable public API surface
└── db/ # session helpers

```
```
File renamed without changes.
File renamed without changes.
28 changes: 28 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
services:
omop-cdm-db:
image: postgres:16-alpine
restart: always
env_file: .env
environment:
- POSTGRES_USER=${OMOP_CDM_DB_USER:-omop}
- POSTGRES_PASSWORD=${OMOP_CDM_DB_PASSWORD:-omop}
- POSTGRES_DB=${OMOP_CDM_DB_NAME:-omop}
- PGDATA=/var/lib/postgresql/data/pgdata
volumes:
- db_data:/var/lib/postgresql/data
networks:
- omop-net
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${OMOP_CDM_DB_USER:-omop} -d ${OMOP_CDM_DB_NAME:-omop}"]
interval: 5s
timeout: 5s
retries: 5
ports:
- "5432:5432"

networks:
omop-net:
name: omop-net

volumes:
db_data:
6 changes: 3 additions & 3 deletions docs/graph/edges.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,16 @@ To allow reproduction and evaluation of this approach, we provide clear guidelin

??? "Expand to see the grouping classification of predicates"

{{ to_grouped_table('docs/predicate_classification.csv', [0, 1], [0, 1, 2, 3, 4], [0, 1],) }}
{{ to_grouped_table('config/predicate_classification.csv', [0, 1], [0, 1, 2, 3, 4], [0, 1],) }}

## Predicate Mappings
Following the predicate classification guidelines of the previous seciton, we calssified the following predicates into their respective classification groups.
Following the predicate classification guidelines of the previous section, we classified the following predicates into their respective classification groups.

!!! warning

This classification is currently still under development and most likely may change with increased feedback from clinicians. The respective interface to store these classifications in the OMOP CDM has been prepared and we are in talks to potentially include this classification eventually in the official OMOP CDM.

??? "Expand to see the classification of all edge connections"

{{ to_grouped_table('docs/predicate_mapping.csv', [0, 1], [0, 1, 2, 3], [0, 1], {"r_id": "relationship_id", "r_name": "relationship_name"}) }}
{{ to_grouped_table('config/predicate_mapping.csv', [0, 1], [0, 1, 2, 3], [0, 1], {"r_id": "relationship_id", "r_name": "relationship_name"}) }}

53 changes: 30 additions & 23 deletions docs/graph/kg.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,14 @@ While the OMOP CDM is stored in a Relational Database Management System (RDBMS),

### Basic Usage

The `KnowledgeGraph` can be used standalone after connecting to the OMOP CDM database on disk.
The `KnowledgeGraph` can be used standalone after connecting to the OMOP CDM database.

```python
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from omop_graph.graph.kg import KnowledgeGraph

# Setup your SQLAlchemy session
engine = create_engine("postgresql://user:pass@localhost/omop")
SessionLocal = sessionmaker(bind=engine)

# Initialize the Virtual Knowledge Graph
kg = KnowledgeGraph(SessionLocal)
kg = KnowledgeGraph(engine)

# Lookup a concept by its label
match_group = kg.label_lookup("Atrial Fibrillation", fuzzy=False)
Expand All @@ -59,41 +54,53 @@ print(f"Parent IDs: {parents}")
To enable semantic similarity and RAG-based retrieval, pass a `KnowledgeGraphEmbeddingConfiguration` when initialising the graph.
This requires the optional `omop-emb` package — see the [installation guide](../usage/installation.md#embedding-rag).

!!! info "omop-emb documentation"
`omop-emb` manages all embedding storage, backends, and retrieval. Full documentation — including backend setup, CLI reference, FAISS sidecar, and configuration — is available at [australiancancerdatanetwork.github.io/omop-emb](https://australiancancerdatanetwork.github.io/omop-emb/).

#### Read-only (pre-computed embeddings already in the DB)

Use this when embeddings have already been indexed and you only need retrieval:

```python
from sqlalchemy import create_engine
from omop_graph.graph.kg import KnowledgeGraph, KnowledgeGraphEmbeddingConfiguration
from omop_emb import BackendType, ProviderType
from omop_emb.config import BackendType, MetricType, ProviderType

engine = create_engine("postgresql://user:pass@localhost/omop")

emb_config = KnowledgeGraphEmbeddingConfiguration(
backend_type=BackendType.FAISS,
backend_type=BackendType.PGVECTOR, # or BackendType.SQLITEVEC
provider_type=ProviderType.OLLAMA,
canonical_model_name="text-embedding-3-small:0.6b",
base_storage_dir="/data/embeddings",
model_name="nomic-embed-text:v1.5", # must match the name used at ingestion time
metric_type=MetricType.COSINE,
)
kg = KnowledgeGraph(SessionLocal, emb_config=emb_config)
kg = KnowledgeGraph(engine, emb_config=emb_config)
```

The backend is resolved from `backend_type` or, as a fallback, from the `OMOP_EMB_BACKEND` environment variable.
See the [omop-emb configuration reference](https://australiancancerdatanetwork.github.io/omop-emb/usage/configuration/) for all connection variables.

#### Write-capable (generate and store embeddings at runtime)

Provide an `EmbeddingClient` to enable both reading and writing embeddings:
Provide an `EmbeddingClient` to enable both reading and writing embeddings. The `provider_type` and `model_name`
are derived automatically from the client:

```python
from omop_emb import EmbeddingClient
from omop_emb import BackendType, ProviderType
from omop_emb.config import BackendType, MetricType

client = EmbeddingClient(...) # configured for your provider
client = EmbeddingClient(
model="nomic-embed-text:v1.5",
api_base="http://ollama:11434/v1",
)

emb_config = KnowledgeGraphEmbeddingConfiguration(
backend_type=BackendType.FAISS,
base_storage_dir="/data/embeddings",
backend_type=BackendType.PGVECTOR,
metric_type=MetricType.COSINE,
client=client,
)
kg = KnowledgeGraph(SessionLocal, emb_config=emb_config)
kg = KnowledgeGraph(engine, emb_config=emb_config)
```
The `provider_type` will be automatically determined from the `client`.

#### Fallback embedding calculation

Expand All @@ -107,12 +114,12 @@ for any missing concepts on-the-fly during a similarity call.

```python
emb_config = KnowledgeGraphEmbeddingConfiguration(
backend_type="faiss",
base_storage_dir="/data/embeddings",
backend_type=BackendType.PGVECTOR,
metric_type=MetricType.COSINE,
client=client,
compute_missing_embeddings=True, # compute embeddings for concepts not yet in the store
compute_missing_embeddings=True,
)
kg = KnowledgeGraph(SessionLocal, emb_config=emb_config)
kg = KnowledgeGraph(engine, emb_config=emb_config)
```

| `compute_missing_embeddings` | `client` present | Behaviour when concepts are missing |
Expand Down
32 changes: 20 additions & 12 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,40 @@
# omop-graph

**omop-graph** is a lightweight virtual knowledge Graph (VKG) built on-top of the OMOP CDM.
It transforms the static OMOP vocabulary tables into a dynamic graph environment suitable for NLP grounding, clinical reasoning and other tasks that benefit from a knowledge graph.
**omop-graph** is a lightweight Virtual Knowledge Graph (VKG) built on top of the OMOP CDM.
It transforms the static OMOP vocabulary tables into a dynamic graph environment suitable for NLP grounding, clinical reasoning, and other tasks that benefit from a knowledge graph.

## Why omop-graph?

Unlike generic graph libraries, `omop-graph` is built specifically for clinical data:

- **Semantic Awareness**: Understands the difference between relationships.
- **Efficient Grounding**: Instead of traversing every possible path, the library uses a **Standard Anchor** approach: translating non-standard terms to standard concepts and leveraging the OMOP `concept_ancestor` table for high-speed hierarchy validation.
- **Transparent Scoring**: Decisions aren't black boxes. Every path is scored based on textual similarity, graph distance (parsimony), and clinical generality (broadness).
- **Pre-classification**: Relationships are already pre-classified into overarching groups, allowing quicker restrictions of connections and more efficient graph traversal.
- **Semantic Awareness**: Understands the difference between relationship kinds (hierarchy, identity, composition, association, attribute).
- **Efficient Grounding**: Instead of traversing every possible path, the library uses a **Standard Anchor** approach — translating non-standard terms to standard concepts and leveraging the OMOP `concept_ancestor` table for high-speed hierarchy validation.
- **Transparent Scoring**: Decisions aren't black boxes. Every candidate concept is scored based on textual similarity, graph distance (parsimony), and clinical generality (broadness).
- **Pre-classification**: Relationships are pre-classified into semantic groups, enabling quicker traversal restrictions and more targeted reasoning.

---

## Documentation Overview

### Core Components
- [KnowledgeGraph](graph/kg.md): The VKG interface and what it attempts to solve.
- [Relationships](graph/edges.md): Pre-classification of edges/relationships of the OMOP CDM.
- [Oaklib Interface](oaklib/interface.md): `oaklib`-compliant interface
- [KnowledgeGraph](graph/kg.md): The VKG interface — connecting to OMOP and traversing the graph.
- [Relationships](graph/edges.md): Pre-classification of OMOP edges into semantic kinds.
- [Oaklib Interface](oaklib/interface.md): OAK-compliant adapter for cross-ontology tooling.

### Reasoning
Explore the grounding pipeline used by clinical NLP tools.

- [Semantic grounding](reasoning/grounding.md): How regular search terms can be traced to a standard Ontology
- [Semantic Grounding](reasoning/grounding.md): Mapping free-text terms to standard OMOP concepts.
- [Resolver Pipelines](reasoning/resolvers.md): How candidate concepts are retrieved from the database.

### Embedding Support

!!! info "Powered by omop-emb"
Embedding-based similarity (vector search, RAG retrieval, on-the-fly embedding computation) is provided by the companion [`omop-emb`](https://australiancancerdatanetwork.github.io/omop-emb/) package.
Install it with `pip install "omop-graph[emb]"` and see [Knowledge Graph — Embedding Configuration](graph/kg.md#embedding-configuration) for integration details.

### Interactive Exploration
`omop-graph` includes built-in HTML renderers for Jupyter Notebooks, allowing you to visualize concepts and relationship summaries instantly.
`omop-graph` includes built-in HTML and Mermaid renderers for Jupyter Notebooks, allowing you to visualise concepts, traversal traces, and relationship summaries directly in a notebook.

### Testing
- [Testing](usage/testing.md): How test configuration works, what is covered, and how to set up environment variables for local test runs.
- [Testing](usage/testing.md): Test configuration, coverage, and how to set up environment variables for local runs.
Loading
Loading