Skip to content

feat: support omop emb 0.5.0#9

Draft
nicoloesch wants to merge 27 commits into
mainfrom
8-support-omop-emb-0.5.0
Draft

feat: support omop emb 0.5.0#9
nicoloesch wants to merge 27 commits into
mainfrom
8-support-omop-emb-0.5.0

Conversation

@nicoloesch
Copy link
Copy Markdown
Collaborator

Motivation

omop-emb introduces a new storage and DB concept with significant and breaking changes to support a local-first and backend-agnostic storage solution for the embeddings. To include these changes and fixes that come with a successful PR, we need to prepare omop-graph for the new incoming interfaces etc.

Closes #8

@gkennos gkennos self-requested a review May 12, 2026 23:53
@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

in README it mentions ClassIDEnum.HIERARCHICAL, but it should be ClassIDEnum.HIERARCHY

also stale refs: rank_paths, kg.find_shortest_paths, and kg.rank_paths

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

in paths.py:

reconstruct_paths makes all nodes standard=False --> path metadata is wrong

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

EdgeView.from_query() depends on positional column order rather than row names

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

if you add synonym (bool) field to LabelMatch and populate from KnowledgeGraph.concept_lookup(), then LabelMatchGroupView can faithfully return direct/synonym --> at the moment it's only returning exact/partial/fulltext/embedding

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

LabelMatchGroupView.from_matches assumes pre-sorting rather than explicit rule (or - stop using ms[0] as 'best' by definition)

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

in paths.py

should we rename 'num_hops' in find_standard_paths to max_standard_hops or something? this is very specifically only traversing standard-standard relationships

alternatively: could add in param allow_non_standard_intermediates: bool = False

reason is that for MCP exploration/view use case, users will likely expect broader graph navigation

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

find_standard_paths drops edges when multiple outgoing edges point to the same ID

this is an issue for concepts joined by >1 valid relationship

select * from (select concept_id_1, concept_id_2, count(distinct(relationship_id)) as c fr
om concept_relationship where concept_id_1 != concept_id_2 group by concept_id_1, concept_id_2) as d w
here c > 1 limit 10;
concept_id_1 | concept_id_2 | c
--------------+--------------+---
327 | 732315 | 2
8481 | 8600 | 2
8482 | 8569 | 2
8486 | 9222 | 2
8487 | 8545 | 2
8495 | 8548 | 2
8500 | 8609 | 2
8501 | 8611 | 2
8502 | 8610 | 2
8519 | 9540 | 2

  concept_views() returns one row per concept, while edges can contain multiple rows for the same object_id
  
  Build a lookup map by concept_id, then iterate all edges independently?

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

PathProfile.from_path() cannot handle valid empty-path case returned when source == target

Profiling or explaining a zero-hop path will fail at runtime instead of returning a sensible self-match profile

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

ranked_concepts = [
    _score_standard_concept(
        text=text,
        kg=kg,
        standard_concept=sc,
        num_ancestors=num_ancestors.get(sc.concept_id, 0),
        similarity_score=nearest_concept_matches_dict_for_single_query.get(sc.concept_id, None),
    )
    for sc in standard_concepts
]

_score_standard_concept just scores and does not rank --> confusing to downstream consumers

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

traverse()

  • During expansion, every discovered outgoing edge is appended to edges_out immediately.
  • Neighbor nodes are only added to visited later when popped from the queue.
  • If traversal terminates early because max_nodes is reached, some queued neighbor nodes may never become visited.
  • This breaks the usual expectation that a subgraph’s edge set is closed over its node set.

@gkennos
Copy link
Copy Markdown
Member

gkennos commented May 13, 2026

Scoring docs say relevance is composite, but _score_standard_concept() uses embedding similarity alone whenever it is available

I think this is intentional but perhaps having some kind of parameter to override this is worthwhile?

Comment thread src/omop_graph/graph/kg.py
Comment thread pyproject.toml
Comment thread src/omop_graph/cli.py
import pandas as pd
import os
from pathlib import Path

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires OMOP_VOCABULARY_DIR to be set even if you have existing DB and don't want to import new vocab

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires OMOP_CDM_DB_DRIVER etc... even if you have the full engine string

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the first comment: Yes as we store the new stuff there to have it all in one. But maybe there is a better way whether we don't need to store it on file or store it in a TempFile. Open for discussion.

Regarding 2: Has this been tested? Cause I have not experienced this but probably need to properly investigate this.

Comment thread src/omop_graph/cli.py
populate_test_data(session)

@app.command()
def relationship_classification(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if using postgres but not installing pgvector (valid, as "psycopg[binary]>=3.1.0" in core deps), the psycopg binary does not support current load functionality - needs psycopg2

uv run omop-graph relationship-classification --pred-class-dir ./docs
2026-05-13 10:23:04 | orm_loader.loaders.loading_helpers | ERROR | Error during bulk load via COPY: 'Cursor' object has no attribute 'copy_expert'
2026-05-13 10:23:04 | orm_loader.tables.loadable_table | WARNING | COPY failed for _staging_relationship_class: 'Cursor' object has no attribute 'copy_expert'
2026-05-13 10:23:04 | orm_loader.loaders.loading_helpers | ERROR | Error during bulk load via COPY: 'Cursor' object has no attribute 'copy_expert'
2026-05-13 10:23:04 | orm_loader.tables.loadable_table | WARNING | COPY failed for _staging_relationship_mapping: 'Cursor' object has no attribute 'copy_expert'

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires further investigation on my side as I did not yet encounter the error but maybe because I accidentally had the old psycopg installed.

Comment thread src/omop_graph/oaklib_interface/omop_implementation.py Outdated
Comment thread src/omop_graph/graph/queries.py Outdated
Comment thread src/omop_graph/graph/queries.py
Comment thread src/omop_graph/graph/queries.py
Comment thread src/omop_graph/graph/kg.py Outdated
Comment thread src/omop_graph/graph/kg.py Outdated
self.session_factory = sessionmaker(bind=self.cdm_engine, future=True)

# Populate the relationshipcache
with self.session_factory() as session:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either have a fallback on this compulsory import of relationship tables, or document clearly that the graph doesn't support concept-only usage (I think the latter but either way?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't really think why you'd want to use concept_view(), concept_lookup(), or concept_id_by_code() without the relationships functionality - this is too heavy for that I suspect?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to give a better error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support omop-emb v0.5.0

2 participants