Skip to content

Unchain meta - External triplestore and related work#361

Draft
ggVGc wants to merge 3 commits into
masterfrom
valter/meta-unchained
Draft

Unchain meta - External triplestore and related work#361
ggVGc wants to merge 3 commits into
masterfrom
valter/meta-unchained

Conversation

@ggVGc
Copy link
Copy Markdown
Contributor

@ggVGc ggVGc commented May 4, 2026

This is the collection branch for the work of moving meta away from custom indexing and towards using an external triplestore, which initially will be Virtuoso.
All specific work will happen in other PRs which will be merged here, until we have a service that replicates current behaviour.

ggVGc added 2 commits May 5, 2026 14:16
This simply removes the magic index and all related code. It is the
starting point from which we can rework `meta` to work with an external
triplestore.
We could do it the other way around, but I believe this way is the
simplest.
Some cleanup to clarify the state of the repository at this time:
* Rename the `magic` package `enrichment` instead, since it now only
relates to enriching SPARQL queries, rather than any custom indexing
* Rename `CpNotifyingSail` to `EnrichingSail` since that describes the
purpose better
* Some more associated renaming and moving of things
@ggVGc ggVGc force-pushed the valter/meta-unchained branch from cfef09b to 396e0be Compare May 5, 2026 12:17
@ggVGc ggVGc changed the title Unchain meta Unchain meta (external triplestore and related work) May 5, 2026
@ggVGc ggVGc changed the title Unchain meta (external triplestore and related work) Unchain meta - External triplestore and related work May 5, 2026
Mostly works, but still improvements to be made.
TODO: Summarize changes and motivation.

**Contains two scripts in `tools/` for populating Virtuoso from
RDFLog:**
- One which exports `.ttl` files which could be bulk loaded directly by
Virtoso.
- Another which uses [Graph Store
protocol](https://www.w3.org/2009/sparql/docs/http-rdf-update/) to send
triples over HTTP.

Both of these require loading all triples from each graph in RDFLog into
memory first, in order to resolve all assertions and retractions before
outputting the final set of triples. This requires a lot of ram, around
10GB for me, but is still within reason. Using the normal RDF Update
protocol with Virtuoso works, but only for batches smaller than a few
hundred triples at a time, which is very slow for a full re-ingestion.
It is also wasteful since a lot of the assertions are later retracted.

**Issues:**
- [x] ~~Currently initial fresh ingestion from RDFLog into Virtuoso
fails with permission issues like: `Virtuoso 42000 Error SR186:SECURITY:
No permission to execute procedure DB.DBA.SPARQL_INSERT_DICT_CONTENT`
even when using `dba` user.~~
- This was solved by
https://github.com/ICOS-Carbon-Portal/meta/pull/364/changes#diff-e1f2b5f8f0029f60bb270bc106c7191c75eee54f8680f68d7c8324985d89e9cdR28-R31

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@ggVGc ggVGc requested a review from jonathanthiry May 13, 2026 13:59
@ggVGc
Copy link
Copy Markdown
Contributor Author

ggVGc commented May 13, 2026

@jonathanthiry This is still under active development, but at this stage it has the main changes of removing the magic index and transitioning to a new triplestore, and I think it would be good to take a quick overview look at least to see if I have made any major mistake.
The proper review should happen later on when everything is in place, but still good to get some other eyes on it a few times before then. Either way, the final testing, comparing to current production, should surface if there are any major inconsistencies at that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant