Skip to content

Begin implementation of RDF ConfigDB#609

Merged
amrc-benmorrow merged 39 commits intomainfrom
bmz/metadb
Mar 23, 2026
Merged

Begin implementation of RDF ConfigDB#609
amrc-benmorrow merged 39 commits intomainfrom
bmz/metadb

Conversation

@amrc-benmorrow
Copy link
Copy Markdown
Contributor

Begin implementation of an RDF-backed replacement for the ConfigDB. This is called the Metadata Database, or MetaDB.

Currently this implements most of (the useful parts of) the ConfigDB HTTP API. It also implements v2/sparql and v2/rdf endpoints for SPARQL queries and Graph Store operations respectively. The v1 and other compat endpoints are not implemented, including in particular v1/search which has no functional replacement.

Not implemented yet:

  • Auth of any kind. All operations are permitted.
  • Dump loading.
  • Notify.
  • Useful mappings between RDF and JSON.

I am bypassing Fuseki and submitting queries directly to Jena. This will
be easier than unpicking all the extra stuff Jena installs, and means I
can use Jersey and handle SPARQL queries within the F+ service
framework. It may not be worth trying to map all possible SPARQL
interfaces; the specs allow rather a lot of options, most of which are
useless.

The service framework doesn't exist yet; I don't have code for auth or
anything. But as long as Jersey is viable for a Jena-backed ConfigDB I
can go from here.
This doesn't support Graph Store yet but I doubt that would be hard. We
also don't have any auth, that will need integrating later.
This is now a proper usable triplestore. We don't support named graphs
yet, as I can't see how to get RDFS across all graphs in a dataset.
Supply the API with a Dataset holding both direct and derived graphs,
under F+ IRIs. Also install the derived graph as the default graph. The
ConfigDB /direct/ endpoints will need access to the direct graph.
This just exposes GET /v2/object, for now.
For now we only support configs with JSON literal entries.
Ideally I don't want to support v1 on the RDF CDB.

Don't move v1/search over, I'm not sure it's even implemented in v2. In
any case an RDF CDB will need a different/better search interface.
This implements enough of the service spec to keep a ServiceClient happy.
Auth is still not implemented so although tokens are generated they are
not stored anywhere or verified.
This constructs the Dataset and knows about F+ conventions such as the
UUID properties.
In Java the class size is getting out of hand. Implement PUT for
configs; this doesn't support schema validation yet.
This still does not include search.
There seem to be a lot of poor options, and not many good options… this
at least looks OK. Long-term it may be better to define some simpler
JSON→RDF mapping language and use that for validation as well. JSON
schema is a lot more flexible than we need or want.
This lets us set environment variables.
We need additional metadata about a config entry, most importantly an
etag. This means an Application can't be simple relation from object to
value. We also haven't brought relations into the well-founded structure
yet, so it is simpler to make an Application a class of config entries.
Each entry then has a relation to the object it applies to, and is
classified within its Application.

Instead of including etags individually, start in a 4D direction by
defining Instants. In general each transaction which changes data will
define a single Instant and any new config entries created by the txn
will refer to that Instant (and share an etag). In the future this could
be straightforwardly extended to support states of config entries
recording the history over time.

Where config entries are derived from other information they will still
need to be treated as separate entities; a config entry is a document
describing the world at a certain time. Whether entries are created
eagerly whenever data changes or lazily as they are read I'm not sure.
Trying to look up entries in the new structure using Model methods is
going to be very difficult, and probably won't use Jena indexes to the
best advantage. Use local SPARQL queries instead.

Move config entry handling into a new model class.
They are stateless objects not really different from strings.
The handling isn't perfect, sometimes we get unnecessary warnings in the
logs. But it produces the correct HTTP responses.
Without this the inference does not see the updated statements. Updates
via the Dataset seem to be immune from this requirement.
* Define :pc to represent the F+ primary class. For now this infers
  rdf:type but that may need to change, especially if we want to
  preserve the existing behaviour.

* Create a method to create a new F+ object. This handles setting up the
  rank subclass relation where appropriate.

* Create a method to define an Instant dated now. This can be used to
  assign starting points to ConfigEntries.
We were accepting the default from the Fetch object, which was always
present, before the default from the ServiceClient. Check these in the
other order and deprecate the field on the Fetch object.
It's difficult to avoid duplication here; possibly (as in the JS) it
would be better to programmatically define these endpoints.

Remove :primary as a subPropertyOf rdf:type; this will not be helpful as
we need to have visibility of any direct membership in the direct graph.
The enforcement will have to happen elsewhere.
The JS ServiceClient code assumes all responses will be JSON.
- Remove use of rdfs:{domain,range}. We don't want inference of rdf:type,
  it's important we can track changes to the class structure from the
  direct graph.

- Include the Special UUIDs. These are needed for now, but it's not really
  clear how they fit into the well-founded structure.

- Update some ConfigEntries to start with ACS v4.0.0.
Create Util.single for when we expect a single result from an Iterator.
Perhaps I should just use vavr Iterators instead?

Remove use of FPObject within ConfigEntry. I'm less sure this is a useful
class, in practice I mostly just want the Resource.
This should now be approximately usable as a ConfigDB substitute,
albeit one which doesn't implement notify or any of the compat
endpoints.
@amrc-benmorrow amrc-benmorrow merged commit ccd211b into main Mar 23, 2026
1 check passed
@amrc-benmorrow amrc-benmorrow deleted the bmz/metadb branch March 23, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants