fix(maintenance): write knowledge_type as NamedNode URI (not Literal)#84
Merged
Conversation
The 2026-05-25 maintenance run on prod 0.1.112 normalized 62,176
``ks:knowledgeType`` annotations BUT wrote them as
``Literal("http://knowledge.local/schema/fact")`` rather than the
canonical ``NamedNode("<...schema/fact>")`` shape produced by the
ingestion path (``stores/triples.py:149``).
Effect: ``/api/admin/stats/types`` started bucketing the same logical
type into two groups (URI form: ~80k rows, Literal form: ~62k rows),
total still 141,946 — no data lost, but the bifurcation that
maintenance was meant to fix moved sideways from CASING to TERM-TYPE.
Fix: ``_canonical_knowledge_type_uri`` strips the ``KS`` prefix if
present, lowercases the suffix, applies the ``Relation→relationship``
alias, and rebuilds as a ``NamedNode(KS + suffix)``. Handles three
input shapes:
- NamedNode mixed-case (``ks:Fact``) — the original bug
- Literal with full URI (the 62k bad rows we just created)
- Literal with bare name (``"Fact"``) — defensive against future drift
Next maintenance sweep on prod will repair the bad rows. Test
coverage extended for all three shapes (`tests/test_maintenance_normalizer.py`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7d2e11e to
770f678
Compare
3 tasks
arshadansari27
added a commit
that referenced
this pull request
May 25, 2026
…ing, maintenance endpoint (#85) PRs #83 (data-quality fixes + periodic janitor) and #84 (NamedNode-shape hotfix) changed three things the docs hadn't caught up to: 1. **`KS_GRAPH_FEDERATED` is gone.** Removed from README's named-graph list, the architecture doc's graph table, and the CLAUDE.md trust_tier description. The graph was empty in production for as long as the service has been deployed; the producer column was dropped in migration 017 (PR #82). What remains is 4 graphs: ontology / asserted / extracted / inferred. 2. **`knowledge_type` is no longer free-form.** `_normalise_knowledge_type` in `models.py` now lowercases at validation and collapses `Relation→ relationship`. Updated the Status table in README, the Knowledge Types Reference in API.md, and every code example in both docs to show the lowercase canonical form (`claim`, `fact`, `event`, `entity`, `relationship`, `temporalfact`). Capitalised input is still accepted on the wire — the prose in API.md says so explicitly. 3. **`POST /api/admin/maintenance/run` exists.** Added the endpoint to API.md (full section with request/response/curl), to the admin row in the endpoints table, and a Maintenance Service section to CLAUDE.md describing `normalize_knowledge_types` / `normalize_spacy_rdf_types`, the lifespan wiring, and failure policy. Added the `MAINTENANCE_INTERVAL_SECONDS` and `MAINTENANCE_INITIAL_DELAY_SECONDS` env vars to API.md's Configuration table and to `docs/deployment.md`. Also documented: - The contradictions endpoint's new same-chunk + identical-object filters (README + CLAUDE.md). - The control-char sanitisation on subject/predicate/object that prevents `Invalid IRI code point '\n'` job failures (CLAUDE.md Models section). - The NER fallback's URL-skip + numeric-label drop + schema.org canonical remap (CLAUDE.md NLP Phase section). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hotfix on top of #83
The 2026-05-25 maintenance run on prod 0.1.112 normalized 62,176
ks:knowledgeTypeannotations BUT wrote them asLiteral(\"http://knowledge.local/schema/fact\")rather than the canonicalNamedNode(<.../schema/fact>)shape that the ingestion path produces (stores/triples.py:149).Effect:
/api/admin/stats/typesnow buckets the same logical type into two groups (URI form: ~80k rows, Literal form: ~62k rows). Total still 141,946 — no data lost — but the bifurcation maintenance was meant to fix moved sideways from CASING to TERM-TYPE.Fix
_canonical_knowledge_type_uristrips theKSprefix if present, lowercases the suffix, applies theRelation→relationshipalias, and rebuilds asNamedNode(KS + suffix). Handles three input shapes:ks:Fact) — the original bifurcation\"Fact\") — defensiveAfter deploy, the next maintenance sweep on prod will repair the bad rows.
Tests
test_uppercase_uri_lowered— original casing pathtest_already_lowercase_uri_is_no_op— idempotencytest_relation_alias_to_relationship— alias collapsetest_literal_with_full_uri_is_repaired— new: corrective pass for the 62k bad rowstest_literal_bare_name_normalised_to_uri— defensivetest_idempotent— convergence701 tests pass, ruff clean.
🤖 Generated with Claude Code