Skip to content

fix(maintenance): write knowledge_type as NamedNode URI (not Literal)#84

Merged
arshadansari27 merged 1 commit into
mainfrom
hotfix-maintenance-shape
May 25, 2026
Merged

fix(maintenance): write knowledge_type as NamedNode URI (not Literal)#84
arshadansari27 merged 1 commit into
mainfrom
hotfix-maintenance-shape

Conversation

@arshadansari27
Copy link
Copy Markdown
Owner

Hotfix on top of #83

The 2026-05-25 maintenance run on prod 0.1.112 normalized 62,176 ks:knowledgeType annotations BUT wrote them as Literal(\"http://knowledge.local/schema/fact\") rather than the canonical NamedNode(<.../schema/fact>) shape that the ingestion path produces (stores/triples.py:149).

Effect: /api/admin/stats/types now buckets the same logical type into two groups (URI form: ~80k rows, Literal form: ~62k rows). Total still 141,946 — no data lost — but the bifurcation maintenance was meant to fix moved sideways from CASING to TERM-TYPE.

# Verify on prod:
SELECT (COUNT(*) AS ?c) WHERE { ?b <ks:knowledgeType> ?v } FILTER(isLiteral(?v))
→ 62176   (the bad rows)
SELECT (COUNT(*) AS ?c) WHERE { ?b <ks:knowledgeType> ?v } FILTER(isIRI(?v))
→ 79770   (good URIs, original + already-lowercase)

Fix

_canonical_knowledge_type_uri strips the KS prefix if present, lowercases the suffix, applies the Relation→relationship alias, and rebuilds as NamedNode(KS + suffix). Handles three input shapes:

  • NamedNode mixed-case (ks:Fact) — the original bifurcation
  • Literal with full URI (the 62k bad rows from the broken run)
  • Literal with bare name (\"Fact\") — defensive

After deploy, the next maintenance sweep on prod will repair the bad rows.

Tests

  • test_uppercase_uri_lowered — original casing path
  • test_already_lowercase_uri_is_no_op — idempotency
  • test_relation_alias_to_relationship — alias collapse
  • test_literal_with_full_uri_is_repaired — new: corrective pass for the 62k bad rows
  • test_literal_bare_name_normalised_to_uri — defensive
  • test_idempotent — convergence

701 tests pass, ruff clean.

🤖 Generated with Claude Code

The 2026-05-25 maintenance run on prod 0.1.112 normalized 62,176
``ks:knowledgeType`` annotations BUT wrote them as
``Literal("http://knowledge.local/schema/fact")`` rather than the
canonical ``NamedNode("<...schema/fact>")`` shape produced by the
ingestion path (``stores/triples.py:149``).

Effect: ``/api/admin/stats/types`` started bucketing the same logical
type into two groups (URI form: ~80k rows, Literal form: ~62k rows),
total still 141,946 — no data lost, but the bifurcation that
maintenance was meant to fix moved sideways from CASING to TERM-TYPE.

Fix: ``_canonical_knowledge_type_uri`` strips the ``KS`` prefix if
present, lowercases the suffix, applies the ``Relation→relationship``
alias, and rebuilds as a ``NamedNode(KS + suffix)``. Handles three
input shapes:
- NamedNode mixed-case (``ks:Fact``) — the original bug
- Literal with full URI (the 62k bad rows we just created)
- Literal with bare name (``"Fact"``) — defensive against future drift

Next maintenance sweep on prod will repair the bad rows. Test
coverage extended for all three shapes (`tests/test_maintenance_normalizer.py`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arshadansari27 arshadansari27 force-pushed the hotfix-maintenance-shape branch from 7d2e11e to 770f678 Compare May 25, 2026 22:01
@arshadansari27 arshadansari27 merged commit ed3859f into main May 25, 2026
5 checks passed
arshadansari27 added a commit that referenced this pull request May 25, 2026
…ing, maintenance endpoint (#85)

PRs #83 (data-quality fixes + periodic janitor) and #84 (NamedNode-shape
hotfix) changed three things the docs hadn't caught up to:

1. **`KS_GRAPH_FEDERATED` is gone.** Removed from README's named-graph
   list, the architecture doc's graph table, and the CLAUDE.md trust_tier
   description. The graph was empty in production for as long as the
   service has been deployed; the producer column was dropped in
   migration 017 (PR #82). What remains is 4 graphs: ontology / asserted
   / extracted / inferred.

2. **`knowledge_type` is no longer free-form.** `_normalise_knowledge_type`
   in `models.py` now lowercases at validation and collapses `Relation→
   relationship`. Updated the Status table in README, the Knowledge Types
   Reference in API.md, and every code example in both docs to show the
   lowercase canonical form (`claim`, `fact`, `event`, `entity`,
   `relationship`, `temporalfact`). Capitalised input is still accepted on
   the wire — the prose in API.md says so explicitly.

3. **`POST /api/admin/maintenance/run` exists.** Added the endpoint to
   API.md (full section with request/response/curl), to the admin row in
   the endpoints table, and a Maintenance Service section to CLAUDE.md
   describing `normalize_knowledge_types` / `normalize_spacy_rdf_types`,
   the lifespan wiring, and failure policy. Added the
   `MAINTENANCE_INTERVAL_SECONDS` and `MAINTENANCE_INITIAL_DELAY_SECONDS`
   env vars to API.md's Configuration table and to `docs/deployment.md`.

Also documented:
- The contradictions endpoint's new same-chunk + identical-object filters
  (README + CLAUDE.md).
- The control-char sanitisation on subject/predicate/object that
  prevents `Invalid IRI code point '\n'` job failures (CLAUDE.md Models
  section).
- The NER fallback's URL-skip + numeric-label drop + schema.org canonical
  remap (CLAUDE.md NLP Phase section).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant