Skip to content

Feature request : Knowledge base health checks #159

@Sainin701

Description

@Sainin701

Problem

After a few hundred atoms, any knowledge base accumulates hygiene issues: near-duplicate atoms, isolated orphans, and silent contradictions. Currently, finding these requires either manual browsing (impractical past a few hundred atoms) or ad-hoc LLM prompts via chat (non-deterministic, hard to track what's been addressed).

Example use cases

Every quarter, I want to answer four questions:

  1. Duplicates : which atoms cover substantially the same ground and
    should be merged?
  2. Orphans : which atoms have low semantic similarity to everything
    else in the base (i.e. likely misfiled or obsolete)?
  3. Concept gaps : which concepts are referenced in many atoms but
    lack a dedicated atom of their own?
  4. Contradictions : which atom pairs make opposing claims on the
    same topic?

Today all four require ad-hoc LLM prompts with no persistence and no way to track resolution.

Proposal

Add a "Health" category to the REST API with deterministic endpoints:

  • GET /api/health/duplicates?threshold=0.85 — pairs of atoms with similarity above threshold
  • GET /api/health/orphans?max_similarity=0.3 — atoms whose highest similarity to any other atom is below threshold
  • GET /api/health/concept-gaps — LLM-backed, returns concepts referenced across N+ atoms without dedicated atoms
  • GET /api/health/contradictions — LLM-backed, returns atom pairs making opposing claims

Plus a "Health" view in the UI that runs these and lets the user act on results (merge, archive, create-atom-for-gap, tag-to-resolve).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions