Problem
After a few hundred atoms, any knowledge base accumulates hygiene issues: near-duplicate atoms, isolated orphans, and silent contradictions. Currently, finding these requires either manual browsing (impractical past a few hundred atoms) or ad-hoc LLM prompts via chat (non-deterministic, hard to track what's been addressed).
Example use cases
Every quarter, I want to answer four questions:
- Duplicates : which atoms cover substantially the same ground and
should be merged?
- Orphans : which atoms have low semantic similarity to everything
else in the base (i.e. likely misfiled or obsolete)?
- Concept gaps : which concepts are referenced in many atoms but
lack a dedicated atom of their own?
- Contradictions : which atom pairs make opposing claims on the
same topic?
Today all four require ad-hoc LLM prompts with no persistence and no way to track resolution.
Proposal
Add a "Health" category to the REST API with deterministic endpoints:
GET /api/health/duplicates?threshold=0.85 — pairs of atoms with similarity above threshold
GET /api/health/orphans?max_similarity=0.3 — atoms whose highest similarity to any other atom is below threshold
GET /api/health/concept-gaps — LLM-backed, returns concepts referenced across N+ atoms without dedicated atoms
GET /api/health/contradictions — LLM-backed, returns atom pairs making opposing claims
Plus a "Health" view in the UI that runs these and lets the user act on results (merge, archive, create-atom-for-gap, tag-to-resolve).
Problem
After a few hundred atoms, any knowledge base accumulates hygiene issues: near-duplicate atoms, isolated orphans, and silent contradictions. Currently, finding these requires either manual browsing (impractical past a few hundred atoms) or ad-hoc LLM prompts via chat (non-deterministic, hard to track what's been addressed).
Example use cases
Every quarter, I want to answer four questions:
should be merged?
else in the base (i.e. likely misfiled or obsolete)?
lack a dedicated atom of their own?
same topic?
Today all four require ad-hoc LLM prompts with no persistence and no way to track resolution.
Proposal
Add a "Health" category to the REST API with deterministic endpoints:
GET /api/health/duplicates?threshold=0.85— pairs of atoms with similarity above thresholdGET /api/health/orphans?max_similarity=0.3— atoms whose highest similarity to any other atom is below thresholdGET /api/health/concept-gaps— LLM-backed, returns concepts referenced across N+ atoms without dedicated atomsGET /api/health/contradictions— LLM-backed, returns atom pairs making opposing claimsPlus a "Health" view in the UI that runs these and lets the user act on results (merge, archive, create-atom-for-gap, tag-to-resolve).