Feature request : Knowledge base health checks

Problem

After a few hundred atoms, any knowledge base accumulates hygiene issues: near-duplicate atoms, isolated orphans, and silent contradictions. Currently, finding these requires either manual browsing (impractical past a few hundred atoms) or ad-hoc LLM prompts via chat (non-deterministic, hard to track what's been addressed).

Example use cases

Every quarter, I want to answer four questions:

1. **Duplicates** : which atoms cover substantially the same ground and 
   should be merged?
2. **Orphans** : which atoms have low semantic similarity to everything 
   else in the base (i.e. likely misfiled or obsolete)?
3. **Concept gaps** : which concepts are referenced in many atoms but 
   lack a dedicated atom of their own?
4. **Contradictions** : which atom pairs make opposing claims on the 
   same topic?

Today all four require ad-hoc LLM prompts with no persistence and no way to track resolution.

Proposal

Add a "Health" category to the REST API with deterministic endpoints:

- `GET /api/health/duplicates?threshold=0.85` — pairs of atoms with similarity above threshold
- `GET /api/health/orphans?max_similarity=0.3` — atoms whose highest similarity to any other atom is below threshold
- `GET /api/health/concept-gaps` — LLM-backed, returns concepts referenced across N+ atoms without dedicated atoms
- `GET /api/health/contradictions` — LLM-backed, returns atom pairs making opposing claims

Plus a "Health" view in the UI that runs these and lets the user act on results (merge, archive, create-atom-for-gap, tag-to-resolve).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request : Knowledge base health checks #159

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature request : Knowledge base health checks #159

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions