[Priority 1] Implement Context Evaluator with validation pipeline

## Problem
We have no systematic quality control for recall results. Hallucinations, incomplete answers, and low-confidence recalls go undetected. Human knowledge (corrections, domain expertise) can only be stored via manual `rlm remember` calls.

## Proposal (from AIGNE paper analysis)
Add Context Evaluator after every recall session that:
1. **Validates output** against source context
2. **Computes confidence score** based on:
   - Coverage: did we find entries for all query terms?
   - Coherence: do subagent findings agree or contradict?
   - Completeness: any obvious gaps?
3. **Triggers human review** when confidence <0.7
4. **Stores human corrections** as new memory entries tagged `human-verified`
5. **Logs validation outcomes** to `performance.jsonl`

## Implementation
1. Create `rlm/evaluator.py` with `validate_recall_results()`
2. Integrate into recall pipeline (after synthesis, before return)
3. Add user prompt for verification when confidence low
4. Store human annotations with provenance
5. Log all validation metrics

## Impact
- Quality control — catch/correct errors before they propagate
- Trust — users see confidence scores
- Self-improvement — validation failures inform learned patterns
- Human knowledge capture — corrections become first-class memories

## Effort
2-3 days

## Related
- Context Evaluator from 'Everything is Context' paper (arxiv 2512.05470)
- Self-improving strategies (learned_patterns.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Priority 1] Implement Context Evaluator with validation pipeline #59

Problem

Proposal (from AIGNE paper analysis)

Implementation

Impact

Effort

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Priority 1] Implement Context Evaluator with validation pipeline #59

Description

Problem

Proposal (from AIGNE paper analysis)

Implementation

Impact

Effort

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions