-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
P3Research — medium-high complexityResearch — medium-high complexityresearchResearch-driven improvementResearch-driven improvement
Description
Source
Factory.ai (2025): Evaluating Context Compression for AI Agents
https://factory.ai/news/evaluating-compression
Summary
Replace opaque compression quality metrics (ROUGE/embedding similarity) with functional probes run after each compaction:
- Recall probes: did specific facts survive?
- Artifact probes: does the agent know which files/tools it used?
- Continuation probes: can it pick up mid-task?
- Decision probes: are past reasoning traces intact?
Agent's ability to correctly answer these probes is the quality signal.
Applicability to Zeph
Relevance: HIGH. Zeph's summarization quality is currently opaque. Zeph already has a compaction probe ([memory.compression.probe]), but the current probe uses generic LLM-generated questions. Structured probe categories (recall/artifact/continuation/decision) would surface silent information loss more reliably.
Implementation sketch
- Extend
CompactionProbeto generate probes per category (currently generates generic questions) - After compaction, run each category with different prompt templates
- Score by category; log per-category breakdown in debug dump
- Expose per-category scores in TUI metrics panel (issue feat: display filter metrics in TUI dashboard #448)
Complexity: LOW-MEDIUM
Probe prompts are simple; main work is categorizing probe generation and updating the scoring logic.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P3Research — medium-high complexityResearch — medium-high complexityresearchResearch-driven improvementResearch-driven improvement