-
Notifications
You must be signed in to change notification settings - Fork 26
Distiller
Distiller is one of four AI-powered tools in Alfred that extracts latent knowledge from operational records and transforms it into a structured evidence graph. It reads conversations, session logs, project notes, and other vault content to identify and capture hidden insights that would otherwise remain buried in narrative text.
Distiller operates on the principle that valuable knowledge often exists implicitly in operational records. Team assumptions, critical decisions, resource constraints, and contradictory information are frequently mentioned in passing but never formalized. Distiller surfaces these insights and creates dedicated learning records that form a queryable knowledge graph.
- Scans vault records for signals of latent knowledge
- Extracts assumptions, decisions, constraints, contradictions, and syntheses
- Creates structured learning records with evidence links back to sources
- Performs cross-learning meta-analysis to detect patterns and conflicts
- Builds an evidence graph connecting learnings to their source records
Pass A: Per-Source Extraction Processes individual source records to extract learnings embedded in their content.
Pass B: Cross-Learning Meta-Analysis Analyzes the complete learning graph to identify higher-order patterns, contradictions between decisions, shared assumptions across projects, and opportunities for synthesis.
Distiller identifies five types of latent knowledge:
| Type | What it Captures | Example |
|---|---|---|
| assumption | Beliefs the team operates on without explicit validation | "Timber prices will stay stable through Q2" |
| decision | Choices made with rationale and context | "Use REST over GraphQL for Acme API due to team familiarity" |
| constraint | Hard limits or boundaries identified during work | "Budget capped at $50k for Phase 1" |
| contradiction | Conflicting information across different sources | "Decision A recommends microservices but Decision B advocates for monolith" |
| synthesis | Patterns and connections across multiple observations | "Three separate projects converge on event-driven architecture" |
Each learning record includes:
- Confidence level (high/medium/low based on signal explicitness)
- Status (active, superseded, invalidated)
- Claim statement
- Evidence excerpt from source
- Links to source records
- Links to related entities (projects, people, organizations)
Pass A consists of four stages that transform raw vault content into structured learning records.
Scans vault records for keyword signals indicating latent knowledge. Uses pattern matching to detect:
- Decision signals: "decided", "chose", "selected", "going with"
- Assumption signals: "assuming", "expect", "believe", "probably"
- Constraint signals: "limited to", "must", "cannot", "blocked by"
- Contradiction signals: "but", "however", "although", "conflict"
- Synthesis signals: "pattern", "trend", "consistently", "across"
Scores each candidate by signal density and recency. Only candidates exceeding min_signal_score are processed.
For each candidate source record:
- LLM analyzes full content with context about learning types
- Writes JSON manifest of discovered learnings to temp file
- Each learning includes type, title, confidence, status, claim, evidence_excerpt, source_links, entity_links
- 3-attempt retry logic handles manifest parsing failures
Confidence and status are calibrated by signal type:
- Explicit statements ("We decided to...") → high confidence, active status
- Implied or inferred learnings → low confidence, tentative status
After extraction across all candidates:
- Fuzzy title matching identifies duplicate learnings
- Merges duplicates, preserving all source links
- Tracks which sources contributed to each learning
- Reports candidate count, merged count, and final deduplicated count
For each deduplicated learning:
- Generates well-formed Markdown with YAML frontmatter
- Creates record via
alfred vault createcommand - Includes proper source links, entity links, and evidence sections
- Follows vault schema conventions for learning types
Pass B analyzes the complete learning graph to discover higher-order insights.
Contradiction Detection Scans decisions and assumptions for conflicting claims. Creates contradiction records linking the conflicting learnings with analysis of the tension.
Shared Assumption Analysis Identifies assumptions referenced across multiple projects or teams. Surfaces implicit dependencies and coordination risks.
Pattern Synthesis Uses semantic clustering to group related learnings. Creates synthesis records that articulate patterns emerging across the evidence graph.
Temporal Analysis Tracks how decisions evolve over time. Identifies superseded decisions and validates whether assumptions held true.
Pass B uses semantic embeddings to cluster learnings by conceptual similarity rather than keyword matching. This reveals non-obvious connections between learnings from different domains.
Distiller is configured in the distiller section of config.yaml:
distiller:
enabled: true
interval: 300 # Light scan interval (seconds)
deep_interval_hours: 24 # Deep extraction interval (hours)
min_signal_score: 3 # Minimum score for candidate processing
batch_size: 10 # Max candidates per extraction run
pass_b_enabled: true # Enable meta-analysisDistiller uses the same agent backend configuration as other Alfred tools (agent.backend in config.yaml). Supports Claude Code, Zo Computer (HTTP), and OpenClaw backends.
alfred distiller scanPerforms keyword-based scanning to identify records containing extraction signals. Reports candidate count and score distribution without performing extraction.
alfred distiller runExecutes full extraction pipeline:
- Scans for candidates
- Extracts learnings from candidates
- Deduplicates and merges
- Creates vault records
- Optionally runs Pass B meta-analysis
alfred distiller watchRuns periodic extraction in foreground:
- Light scans every
intervalseconds - Deep extraction every
deep_interval_hourshours - Continues until interrupted
alfred up --only distillerStarts Distiller as a background daemon with auto-restart. Use alfred down to stop.
alfred statusShows Distiller daemon status, last extraction time, and learning record counts.
Distiller maintains state in data/distiller_state.json:
{
"processed_sources": {
"conversation/weekly-sync-2024-01-15": "abc123hash",
"session/project-kickoff": "def456hash"
},
"last_scan": "2024-01-20T10:30:00Z",
"last_deep_run": "2024-01-20T08:00:00Z",
"extraction_history": [...]
}Source records are tracked by content hash. When a source is modified, it becomes eligible for re-extraction.
Distiller operates under the distiller scope defined in vault/scope.py:
Allowed Operations:
- Create learning records (assumption, decision, constraint, contradiction, synthesis)
- Read any vault record for context
- Edit existing learning records to add sources or update status
Prohibited Operations:
- Create non-learning records
- Delete any records
- Move or rename records
This scope ensures Distiller can build the learning graph without affecting operational records.
# Scan vault for extraction candidates
alfred distiller scan
# Output:
# Found 42 candidates across 120 vault records
# Top candidates:
# - conversation/architecture-debate (score: 8.5)
# - session/budget-planning (score: 7.2)
# - project/acme-api-design (score: 6.8)
# Run extraction
alfred distiller run
# Output:
# Stage 1: Extracted 23 learnings from 15 sources
# Stage 2: Merged 5 duplicates → 18 unique learnings
# Stage 3: Created 18 learning records
# Pass B: Identified 2 contradictions, created 1 synthesis# Start as background daemon
alfred up --only distiller
# Check status
alfred status
# Output:
# Distiller: running (PID 12345)
# Last scan: 2 minutes ago
# Last deep extraction: 6 hours ago
# Learning records: 127 total (45 decisions, 38 assumptions, ...)Curator creates operational records that become extraction sources for Distiller. As new conversations, sessions, and observations flow into the vault, Distiller automatically processes them for latent knowledge.
Janitor ensures learning records maintain proper links and frontmatter. If source records are moved or renamed, Janitor updates the references in learning records.
Surveyor's semantic clustering complements Distiller's Pass B meta-analysis. Surveyor can identify conceptually similar learnings across the vault and suggest relationship links that Distiller can analyze for contradictions or synthesis opportunities.
Configure min_signal_score based on vault size and signal quality:
- Small vaults (< 500 records): score 2-3 catches most candidates
- Large vaults (> 1000 records): score 4-5 focuses on high-confidence signals
- Noisy vaults: score 6+ for precision over recall
Balance extraction frequency against vault activity:
- High-activity vaults:
interval: 300(5 minutes),deep_interval_hours: 12 - Low-activity vaults:
interval: 1800(30 minutes),deep_interval_hours: 48 - Ad-hoc extraction: Disable daemon, run
alfred distiller runmanually
Distiller works best on narrative content with explicit reasoning:
- Meeting notes with decision rationale
- Project retrospectives
- Architecture discussions
- Planning documents with constraints
Short, factual records (contacts, tasks) typically yield few learnings.
Review and refine extracted learnings periodically:
- Update status field when assumptions are validated or invalidated
- Link related learnings to build evidence chains
- Add entity links to connect learnings to relevant projects/people
- Mark superseded decisions to maintain decision history
Symptom: alfred distiller scan reports 0 candidates
Solutions:
- Lower
min_signal_scorethreshold - Check that vault contains narrative content (not just structured entities)
- Review
data/distiller_state.json— already-processed sources won't re-appear - Manually trigger re-extraction by removing entries from
processed_sources
Symptom: Stage 1 or Stage 3 consistently fails
Solutions:
- Check
data/distiller.logfor LLM errors - Verify agent backend is configured correctly
- Reduce
batch_sizeto avoid rate limits - Check that vault
CLAUDE.mdis in agent workspace (OpenClaw backend)
Symptom: Similar learnings created with slightly different titles
Solutions:
- Stage 2 dedup uses fuzzy matching — very similar titles should merge
- Review merge threshold in code if needed
- Manually merge duplicates in vault and link to all sources
Symptom: Extraction takes too long or times out
Solutions:
- Reduce
batch_sizeto process fewer candidates per run - Increase
intervalto run less frequently - Use faster backend (OpenClaw is typically faster than HTTP for serial processing)
- Consider extracting from specific sources manually rather than full scans
Distiller uses Alfred's agent-writes-directly pattern: the LLM agent receives vault context and creates learning records via alfred vault create commands. Changes are tracked through the mutation log (vault/mutation_log.py).
Distiller works with all three agent backends (Claude Code, Zo Computer, OpenClaw). The prompt builder (backends/__init__.py) handles backend-specific formatting, but the extraction pipeline is backend-agnostic.
State files are bookkeeping only — the vault is the source of truth. You can safely delete data/distiller_state.json to force re-processing of all sources.
See Also:
- Curator — Processes inbox inputs into vault records
- Janitor — Maintains vault structural integrity
- Surveyor — Semantic clustering and relationship discovery
- Vault Schema — Complete record type reference
Getting Started
Architecture
Workers
Reference