Distiller

Distiller is one of four AI-powered tools in Alfred that extracts latent knowledge from operational records and transforms it into a structured evidence graph. It reads conversations, session logs, project notes, and other vault content to identify and capture hidden insights that would otherwise remain buried in narrative text.

Overview

Distiller operates on the principle that valuable knowledge often exists implicitly in operational records. Team assumptions, critical decisions, resource constraints, and contradictory information are frequently mentioned in passing but never formalized. Distiller surfaces these insights and creates dedicated learning records that form a queryable knowledge graph.

What Distiller Does

Scans vault records for signals of latent knowledge
Extracts assumptions, decisions, constraints, contradictions, and syntheses
Creates structured learning records with evidence links back to sources
Performs cross-learning meta-analysis to detect patterns and conflicts
Builds an evidence graph connecting learnings to their source records

Two-Pass Pipeline

Pass A: Per-Source Extraction Processes individual source records to extract learnings embedded in their content.

Pass B: Cross-Learning Meta-Analysis Analyzes the complete learning graph to identify higher-order patterns, contradictions between decisions, shared assumptions across projects, and opportunities for synthesis.

Learning Types

Distiller identifies five types of latent knowledge:

Type	What it Captures	Example
assumption	Beliefs the team operates on without explicit validation	"Timber prices will stay stable through Q2"
decision	Choices made with rationale and context	"Use REST over GraphQL for Acme API due to team familiarity"
constraint	Hard limits or boundaries identified during work	"Budget capped at $50k for Phase 1"
contradiction	Conflicting information across different sources	"Decision A recommends microservices but Decision B advocates for monolith"
synthesis	Patterns and connections across multiple observations	"Three separate projects converge on event-driven architecture"

Each learning record includes:

Confidence level (high/medium/low based on signal explicitness)
Status (active, superseded, invalidated)
Claim statement
Evidence excerpt from source
Links to source records
Links to related entities (projects, people, organizations)

Pass A: Per-Source Extraction

Pass A consists of four stages that transform raw vault content into structured learning records.

Stage 0: Candidate Scanning (Pure Python)

Scans vault records for keyword signals indicating latent knowledge. Uses pattern matching to detect:

Decision signals: "decided", "chose", "selected", "going with"
Assumption signals: "assuming", "expect", "believe", "probably"
Constraint signals: "limited to", "must", "cannot", "blocked by"
Contradiction signals: "but", "however", "although", "conflict"
Synthesis signals: "pattern", "trend", "consistently", "across"

Scores each candidate by signal density and recency. Only candidates exceeding min_signal_score are processed.

Stage 1: Extract (LLM, per-source)

For each candidate source record:

LLM analyzes full content with context about learning types
Writes JSON manifest of discovered learnings to temp file
Each learning includes type, title, confidence, status, claim, evidence_excerpt, source_links, entity_links
3-attempt retry logic handles manifest parsing failures

Confidence and status are calibrated by signal type:

Explicit statements ("We decided to...") → high confidence, active status
Implied or inferred learnings → low confidence, tentative status

Stage 2: Dedup + Merge (Pure Python)

After extraction across all candidates:

Fuzzy title matching identifies duplicate learnings
Merges duplicates, preserving all source links
Tracks which sources contributed to each learning
Reports candidate count, merged count, and final deduplicated count

Stage 3: Create Records (LLM, per-learning)

For each deduplicated learning:

Generates well-formed Markdown with YAML frontmatter
Creates record via alfred vault create command
Includes proper source links, entity links, and evidence sections
Follows vault schema conventions for learning types

Pass B: Cross-Learning Meta-Analysis

Pass B analyzes the complete learning graph to discover higher-order insights.

Meta-Analysis Capabilities

Contradiction Detection Scans decisions and assumptions for conflicting claims. Creates contradiction records linking the conflicting learnings with analysis of the tension.

Shared Assumption Analysis Identifies assumptions referenced across multiple projects or teams. Surfaces implicit dependencies and coordination risks.

Pattern Synthesis Uses semantic clustering to group related learnings. Creates synthesis records that articulate patterns emerging across the evidence graph.

Temporal Analysis Tracks how decisions evolve over time. Identifies superseded decisions and validates whether assumptions held true.

Clustering Method

Pass B uses semantic embeddings to cluster learnings by conceptual similarity rather than keyword matching. This reveals non-obvious connections between learnings from different domains.

Configuration

Distiller is configured in the distiller section of config.yaml:

distiller:
  enabled: true
  interval: 300                    # Light scan interval (seconds)
  deep_interval_hours: 24          # Deep extraction interval (hours)
  min_signal_score: 3              # Minimum score for candidate processing
  batch_size: 10                   # Max candidates per extraction run
  pass_b_enabled: true             # Enable meta-analysis

Agent Backend

Distiller uses the same agent backend configuration as other Alfred tools (agent.backend in config.yaml). Supports Claude Code, Zo Computer (HTTP), and OpenClaw backends.

CLI Commands

Scan for Candidates

alfred distiller scan

Performs keyword-based scanning to identify records containing extraction signals. Reports candidate count and score distribution without performing extraction.

Run Extraction

alfred distiller run

Executes full extraction pipeline:

Scans for candidates
Extracts learnings from candidates
Deduplicates and merges
Creates vault records
Optionally runs Pass B meta-analysis

Watch Mode (Daemon)

alfred distiller watch

Runs periodic extraction in foreground:

Light scans every interval seconds
Deep extraction every deep_interval_hours hours
Continues until interrupted

Background Daemon

alfred up --only distiller

Starts Distiller as a background daemon with auto-restart. Use alfred down to stop.

Check Status

alfred status

Shows Distiller daemon status, last extraction time, and learning record counts.

State Tracking

Distiller maintains state in data/distiller_state.json:

{
  "processed_sources": {
    "conversation/weekly-sync-2024-01-15": "abc123hash",
    "session/project-kickoff": "def456hash"
  },
  "last_scan": "2024-01-20T10:30:00Z",
  "last_deep_run": "2024-01-20T08:00:00Z",
  "extraction_history": [...]
}

Source records are tracked by content hash. When a source is modified, it becomes eligible for re-extraction.

Vault Scope

Distiller operates under the distiller scope defined in vault/scope.py:

Allowed Operations:

Create learning records (assumption, decision, constraint, contradiction, synthesis)
Read any vault record for context
Edit existing learning records to add sources or update status

Prohibited Operations:

Create non-learning records
Delete any records
Move or rename records

This scope ensures Distiller can build the learning graph without affecting operational records.

Workflow Example

Initial Extraction

# Scan vault for extraction candidates
alfred distiller scan

# Output:
# Found 42 candidates across 120 vault records
# Top candidates:
#   - conversation/architecture-debate (score: 8.5)
#   - session/budget-planning (score: 7.2)
#   - project/acme-api-design (score: 6.8)

# Run extraction
alfred distiller run

# Output:
# Stage 1: Extracted 23 learnings from 15 sources
# Stage 2: Merged 5 duplicates → 18 unique learnings
# Stage 3: Created 18 learning records
# Pass B: Identified 2 contradictions, created 1 synthesis

Continuous Operation

# Start as background daemon
alfred up --only distiller

# Check status
alfred status

# Output:
# Distiller: running (PID 12345)
#   Last scan: 2 minutes ago
#   Last deep extraction: 6 hours ago
#   Learning records: 127 total (45 decisions, 38 assumptions, ...)

Integration with Other Tools

With Curator

Curator creates operational records that become extraction sources for Distiller. As new conversations, sessions, and observations flow into the vault, Distiller automatically processes them for latent knowledge.

With Janitor

Janitor ensures learning records maintain proper links and frontmatter. If source records are moved or renamed, Janitor updates the references in learning records.

With Surveyor

Surveyor's semantic clustering complements Distiller's Pass B meta-analysis. Surveyor can identify conceptually similar learnings across the vault and suggest relationship links that Distiller can analyze for contradictions or synthesis opportunities.

Best Practices

Signal Quality

Configure min_signal_score based on vault size and signal quality:

Small vaults (< 500 records): score 2-3 catches most candidates
Large vaults (> 1000 records): score 4-5 focuses on high-confidence signals
Noisy vaults: score 6+ for precision over recall

Extraction Frequency

Balance extraction frequency against vault activity:

High-activity vaults: interval: 300 (5 minutes), deep_interval_hours: 12
Low-activity vaults: interval: 1800 (30 minutes), deep_interval_hours: 48
Ad-hoc extraction: Disable daemon, run alfred distiller run manually

Source Record Quality

Distiller works best on narrative content with explicit reasoning:

Meeting notes with decision rationale
Project retrospectives
Architecture discussions
Planning documents with constraints

Short, factual records (contacts, tasks) typically yield few learnings.

Learning Record Maintenance

Review and refine extracted learnings periodically:

Update status field when assumptions are validated or invalidated
Link related learnings to build evidence chains
Add entity links to connect learnings to relevant projects/people
Mark superseded decisions to maintain decision history

Troubleshooting

No Candidates Found

Symptom: alfred distiller scan reports 0 candidates

Solutions:

Lower min_signal_score threshold
Check that vault contains narrative content (not just structured entities)
Review data/distiller_state.json — already-processed sources won't re-appear
Manually trigger re-extraction by removing entries from processed_sources

Extraction Failures

Symptom: Stage 1 or Stage 3 consistently fails

Solutions:

Check data/distiller.log for LLM errors
Verify agent backend is configured correctly
Reduce batch_size to avoid rate limits
Check that vault CLAUDE.md is in agent workspace (OpenClaw backend)

Duplicate Learnings

Symptom: Similar learnings created with slightly different titles

Solutions:

Stage 2 dedup uses fuzzy matching — very similar titles should merge
Review merge threshold in code if needed
Manually merge duplicates in vault and link to all sources

Performance Issues

Symptom: Extraction takes too long or times out

Solutions:

Reduce batch_size to process fewer candidates per run
Increase interval to run less frequently
Use faster backend (OpenClaw is typically faster than HTTP for serial processing)
Consider extracting from specific sources manually rather than full scans

Architecture Notes

Agent-Writes-Directly Pattern

Distiller uses Alfred's agent-writes-directly pattern: the LLM agent receives vault context and creates learning records via alfred vault create commands. Changes are tracked through the mutation log (vault/mutation_log.py).

Backend Independence

Distiller works with all three agent backends (Claude Code, Zo Computer, OpenClaw). The prompt builder (backends/__init__.py) handles backend-specific formatting, but the extraction pipeline is backend-agnostic.

State Management

State files are bookkeeping only — the vault is the source of truth. You can safely delete data/distiller_state.json to force re-processing of all sources.

See Also:

Curator — Processes inbox inputs into vault records
Janitor — Maintains vault structural integrity
Surveyor — Semantic clustering and relationship discovery
Vault Schema — Complete record type reference

Getting Started

Architecture

Workers

Reference

Distiller

Distiller

Overview

What Distiller Does

Two-Pass Pipeline

Learning Types

Pass A: Per-Source Extraction

Stage 0: Candidate Scanning (Pure Python)

Stage 1: Extract (LLM, per-source)

Stage 2: Dedup + Merge (Pure Python)

Stage 3: Create Records (LLM, per-learning)

Pass B: Cross-Learning Meta-Analysis

Meta-Analysis Capabilities

Clustering Method

Configuration

Agent Backend

CLI Commands

Scan for Candidates

Run Extraction

Watch Mode (Daemon)

Background Daemon

Check Status

State Tracking

Vault Scope

Workflow Example

Initial Extraction

Continuous Operation

Integration with Other Tools

With Curator

With Janitor

With Surveyor

Best Practices

Signal Quality

Extraction Frequency

Source Record Quality

Learning Record Maintenance

Troubleshooting

No Candidates Found

Extraction Failures

Duplicate Learnings

Performance Issues

Architecture Notes

Agent-Writes-Directly Pattern

Backend Independence

State Management

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally