Skip to content

Distiller

ssd edited this page Feb 24, 2026 · 1 revision

Distiller

Distiller is one of four AI-powered tools in Alfred that extracts latent knowledge from operational records and transforms it into a structured evidence graph. It reads conversations, session logs, project notes, and other vault content to identify and capture hidden insights that would otherwise remain buried in narrative text.

Overview

Distiller operates on the principle that valuable knowledge often exists implicitly in operational records. Team assumptions, critical decisions, resource constraints, and contradictory information are frequently mentioned in passing but never formalized. Distiller surfaces these insights and creates dedicated learning records that form a queryable knowledge graph.

What Distiller Does

  • Scans vault records for signals of latent knowledge
  • Extracts assumptions, decisions, constraints, contradictions, and syntheses
  • Creates structured learning records with evidence links back to sources
  • Performs cross-learning meta-analysis to detect patterns and conflicts
  • Builds an evidence graph connecting learnings to their source records

Two-Pass Pipeline

Pass A: Per-Source Extraction Processes individual source records to extract learnings embedded in their content.

Pass B: Cross-Learning Meta-Analysis Analyzes the complete learning graph to identify higher-order patterns, contradictions between decisions, shared assumptions across projects, and opportunities for synthesis.

Learning Types

Distiller identifies five types of latent knowledge:

Type What it Captures Example
assumption Beliefs the team operates on without explicit validation "Timber prices will stay stable through Q2"
decision Choices made with rationale and context "Use REST over GraphQL for Acme API due to team familiarity"
constraint Hard limits or boundaries identified during work "Budget capped at $50k for Phase 1"
contradiction Conflicting information across different sources "Decision A recommends microservices but Decision B advocates for monolith"
synthesis Patterns and connections across multiple observations "Three separate projects converge on event-driven architecture"

Each learning record includes:

  • Confidence level (high/medium/low based on signal explicitness)
  • Status (active, superseded, invalidated)
  • Claim statement
  • Evidence excerpt from source
  • Links to source records
  • Links to related entities (projects, people, organizations)

Pass A: Per-Source Extraction

Pass A consists of four stages that transform raw vault content into structured learning records.

Stage 0: Candidate Scanning (Pure Python)

Scans vault records for keyword signals indicating latent knowledge. Uses pattern matching to detect:

  • Decision signals: "decided", "chose", "selected", "going with"
  • Assumption signals: "assuming", "expect", "believe", "probably"
  • Constraint signals: "limited to", "must", "cannot", "blocked by"
  • Contradiction signals: "but", "however", "although", "conflict"
  • Synthesis signals: "pattern", "trend", "consistently", "across"

Scores each candidate by signal density and recency. Only candidates exceeding min_signal_score are processed.

Stage 1: Extract (LLM, per-source)

For each candidate source record:

  1. LLM analyzes full content with context about learning types
  2. Writes JSON manifest of discovered learnings to temp file
  3. Each learning includes type, title, confidence, status, claim, evidence_excerpt, source_links, entity_links
  4. 3-attempt retry logic handles manifest parsing failures

Confidence and status are calibrated by signal type:

  • Explicit statements ("We decided to...") → high confidence, active status
  • Implied or inferred learnings → low confidence, tentative status

Stage 2: Dedup + Merge (Pure Python)

After extraction across all candidates:

  1. Fuzzy title matching identifies duplicate learnings
  2. Merges duplicates, preserving all source links
  3. Tracks which sources contributed to each learning
  4. Reports candidate count, merged count, and final deduplicated count

Stage 3: Create Records (LLM, per-learning)

For each deduplicated learning:

  1. Generates well-formed Markdown with YAML frontmatter
  2. Creates record via alfred vault create command
  3. Includes proper source links, entity links, and evidence sections
  4. Follows vault schema conventions for learning types

Pass B: Cross-Learning Meta-Analysis

Pass B analyzes the complete learning graph to discover higher-order insights.

Meta-Analysis Capabilities

Contradiction Detection Scans decisions and assumptions for conflicting claims. Creates contradiction records linking the conflicting learnings with analysis of the tension.

Shared Assumption Analysis Identifies assumptions referenced across multiple projects or teams. Surfaces implicit dependencies and coordination risks.

Pattern Synthesis Uses semantic clustering to group related learnings. Creates synthesis records that articulate patterns emerging across the evidence graph.

Temporal Analysis Tracks how decisions evolve over time. Identifies superseded decisions and validates whether assumptions held true.

Clustering Method

Pass B uses semantic embeddings to cluster learnings by conceptual similarity rather than keyword matching. This reveals non-obvious connections between learnings from different domains.

Configuration

Distiller is configured in the distiller section of config.yaml:

distiller:
  enabled: true
  interval: 300                    # Light scan interval (seconds)
  deep_interval_hours: 24          # Deep extraction interval (hours)
  min_signal_score: 3              # Minimum score for candidate processing
  batch_size: 10                   # Max candidates per extraction run
  pass_b_enabled: true             # Enable meta-analysis

Agent Backend

Distiller uses the same agent backend configuration as other Alfred tools (agent.backend in config.yaml). Supports Claude Code, Zo Computer (HTTP), and OpenClaw backends.

CLI Commands

Scan for Candidates

alfred distiller scan

Performs keyword-based scanning to identify records containing extraction signals. Reports candidate count and score distribution without performing extraction.

Run Extraction

alfred distiller run

Executes full extraction pipeline:

  1. Scans for candidates
  2. Extracts learnings from candidates
  3. Deduplicates and merges
  4. Creates vault records
  5. Optionally runs Pass B meta-analysis

Watch Mode (Daemon)

alfred distiller watch

Runs periodic extraction in foreground:

  • Light scans every interval seconds
  • Deep extraction every deep_interval_hours hours
  • Continues until interrupted

Background Daemon

alfred up --only distiller

Starts Distiller as a background daemon with auto-restart. Use alfred down to stop.

Check Status

alfred status

Shows Distiller daemon status, last extraction time, and learning record counts.

State Tracking

Distiller maintains state in data/distiller_state.json:

{
  "processed_sources": {
    "conversation/weekly-sync-2024-01-15": "abc123hash",
    "session/project-kickoff": "def456hash"
  },
  "last_scan": "2024-01-20T10:30:00Z",
  "last_deep_run": "2024-01-20T08:00:00Z",
  "extraction_history": [...]
}

Source records are tracked by content hash. When a source is modified, it becomes eligible for re-extraction.

Vault Scope

Distiller operates under the distiller scope defined in vault/scope.py:

Allowed Operations:

  • Create learning records (assumption, decision, constraint, contradiction, synthesis)
  • Read any vault record for context
  • Edit existing learning records to add sources or update status

Prohibited Operations:

  • Create non-learning records
  • Delete any records
  • Move or rename records

This scope ensures Distiller can build the learning graph without affecting operational records.

Workflow Example

Initial Extraction

# Scan vault for extraction candidates
alfred distiller scan

# Output:
# Found 42 candidates across 120 vault records
# Top candidates:
#   - conversation/architecture-debate (score: 8.5)
#   - session/budget-planning (score: 7.2)
#   - project/acme-api-design (score: 6.8)

# Run extraction
alfred distiller run

# Output:
# Stage 1: Extracted 23 learnings from 15 sources
# Stage 2: Merged 5 duplicates → 18 unique learnings
# Stage 3: Created 18 learning records
# Pass B: Identified 2 contradictions, created 1 synthesis

Continuous Operation

# Start as background daemon
alfred up --only distiller

# Check status
alfred status

# Output:
# Distiller: running (PID 12345)
#   Last scan: 2 minutes ago
#   Last deep extraction: 6 hours ago
#   Learning records: 127 total (45 decisions, 38 assumptions, ...)

Integration with Other Tools

With Curator

Curator creates operational records that become extraction sources for Distiller. As new conversations, sessions, and observations flow into the vault, Distiller automatically processes them for latent knowledge.

With Janitor

Janitor ensures learning records maintain proper links and frontmatter. If source records are moved or renamed, Janitor updates the references in learning records.

With Surveyor

Surveyor's semantic clustering complements Distiller's Pass B meta-analysis. Surveyor can identify conceptually similar learnings across the vault and suggest relationship links that Distiller can analyze for contradictions or synthesis opportunities.

Best Practices

Signal Quality

Configure min_signal_score based on vault size and signal quality:

  • Small vaults (< 500 records): score 2-3 catches most candidates
  • Large vaults (> 1000 records): score 4-5 focuses on high-confidence signals
  • Noisy vaults: score 6+ for precision over recall

Extraction Frequency

Balance extraction frequency against vault activity:

  • High-activity vaults: interval: 300 (5 minutes), deep_interval_hours: 12
  • Low-activity vaults: interval: 1800 (30 minutes), deep_interval_hours: 48
  • Ad-hoc extraction: Disable daemon, run alfred distiller run manually

Source Record Quality

Distiller works best on narrative content with explicit reasoning:

  • Meeting notes with decision rationale
  • Project retrospectives
  • Architecture discussions
  • Planning documents with constraints

Short, factual records (contacts, tasks) typically yield few learnings.

Learning Record Maintenance

Review and refine extracted learnings periodically:

  • Update status field when assumptions are validated or invalidated
  • Link related learnings to build evidence chains
  • Add entity links to connect learnings to relevant projects/people
  • Mark superseded decisions to maintain decision history

Troubleshooting

No Candidates Found

Symptom: alfred distiller scan reports 0 candidates

Solutions:

  • Lower min_signal_score threshold
  • Check that vault contains narrative content (not just structured entities)
  • Review data/distiller_state.json — already-processed sources won't re-appear
  • Manually trigger re-extraction by removing entries from processed_sources

Extraction Failures

Symptom: Stage 1 or Stage 3 consistently fails

Solutions:

  • Check data/distiller.log for LLM errors
  • Verify agent backend is configured correctly
  • Reduce batch_size to avoid rate limits
  • Check that vault CLAUDE.md is in agent workspace (OpenClaw backend)

Duplicate Learnings

Symptom: Similar learnings created with slightly different titles

Solutions:

  • Stage 2 dedup uses fuzzy matching — very similar titles should merge
  • Review merge threshold in code if needed
  • Manually merge duplicates in vault and link to all sources

Performance Issues

Symptom: Extraction takes too long or times out

Solutions:

  • Reduce batch_size to process fewer candidates per run
  • Increase interval to run less frequently
  • Use faster backend (OpenClaw is typically faster than HTTP for serial processing)
  • Consider extracting from specific sources manually rather than full scans

Architecture Notes

Agent-Writes-Directly Pattern

Distiller uses Alfred's agent-writes-directly pattern: the LLM agent receives vault context and creates learning records via alfred vault create commands. Changes are tracked through the mutation log (vault/mutation_log.py).

Backend Independence

Distiller works with all three agent backends (Claude Code, Zo Computer, OpenClaw). The prompt builder (backends/__init__.py) handles backend-specific formatting, but the extraction pipeline is backend-agnostic.

State Management

State files are bookkeeping only — the vault is the source of truth. You can safely delete data/distiller_state.json to force re-processing of all sources.


See Also:

  • Curator — Processes inbox inputs into vault records
  • Janitor — Maintains vault structural integrity
  • Surveyor — Semantic clustering and relationship discovery
  • Vault Schema — Complete record type reference

Clone this wiki locally