feat: two-pass hierarchical search with entity-guided retrieval

## Summary

Split search into two stages: first find the most relevant entities (people, projects, teams), then search within documents linked to those entities — blending entity relevance into document scores for dramatically better precision on structured queries.

```bash
# Automatic two-pass (when entity matches are confident)
kbx search "what did the platform team decide about migration?"

# Force flat search (bypass hierarchy)
kbx search "what did the platform team decide about migration?" --no-hierarchy
```

## Motivation

kbx currently does flat search across all 3000+ documents. Every document is treated equally regardless of its relationship to entities. This works well for keyword-specific queries but poorly for entity-centric queries like:

- "What has Person A been working on?" — should prioritise documents where Person A is an attendee or is mentioned heavily, not documents that happen to contain similar keywords
- "Latest updates on Project X" — should prioritise documents linked to that project entity, not unrelated documents that mention similar terms
- "What did the infrastructure team discuss last week?" — should scope to meetings with infra team members

**The problem**: Flat vector search treats "Person A discussed the migration" and "Person B discussed a different migration" as equally relevant to a query about Person A's work. Two-pass search uses the entity graph to filter first, then score within that filtered set.

**Inspiration**: OpenViking's `HierarchicalRetriever` uses a priority-queue BFS over its directory tree with score propagation (`final = 0.5 * child + 0.5 * parent`). Their convergence detection stops early when top-K stabilises. We adapt this concept to kbx's entity→document graph rather than a filesystem hierarchy.

## Design

### 1. Pass 1 — Entity Search

Use the existing entity index to find the most relevant entities for the query.

```python
def pass1_entity_search(query: str, limit: int = 5) -> list[EntityScore]:
    """
    Search entities (people, projects, teams) by name, alias, role, and facts.
    Returns top-K entities with relevance scores.
    """
    # FTS search across entity names, aliases, roles, facts
    fts_matches = entity_fts_search(query, limit=limit * 2)

    # Vector search across entity descriptions/facts
    vector_matches = entity_vector_search(query, limit=limit * 2)

    # Merge and deduplicate
    return merge_entity_scores(fts_matches, vector_matches, limit=limit)
```

**Entity scoring sources**:
- Name/alias exact or fuzzy match (highest signal)
- Role/team match
- Fact text match (e.g. entity fact "leads the migration project" matches query "migration")
- Vector similarity of entity description to query

**Output**: List of `(entity_id, entity_name, entity_type, score)` tuples, sorted by score descending.

**Confidence threshold**: Pass 1 results are only used if the top entity scores above a configurable threshold (e.g. 0.5). Below that, fall back to flat search — the query is probably not entity-centric.

### 2. Pass 2 — Entity-Constrained Document Search

Search documents linked to the top entities from Pass 1, blending entity relevance into document scores.

```python
def pass2_document_search(
    query: str,
    entity_scores: list[EntityScore],
    alpha: float = 0.5,
    limit: int = 10,
) -> list[SearchResult]:
    """
    Search within documents associated with top entities.
    Blend entity score into document score.
    """
    # Collect document IDs linked to top entities (via entity mention index)
    candidate_doc_ids = set()
    doc_entity_scores = {}  # doc_id -> best parent entity score

    for entity in entity_scores:
        linked_docs = get_documents_mentioning_entity(entity.id)
        for doc_id in linked_docs:
            candidate_doc_ids.add(doc_id)
            # Keep the best entity score per document
            doc_entity_scores[doc_id] = max(
                doc_entity_scores.get(doc_id, 0.0),
                entity.score,
            )

    # Run standard FTS+vector search, but constrained to candidate docs
    doc_results = search_within_documents(query, doc_ids=candidate_doc_ids)

    # Blend scores
    for result in doc_results:
        parent_score = doc_entity_scores.get(result.doc_id, 0.0)
        result.score = alpha * result.score + (1 - alpha) * parent_score

    return sorted(doc_results, key=lambda r: r.score, reverse=True)[:limit]
```

**Document→entity linkage** uses the existing entity mention index (`find_entity_mentions()`). A document is linked to an entity if:
- The entity is mentioned in the document text (existing linking)
- The entity is listed as an attendee (for meetings)
- The entity is referenced in frontmatter metadata

### 3. Score Propagation Formula

```python
final_score = alpha * doc_relevance_score + (1 - alpha) * parent_entity_score
```

| Alpha | Behaviour |
|-------|-----------|
| 1.0 | Pure document relevance (flat search) |
| 0.5 | Equal weight to document relevance and entity match (default) |
| 0.0 | Pure entity proximity (rank by how strongly linked to matched entity) |

**Default**: `alpha = 0.5` — a document that's moderately relevant but strongly linked to a matched entity ranks similarly to a highly relevant document with weak entity linkage.

**Edge case**: Documents linked to multiple matched entities get the **best** (max) entity score, not the sum. This prevents documents mentioning many entities from dominating purely through quantity.

Configurable in `kbx.toml`:
```toml
[search]
hierarchy_alpha = 0.5              # 0.0 to 1.0
hierarchy_entity_threshold = 0.5   # minimum entity score to trigger two-pass
hierarchy_max_entities = 5         # max entities from Pass 1
```

### 4. Fallback to Flat Search

Two-pass search is not always appropriate. Fall back to standard flat search when:

1. **No confident entities**: Pass 1 top score < threshold (default 0.5). The query is probably not entity-centric (e.g. "how do we handle deployment rollbacks?" — general topic, no specific entity).

2. **Too many entities**: If Pass 1 returns many entities with similar scores, the query is too broad for entity-guided filtering. Fall back if `top_score - 5th_score < 0.1`.

3. **Explicit opt-out**: `--no-hierarchy` flag.

4. **FTS-only mode**: `--fast` (FTS-only search) skips two-pass by default since entity vector search isn't available. Could still do entity FTS matching in a future iteration.

**Fallback is transparent**: Results include a `search_mode` field in `--explain` output indicating whether two-pass or flat search was used, and why.

### 5. Convergence Detection

For large entity-linked document sets, implement early stopping:

```python
MAX_CONVERGENCE_ROUNDS = 3

def search_with_convergence(query, candidate_docs, limit):
    """Process candidate docs in batches, stop when top-K stabilises."""
    top_k = []
    stable_rounds = 0

    for batch in chunk(candidate_docs, batch_size=50):
        new_results = score_batch(query, batch)
        prev_top_k = [r.doc_id for r in top_k[:limit]]

        top_k = merge_and_sort(top_k + new_results)[:limit * 2]
        curr_top_k = [r.doc_id for r in top_k[:limit]]

        if prev_top_k == curr_top_k:
            stable_rounds += 1
            if stable_rounds >= MAX_CONVERGENCE_ROUNDS:
                break  # Top-K is stable, stop early
        else:
            stable_rounds = 0

    return top_k[:limit]
```

This prevents scanning hundreds of entity-linked documents when the top results are already clear after the first few batches. Particularly valuable for entities that appear in 100+ documents.

### 6. Query Type Suitability

| Query type | Two-pass benefit | Example |
|------------|-----------------|---------|
| **Person queries** | **High** | "What has Person A been focused on?" |
| **Project queries** | **High** | "Latest updates on Project X" |
| **Team queries** | **High** | "What did the platform team discuss?" |
| **Topic + person** | **High** | "What does Person B think about migration?" |
| **Pure topic** | **Low** | "How do deployment rollbacks work?" |
| **Keyword lookup** | **None** | "error code 5032" |
| **Date-scoped** | **Low** | "What happened last Tuesday?" |

The confidence threshold in Pass 1 handles this automatically — pure topic queries won't match entities confidently, so they fall back to flat search.

### 7. CLI Flags

```bash
# Default: automatic (two-pass when entities match, flat otherwise)
kbx search "Person A migration status" --json

# Force flat search (bypass entity pass)
kbx search "Person A migration status" --no-hierarchy --json

# Tune alpha (more weight to entity match)
kbx search "Person A migration status" --hierarchy-alpha 0.7 --json

# Explain mode shows which pass was used
kbx search "Person A migration status" --explain --json
# → meta.search_mode: "two_pass"
# → meta.pass1_entities: [{name: "Person A", score: 0.92}]
# → per-result: explain.parent_entity_score: 0.92
```

### 8. Performance Implications

**Added latency from Pass 1**:
- Entity FTS search: ~5ms (fast, small index)
- Entity vector search: ~20ms (if embeddings exist for entities)
- Total Pass 1: ~25ms overhead

**Pass 2 can be faster than flat search**:
- Flat search scans all ~3000+ documents
- Two-pass scans only documents linked to top entities (typically 50-200)
- Net effect: often faster despite the extra pass

**Caching opportunities**:
- Entity search results are stable across queries in the same session. Cache Pass 1 results for queries with overlapping entity matches.
- Entity→document linkage is stable between index runs. Cache the `entity_id → [doc_ids]` mapping in memory.
- LRU cache on entity scores with ~60s TTL would eliminate redundant Pass 1 calls in interactive sessions.

**Worst case**: Query matches 5 entities, each linked to 200 documents → 1000 candidate docs. With convergence detection (batch size 50, 3 stable rounds), typically evaluates 150-200 before stopping.

## Integration with Other Features

- **L0 abstracts (#66)**: Pass 2 could score entity-linked documents using L0 abstracts first (cheap triage), then load full chunks only for top candidates. Two-pass + L0 is a powerful combination.
- **Hotness scoring (#67)**: Hotness blending applies after the two-pass score propagation: `final = (1 - hotness_weight) * two_pass_score + hotness_weight * hotness`
- **Search explain (#68)**: Explain mode should show Pass 1 entity matches, Pass 2 document constraint set, and the score propagation formula per result.

## Implementation Phases

1. **Phase 1 — Pass 1 entity search**: Build entity scoring (FTS + vector on entity names/facts/roles). Confidence threshold for triggering two-pass. `~2 days`
2. **Phase 2 — Pass 2 constrained search**: Filter document search to entity-linked docs. Score propagation with configurable alpha. Fallback logic. `~2 days`
3. **Phase 3 — Convergence detection**: Batch processing with early stopping when top-K stabilises. `~1 day`
4. **Phase 4 — CLI surface + caching**: `--no-hierarchy`, `--hierarchy-alpha` flags. Entity score caching. Integration with `--explain`. `~1 day`

## Open Questions

- [ ] Should Pass 1 use the existing `find_entity_mentions()` matching or a dedicated entity search index? The existing matching is string-based; a vector search over entity descriptions would catch semantic matches (e.g. "infrastructure issues" matching an entity whose role is "SRE Lead").
- [ ] Should two-pass be the default for all searches, or opt-in via `--hierarchy`? Default-on with automatic fallback seems right, but adds complexity.
- [ ] How should team entities work? A team query should expand to all team members' documents. Is this a recursive entity expansion (team → members → docs) or a single-level lookup?
- [ ] Should the entity→document linkage include a "strength" score (e.g. number of mentions) that feeds into the propagation formula? A document mentioning an entity once vs ten times likely has different relevance.

Query type	Two-pass benefit	Example
Person queries	High	"What has Person A been focused on?"
Project queries	High	"Latest updates on Project X"
Team queries	High	"What did the platform team discuss?"
Topic + person	High	"What does Person B think about migration?"
Pure topic	Low	"How do deployment rollbacks work?"
Keyword lookup	None	"error code 5032"
Date-scoped	Low	"What happened last Tuesday?"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: two-pass hierarchical search with entity-guided retrieval #69

Summary

Motivation

Design

1. Pass 1 — Entity Search

2. Pass 2 — Entity-Constrained Document Search

3. Score Propagation Formula

4. Fallback to Flat Search

5. Convergence Detection

6. Query Type Suitability

7. CLI Flags

8. Performance Implications

Integration with Other Features

Implementation Phases

Open Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Alpha	Behaviour
1.0	Pure document relevance (flat search)
0.5	Equal weight to document relevance and entity match (default)
0.0	Pure entity proximity (rank by how strongly linked to matched entity)

feat: two-pass hierarchical search with entity-guided retrieval #69

Description

Summary

Motivation

Design

1. Pass 1 — Entity Search

2. Pass 2 — Entity-Constrained Document Search

3. Score Propagation Formula

4. Fallback to Flat Search

5. Convergence Detection

6. Query Type Suitability

7. CLI Flags

8. Performance Implications

Integration with Other Features

Implementation Phases

Open Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions