Rebalance frecency formula: recency factor dominates all other signals

## Problem

The frecency activation formula (`frequency × recency × spread × effectiveness`) is dominated by the recency term. After just 24 hours without surfacing, recency = `1/(1+24)` = 0.04, which crushes the entire product regardless of the other factors.

Examples:
- Memory surfaced 500 times, last surfaced 24h ago: `ln(501) × 0.04 × spread × eff ≈ 0.25 × spread × eff`
- Memory surfaced 500 times, last surfaced 1 week ago: `ln(501) × 0.006 × spread × eff ≈ 0.037 × spread × eff`
- Memory surfaced 2 times, last surfaced 1 hour ago: `ln(3) × 0.5 × spread × eff ≈ 0.55 × spread × eff`

A brand new memory surfaced once an hour ago outscores a proven memory with 500 surfacings from yesterday. The frequency, spread, and effectiveness signals are effectively ignored.

## Impact

In prompt/tool mode, `CombinedScore = BM25 × (1 + activation)`. Since activation ≈ 0 for anything not surfaced in the last few hours, this is basically just BM25. Frecency re-ranking adds no value.

## Desired behavior

All four factors (frequency, recency, spread, effectiveness) should contribute meaningfully to the activation score. A memory with a long track record of being useful shouldn't be wiped out by a day of inactivity.

## Possible approaches

- Use a slower decay function for recency (e.g., `1/(1 + daysSince/halfLife)` instead of `1/(1 + hoursSince)`)
- Use log-scaled recency to match the log-scaled frequency
- Weight the factors differently (e.g., geometric mean instead of product)
- Cap the recency penalty floor so proven memories never drop below some minimum activation

## Additionally: rethink the insufficient-data threshold

The maintenance system requires ≥5 surfacing events before classifying a memory into quadrants. Most memories never reach this threshold, so they escape cleanup entirely.

The threshold should measure **confidence in the data**, not just surfacing count. Multiple signals contribute to knowing enough about a memory to act on it:

- **Surfacing count** — how often the system tried to use it
- **Feedback events** — relevant/irrelevant/used/notused marks
- **Age** — a memory with 0 surfacings after 5 weeks is telling you something

A composite confidence metric (e.g., weighted sum of these) would let the maintenance system classify memories sooner, especially ones that are simply never matching any queries.

## Files

- `internal/frecency/frecency.go`
- `internal/maintain/maintain.go` (insufficient data threshold)
- `internal/review/review.go` (quadrant classification gate)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebalance frecency formula: recency factor dominates all other signals #374

Problem

Impact

Desired behavior

Possible approaches

Additionally: rethink the insufficient-data threshold

Files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Rebalance frecency formula: recency factor dominates all other signals #374

Description

Problem

Impact

Desired behavior

Possible approaches

Additionally: rethink the insufficient-data threshold

Files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions