-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
The frecency activation formula (frequency × recency × spread × effectiveness) is dominated by the recency term. After just 24 hours without surfacing, recency = 1/(1+24) = 0.04, which crushes the entire product regardless of the other factors.
Examples:
- Memory surfaced 500 times, last surfaced 24h ago:
ln(501) × 0.04 × spread × eff ≈ 0.25 × spread × eff - Memory surfaced 500 times, last surfaced 1 week ago:
ln(501) × 0.006 × spread × eff ≈ 0.037 × spread × eff - Memory surfaced 2 times, last surfaced 1 hour ago:
ln(3) × 0.5 × spread × eff ≈ 0.55 × spread × eff
A brand new memory surfaced once an hour ago outscores a proven memory with 500 surfacings from yesterday. The frequency, spread, and effectiveness signals are effectively ignored.
Impact
In prompt/tool mode, CombinedScore = BM25 × (1 + activation). Since activation ≈ 0 for anything not surfaced in the last few hours, this is basically just BM25. Frecency re-ranking adds no value.
Desired behavior
All four factors (frequency, recency, spread, effectiveness) should contribute meaningfully to the activation score. A memory with a long track record of being useful shouldn't be wiped out by a day of inactivity.
Possible approaches
- Use a slower decay function for recency (e.g.,
1/(1 + daysSince/halfLife)instead of1/(1 + hoursSince)) - Use log-scaled recency to match the log-scaled frequency
- Weight the factors differently (e.g., geometric mean instead of product)
- Cap the recency penalty floor so proven memories never drop below some minimum activation
Additionally: rethink the insufficient-data threshold
The maintenance system requires ≥5 surfacing events before classifying a memory into quadrants. Most memories never reach this threshold, so they escape cleanup entirely.
The threshold should measure confidence in the data, not just surfacing count. Multiple signals contribute to knowing enough about a memory to act on it:
- Surfacing count — how often the system tried to use it
- Feedback events — relevant/irrelevant/used/notused marks
- Age — a memory with 0 surfacings after 5 weeks is telling you something
A composite confidence metric (e.g., weighted sum of these) would let the maintenance system classify memories sooner, especially ones that are simply never matching any queries.
Files
internal/frecency/frecency.gointernal/maintain/maintain.go(insufficient data threshold)internal/review/review.go(quadrant classification gate)
🤖 Generated with Claude Code