Skip to content

Rebalance frecency formula: recency factor dominates all other signals #374

@toejough

Description

@toejough

Problem

The frecency activation formula (frequency × recency × spread × effectiveness) is dominated by the recency term. After just 24 hours without surfacing, recency = 1/(1+24) = 0.04, which crushes the entire product regardless of the other factors.

Examples:

  • Memory surfaced 500 times, last surfaced 24h ago: ln(501) × 0.04 × spread × eff ≈ 0.25 × spread × eff
  • Memory surfaced 500 times, last surfaced 1 week ago: ln(501) × 0.006 × spread × eff ≈ 0.037 × spread × eff
  • Memory surfaced 2 times, last surfaced 1 hour ago: ln(3) × 0.5 × spread × eff ≈ 0.55 × spread × eff

A brand new memory surfaced once an hour ago outscores a proven memory with 500 surfacings from yesterday. The frequency, spread, and effectiveness signals are effectively ignored.

Impact

In prompt/tool mode, CombinedScore = BM25 × (1 + activation). Since activation ≈ 0 for anything not surfaced in the last few hours, this is basically just BM25. Frecency re-ranking adds no value.

Desired behavior

All four factors (frequency, recency, spread, effectiveness) should contribute meaningfully to the activation score. A memory with a long track record of being useful shouldn't be wiped out by a day of inactivity.

Possible approaches

  • Use a slower decay function for recency (e.g., 1/(1 + daysSince/halfLife) instead of 1/(1 + hoursSince))
  • Use log-scaled recency to match the log-scaled frequency
  • Weight the factors differently (e.g., geometric mean instead of product)
  • Cap the recency penalty floor so proven memories never drop below some minimum activation

Additionally: rethink the insufficient-data threshold

The maintenance system requires ≥5 surfacing events before classifying a memory into quadrants. Most memories never reach this threshold, so they escape cleanup entirely.

The threshold should measure confidence in the data, not just surfacing count. Multiple signals contribute to knowing enough about a memory to act on it:

  • Surfacing count — how often the system tried to use it
  • Feedback events — relevant/irrelevant/used/notused marks
  • Age — a memory with 0 surfacings after 5 weeks is telling you something

A composite confidence metric (e.g., weighted sum of these) would let the maintenance system classify memories sooner, especially ones that are simply never matching any queries.

Files

  • internal/frecency/frecency.go
  • internal/maintain/maintain.go (insufficient data threshold)
  • internal/review/review.go (quadrant classification gate)

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions