Skip to content

fix(retrieval): plumb documented [retrieval] tier weights into the additive path (#202)#210

Merged
mbachaud merged 1 commit into
masterfrom
fix/202-additive-weights
Jun 10, 2026
Merged

fix(retrieval): plumb documented [retrieval] tier weights into the additive path (#202)#210
mbachaud merged 1 commit into
masterfrom
fix/202-additive-weights

Conversation

@mbachaud

Copy link
Copy Markdown
Owner

Closes #202. All 9 tier weights (incl. new sema_boost_weight, default 2.0 - the tier had no knob) now bind in BOTH fusion modes. Zero config-default changes (defaults already equaled the literals); caps scale proportionally with their weight (2x fts5, 2x lex, 3x harmonic, 4x entity - documented per site). Byte-identity proven: golden test captured on pre-fix tree asserts == on scores and per-tier contributions across a 9-tier corpus; the existing 50-query additive snapshot passes unchanged. 23 new tests; ~2,255 passed full sweep in sandbox (4 failures pre-existing on master). Unblocks per-tier weight tuning ahead of the RRF gate (roadmap section 5).

…e additive path (#202)

The eight documented [retrieval] tier weights (fts5_weight,
splade_weight, tag_exact_weight, tag_prefix_weight, sema_cold_weight,
lex_anchor_weight, harmonic_weight, entity_graph_weight) were consumed
only via fuser.add_tier(), which the default fusion_mode="additive"
never consults -- operators tuning them per the docs saw zero effect.
The additive accumulations used inline literals instead.

This binds the existing self._*_weight attrs into the additive tier
formulas with defaults byte-identical to the old literals. Every config
default already equals its tier's leading coefficient, so each
substitution swaps in the SAME float value (no scale-factor
multiplication, no round-off drift):

  tier          old literal                 new formula
  tag_exact     match_count * 3.0           match_count * tag_exact_weight
  tag_prefix    match_count * 1.5           match_count * tag_prefix_weight
  fts5          min(-rank, 6.0)             min(-rank, 2.0 * fts5_weight)
                  [no leading coeff in additive -- cap-only knob;
                   cap = 2.0 x weight, default 2.0 x 3.0 == legacy 6.0]
  splade        min(s, 20) * 3.5 / 20       min(s, 20) * splade_weight / 20
  sema_boost    sim * 2.0 * scale           sim * sema_boost_weight * scale
                  [NEW knob, default 2.0 -- the warm Tier-4A boost had
                   no weight knob at all (post-fusion additive in RRF)]
  sema_cold     sim * 3.0                   sim * sema_cold_weight
  lex_anchor    min(idf * 1.5, 3.0)         min(idf * w, 2.0 * w)
                  [cap = 2.0 x weight, default == legacy 3.0]
  harmonic      +1.0/link, cap 3.0          +w per link, cap 3.0 * w
                  [cap = 3.0 x weight, default == legacy 3.0]
  entity_graph  min(1.0 * 0.5, 2.0)         min(1.0 * w, 4.0 * w)
                  [cap = 4.0 x weight, default == legacy 2.0]

No existing config default changed. The new sema_boost_weight (2.0) is
plumbed through RetrievalConfig, the TOML loader, context_manager's
open_read_source kwargs (fans to solo Genome and per-shard Genomes),
and the KnowledgeStore ctor. Caps that were independent literals now
scale proportionally with their tier's weight (documented inline, in
helix.toml and docs/config-reference.md): zeroing a weight kills its
tier; scaling it scales the tier's contribution including the capped
region.

Tests: tests/test_additive_weight_plumbing.py
  - golden byte-identity: 10-doc corpus firing all 9 tiers; final
    scores AND per-tier contributions captured on the pre-fix tree
    (266e9aa) and asserted bit-identical (==) post-fix, plus an
    explicit-defaults == implicit-defaults run
  - per-knob "knob moves exactly its tier" and "zero weight kills the
    tier" coverage for all 9 knobs
  - RetrievalConfig defaults == legacy additive literals; TOML loader
    plumbs sema_boost_weight
  - the existing 50-query additive back-compat snapshot
    (test_fusion_rrf.py::test_fusion_mode_additive_unchanged) passes
    unchanged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Additive mode ignores the documented [retrieval] tier weights (9 dead knobs)

1 participant