fix(public): de-hardcode wave 1 + retrieval-profiles spec + evidence roadmap by mbachaud · Pull Request #201 · mbachaud/helix-context

mbachaud · 2026-06-10T00:48:32Z

Output of the 2026-06-09 research → examine → document loop (3 parallel research agents: issues/PRs evidence audit, public-product hardcoding audit, retrieval-tuning evidence study).

De-hardcode wave 1 (runtime retrieval code)

Tagger: owner-project vocabulary (BigEd/BookKeeper/CosmicTasha/ModuleHub/Dr. Ders/FleetDB/SwiftWing21 + matching tech terms) removed from _PROJECT_ENTITIES/_TECH_TERMS — these drive the tag_exact tier, so every public user's corpus was being tagged against the owner's project names. Operators extend via HELIX_TAGGER_EXTRA_ENTITIES ("Label:Text" pairs) / HELIX_TAGGER_EXTRA_TERMS.
lexical_rescue: removed the product's self-boost (helix query + helix-context path +2.0, /helix.toml +2.0) and the owner's /_worktrees/ −1.0 layout penalty; replaced with a generic +2.0 when a query term (len≥4) names a path segment/filename stem. Structural bias toward the product's own repo is gone (helix/acme now score identically).
helix.toml template: owner-project synonym rows and the personal mem_sync.watch_dirs path scrubbed.
Dashboard placeholders: F:\path\to\new.genome.db → path/to/new.genome.db.
bridge.py How-to-Use renders the instance helix_base_url (was hardcoded :11437); installer service message threads the real --port.

Docs (same loop)

docs/specs/2026-06-09-retrieval-profiles.md — the profiles/modes answer: one pipeline can't serve all workloads, but only ~6 knobs are corpus-sensitive (dense_additive_weight ±19pp on prose, SPLADE 21.1% disk for 0pp at 850K, non-transferable absolute thresholds, filename anchor +24pp code-only, synonyms, abstain floors). Design = Layer 1 auto-calibration (genome_calibration machinery exists) + Layer 2 classifier-owned semantic arm (kills the HELIX_SEMANTIC_ARM env gate) + Layer 3 three small corpus profiles (code/prose/small-mixed). Blocking measurements ranked.
docs/audits/2026-06-09-next-steps-evidence.md — ranked roadmap with evidence (500K rebuild+bench vs published 68.4/68.8/72.4 baselines, Audit fingerprint routing index (path_key_index) storage: 34.1% of v2 corpus #165 verification + residual ~2–3 GB schema wins, the two deferred default-flips, Wall-2 orphaned PRs perf(retrieval): parallel cross-shard fan-out + lifted SPLADE query encoding #158/feat(retrieval): SPLADE pre-filter for dense matmul (Issue #159 Wall-2) #160, hardcoding wave 2 list).

Test plan

16 new tests (test_public_dehardcode.py): owner-vocab absence, env extension, generic-vs-helix score equality, template neutrality
Updated fixtures in density-gate / shard-router / bench-needles / installer tests
157 passed across touched suites on Windows; 114 passed + compileall clean in sandbox

…ing/config, generic path-affinity rescue Public-product readiness pass. The shipped defaults baked the original author's private project vocabulary and machine paths into runtime retrieval surfaces. That matters beyond cosmetics: CpuTagger's EntityRuler patterns and _TECH_TERMS become promoter tags at ingest, and promoter tags drive the Tier-1 tag_exact retrieval tier (tag_exact_weight = 3.0) — so any public deployment was spending its strongest lexical tier matching another person's project names. Likewise lexical_rescue's path bonus hardwired +2.0 boosts for "helix-context" paths and "/helix.toml", biasing source rescue toward the product's own repository on every corpus. Changes: * tagger.py — removed owner-project entries from _PROJECT_ENTITIES (BigEd, BigEd CC, BookKeeper, CosmicTasha, two-brain, ModuleHub, Dr. Ders, FleetDB, fleet.toml as PRODUCT; SwiftWing21 as ORG) and from _TECH_TERMS (biged, bookkeeper, cosmictasha, modulehub, fleetdb, "dr ders"). The product's own stack (helix, agentome, scorerift, splade, ...) and generic IR/NLP vocabulary stay. New module-level _extend_vocabulary_from_env() reads HELIX_TAGGER_EXTRA_ENTITIES (comma-separated "Label:Text" pairs) and HELIX_TAGGER_EXTRA_TERMS (comma-separated lowercased terms) once at import so operators re-add deployment vocabulary without forking; malformed pairs are skipped with a warning. Comments note the previously owner-specific defaults removed for public release. * retrieval/lexical_rescue.py — dropped the hardwired helix-in-query + "helix-context"-in-path +2.0, the "/helix.toml" +2.0, and the owner-specific "/_worktrees/" -1.0 penalty. Replaced with one corpus-neutral heuristic: +2.0 (once) when any query term of len >= 4 names a whole path segment or filename stem of the candidate — "acme" boosts /acme/ and acme.toml exactly the way "helix" boosts helix.toml. Existing generic scores (config-ext +1.5/+1.0, substring agreement +0.4, tests-path -0.75) unchanged. Module docstring now states the full scoring contract. * helix.toml — [synonyms]: removed owner rows biged / bookkeeper / cosmictasha / fleet, dropped "scorerift" from the scoring row and "biged" from the agents row; generic software vocabulary kept. [mem_sync]: personal watch_dirs path replaced with an empty list plus an example comment. * launcher templates (dashboard.html, components/database_panel.html) — placeholder F:\path\to\new.genome.db -> path/to/new.genome.db. * bridge.py — generated SHARED_CONTEXT.md "How to Use" line now renders self.helix_base_url instead of hardcoding http://127.0.0.1:11437. * launcher/installer.py + launcher/app.py — install_service() grew a defaulted port param (11438) used in the printed dashboard URL, threaded from the CLI's existing --port flag through _handle_service_command. Tests: new tests/test_public_dehardcode.py (16 tests) pins (a) no owner vocabulary in tagger patterns/terms, (b) env-var extension via the importlib.reload pattern, (c) the generic path-segment boost, structural score equality between helix-context and acme-context paths, worktrees penalty removed, tests penalty kept, and end-to-end rescue preference for a query-term path match, (d) helix.toml parses with no owner synonym keys and empty watch_dirs. Owner-named fixtures replaced with generic equivalents in test_density_gate.py, test_shard_router.py, test_bench_needles.py; test_launcher_installer.py call-through assertion updated to expect port=11438. Out of scope, noted for wave 2: com.swiftwing21.helix-launcher plist id (installer + deploy templates + tests pin it), "swiftwing" org fixtures in test_server/test_mcp_server, biged_skills empirical-curve comments in test_abstain_tier.py / pipeline/tier_logic.py, and docstring examples in knowledge_store.py / storage/ddl.py.

…-steps roadmap From the 2026-06-09 research/examine/document loop (3 parallel research agents over issues+PRs, hardcoding audit, and tuning-evidence study). Answers the profiles question: one pipeline cannot serve all workloads, but only ~6 knobs are corpus-sensitive — design = auto-calibration + classifier-owned query-time adaptation + 3 small corpus profiles.

mbachaud and others added 2 commits June 9, 2026 17:46

This was referenced Jun 10, 2026

Retrieval profiles/modes: implement the 3-layer design #205

Open

De-hardcoding wave 2: model-ID knobs, citation root-stripping, exposed truncation caps, budget/abstain knobs, deny-list extensibility #207

Open

mbachaud merged commit a1e9879 into master Jun 10, 2026
3 checks passed

mbachaud deleted the fix/public-dehardcode-wave1 branch June 10, 2026 00:52

This was referenced Jun 12, 2026

Bench: evaluate EnterpriseRAG-Bench as a blob-vs-sharded test corpus #93

Open

Re-baseline SIKE curated needles as a scale sweep: XL + ERB 10K/50K/850K distractor beds #221

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(public): de-hardcode wave 1 + retrieval-profiles spec + evidence roadmap#201

fix(public): de-hardcode wave 1 + retrieval-profiles spec + evidence roadmap#201
mbachaud merged 2 commits into
masterfrom
fix/public-dehardcode-wave1

mbachaud commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mbachaud commented Jun 10, 2026

De-hardcode wave 1 (runtime retrieval code)

Docs (same loop)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant