fix(config): default honesty pass - 20 code/toml drifts reconciled, real KnowConfig, auto shard fan-out (council Option B slice 1) by mbachaud · Pull Request #218 · mbachaud/helix-context

mbachaud · 2026-06-12T21:19:52Z

Council Option B, slice 1 (consensus report: docs/audits/2026-06-12-pipeline-value-consensus.md). The council estimated 5 code-vs-toml default drifts; the new comparator ratchet found 20. Resolution policy: the shipped helix.toml is the operationally-tested product (every bench this month ran it), so toml won unless a feature was measured at zero effect.

Every default changed (code → new value) — review and veto individually if needed

[budget]: expression_tokens 6000→7000 · max_genes_per_turn 8→12 · splice_aggressiveness 0.5→0.3 · decoder_mode full→condensed · session_delivery_enabled false→true
[ribosome]: backend ollama→none (core pip install now serves without an LLM dependency — matches the pyproject "core gives a working server" contract) · model auto→gemma4:e2b · timeout 10→120 · warmup true→false · query_expansion_enabled true→false
[genome]: path genome.db→genomes/main/genome.db
[ingestion]: backend ollama→cpu · splade_enabled false→true (scale-conditional cost; the #164/#189 auto-disable knob covers the >200K cliff) · rerank_model ""→ms-marco-MiniLM (inert while rerank_enabled=false) · entity_graph false→true
[retrieval]: filename_anchor false→true (+24pp on code corpora) · bm25_shortlist false→true (8/8 on config-value queries) · plr.enabled false→true (soft-no-op without artifact) · headroom.enabled false→true (launcher-only)
Inverse (measured-zero, code wins): sr_enabled — toml true→false (roadmap: sr measured zero everywhere), evidence noted in the toml comment.
Allowlist (intentional divergence): synonym_map starter vocabulary only.

Also in this slice

Real KnowConfig: the [know] section was parsed by a shadow loader in know_calibration.py outside the config system, while CLAUDE.md advertised keys that don't exist. KnowConfig now lives in config.py (real keys: emit_floor, s_ref, g_ref, betas[6], calibrated_at, calibrated_on_n, stale_after_days), the shadow loader is a back-compat shim, /context consumes live config, CLAUDE.md [know]/[classifier] rows corrected.
Auto shard fan-out: HELIX_SHARD_WORKERS unset + >4 routed shards → auto_shard_workers() (previously unwired); explicit env always wins; =1 forces the serial reference path. Evidence: serial default measured 5 min/query at 829K/100 shards vs ~55s at 8 workers (Wall-2: decide fate of unmerged dense-latency PRs #158/#160 #206). Determinism tests now pin serial explicitly.
Ratchet test: test_shipped_toml_matches_code_defaults — any future drift between config.py and helix.toml fails CI with a named field list.

Test plan

73 passed locally (honesty ratchet + config + shard_router); ~2,210 full-suite in sandbox (5 pre-existing container failures verified on base)
Know back-compat shim covered; worker-selection matrix covered (7 tests)

…nowConfig, auto shard fan-out (council Option B slice 1) The 2026-06-12 council audit found HelixConfig() dataclass defaults and the shipped helix.toml disagreeing on the product's actual behavior. A field-by-field comparator (now a permanent regression test) found 20 drifted fields, not just the audit's headline five. Resolution policy: the shipped toml is the operationally-tested product — every bench this month ran those values — so CODE defaults were aligned to the TOML, except measured-zero features, which align the other way. Code defaults changed to the shipped toml values: - budget.expression_tokens 6000 -> 7000 - budget.max_genes_per_turn 8 -> 12 - budget.splice_aggressiveness 0.5 -> 0.3 - budget.decoder_mode "full" -> "condensed" - budget.session_delivery_enabled false -> true (~40% token savings, multi-turn) - ribosome.model "auto" -> "gemma4:e2b" (inert while disabled) - ribosome.timeout 10.0 -> 120.0 - ribosome.warmup true -> false - ribosome.backend "ollama" -> "none" (both dispatch DisabledBackend) - ribosome.query_expansion_enabled true -> false (LLM-free /context pillar) - genome.path "genome.db" -> "genomes/main/genome.db" (CLAUDE.md already documented this as THE default) - ingestion.backend "ollama" -> "cpu" (the load_config coherence guard auto-flipped it anyway) - ingestion.splade_enabled false -> true (#164's 0pp@850K is scale-conditional; auto-disable knob covers the cliff; soft-no-op without torch) - ingestion.rerank_model "" -> "cross-encoder/ms-marco-MiniLM-L-6-v2" (inert while rerank_enabled=false) - ingestion.entity_graph false -> true - retrieval.filename_anchor_enabled false -> true (+12pp Dewey axis-2, 2026-04-22) - retrieval.bm25_shortlist_enabled false -> true (+1/8 ans_full, 2026-04-22) - plr.enabled false -> true (bench-gated #74; soft-no-op without artifact) - headroom.enabled false -> true (launcher-only; route_upstream stays false) Inverse case (measured-zero feature, toml aligned to code): - retrieval.sr_enabled: helix.toml true -> false. The evidence roadmap measured SR at zero retrieval effect, so the 2026-04-22 toml flip was reverted instead of propagated into code. Both sides now false. Allowlisted divergence (documented, deliberate): synonym_map — the toml ships starter vocabulary data; the dataclass default stays {}. [know] folded into the config system: scoring/know_calibration.py was shadow-parsing helix.toml outside config.py, and CLAUDE.md advertised keys that did not exist (confidence_floor / margin_threshold). New KnowConfig dataclass carries the REAL keys (emit_floor, s_ref, g_ref, betas, calibrated_at, calibrated_on_n, stale_after_days) with the shadow loader's per-field soft-fail semantics; load_calibration_from_toml is now a thin back-compat shim over load_config + calibration_from_config, and the /context know-block path consumes the manager's live config. CLAUDE.md's [know] row lists the real keys; [classifier] row corrected to "enabled toggle only; class caps are code constants pending #205". Auto shard fan-out (issue #206): the serial fan-out default measured 5 min/query at 829K genes / 100 shards vs ~55s at 8 workers — a 5x+ latency tax for an env knob nobody knew existed. HELIX_SHARD_WORKERS unset now auto-sizes via parallel.auto_shard_workers() when >4 shards are routed; explicit env always wins; =1 forces the serial reference path; small/monolithic stores stay serial. Determinism tests pin the serial oracle explicitly via the env var instead of relying on the old unset-means-serial default; parallel fan-out remains byte-identical by construction (order-preserving map + deterministic merge). Tests: tests/test_config_default_honesty.py (drift ratchet with explicit allowlist + KnowConfig load/back-compat); 7 new shard_fanout_workers selection tests. 114 targeted + ~2200 full-suite pass; the only failures are pre-existing container issues (sqlite3 getlimit on py3.10, missing transformers, one wall-clock perf gate), verified present on the base commit.

…efault honesty, static-vs-dynamic, chunk-value evidence

mbachaud and others added 2 commits June 12, 2026 14:18

docs: pipeline-value fable council (Option B verdict) - system tax, d…

a615f3a

…efault honesty, static-vs-dynamic, chunk-value evidence

mbachaud mentioned this pull request Jun 12, 2026

Epic: config unification + default honesty (council Option B) #219

Open

5 tasks

mbachaud merged commit fd87b40 into master Jun 12, 2026
3 checks passed

mbachaud deleted the fix/config-default-honesty branch June 12, 2026 21:41

mbachaud mentioned this pull request Jun 12, 2026

Re-baseline SIKE curated needles as a scale sweep: XL + ERB 10K/50K/850K distractor beds #221

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(config): default honesty pass - 20 code/toml drifts reconciled, real KnowConfig, auto shard fan-out (council Option B slice 1)#218

fix(config): default honesty pass - 20 code/toml drifts reconciled, real KnowConfig, auto shard fan-out (council Option B slice 1)#218
mbachaud merged 2 commits into
masterfrom
fix/config-default-honesty

mbachaud commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mbachaud commented Jun 12, 2026

Every default changed (code → new value) — review and veto individually if needed

Also in this slice

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant