fix(config): default honesty pass - 20 code/toml drifts reconciled, real KnowConfig, auto shard fan-out (council Option B slice 1)#218
Merged
Conversation
…nowConfig, auto shard fan-out (council Option B slice 1) The 2026-06-12 council audit found HelixConfig() dataclass defaults and the shipped helix.toml disagreeing on the product's actual behavior. A field-by-field comparator (now a permanent regression test) found 20 drifted fields, not just the audit's headline five. Resolution policy: the shipped toml is the operationally-tested product — every bench this month ran those values — so CODE defaults were aligned to the TOML, except measured-zero features, which align the other way. Code defaults changed to the shipped toml values: - budget.expression_tokens 6000 -> 7000 - budget.max_genes_per_turn 8 -> 12 - budget.splice_aggressiveness 0.5 -> 0.3 - budget.decoder_mode "full" -> "condensed" - budget.session_delivery_enabled false -> true (~40% token savings, multi-turn) - ribosome.model "auto" -> "gemma4:e2b" (inert while disabled) - ribosome.timeout 10.0 -> 120.0 - ribosome.warmup true -> false - ribosome.backend "ollama" -> "none" (both dispatch DisabledBackend) - ribosome.query_expansion_enabled true -> false (LLM-free /context pillar) - genome.path "genome.db" -> "genomes/main/genome.db" (CLAUDE.md already documented this as THE default) - ingestion.backend "ollama" -> "cpu" (the load_config coherence guard auto-flipped it anyway) - ingestion.splade_enabled false -> true (#164's 0pp@850K is scale-conditional; auto-disable knob covers the cliff; soft-no-op without torch) - ingestion.rerank_model "" -> "cross-encoder/ms-marco-MiniLM-L-6-v2" (inert while rerank_enabled=false) - ingestion.entity_graph false -> true - retrieval.filename_anchor_enabled false -> true (+12pp Dewey axis-2, 2026-04-22) - retrieval.bm25_shortlist_enabled false -> true (+1/8 ans_full, 2026-04-22) - plr.enabled false -> true (bench-gated #74; soft-no-op without artifact) - headroom.enabled false -> true (launcher-only; route_upstream stays false) Inverse case (measured-zero feature, toml aligned to code): - retrieval.sr_enabled: helix.toml true -> false. The evidence roadmap measured SR at zero retrieval effect, so the 2026-04-22 toml flip was reverted instead of propagated into code. Both sides now false. Allowlisted divergence (documented, deliberate): synonym_map — the toml ships starter vocabulary data; the dataclass default stays {}. [know] folded into the config system: scoring/know_calibration.py was shadow-parsing helix.toml outside config.py, and CLAUDE.md advertised keys that did not exist (confidence_floor / margin_threshold). New KnowConfig dataclass carries the REAL keys (emit_floor, s_ref, g_ref, betas, calibrated_at, calibrated_on_n, stale_after_days) with the shadow loader's per-field soft-fail semantics; load_calibration_from_toml is now a thin back-compat shim over load_config + calibration_from_config, and the /context know-block path consumes the manager's live config. CLAUDE.md's [know] row lists the real keys; [classifier] row corrected to "enabled toggle only; class caps are code constants pending #205". Auto shard fan-out (issue #206): the serial fan-out default measured 5 min/query at 829K genes / 100 shards vs ~55s at 8 workers — a 5x+ latency tax for an env knob nobody knew existed. HELIX_SHARD_WORKERS unset now auto-sizes via parallel.auto_shard_workers() when >4 shards are routed; explicit env always wins; =1 forces the serial reference path; small/monolithic stores stay serial. Determinism tests pin the serial oracle explicitly via the env var instead of relying on the old unset-means-serial default; parallel fan-out remains byte-identical by construction (order-preserving map + deterministic merge). Tests: tests/test_config_default_honesty.py (drift ratchet with explicit allowlist + KnowConfig load/back-compat); 7 new shard_fanout_workers selection tests. 114 targeted + ~2200 full-suite pass; the only failures are pre-existing container issues (sqlite3 getlimit on py3.10, missing transformers, one wall-clock perf gate), verified present on the base commit.
…efault honesty, static-vs-dynamic, chunk-value evidence
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Council Option B, slice 1 (consensus report:
docs/audits/2026-06-12-pipeline-value-consensus.md). The council estimated 5 code-vs-toml default drifts; the new comparator ratchet found 20. Resolution policy: the shipped helix.toml is the operationally-tested product (every bench this month ran it), so toml won unless a feature was measured at zero effect.Every default changed (code → new value) — review and veto individually if needed
[budget]: expression_tokens 6000→7000 · max_genes_per_turn 8→12 · splice_aggressiveness 0.5→0.3 · decoder_mode full→condensed · session_delivery_enabled false→true
[ribosome]: backend ollama→none (core pip install now serves without an LLM dependency — matches the pyproject "core gives a working server" contract) · model auto→gemma4:e2b · timeout 10→120 · warmup true→false · query_expansion_enabled true→false
[genome]: path genome.db→genomes/main/genome.db
[ingestion]: backend ollama→cpu · splade_enabled false→true (scale-conditional cost; the #164/#189 auto-disable knob covers the >200K cliff) · rerank_model ""→ms-marco-MiniLM (inert while rerank_enabled=false) · entity_graph false→true
[retrieval]: filename_anchor false→true (+24pp on code corpora) · bm25_shortlist false→true (8/8 on config-value queries) · plr.enabled false→true (soft-no-op without artifact) · headroom.enabled false→true (launcher-only)
Inverse (measured-zero, code wins): sr_enabled — toml true→false (roadmap: sr measured zero everywhere), evidence noted in the toml comment.
Allowlist (intentional divergence):
synonym_mapstarter vocabulary only.Also in this slice
KnowConfig: the[know]section was parsed by a shadow loader in know_calibration.py outside the config system, while CLAUDE.md advertised keys that don't exist. KnowConfig now lives in config.py (real keys: emit_floor, s_ref, g_ref, betas[6], calibrated_at, calibrated_on_n, stale_after_days), the shadow loader is a back-compat shim,/contextconsumes live config, CLAUDE.md[know]/[classifier]rows corrected.auto_shard_workers()(previously unwired); explicit env always wins; =1 forces the serial reference path. Evidence: serial default measured 5 min/query at 829K/100 shards vs ~55s at 8 workers (Wall-2: decide fate of unmerged dense-latency PRs #158/#160 #206). Determinism tests now pin serial explicitly.test_shipped_toml_matches_code_defaults— any future drift between config.py and helix.toml fails CI with a named field list.Test plan