Skip to content

fix(config): default honesty pass - 20 code/toml drifts reconciled, real KnowConfig, auto shard fan-out (council Option B slice 1)#218

Merged
mbachaud merged 2 commits into
masterfrom
fix/config-default-honesty
Jun 12, 2026
Merged

fix(config): default honesty pass - 20 code/toml drifts reconciled, real KnowConfig, auto shard fan-out (council Option B slice 1)#218
mbachaud merged 2 commits into
masterfrom
fix/config-default-honesty

Conversation

@mbachaud

Copy link
Copy Markdown
Owner

Council Option B, slice 1 (consensus report: docs/audits/2026-06-12-pipeline-value-consensus.md). The council estimated 5 code-vs-toml default drifts; the new comparator ratchet found 20. Resolution policy: the shipped helix.toml is the operationally-tested product (every bench this month ran it), so toml won unless a feature was measured at zero effect.

Every default changed (code → new value) — review and veto individually if needed

[budget]: expression_tokens 6000→7000 · max_genes_per_turn 8→12 · splice_aggressiveness 0.5→0.3 · decoder_mode full→condensed · session_delivery_enabled false→true
[ribosome]: backend ollama→none (core pip install now serves without an LLM dependency — matches the pyproject "core gives a working server" contract) · model auto→gemma4:e2b · timeout 10→120 · warmup true→false · query_expansion_enabled true→false
[genome]: path genome.db→genomes/main/genome.db
[ingestion]: backend ollama→cpu · splade_enabled false→true (scale-conditional cost; the #164/#189 auto-disable knob covers the >200K cliff) · rerank_model ""→ms-marco-MiniLM (inert while rerank_enabled=false) · entity_graph false→true
[retrieval]: filename_anchor false→true (+24pp on code corpora) · bm25_shortlist false→true (8/8 on config-value queries) · plr.enabled false→true (soft-no-op without artifact) · headroom.enabled false→true (launcher-only)
Inverse (measured-zero, code wins): sr_enabled — toml true→false (roadmap: sr measured zero everywhere), evidence noted in the toml comment.
Allowlist (intentional divergence): synonym_map starter vocabulary only.

Also in this slice

  • Real KnowConfig: the [know] section was parsed by a shadow loader in know_calibration.py outside the config system, while CLAUDE.md advertised keys that don't exist. KnowConfig now lives in config.py (real keys: emit_floor, s_ref, g_ref, betas[6], calibrated_at, calibrated_on_n, stale_after_days), the shadow loader is a back-compat shim, /context consumes live config, CLAUDE.md [know]/[classifier] rows corrected.
  • Auto shard fan-out: HELIX_SHARD_WORKERS unset + >4 routed shards → auto_shard_workers() (previously unwired); explicit env always wins; =1 forces the serial reference path. Evidence: serial default measured 5 min/query at 829K/100 shards vs ~55s at 8 workers (Wall-2: decide fate of unmerged dense-latency PRs #158/#160 #206). Determinism tests now pin serial explicitly.
  • Ratchet test: test_shipped_toml_matches_code_defaults — any future drift between config.py and helix.toml fails CI with a named field list.

Test plan

  • 73 passed locally (honesty ratchet + config + shard_router); ~2,210 full-suite in sandbox (5 pre-existing container failures verified on base)
  • Know back-compat shim covered; worker-selection matrix covered (7 tests)

mbachaud and others added 2 commits June 12, 2026 14:18
…nowConfig, auto shard fan-out (council Option B slice 1)

The 2026-06-12 council audit found HelixConfig() dataclass defaults and
the shipped helix.toml disagreeing on the product's actual behavior. A
field-by-field comparator (now a permanent regression test) found 20
drifted fields, not just the audit's headline five. Resolution policy:
the shipped toml is the operationally-tested product — every bench this
month ran those values — so CODE defaults were aligned to the TOML,
except measured-zero features, which align the other way.

Code defaults changed to the shipped toml values:
- budget.expression_tokens        6000   -> 7000
- budget.max_genes_per_turn       8      -> 12
- budget.splice_aggressiveness    0.5    -> 0.3
- budget.decoder_mode             "full" -> "condensed"
- budget.session_delivery_enabled false  -> true  (~40% token savings, multi-turn)
- ribosome.model                  "auto" -> "gemma4:e2b" (inert while disabled)
- ribosome.timeout                10.0   -> 120.0
- ribosome.warmup                 true   -> false
- ribosome.backend                "ollama" -> "none" (both dispatch DisabledBackend)
- ribosome.query_expansion_enabled true  -> false (LLM-free /context pillar)
- genome.path                     "genome.db" -> "genomes/main/genome.db" (CLAUDE.md already documented this as THE default)
- ingestion.backend               "ollama" -> "cpu" (the load_config coherence guard auto-flipped it anyway)
- ingestion.splade_enabled        false  -> true  (#164's 0pp@850K is scale-conditional; auto-disable knob covers the cliff; soft-no-op without torch)
- ingestion.rerank_model          ""     -> "cross-encoder/ms-marco-MiniLM-L-6-v2" (inert while rerank_enabled=false)
- ingestion.entity_graph          false  -> true
- retrieval.filename_anchor_enabled false -> true (+12pp Dewey axis-2, 2026-04-22)
- retrieval.bm25_shortlist_enabled  false -> true (+1/8 ans_full, 2026-04-22)
- plr.enabled                     false  -> true  (bench-gated #74; soft-no-op without artifact)
- headroom.enabled                false  -> true  (launcher-only; route_upstream stays false)

Inverse case (measured-zero feature, toml aligned to code):
- retrieval.sr_enabled: helix.toml true -> false. The evidence roadmap
  measured SR at zero retrieval effect, so the 2026-04-22 toml flip was
  reverted instead of propagated into code. Both sides now false.

Allowlisted divergence (documented, deliberate): synonym_map — the toml
ships starter vocabulary data; the dataclass default stays {}.

[know] folded into the config system: scoring/know_calibration.py was
shadow-parsing helix.toml outside config.py, and CLAUDE.md advertised
keys that did not exist (confidence_floor / margin_threshold). New
KnowConfig dataclass carries the REAL keys (emit_floor, s_ref, g_ref,
betas, calibrated_at, calibrated_on_n, stale_after_days) with the shadow
loader's per-field soft-fail semantics; load_calibration_from_toml is
now a thin back-compat shim over load_config + calibration_from_config,
and the /context know-block path consumes the manager's live config.
CLAUDE.md's [know] row lists the real keys; [classifier] row corrected
to "enabled toggle only; class caps are code constants pending #205".

Auto shard fan-out (issue #206): the serial fan-out default measured
5 min/query at 829K genes / 100 shards vs ~55s at 8 workers — a 5x+
latency tax for an env knob nobody knew existed. HELIX_SHARD_WORKERS
unset now auto-sizes via parallel.auto_shard_workers() when >4 shards
are routed; explicit env always wins; =1 forces the serial reference
path; small/monolithic stores stay serial. Determinism tests pin the
serial oracle explicitly via the env var instead of relying on the old
unset-means-serial default; parallel fan-out remains byte-identical by
construction (order-preserving map + deterministic merge).

Tests: tests/test_config_default_honesty.py (drift ratchet with
explicit allowlist + KnowConfig load/back-compat); 7 new
shard_fanout_workers selection tests. 114 targeted + ~2200 full-suite
pass; the only failures are pre-existing container issues (sqlite3
getlimit on py3.10, missing transformers, one wall-clock perf gate),
verified present on the base commit.
…efault honesty, static-vs-dynamic, chunk-value evidence
@mbachaud mbachaud merged commit fd87b40 into master Jun 12, 2026
3 checks passed
@mbachaud mbachaud deleted the fix/config-default-honesty branch June 12, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant