You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
upstream reference Optimize Anything (upstream reference) supports multi-component candidates: each Candidate is a dict[str, str] keyed by named predictors (e.g. {"summarizer": "<prompt>", "scorer": "<prompt>"}). Each predictor is independently mutable, and the evaluator can emit per-predictor objective scores under reserved side_info keys, which the adapter then namespaces into the multi-axis frontier.
Concretely, in upstream reference-ai/upstream reference:
This produces objective keys like "summarizer::accuracy" and "scorer::latency" — distinct axes on the Pareto frontier.
src/upstream reference/core/state.py:539-543 propagates per-predictor identity through named_predictor_id_to_update_next_for_program_candidate, which the proposer uses to round-robin which predictor to mutate next.
HELIX gap
HELIX's data model is whole git worktrees: each Candidate carries worktree_path / branch_name, and mutations are LLM-driven file edits across the entire tree. There is no built-in concept of "predictor X lives at file Y".
Verification (today, on fix/upstream-frontier-acceptance @ d796f59):
$ grep -rn 'specific_info\|named_predictor\|predictor_id' src/helix/
(no matches)
The helix_result parser (src/helix/parsers/helix_result.py::_harvest_objective_scores) reads only top-level side_info["scores"]. Per-predictor namespacing is intentionally not replicated; it would be incoherent without an underlying multi-component candidate model.
Why it matters
Without multi-predictor support, HELIX cannot:
Optimise multi-stage pipelines independently. A summarise→score pipeline where each stage has its own objective (faithfulness for the summariser, calibration for the scorer) collapses into a single global objective. The evaluator can still emit both numbers, but they share an axis on the frontier and dominate / are dominated together.
Round-robin predictor mutation. upstream reference reflective mutation only rewrites one named predictor per generation. HELIX's mutator targets the whole worktree per generation; large multi-file candidates have no built-in notion of "this generation, only touch the scorer's prompt".
Match upstream reference parity end-to-end. Today HELIX has parity on the frontier algorithms (instance / objective / hybrid / cartesian), the validation gating, and the parser shape — but the upstream reference programming model (named predictors, per-predictor objectives, per-predictor mutation) is still single-component.
Feasibility
It is possible but is a major architectural feature, not a quick fix. Sketch:
Config: add [evolution.predictors] section listing named editable regions (file paths, region markers, or symbol names).
Candidate: extend Candidate with a predictors: dict[str, str] snapshot (or pointers into the worktree) so mutation knows which predictor is "active".
Mutator: upstream reference-style round-robin — track named_predictor_id_to_update_next_for_program_candidate[cid] and target a single predictor's region per mutation.
Parser: harvest side_info[param_name + "_specific_info"]["scores"] and prefix as {param_name}::{obj_name}. This piece is the cheapest — _harvest_objective_scores adds ~10 lines.
State: persist the per-candidate next-predictor index alongside frontier, active_frontier, num_metric_calls_by_discovery.
Backward compat: when [evolution.predictors] is unset, behaviour is exactly today's whole-worktree model.
Suggested first step
Land the parser-side change (item 4) behind an off-by-default evolution.predictors config key. This unblocks evaluators that already produce per-predictor scores from being silently flattened, even before the rest of the multi-component machinery exists. The downstream consumer (the multi-axis frontier) already supports prefixed objective names.
Background
upstream reference Optimize Anything (upstream reference) supports multi-component candidates: each
Candidateis adict[str, str]keyed by named predictors (e.g.{"summarizer": "<prompt>", "scorer": "<prompt>"}). Each predictor is independently mutable, and the evaluator can emit per-predictor objective scores under reservedside_infokeys, which the adapter then namespaces into the multi-axis frontier.Concretely, in
upstream reference-ai/upstream reference:src/upstream reference/adapters/optimize_anything_adapter/optimize_anything_adapter.py:266-270:This produces objective keys like
"summarizer::accuracy"and"scorer::latency"— distinct axes on the Pareto frontier.src/upstream reference/core/state.py:539-543propagates per-predictor identity throughnamed_predictor_id_to_update_next_for_program_candidate, which the proposer uses to round-robin which predictor to mutate next.HELIX gap
HELIX's data model is whole git worktrees: each
Candidatecarriesworktree_path/branch_name, and mutations are LLM-driven file edits across the entire tree. There is no built-in concept of "predictor X lives at file Y".Verification (today, on
fix/upstream-frontier-acceptance@d796f59):The
helix_resultparser (src/helix/parsers/helix_result.py::_harvest_objective_scores) reads only top-levelside_info["scores"]. Per-predictor namespacing is intentionally not replicated; it would be incoherent without an underlying multi-component candidate model.Why it matters
Without multi-predictor support, HELIX cannot:
Feasibility
It is possible but is a major architectural feature, not a quick fix. Sketch:
[evolution.predictors]section listing named editable regions (file paths, region markers, or symbol names).Candidatewith apredictors: dict[str, str]snapshot (or pointers into the worktree) so mutation knows which predictor is "active".named_predictor_id_to_update_next_for_program_candidate[cid]and target a single predictor's region per mutation.side_info[param_name + "_specific_info"]["scores"]and prefix as{param_name}::{obj_name}. This piece is the cheapest —_harvest_objective_scoresadds ~10 lines.frontier,active_frontier,num_metric_calls_by_discovery.[evolution.predictors]is unset, behaviour is exactly today's whole-worktree model.Suggested first step
Land the parser-side change (item 4) behind an off-by-default
evolution.predictorsconfig key. This unblocks evaluators that already produce per-predictor scores from being silently flattened, even before the rest of the multi-component machinery exists. The downstream consumer (the multi-axis frontier) already supports prefixed objective names.References
upstream reference-ai/upstream reference:src/upstream reference/optimize_anything.py:476(defaultfrontier_type="hybrid")upstream reference-ai/upstream reference:src/upstream reference/core/state.py:539-562(per-predictor mutation tracking + objective aggregation)upstream reference-ai/upstream reference:src/upstream reference/adapters/optimize_anything_adapter/optimize_anything_adapter.py:260-272(per-predictor_specific_info["scores"]harvest)src/helix/parsers/helix_result.py::_harvest_objective_scores(single-axis harvest, callout in docstring)fix/upstream-frontier-acceptance) review thread surfaced this as the only remaining upstream reference parity gap