Skip to content

upstream reference parity gap: multi-predictor (named-component) candidates #24

@KE7

Description

@KE7

Background

upstream reference Optimize Anything (upstream reference) supports multi-component candidates: each Candidate is a dict[str, str] keyed by named predictors (e.g. {"summarizer": "<prompt>", "scorer": "<prompt>"}). Each predictor is independently mutable, and the evaluator can emit per-predictor objective scores under reserved side_info keys, which the adapter then namespaces into the multi-axis frontier.

Concretely, in upstream reference-ai/upstream reference:

  • src/upstream reference/adapters/optimize_anything_adapter/optimize_anything_adapter.py:266-270:

    for param_name in candidate.keys():
        if param_name + "_specific_info" in side_info and "scores" in side_info[param_name + "_specific_info"]:
            objective_score.update(
                {f"{param_name}::{k}": v for k, v in side_info[param_name + "_specific_info"]["scores"].items()}
            )

    This produces objective keys like "summarizer::accuracy" and "scorer::latency"distinct axes on the Pareto frontier.

  • src/upstream reference/core/state.py:539-543 propagates per-predictor identity through named_predictor_id_to_update_next_for_program_candidate, which the proposer uses to round-robin which predictor to mutate next.

HELIX gap

HELIX's data model is whole git worktrees: each Candidate carries worktree_path / branch_name, and mutations are LLM-driven file edits across the entire tree. There is no built-in concept of "predictor X lives at file Y".

Verification (today, on fix/upstream-frontier-acceptance @ d796f59):

$ grep -rn 'specific_info\|named_predictor\|predictor_id' src/helix/
(no matches)

The helix_result parser (src/helix/parsers/helix_result.py::_harvest_objective_scores) reads only top-level side_info["scores"]. Per-predictor namespacing is intentionally not replicated; it would be incoherent without an underlying multi-component candidate model.

Why it matters

Without multi-predictor support, HELIX cannot:

  1. Optimise multi-stage pipelines independently. A summarise→score pipeline where each stage has its own objective (faithfulness for the summariser, calibration for the scorer) collapses into a single global objective. The evaluator can still emit both numbers, but they share an axis on the frontier and dominate / are dominated together.
  2. Round-robin predictor mutation. upstream reference reflective mutation only rewrites one named predictor per generation. HELIX's mutator targets the whole worktree per generation; large multi-file candidates have no built-in notion of "this generation, only touch the scorer's prompt".
  3. Match upstream reference parity end-to-end. Today HELIX has parity on the frontier algorithms (instance / objective / hybrid / cartesian), the validation gating, and the parser shape — but the upstream reference programming model (named predictors, per-predictor objectives, per-predictor mutation) is still single-component.

Feasibility

It is possible but is a major architectural feature, not a quick fix. Sketch:

  1. Config: add [evolution.predictors] section listing named editable regions (file paths, region markers, or symbol names).
  2. Candidate: extend Candidate with a predictors: dict[str, str] snapshot (or pointers into the worktree) so mutation knows which predictor is "active".
  3. Mutator: upstream reference-style round-robin — track named_predictor_id_to_update_next_for_program_candidate[cid] and target a single predictor's region per mutation.
  4. Parser: harvest side_info[param_name + "_specific_info"]["scores"] and prefix as {param_name}::{obj_name}. This piece is the cheapest — _harvest_objective_scores adds ~10 lines.
  5. State: persist the per-candidate next-predictor index alongside frontier, active_frontier, num_metric_calls_by_discovery.
  6. Backward compat: when [evolution.predictors] is unset, behaviour is exactly today's whole-worktree model.

Suggested first step

Land the parser-side change (item 4) behind an off-by-default evolution.predictors config key. This unblocks evaluators that already produce per-predictor scores from being silently flattened, even before the rest of the multi-component machinery exists. The downstream consumer (the multi-axis frontier) already supports prefixed objective names.

References

  • upstream reference-ai/upstream reference: src/upstream reference/optimize_anything.py:476 (default frontier_type="hybrid")
  • upstream reference-ai/upstream reference: src/upstream reference/core/state.py:539-562 (per-predictor mutation tracking + objective aggregation)
  • upstream reference-ai/upstream reference: src/upstream reference/adapters/optimize_anything_adapter/optimize_anything_adapter.py:260-272 (per-predictor _specific_info["scores"] harvest)
  • This repo: src/helix/parsers/helix_result.py::_harvest_objective_scores (single-axis harvest, callout in docstring)
  • This repo: PR Enforce upstream reference frontier and acceptance semantics #20 (fix/upstream-frontier-acceptance) review thread surfaced this as the only remaining upstream reference parity gap

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions