upstream reference parity gap: multi-predictor (named-component) candidates

## Background

upstream reference Optimize Anything (upstream reference) supports **multi-component candidates**: each `Candidate` is a `dict[str, str]` keyed by **named predictors** (e.g. `{"summarizer": "<prompt>", "scorer": "<prompt>"}`).  Each predictor is independently mutable, and the evaluator can emit per-predictor objective scores under reserved `side_info` keys, which the adapter then namespaces into the multi-axis frontier.

Concretely, in `upstream reference-ai/upstream reference`:

- `src/upstream reference/adapters/optimize_anything_adapter/optimize_anything_adapter.py:266-270`:

    ```python
    for param_name in candidate.keys():
        if param_name + "_specific_info" in side_info and "scores" in side_info[param_name + "_specific_info"]:
            objective_score.update(
                {f"{param_name}::{k}": v for k, v in side_info[param_name + "_specific_info"]["scores"].items()}
            )
    ```

  This produces objective keys like `"summarizer::accuracy"` and `"scorer::latency"` — **distinct** axes on the Pareto frontier.

- `src/upstream reference/core/state.py:539-543` propagates per-predictor identity through `named_predictor_id_to_update_next_for_program_candidate`, which the proposer uses to round-robin which predictor to mutate next.

## HELIX gap

HELIX's data model is **whole git worktrees**: each `Candidate` carries `worktree_path` / `branch_name`, and mutations are LLM-driven file edits across the entire tree.  There is no built-in concept of "predictor X lives at file Y".

Verification (today, on `fix/upstream-frontier-acceptance` @ `d796f59`):

```
$ grep -rn 'specific_info\|named_predictor\|predictor_id' src/helix/
(no matches)
```

The `helix_result` parser (`src/helix/parsers/helix_result.py::_harvest_objective_scores`) reads only top-level `side_info["scores"]`.  Per-predictor namespacing is intentionally not replicated; it would be incoherent without an underlying multi-component candidate model.

## Why it matters

Without multi-predictor support, HELIX cannot:

1. **Optimise multi-stage pipelines independently.**  A summarise→score pipeline where each stage has its own objective (faithfulness for the summariser, calibration for the scorer) collapses into a single global objective.  The evaluator can still emit both numbers, but they share an axis on the frontier and dominate / are dominated together.
2. **Round-robin predictor mutation.**  upstream reference reflective mutation only rewrites one named predictor per generation.  HELIX's mutator targets the whole worktree per generation; large multi-file candidates have no built-in notion of "this generation, only touch the scorer's prompt".
3. **Match upstream reference parity end-to-end.**  Today HELIX has parity on the frontier algorithms (instance / objective / hybrid / cartesian), the validation gating, and the parser shape — but the upstream reference *programming model* (named predictors, per-predictor objectives, per-predictor mutation) is still single-component.

## Feasibility

It is possible but is a **major architectural feature**, not a quick fix.  Sketch:

1. **Config**: add `[evolution.predictors]` section listing named editable regions (file paths, region markers, or symbol names).
2. **Candidate**: extend `Candidate` with a `predictors: dict[str, str]` snapshot (or pointers into the worktree) so mutation knows which predictor is "active".
3. **Mutator**: upstream reference-style round-robin — track `named_predictor_id_to_update_next_for_program_candidate[cid]` and target a single predictor's region per mutation.
4. **Parser**: harvest `side_info[param_name + "_specific_info"]["scores"]` and prefix as `{param_name}::{obj_name}`.  This piece is the cheapest — `_harvest_objective_scores` adds ~10 lines.
5. **State**: persist the per-candidate next-predictor index alongside `frontier`, `active_frontier`, `num_metric_calls_by_discovery`.
6. **Backward compat**: when `[evolution.predictors]` is unset, behaviour is exactly today's whole-worktree model.

## Suggested first step

Land the parser-side change (item 4) behind an off-by-default `evolution.predictors` config key.  This unblocks evaluators that already produce per-predictor scores from being silently flattened, even before the rest of the multi-component machinery exists.  The downstream consumer (the multi-axis frontier) already supports prefixed objective names.

## References

- `upstream reference-ai/upstream reference`: `src/upstream reference/optimize_anything.py:476` (default `frontier_type="hybrid"`)
- `upstream reference-ai/upstream reference`: `src/upstream reference/core/state.py:539-562` (per-predictor mutation tracking + objective aggregation)
- `upstream reference-ai/upstream reference`: `src/upstream reference/adapters/optimize_anything_adapter/optimize_anything_adapter.py:260-272` (per-predictor `_specific_info["scores"]` harvest)
- This repo: `src/helix/parsers/helix_result.py::_harvest_objective_scores` (single-axis harvest, callout in docstring)
- This repo: PR #20 (`fix/upstream-frontier-acceptance`) review thread surfaced this as the only remaining upstream reference parity gap



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upstream reference parity gap: multi-predictor (named-component) candidates #24

Background

HELIX gap

Why it matters

Feasibility

Suggested first step

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

upstream reference parity gap: multi-predictor (named-component) candidates #24

Description

Background

HELIX gap

Why it matters

Feasibility

Suggested first step

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions