Releases: KE7/helix
Releases · KE7/helix
v0.2.1
Note: 0.2.0 was published from a broken build and is superseded by this
release. Do not install 0.2.0; use 0.2.1 or later.
Removed
- Orphan
helix.tomltemplate at the repository root. The runtime
template thathelix initwrites lives inline in
src/helix/cli.py::_HELIX_TOML_TEMPLATE; the repo-root file was a
stale stub no docs or code referenced.
Packaging
- Tightened the sdist via
[tool.hatch.build.targets.sdist]. The
source distribution now ships onlysrc/helix,pyproject.toml,
uv.lock,README.md, andLICENSE(mirrors GEPA's lean PyPI
layout, minustests/). Wheel contents unchanged.
Added
- Multi-axis Pareto frontier (GEPA
FrontierTypeparity,
src/gepa/core/state.py:22-23). New
evolution.frontier_type: Literal["instance", "objective", "hybrid", "cartesian"]with default"hybrid"— matches GEPA's own
optimize_anythingdefault (src/gepa/optimize_anything.py:476).
ParetoFrontiernow tracks per-objective-name and per-(val_id, objective_name)best sets alongside the existing per-example-id
tracking, andget_non_dominated()/select_parent()dispatch on
frontier_type. The acceptance gate stays positional on
scores_listunchanged. EvalResult.per_example_side_info: list[dict[str, Any]] | None—
per-example diagnostic dicts from the newhelix_resultcontract,
positional toinstance_scoresbyhelix_batch.jsonid order.
GEPA analogue:EvaluationBatch.trajectories
(src/gepa/core/adapter.py:25).EvalResult.objective_scores: list[dict[str, float]] | None—
per-example objective-axis harvest fromside_info["scores"]. GEPA
analogue:EvaluationBatch.objective_scores
(src/gepa/core/adapter.py:26). Feedsfrontier_type ∈ {"objective", "hybrid", "cartesian"}; harmless on the"instance"path.DatasetConfig.train_size/val_size— cardinality-only fields that drive
the minibatch sampler when the evaluator owns the dataset (Architecture A
example-id handoff). HELIX writes sampled example ids to
{worktree}/helix_batch.json; the evaluator filters its own dataset.SeedlessConfig— new section carryingenabledplus the optional
prompt-groundingtrain_path/val_pathused during seed generation.- Evaluator-owned dataset integration wired into Architecture A with
cardinality-onlytrain_size/val_size. load_confignow emits a dedicated hint for pydanticextra_forbidden
validation errors, pointing users at likely typos or misplaced keys.
Changed
- BREAKING:
score_parser = "helix_result"now rejects malformed
side_info["scores"]payloads instead of silently dropping fields.
Non-dict"scores"values, non-string objective names, and non-finite
/ non-numeric objective values now raiseEvaluatorErrorat parse
time. Previously these were dropped (with a logged warning at most),
which let evaluators stuff arbitrary diagnostics under"scores"—
including pairwise / Bradley-Terry payloads — and then misread the
result as scalar objective semantics. Pairwise / BT payloads are
not implemented in HELIX yet; emit them under a different
side_infokey. Migration: ensureside_info["scores"]is a
dict[str, float|int|bool]with finite numeric values, or omit the
key entirely. - BREAKING: Non-
"instance"evolution.frontier_typevalues now
fail loudly when anEvalResultlacks per-exampleobjective_scores,
has length-mismatchedobjective_scores, or has all-empty objective
slots.ParetoFrontier.add()raises the new
MissingObjectiveScoresErrorinstead of silently degenerating to
scalar / instance-axis behaviour.select_parent()likewise raises
rather than falling back to all-candidate sampling on objective-bearing
modes. The"instance"path keeps its existing fallback semantics. - BREAKING:
evolution.cache_evaluationnow defaults toFalse
(previouslyTrue). Matches GEPA Optimize Anything's conservative
cache_evaluation default
(src/gepa/optimize_anything.py:476). When the cache is enabled,
entries are now keyed by candidate content (the worktree's
HEAD^{tree}SHA, with a clean-state guard) rather than HELIX's
lineagecandidate.id, so equivalent candidates can reuse results
across re-derivations. Configs that previously relied on cache hits
(e.g. resume scenarios that re-evaluate the seed) should set
cache_evaluation = trueexplicitly. - BREAKING:
score_parser = "helix_result"now takes a per-example
HELIX_RESULT=[[score_0, side_info_0], [score_1, side_info_1], ...]
payload — one[score, side_info]pair per id inhelix_batch.json.
HELIX zips it intoinstance_scoresand storesside_info_ion
EvalResult.per_example_side_infofor the reflection prompt. GEPA
optimize_anythingparity (src/gepa/optimize_anything.py:387-438).
The previous scalar-plus-id-keyed-dict contract is removed — it
silently failed the minibatch gate whenever the evaluator keyed its
dict by aggregate metric names (task__metric) instead of per-example
ids (task__trialN). Migration: replace
HELIX_RESULT=[mean, {"scores": {id_i: v_i, ...}, ...}]with
HELIX_RESULT=[[v_0, {"info": "..."}], [v_1, {...}], ...]. helix.executor.run_evaluatornow emits aWARNINGlog line when the
post-filter zero-fills any requested id (naming the count and a sample
of up to five). Non-breaking — behaviour is unchanged, only
observability improves. Catches parser / id-keying mismatches on
parsers other thanhelix_result(e.g.exitcodeplusinstance_ids).- BREAKING:
seedlessis now a section ([seedless]withenabled = …),
not a top-level boolean. - BREAKING:
dataset.train_path/dataset.val_pathhave moved to
seedless.train_path/seedless.val_path.[dataset]is now
cardinality-only (train_size/val_size). - BREAKING:
helix_batch.jsonpayload shape is nowlist[str]instead
oflist[int]. Example ids flow through the Architecture A evaluator
handoff as opaque strings — the default_RangeDataLoaderemits"0",
"1", …, andStratifiedBatchSampleremits task-prefixed ids like
"group_alpha__case_3". Evaluators that previously read the handoff as
list[int]must cast on their side
([int(s) for s in json.loads(Path("helix_batch.json").read_text())])
or switch to string-keyed lookup. Unblocks the stratified sampler on
Architecture A, which previously died with
ValueError: invalid literal for int()at the serialization boundary. - BREAKING: All pydantic sub-models in
src/helix/config.py
(EvaluatorConfig,DatasetConfig,SeedlessConfig,EvolutionConfig,
ClaudeConfig,WorktreeConfig,HelixConfig) now use
model_config = ConfigDict(extra="forbid"). Unknown / misplaced /
mistyped TOML keys raise a validation error at load time instead of
being silently dropped. Previously, placingbatch_samplerunder
[evaluator](the key lives on[evolution]) silently left users on
the default sampler with no warning.