Releases · KE7/helix

Note: 0.2.0 was published from a broken build and is superseded by this
release. Do not install 0.2.0; use 0.2.1 or later.

Removed

Orphan helix.toml template at the repository root. The runtime
template that helix init writes lives inline in
src/helix/cli.py::_HELIX_TOML_TEMPLATE; the repo-root file was a
stale stub no docs or code referenced.

Packaging

Tightened the sdist via [tool.hatch.build.targets.sdist]. The
source distribution now ships only src/helix, pyproject.toml,
uv.lock, README.md, and LICENSE (mirrors GEPA's lean PyPI
layout, minus tests/). Wheel contents unchanged.

Added

Multi-axis Pareto frontier (GEPA FrontierType parity,
src/gepa/core/state.py:22-23). New
evolution.frontier_type: Literal["instance", "objective", "hybrid", "cartesian"] with default "hybrid" — matches GEPA's own
optimize_anything default (src/gepa/optimize_anything.py:476).
ParetoFrontier now tracks per-objective-name and per-(val_id, objective_name) best sets alongside the existing per-example-id
tracking, and get_non_dominated() / select_parent() dispatch on
frontier_type. The acceptance gate stays positional on
scores_list unchanged.
EvalResult.per_example_side_info: list[dict[str, Any]] | None —
per-example diagnostic dicts from the new helix_result contract,
positional to instance_scores by helix_batch.json id order.
GEPA analogue: EvaluationBatch.trajectories
(src/gepa/core/adapter.py:25).
EvalResult.objective_scores: list[dict[str, float]] | None —
per-example objective-axis harvest from side_info["scores"]. GEPA
analogue: EvaluationBatch.objective_scores
(src/gepa/core/adapter.py:26). Feeds frontier_type ∈ {"objective", "hybrid", "cartesian"}; harmless on the "instance" path.
DatasetConfig.train_size / val_size — cardinality-only fields that drive
the minibatch sampler when the evaluator owns the dataset (Architecture A
example-id handoff). HELIX writes sampled example ids to
{worktree}/helix_batch.json; the evaluator filters its own dataset.
SeedlessConfig — new section carrying enabled plus the optional
prompt-grounding train_path / val_path used during seed generation.
Evaluator-owned dataset integration wired into Architecture A with
cardinality-only train_size / val_size.
load_config now emits a dedicated hint for pydantic extra_forbidden
validation errors, pointing users at likely typos or misplaced keys.

Changed

BREAKING: score_parser = "helix_result" now rejects malformed
side_info["scores"] payloads instead of silently dropping fields.
Non-dict "scores" values, non-string objective names, and non-finite
/ non-numeric objective values now raise EvaluatorError at parse
time. Previously these were dropped (with a logged warning at most),
which let evaluators stuff arbitrary diagnostics under "scores" —
including pairwise / Bradley-Terry payloads — and then misread the
result as scalar objective semantics. Pairwise / BT payloads are
not implemented in HELIX yet; emit them under a different
side_info key. Migration: ensure side_info["scores"] is a
dict[str, float|int|bool] with finite numeric values, or omit the
key entirely.
BREAKING: Non-"instance" evolution.frontier_type values now
fail loudly when an EvalResult lacks per-example objective_scores,
has length-mismatched objective_scores, or has all-empty objective
slots. ParetoFrontier.add() raises the new
MissingObjectiveScoresError instead of silently degenerating to
scalar / instance-axis behaviour. select_parent() likewise raises
rather than falling back to all-candidate sampling on objective-bearing
modes. The "instance" path keeps its existing fallback semantics.
BREAKING: evolution.cache_evaluation now defaults to False
(previously True). Matches GEPA Optimize Anything's conservative
cache_evaluation default
(src/gepa/optimize_anything.py:476). When the cache is enabled,
entries are now keyed by candidate content (the worktree's
HEAD^{tree} SHA, with a clean-state guard) rather than HELIX's
lineage candidate.id, so equivalent candidates can reuse results
across re-derivations. Configs that previously relied on cache hits
(e.g. resume scenarios that re-evaluate the seed) should set
cache_evaluation = true explicitly.
BREAKING: score_parser = "helix_result" now takes a per-example
HELIX_RESULT=[[score_0, side_info_0], [score_1, side_info_1], ...]
payload — one [score, side_info] pair per id in helix_batch.json.
HELIX zips it into instance_scores and stores side_info_i on
EvalResult.per_example_side_info for the reflection prompt. GEPA
optimize_anything parity (src/gepa/optimize_anything.py:387-438).
The previous scalar-plus-id-keyed-dict contract is removed — it
silently failed the minibatch gate whenever the evaluator keyed its
dict by aggregate metric names (task__metric) instead of per-example
ids (task__trialN). Migration: replace
HELIX_RESULT=[mean, {"scores": {id_i: v_i, ...}, ...}] with
HELIX_RESULT=[[v_0, {"info": "..."}], [v_1, {...}], ...].
helix.executor.run_evaluator now emits a WARNING log line when the
post-filter zero-fills any requested id (naming the count and a sample
of up to five). Non-breaking — behaviour is unchanged, only
observability improves. Catches parser / id-keying mismatches on
parsers other than helix_result (e.g. exitcode plus instance_ids).
BREAKING: seedless is now a section ([seedless] with enabled = …),
not a top-level boolean.
BREAKING: dataset.train_path / dataset.val_path have moved to
seedless.train_path / seedless.val_path. [dataset] is now
cardinality-only (train_size / val_size).
BREAKING: helix_batch.json payload shape is now list[str] instead
of list[int]. Example ids flow through the Architecture A evaluator
handoff as opaque strings — the default _RangeDataLoader emits "0",
"1", …, and StratifiedBatchSampler emits task-prefixed ids like
"group_alpha__case_3". Evaluators that previously read the handoff as
list[int] must cast on their side
([int(s) for s in json.loads(Path("helix_batch.json").read_text())])
or switch to string-keyed lookup. Unblocks the stratified sampler on
Architecture A, which previously died with
ValueError: invalid literal for int() at the serialization boundary.
BREAKING: All pydantic sub-models in src/helix/config.py
(EvaluatorConfig, DatasetConfig, SeedlessConfig, EvolutionConfig,
ClaudeConfig, WorktreeConfig, HelixConfig) now use
model_config = ConfigDict(extra="forbid"). Unknown / misplaced /
mistyped TOML keys raise a validation error at load time instead of
being silently dropped. Previously, placing batch_sampler under
[evaluator] (the key lives on [evolution]) silently left users on
the default sampler with no warning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Removed

Packaging

Added

Changed

Uh oh!

Releases: KE7/helix

v0.2.1

Removed

Packaging

Added

Changed

Uh oh!