Implementation strategy: learning-style coordination methods

"Group-Evolving Agents" implies open-ended improvement across episodes. That conflicts with CI determinism unless behaviour is split into a deterministic track (CI-safe, frozen) and a study track (research mode, reproducible via seed-base and artifact logs). This document records the design decision and the concrete metadata convention.

Two-track coordination method mode

Deterministic track (CI-safe)

Uses fixed heuristics or fixed update rules (e.g. loaded policy, no training step).
No cross-run learning persistence unless it is fully seeded and controlled (e.g. same checkpoint path + seed yields identical behaviour).
Runs in pipeline_mode=deterministic.
Suitable for: CI, regression tests, contract tests, and any run that must be bit-reproducible from seed alone.
Learning-style methods (e.g. MARL PPO) in this track: inference-only from a fixed checkpoint; no train() or buffer updates during the run.

Study track (research mode)

Can evolve across episodes (e.g. policy updates, experience replay, evolution).
Must save and reference:
- Experience buffer snapshots (or equivalent) when applicable.
- Policy / genome checkpoints (e.g. per N episodes or at end of run).
- Mutation / update logs (what changed, when).
Reproducibility is achieved via:
- seed_base (and episode seeds derived from it).
- Explicit algorithm version hash (e.g. code or config fingerprint).
- Checkpoint hashing in MANIFEST or run manifests (so a run can be reproduced from seed + checkpoint hash + algorithm version).
Bit-identical replay across runs is optional; same seed_base + same checkpoint + same algorithm version must yield reproducible behaviour (same metrics within expected variance when the method is stochastic).
Study-track runs belong in research workflows; CI gates should rely on deterministic-track or inference-only baselines.

Optional learning metadata in results (v0.2 compatible)

Results schema v0.2 allows optional top-level metadata with additionalProperties: true. To support learning-style methods without breaking compatibility, the following optional nested structure is defined. When a coordination method uses learning (study track), the runner can populate:

Location: results.metadata.coordination.learning

Field	Type	Description
enabled	boolean	True when this run used a learning/evolving coordination method (study track).
checkpoint_sha	string (optional)	Hash (e.g. SHA-256) of the policy or genome checkpoint used at start of the run, or at end if reporting final checkpoint. Enables reproducibility (re-run with same checkpoint + seed_base).
update_count	integer (optional)	Number of policy/parameter updates (e.g. gradient steps, mutations) performed during this run. Zero for inference-only.
buffer_size	integer (optional)	Size of the experience buffer (or equivalent) at end of run, when applicable.

If enabled is false or absent, the run is treated as deterministic-track (no learning during the run).
checkpoint_sha and buffer_size can be included in MANIFEST or run manifests for study-track reproducibility.
Consumers (summarize-results, risk register, CI) should treat presence of coordination.learning.enabled === true as an indication that the run is study-track and may not be bit-reproducible from seed alone; reproducibility is then via seed_base + algorithm version + checkpoint_sha (and optionally buffer snapshot).

How methods expose learning metadata

Coordination methods that support the study track may implement an optional get_learning_metadata() -> dict[str, Any] | None. When non-None, the runner merges it into results.metadata.coordination.learning after the run.
Deterministic and inference-only methods omit get_learning_metadata (or return None); the runner skips coordination.learning in that case.
The dict returned by get_learning_metadata() should contain at most the keys above (enabled, checkpoint_sha, update_count, buffer_size); additional keys are allowed for forward compatibility but may be ignored by downstream tools.

Reproducibility checklist (study track)

When running or publishing a study-track coordination run:

Record seed_base (and episode seeds if different).
Record algorithm version (e.g. git SHA, or a hash of code/config).
Save checkpoint(s) and record checkpoint_sha in metadata and/or MANIFEST.
If experience buffers are part of the method, save buffer snapshots or document that reproducibility is conditional on same buffer state (e.g. same prior run).
Optionally save mutation/update logs (e.g. which updates were applied, in what order).

CI and regression gates should use only deterministic-track runs (or inference-only learning methods with a fixed checkpoint and no updates during the run).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation strategy: learning-style coordination methods

Two-track coordination method mode

Deterministic track (CI-safe)

Study track (research mode)

Optional learning metadata in results (v0.2 compatible)

How methods expose learning metadata

Reproducibility checklist (study track)

FilesExpand file tree

learning_methods_implementation_strategy.md

Latest commit

History

learning_methods_implementation_strategy.md

File metadata and controls

Implementation strategy: learning-style coordination methods

Two-track coordination method mode

Deterministic track (CI-safe)

Study track (research mode)

Optional learning metadata in results (v0.2 compatible)

How methods expose learning metadata

Reproducibility checklist (study track)