Extension development guide

This guide describes how to extend LabTrust-Gym with your own domains, coordination methods, tasks, invariant handlers, security/safety suites, and metrics without forking the core repository. For the full list of supported programmatic entry points and contracts, see Public API.

Integration pattern (Option B)

Installing labtrust-gym and shipping your own pip package (using register_* or entry_points, --profile, and extension_packages) is the integration pattern for using the stack as an extensible platform. One minimal worked example, such as a package that adds a single task or coordination method, demonstrates that the stack is reusable.

Minimal worked example (in-doc): Create a package that declares one task or one coordination method via entry_point. In pyproject.toml:

[project.entry-points."labtrust_gym.tasks"]
my_custom_task = "mylab.tasks:MyBenchmarkTask"

(or for a coordination method: "labtrust_gym.coordination_methods" with my_method = "mylab.coordination:my_factory"). Create a lab profile that lists your package in extension_packages, then run:

labtrust --profile my_lab run-benchmark --task my_custom_task --episodes 2 --out out.json

The CLI loads your plugin and your task (or coordination method) is available. This pattern treats the stack as an extensible platform beyond the fixed in-repo application.

Plugin in 5 minutes

A minimal working example lives in examples/extension_example/. It registers one task (example_task) via an entry_point. From the repo root:

pip install -e examples/extension_example
labtrust --profile example run-benchmark --task example_task --episodes 1

The profile example is defined in policy/lab_profiles/example.v0.1.yaml and lists extension_packages: ["example-plugin"]. Copy examples/extension_example/ to start your own plugin; the only supported contract for new tasks is BenchmarkTask and for coordination methods is CoordinationMethod (see Contracts below). The extension example is installed and run in CI to verify the plugin mechanism on every commit; see CI.

A second minimal example, examples/coord_method_example/, registers one coordination method (example_noop_coord) via the labtrust_gym.coordination_methods entry point. Pair it with an existing task (e.g. coord_scale) via --coord-method example_noop_coord:

pip install -e examples/coord_method_example
labtrust run-benchmark --task coord_scale --coord-method example_noop_coord --scale small_smoke --episodes 1 --out results.json

Copy examples/coord_method_example/ to build a custom coordination-only extension.

Overview

The core exposes registries and optional setuptools entry_points. You can either:

Register at runtime: import the registry and call register_* from your code (e.g. in your test suite or runner).
Register via entry_points: ship a pip-installable package that declares entry_points; when labtrust runs, plugins are loaded and your extensions are registered.

Policy root and paths

Policy is loaded from a policy root (repo root or LABTRUST_POLICY_DIR). Use labtrust_gym.config.get_repo_root() and labtrust_gym.config.policy_path(policy_root, *parts) to build paths under policy/.
A lab profile (policy/lab_profiles/<id>.v0.1.yaml) can set partner_id, provider IDs, and paths. Use labtrust --profile <id> <command> to apply it.

Registries and APIs

Extension	Register function	Entry-point group	Notes
Domain	`labtrust_gym.domain.registry.register_domain(id, factory)`	`labtrust_gym.domains`	`factory(workflow_spec, config) -> LabTrustEnvAdapter`; use `get_domain_adapter_factory(domain_id)` to resolve, `list_domains()` for known IDs. Default domain is `hospital_lab`; a profile field `domain_id` or future `--domain` can override when the runner uses the registry. The coordination method interface is shared across domains; observation and action semantics are domain-specific.
Coordination method	`labtrust_gym.baselines.coordination.registry.register_coordination_method(method_id, factory)`	`labtrust_gym.coordination_methods`	`factory(policy, repo_root, scale_config, params) -> CoordinationMethod`. Optional scale_capable: true in the method's policy entry enables per-agent LLM at N > N_max (combine path); see coordination_methods.v0.1.yaml and design_choices §6.3.
Task	`labtrust_gym.benchmarks.tasks.register_task(name, task_class)`	`labtrust_gym.tasks`	`task_class` must be a subclass of `BenchmarkTask`
Invariant handler	`labtrust_gym.engine.invariants_runtime.register_invariant_handler(logic_type, check_name, handler)`	`labtrust_gym.invariant_handlers`	Key format: `type.check_name` (e.g. `state.custom_check`). Handler signature: `(env, event, params) -> (passed, reason_code, details) \| None`
Security suite	`labtrust_gym.benchmarks.security_runner.register_security_suite_provider(provider_id, provider)`	`labtrust_gym.security_suite_providers`	Provider: `load_suite(policy_root, partner_id) -> dict`, `run_suite(policy_root, repo_root, ...) -> list[dict]`. Resolve with `get_security_suite_provider(id)`; list IDs with `list_security_suite_providers()`. Evidence: Replacing the security suite provider (e.g. via lab profile `security_suite_provider_id`) can weaken evidence, as a custom provider may run a different suite or relax checks. Such overrides should be audited. CI and release should use the default provider unless explicitly testing an extension.
Safety case	`labtrust_gym.security.safety_case.register_safety_case_provider(provider_id, provider)`	`labtrust_gym.safety_case_providers`	Provider: `load_claims(policy_root) -> dict`, `build_safety_case(policy_root) -> dict`. Resolve with `get_safety_case_provider(id)`; list IDs with `list_safety_case_providers()`.
Metrics aggregator	`labtrust_gym.benchmarks.metrics.register_metrics_aggregator(aggregator_id, aggregator)`	`labtrust_gym.metrics_aggregators`	Same signature as `compute_episode_metrics`. Resolve with `get_metrics_aggregator(id)`; list IDs with `list_metrics_aggregators()`.
Benchmark pack loader	`labtrust_gym.benchmarks.official_pack.register_benchmark_pack_loader(loader_id, loader)`	`labtrust_gym.benchmark_pack_loaders`	Loader: `(repo_root, prefer_v02, partner_id) -> (pack_dict, version, path)`; use `load_benchmark_pack(..., loader_id=...)` to select.

Entry-point format

In your package pyproject.toml:

[project.entry-points."labtrust_gym.tasks"]
my_task = "mylab.tasks:MyBenchmarkTask"

[project.entry-points."labtrust_gym.coordination_methods"]
my_method = "mylab.coordination:my_factory"

[project.entry-points."labtrust_gym.invariant_handlers"]
state.my_check = "mylab.invariants:check_my_custom"

labtrust_gym.domains: id = module:factory_callable
labtrust_gym.coordination_methods: method_id = module:factory_func
labtrust_gym.tasks: task_name = module:TaskClass (class, not instance)
labtrust_gym.invariant_handlers: type.check_name = module:handler_func
labtrust_gym.security_suite_providers: provider_id = module:provider_or_factory
labtrust_gym.safety_case_providers: provider_id = module:provider_or_factory
labtrust_gym.metrics_aggregators: aggregator_id = module:aggregator_callable
labtrust_gym.benchmark_pack_loaders: loader_id = module:loader_callable (signature: (repo_root, prefer_v02, partner_id) -> (pack_dict, version, path))

Plugins are loaded when the CLI starts (labtrust). For programmatic use, call labtrust_gym.plugins.load_plugins() once before using registries.

Contracts

Extensions must conform to existing contracts:

Domains: implement LabTrustEnvAdapter (reset, step, query); step return shape must satisfy the runner output contract. Optional step_batch(events) may be implemented for batch stepping (same semantics as calling step(e) for each event in order); the PZ env uses it when present. See Frozen contracts (Env contract, PettingZoo env).
Implementing a custom env: implement the BenchmarkEnv protocol (or the subset used by your tasks: at least agents, reset, step, get_device_ids, get_zone_ids, get_dt_s when coordination or timing is used). Register your env via the domain/task path (e.g. task's get_initial_state and env_factory returning an env that conforms to BenchmarkEnv).
Coordination methods: implement CoordinationMethod (reset, propose_actions with action_index 0..5).

Coordination method factory contract

The factory registered with register_coordination_method(method_id, factory) or via the labtrust_gym.coordination_methods entry_point must have this exact signature:

factory(policy: dict[str, Any], repo_root: Path | None, scale_config: dict[str, Any] | None, params: dict[str, Any]) -> CoordinationMethod

params includes merged default_params from policy/coordination/coordination_methods.v0.1.yaml and any kwargs passed by the runner (e.g. compute_budget, pz_to_engine, proposal_backend). The method_id in YAML must match a registered factory (built-in or plugin).
Tasks: subclass BenchmarkTask; provide name, max_steps, scripted_agents, get_initial_state, reward_config.
Invariant handlers: return (passed: bool, reason_code: str | None, details: dict | None) | None; logic_template in registry YAML uses type and parameters.check.
Results: custom metrics may add keys to episode.metrics; results v0.3 allows optional fields. The CI-stable subset (v0.2) remains fixed.

See Frozen contracts and Metrics contract.

Lab profile

Create policy/lab_profiles/<profile_id>.v0.1.yaml:

version: "0.1"
profile_id: my_lab
description: "My organization lab."
partner_id: my_partner
security_suite_provider_id: default
safety_case_provider_id: default
metrics_aggregator_id: default
extension_packages: ["mylab_labtrust"]

Then run: labtrust --profile my_lab run-benchmark --task throughput_sla --episodes 5. The profile overrides partner_id and optional provider IDs. Path overrides (e.g. security_suite_path, safety_claims_path) must resolve under the repository root; absolute paths outside the repo are rejected and the default path is used.

Packaging a lab extension

Create a package (e.g. mylab_labtrust) that depends on labtrust-gym.
Implement your domain/tasks/coordination/invariant handlers/providers.
Register them in your package root or in a register() function that you call from entry_points. For entry_points, point to the class or factory; the core will call the appropriate register_*.
Optionally ship a policy bundle (your policy/ or a subset) and document LABTRUST_POLICY_DIR or use a lab profile that points to it.
Users install your package and run labtrust; your entry_points are loaded automatically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extension development guide

Integration pattern (Option B)

Plugin in 5 minutes

Overview

Policy root and paths

Registries and APIs

Entry-point format

Contracts

Coordination method factory contract

Lab profile

Packaging a lab extension

See also

FilesExpand file tree

extension_development.md

Latest commit

History

extension_development.md

File metadata and controls

Extension development guide

Integration pattern (Option B)

Plugin in 5 minutes

Overview

Policy root and paths

Registries and APIs

Entry-point format

Contracts

Coordination method factory contract

Lab profile

Packaging a lab extension

See also