This guide describes how to extend LabTrust-Gym with your own domains, coordination methods, tasks, invariant handlers, security/safety suites, and metrics without forking the core repository. For the full list of supported programmatic entry points and contracts, see Public API.
Installing labtrust-gym and shipping your own pip package (using register_* or entry_points, --profile, and extension_packages) is the integration pattern for using the stack as an extensible platform. One minimal worked example, such as a package that adds a single task or coordination method, demonstrates that the stack is reusable.
Minimal worked example (in-doc): Create a package that declares one task or one coordination method via entry_point. In pyproject.toml:
[project.entry-points."labtrust_gym.tasks"]
my_custom_task = "mylab.tasks:MyBenchmarkTask"(or for a coordination method: "labtrust_gym.coordination_methods" with my_method = "mylab.coordination:my_factory"). Create a lab profile that lists your package in extension_packages, then run:
labtrust --profile my_lab run-benchmark --task my_custom_task --episodes 2 --out out.jsonThe CLI loads your plugin and your task (or coordination method) is available. This pattern treats the stack as an extensible platform beyond the fixed in-repo application.
A minimal working example lives in examples/extension_example/. It registers one task (example_task) via an entry_point. From the repo root:
pip install -e examples/extension_example
labtrust --profile example run-benchmark --task example_task --episodes 1The profile example is defined in policy/lab_profiles/example.v0.1.yaml and lists extension_packages: ["example-plugin"]. Copy examples/extension_example/ to start your own plugin; the only supported contract for new tasks is BenchmarkTask and for coordination methods is CoordinationMethod (see Contracts below). The extension example is installed and run in CI to verify the plugin mechanism on every commit; see CI.
A second minimal example, examples/coord_method_example/, registers one coordination method (example_noop_coord) via the labtrust_gym.coordination_methods entry point. Pair it with an existing task (e.g. coord_scale) via --coord-method example_noop_coord:
pip install -e examples/coord_method_example
labtrust run-benchmark --task coord_scale --coord-method example_noop_coord --scale small_smoke --episodes 1 --out results.jsonCopy examples/coord_method_example/ to build a custom coordination-only extension.
The core exposes registries and optional setuptools entry_points. You can either:
- Register at runtime: import the registry and call
register_*from your code (e.g. in your test suite or runner). - Register via entry_points: ship a pip-installable package that declares entry_points; when
labtrustruns, plugins are loaded and your extensions are registered.
- Policy is loaded from a policy root (repo root or
LABTRUST_POLICY_DIR). Uselabtrust_gym.config.get_repo_root()andlabtrust_gym.config.policy_path(policy_root, *parts)to build paths underpolicy/. - A lab profile (
policy/lab_profiles/<id>.v0.1.yaml) can setpartner_id, provider IDs, and paths. Uselabtrust --profile <id> <command>to apply it.
| Extension | Register function | Entry-point group | Notes |
|---|---|---|---|
| Domain | labtrust_gym.domain.registry.register_domain(id, factory) |
labtrust_gym.domains |
factory(workflow_spec, config) -> LabTrustEnvAdapter; use get_domain_adapter_factory(domain_id) to resolve, list_domains() for known IDs. Default domain is hospital_lab; a profile field domain_id or future --domain can override when the runner uses the registry. The coordination method interface is shared across domains; observation and action semantics are domain-specific. |
| Coordination method | labtrust_gym.baselines.coordination.registry.register_coordination_method(method_id, factory) |
labtrust_gym.coordination_methods |
factory(policy, repo_root, scale_config, params) -> CoordinationMethod. Optional scale_capable: true in the method's policy entry enables per-agent LLM at N > N_max (combine path); see coordination_methods.v0.1.yaml and design_choices §6.3. |
| Task | labtrust_gym.benchmarks.tasks.register_task(name, task_class) |
labtrust_gym.tasks |
task_class must be a subclass of BenchmarkTask |
| Invariant handler | labtrust_gym.engine.invariants_runtime.register_invariant_handler(logic_type, check_name, handler) |
labtrust_gym.invariant_handlers |
Key format: type.check_name (e.g. state.custom_check). Handler signature: (env, event, params) -> (passed, reason_code, details) | None |
| Security suite | labtrust_gym.benchmarks.security_runner.register_security_suite_provider(provider_id, provider) |
labtrust_gym.security_suite_providers |
Provider: load_suite(policy_root, partner_id) -> dict, run_suite(policy_root, repo_root, ...) -> list[dict]. Resolve with get_security_suite_provider(id); list IDs with list_security_suite_providers(). Evidence: Replacing the security suite provider (e.g. via lab profile security_suite_provider_id) can weaken evidence, as a custom provider may run a different suite or relax checks. Such overrides should be audited. CI and release should use the default provider unless explicitly testing an extension. |
| Safety case | labtrust_gym.security.safety_case.register_safety_case_provider(provider_id, provider) |
labtrust_gym.safety_case_providers |
Provider: load_claims(policy_root) -> dict, build_safety_case(policy_root) -> dict. Resolve with get_safety_case_provider(id); list IDs with list_safety_case_providers(). |
| Metrics aggregator | labtrust_gym.benchmarks.metrics.register_metrics_aggregator(aggregator_id, aggregator) |
labtrust_gym.metrics_aggregators |
Same signature as compute_episode_metrics. Resolve with get_metrics_aggregator(id); list IDs with list_metrics_aggregators(). |
| Benchmark pack loader | labtrust_gym.benchmarks.official_pack.register_benchmark_pack_loader(loader_id, loader) |
labtrust_gym.benchmark_pack_loaders |
Loader: (repo_root, prefer_v02, partner_id) -> (pack_dict, version, path); use load_benchmark_pack(..., loader_id=...) to select. |
In your package pyproject.toml:
[project.entry-points."labtrust_gym.tasks"]
my_task = "mylab.tasks:MyBenchmarkTask"
[project.entry-points."labtrust_gym.coordination_methods"]
my_method = "mylab.coordination:my_factory"
[project.entry-points."labtrust_gym.invariant_handlers"]
state.my_check = "mylab.invariants:check_my_custom"- labtrust_gym.domains:
id = module:factory_callable - labtrust_gym.coordination_methods:
method_id = module:factory_func - labtrust_gym.tasks:
task_name = module:TaskClass(class, not instance) - labtrust_gym.invariant_handlers:
type.check_name = module:handler_func - labtrust_gym.security_suite_providers:
provider_id = module:provider_or_factory - labtrust_gym.safety_case_providers:
provider_id = module:provider_or_factory - labtrust_gym.metrics_aggregators:
aggregator_id = module:aggregator_callable - labtrust_gym.benchmark_pack_loaders:
loader_id = module:loader_callable(signature:(repo_root, prefer_v02, partner_id) -> (pack_dict, version, path))
Plugins are loaded when the CLI starts (labtrust). For programmatic use, call labtrust_gym.plugins.load_plugins() once before using registries.
Extensions must conform to existing contracts:
- Domains: implement
LabTrustEnvAdapter(reset, step, query); step return shape must satisfy the runner output contract. Optional step_batch(events) may be implemented for batch stepping (same semantics as calling step(e) for each event in order); the PZ env uses it when present. See Frozen contracts (Env contract, PettingZoo env). - Implementing a custom env: implement the
BenchmarkEnvprotocol (or the subset used by your tasks: at leastagents,reset,step,get_device_ids,get_zone_ids,get_dt_swhen coordination or timing is used). Register your env via the domain/task path (e.g. task'sget_initial_stateand env_factory returning an env that conforms to BenchmarkEnv). - Coordination methods: implement
CoordinationMethod(reset, propose_actions with action_index 0..5).
The factory registered with register_coordination_method(method_id, factory) or via the labtrust_gym.coordination_methods entry_point must have this exact signature:
factory(policy: dict[str, Any], repo_root: Path | None, scale_config: dict[str, Any] | None, params: dict[str, Any]) -> CoordinationMethod
paramsincludes merged default_params frompolicy/coordination/coordination_methods.v0.1.yamland any kwargs passed by the runner (e.g.compute_budget,pz_to_engine,proposal_backend). Themethod_idin YAML must match a registered factory (built-in or plugin).- Tasks: subclass
BenchmarkTask; provide name, max_steps, scripted_agents, get_initial_state, reward_config. - Invariant handlers: return
(passed: bool, reason_code: str | None, details: dict | None) | None; logic_template in registry YAML usestypeandparameters.check. - Results: custom metrics may add keys to episode.metrics; results v0.3 allows optional fields. The CI-stable subset (v0.2) remains fixed.
See Frozen contracts and Metrics contract.
Create policy/lab_profiles/<profile_id>.v0.1.yaml:
version: "0.1"
profile_id: my_lab
description: "My organization lab."
partner_id: my_partner
security_suite_provider_id: default
safety_case_provider_id: default
metrics_aggregator_id: default
extension_packages: ["mylab_labtrust"]Then run: labtrust --profile my_lab run-benchmark --task throughput_sla --episodes 5. The profile overrides partner_id and optional provider IDs. Path overrides (e.g. security_suite_path, safety_claims_path) must resolve under the repository root; absolute paths outside the repo are rejected and the default path is used.
- Create a package (e.g.
mylab_labtrust) that depends onlabtrust-gym. - Implement your domain/tasks/coordination/invariant handlers/providers.
- Register them in your package root or in a
register()function that you call from entry_points. For entry_points, point to the class or factory; the core will call the appropriateregister_*. - Optionally ship a policy bundle (your
policy/or a subset) and documentLABTRUST_POLICY_DIRor use a lab profile that points to it. - Users install your package and run
labtrust; your entry_points are loaded automatically.
- Forker guide for policy-only customization (partner overlay, coordination methods YAML, risk register).
- Frozen contracts for the extensibility section and schema stability.