Get LabTrust-Gym installed, run your first benchmarks, and optionally fork and extend for your organization.
| I want to... | First step |
|---|---|
| Run benchmarks only | pip install labtrust-gym[env,plots] then labtrust quick-eval |
| Add my coordination method (or task) | Extension development + entry_points; see examples/extension_example for a minimal plugin |
| Fork and customize policy | Forker guide and labtrust forker-quickstart |
| Use labtrust-gym as a library | Extension development + --profile + extension_packages in a lab profile |
| Run the full security suite | labtrust run-security-suite; full suite needs .[env]; use --skip-system-level when the [env] extra is missing |
| Run the PCS QC-release demo (proof-carrying science) | PCS quickstart — scripts/setup_pcs_dev.ps1, then labtrust run-demo qc-release |
| Connect the LabTrust Portal (Lovable) to live data | Set VITE_DATA_BASE_URL in the portal to this repo’s deployed viewer-data URL; see Portal context — Portal live data connection |
| Export a UI bundle (tables + coordination charts) for the portal | labtrust ui-export --run <dir> --out <zip>; when run has coordination pack output, zip includes SOTA leaderboards and coordination/graphs/ HTML charts. See Frontend handoff. |
| Document | Description |
|---|---|
| Installation | Pip install, extras, environment variables, development setup. |
| Build your own agent | Implement an agent and run it with eval-agent (5–10 min). |
| Example agents | Reference agents and run commands. |
| Example experiments | Reproducible experiments (trust vs performance). |
| Document | Description |
|---|---|
| Forker guide | Fork, customize policy, run the full pipeline, add partner overlays and coordination methods. |
| Demo readiness | Prerequisites, Windows notes, and risk-register usage for the three presentation demos. |
| Recommended Windows setup | Path, shell, file-lock mitigation, and locale for Windows-only users. |
| Troubleshooting | Common issues and fixes. |
- Repository structure — where to put outputs and how the repo is laid out.
- Architecture — system design and threat model.
- Benchmarks — tasks, studies, and reproduction.
- PCS (proof-carrying science) — QC-release reference workflow and release gates.
- Frontend handoff (UI bundle) — for frontend engineers: zip layout, coordination_artifacts, and how to display SOTA tables and charts.