-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Gareth Roberts edited this page Jan 19, 2026
·
4 revisions
insideLLMs is a Python library and CLI for comparing LLM behaviour across models using shared probes and datasets. It produces deterministic run artefacts for reporting and CI diffs.
If you are new, start with Getting Started and then move to Harness and Results and Reports once you have a baseline run.
flowchart LR
Run[Run / Harness] --> Validate[Validate Outputs]
Validate --> Report[Report]
Report --> Diff[Diff]
Diff --> Decide[Decide]
- Models: Providers and local models share a single interface.
- Probes: Small, focused tests for specific behaviours (logic, bias, safety).
- Harness: Run the same probe suite across models and datasets in one pass.
- Outputs:
records.jsonl,summary.json,report.html, anddiff.json.