Skip to content
Gareth Roberts edited this page Jan 19, 2026 · 4 revisions

insideLLMs Wiki

insideLLMs is a Python library and CLI for comparing LLM behaviour across models using shared probes and datasets. It produces deterministic run artefacts for reporting and CI diffs.

Docs Start Here

If you are new, start with Getting Started and then move to Harness and Results and Reports once you have a baseline run.

flowchart LR
  Run[Run / Harness] --> Validate[Validate Outputs]
  Validate --> Report[Report]
  Report --> Diff[Diff]
  Diff --> Decide[Decide]
Loading

Quick Links

Core Concepts

  • Models: Providers and local models share a single interface.
  • Probes: Small, focused tests for specific behaviours (logic, bias, safety).
  • Harness: Run the same probe suite across models and datasets in one pass.
  • Outputs: records.jsonl, summary.json, report.html, and diff.json.

Clone this wiki locally