Home

insideLLMs Wiki

insideLLMs is a Python library and CLI for comparing LLM behaviour across models using shared probes and datasets. It produces deterministic run artefacts for reporting and CI diffs.

Docs Start Here

If you are new, start with Getting Started and then move to Harness and Results and Reports once you have a baseline run.

flowchart LR
  Run[Run / Harness] --> Validate[Validate Outputs]
  Validate --> Report[Report]
  Report --> Diff[Diff]
  Diff --> Decide[Decide]

Quick Links

Core Concepts

Models: Providers and local models share a single interface.
Probes: Small, focused tests for specific behaviours (logic, bias, safety).
Harness: Run the same probe suite across models and datasets in one pass.
Outputs: records.jsonl, summary.json, report.html, and diff.json.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

insideLLMs Wiki

Docs Start Here

Quick Links

Core Concepts

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally