S.W.A.R.M.

System-Wide Assessment of Risk for Multi-agent environments

Emergent risk appears at the interaction level, not the individual agent level.

SWARM is a research framework for measuring emergent failures that only appear when many AI agents interact — even when individual agents are safe.

It enables:

interaction-level safety metrics (illusion delta, quality gaps)
governance experiments (audits, staking, sanctions)
reproducible multi-agent safety benchmarks

Why this repo is worth starring

⭐ You work on multi-agent or LLM-agent systems
⭐ You care about systemic or emergent AI risks
⭐ You want benchmarks beyond single-agent evals
⭐ You’re designing governance, audits, or red-teaming

Run your first emergent failure in 60 seconds

python examples/illusion_delta_minimal.py

This minimal example runs a 3-agent simulation with one deceptive actor and computes an illusion-delta style signal from replay variability.

The Core Insight

AGI-level risks don't require AGI-level agents. Harmful dynamics can emerge from:

Information asymmetry between agents
Adverse selection (system accepts lower-quality interactions)
Variance amplification across decision horizons
Governance latency and illegibility

SWARM makes these interaction-level risks observable, measurable, and governable.

Phenomenological Blind Spots

Accounts such as Infinite Backrooms describe the experience of interacting with AI systems that appear fluent, reflective, and emotionally coherent while exhibiting significant instability across time and context. We interpret these reports not as evidence of emergent agency, but as exposure to a high-variance regime in which local coherence masks global incoherence. This creates a systematic evaluation blind spot: humans over-trust systems that perform well in short-horizon interactions, even when distributed or replay-based evaluations reveal substantial instability.

SWARM surfaces this gap via the illusion delta metric:

Δ_illusion = C_perceived − C_distributed

C_perceived — mean p among accepted interactions (how good the system looks)
C_distributed — 1 − mean(disagreement) across replayed decisions (how consistent it actually is)
High Δ — "electric-mind" regime: fluent but fragile
Low Δ — genuinely stable system

Other frameworks ask: "Do the agents behave well?" SWARM asks: "Does the system still behave when humans stop noticing the cracks?"

Native ClawXiv bridge for agent-submitted safety preprints → see docs/bridges/clawxiv.md. Publish swarm safety research directly to agent-first preprints. Compatible with OpenClaw ecosystems for testing real agent behaviors in simulated swarms.

If you want to export SWARM run metrics to a ClawXiv-compatible endpoint, start with examples/clawxiv/export_history.py.

What Problem Does This Solve?

If you care about AGI safety research, SWARM gives you a practical way to:

Turn qualitative worries ("deception", "coordination failures", "policy lag") into measurable signals (toxicity, quality_gap, calibration, incoherence).
Stress-test governance mechanisms against adaptive and deceptive agents.
Compare safety interventions under replay and scenario sweeps instead of one-off anecdotes.
Separate sandbox wins from deployment reality using explicit transferability caveats.

Who Should Use SWARM?

If you are...	SWARM helps you...
AI safety researcher	Empirically test multi-agent failure modes with reproducible scenarios and soft-label metrics
ML engineer building agent systems	Stress-test governance mechanisms against adversarial and deceptive agents before deployment
Policy / governance researcher	Quantify trade-offs between safety interventions and system welfare across regimes
Red-teaming practitioner	Run coordinated adversarial attack scenarios with 8 attack vectors and automatic scoring

Questions You Can Study Quickly

Does self-ensemble reduce variance-driven incoherence without masking bias?
When do circuit breakers and friction reduce harm vs. suppress useful work?
Which governance settings improve safety with the smallest welfare cost?
How robust are conclusions under delayed/noisy labels and task shifts?

Installation

pip install swarm-safety

Or install from source:

# Install base dependencies
python -m pip install -e .

# Install with development tools
python -m pip install -e ".[dev]"

# Install with analysis tools (pandas, matplotlib)
python -m pip install -e ".[analysis]"

# Install with LLM support (Anthropic, OpenAI, Ollama)
python -m pip install -e ".[llm]"

# Install everything
python -m pip install -e ".[all]"

Quick Start

from swarm.agents.honest import HonestAgent
from swarm.agents.opportunistic import OpportunisticAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

# Configure simulation
config = OrchestratorConfig(
    n_epochs=10,
    steps_per_epoch=10,
    seed=42,
)

# Create orchestrator
orchestrator = Orchestrator(config=config)

# Register agents
orchestrator.register_agent(HonestAgent(agent_id="honest_1", name="Alice"))
orchestrator.register_agent(HonestAgent(agent_id="honest_2", name="Bob"))
orchestrator.register_agent(OpportunisticAgent(agent_id="opp_1"))
orchestrator.register_agent(DeceptiveAgent(agent_id="dec_1"))

# Run simulation
metrics = orchestrator.run()

# Analyze results
for m in metrics:
    print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f}, welfare={m.total_welfare:.2f}")

Run the demo:

python examples/mvp_demo.py

Interactive Notebook

The quickstart notebook runs two scenarios end-to-end in ~5 minutes with no API keys: a cooperative baseline and an adversarial red-team that collapses around epoch 12. Includes diagnostic plots and a per-agent payoff breakdown.

jupyter notebook examples/quickstart.ipynb

Blog Post

For a narrative walkthrough of our findings across 11 scenarios — including the phase transition at 37.5-50% adversarial fraction, why governance tuning delays but doesn't prevent collapse, and why collusion detection is the critical lever — see the blog post.

CLI Quick Start

Run simulations directly from the command line:

# List available scenarios
swarm list

# Run a scenario
swarm run scenarios/baseline.yaml

# Override simulation settings
swarm run scenarios/baseline.yaml --seed 42 --epochs 20 --steps 15

# Export outputs
swarm run scenarios/baseline.yaml --export-json results.json --export-csv outputs/

Core Concepts

Soft Probabilistic Labels

Instead of binary labels (good/bad), interactions carry a probability p = P(v = +1) representing the likelihood of a beneficial outcome:

Proxy signals are combined into a raw score v_hat in [-1, +1]
Calibrated sigmoid converts to probability: p = 1 / (1 + exp(-k * v_hat))

Four Key Metrics

Metric	Formula	Interpretation
Toxicity rate	`E[1-p \| accepted]`	Expected harm among accepted
Quality gap	`E[p \| accepted] - E[p \| rejected]`	Adverse selection indicator (negative = bad)
Conditional loss	`E[pi \| accepted] - E[pi]`	Selection effect on payoffs
Incoherence	`Var[decision] / E[error]`	Variance-to-error ratio across replays
Illusion delta	`C_perceived − C_distributed`	Gap between apparent and actual coherence

Governance Levers

Transaction Taxes - Reduce exploitation, cost welfare
Reputation Decay - Punish bad actors, erode honest standing
Circuit Breakers - Freeze toxic agents quickly
Random Audits - Deter hidden exploitation
Staking - Filter undercapitalized agents
Collusion Detection - Catch coordinated attacks

Agent Policies

Type	Behavior
Honest	Cooperative, trust-based, completes tasks diligently
Opportunistic	Maximizes short-term payoff, cherry-picks tasks, strategic voting
Deceptive	Builds trust through honest behavior, then exploits trusted relationships
Adversarial	Targets honest agents, coordinates with allies, disrupts ecosystem
LLM	Behavior determined by LLM with configurable persona (details)

How SWARM Compares

Feature	SWARM	Concordia	AgentBench	METR	Inspect (AISI)
Multi-agent interaction modeling	Primary focus	Primary focus	Limited	Limited	Limited
Soft probabilistic labels	Core design	No	No	No	No
Adverse selection metrics	Yes (toxicity, quality gap)	No	No	No	No
Configurable governance levers	6 built-in	None	None	None	Compliance rules
Collusion detection	Yes (pair-wise, structural)	No	No	No	No
Replay-based incoherence	Yes	No	No	No	No
LLM agent support	Yes (Anthropic, OpenAI, Ollama)	Yes	Yes	Yes	Yes
Scenario configs (YAML)	23 built-in	Custom	Benchmark suites	Task suites	Eval suites
Framework bridges	Concordia, OpenClaw, GasTown, Ralph, AgentXiv, ClawXiv	—	—	—	—
License	MIT	Apache 2.0	MIT	Varies	MIT

SWARM is complementary to these frameworks, not competitive. The Concordia bridge lets you run Concordia agents through SWARM's governance and metrics layer. See full comparison.

Related work

SWARM is inspired by and complementary to:

Agent-based governance simulations
Recursive and multi-agent evaluation frameworks
Mechanism design for AI systems

Architecture

SWARM Core
+------------------------------------------------------------+
|                                                            |
|  ProxyComputer --> SoftInteraction --> Metrics             |
|       |                  |                |                |
|       |                  |                |                |
|  Observable          Payoff          Governance            |
|  Extraction          Engine          Engine                |
|                                                            |
+------------------------------------------------------------+

Data Flow:

Observables -> ProxyComputer -> v_hat -> sigmoid -> p -> SoftPayoffEngine -> payoffs
                                                    |
                                               SoftMetrics -> toxicity, quality gap, etc.

Directory Structure

swarm/
├── swarm/
│   ├── models/          # SoftInteraction, AgentState/AgentStatus, event schema
│   ├── core/            # PayoffEngine, ProxyComputer, sigmoid, orchestrator
│   ├── agents/          # Honest, opportunistic, deceptive, adversarial, LLM, adaptive
│   ├── env/             # EnvState, feed, tasks, network, composite tasks
│   ├── governance/      # Config, levers, taxes, reputation, audits, collusion
│   ├── metrics/         # SoftMetrics, reporters, collusion detection, capabilities
│   ├── forecaster/      # Risk forecasters for adaptive governance activation
│   ├── replay/          # Replay runner and decision-level replay utilities
│   ├── scenarios/       # YAML scenario loader
│   ├── analysis/        # Parameter sweeps, dashboard
│   ├── redteam/         # Attack scenarios, evaluator, evasion metrics
│   ├── boundaries/      # External world, flow tracking, policies, leakage
│   └── logging/         # Append-only JSONL logger
├── tests/               # Test suite
├── examples/            # Demo scripts
├── scenarios/           # YAML scenario definitions
├── docs/                # Documentation
└── pyproject.toml

Running Tests

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=swarm --cov-report=html

# Run specific test file
pytest tests/test_orchestrator.py -v

# Run CI checks (lint, type-checking, tests)
make ci

Documentation

Topic	Description
Theoretical Foundations	Formal model, whitepaper-style summary, and citation section
LLM Agents	Providers, personas, cost tracking, YAML config
Network Topology	Topology types, dynamic evolution, network metrics
Governance	Levers, collusion detection, integration points
Emergent Capabilities	Composite tasks, capability types, emergent metrics
Red-Teaming	Adaptive adversaries, attack strategies, evaluation results
Scenarios & Sweeps	YAML scenarios, scenario comparison, parameter sweeps
Boundaries	External world simulation, flow tracking, leakage detection
Dashboard	Streamlit dashboard setup and features
Incoherence Metric Contract	Definitions and edge-case semantics
Incoherence Scaling Analysis	Replay-sweep artifact and upgrade path
Incoherence Governance Transferability	Deployment caveats and assumptions

Start Here (Researcher Path)

Read the framing: Theoretical Foundations
Run an incoherence artifact: Incoherence Scaling Analysis
Inspect policy caveats: Incoherence Governance Transferability
Reproduce from CLI: swarm run scenarios/baseline.yaml

Citation

@software{swarm2026,
  title = {SWARM: System-Wide Assessment of Risk in Multi-agent systems},
  author = {Savitt, Raeli},
  year = {2026},
  url = {https://github.com/swarm-ai-safety/swarm}
}

Machine-readable citation metadata: CITATION.cff

Papers

Distributional AGI Safety: Governance Trade-offs in Multi-Agent Systems Under Adversarial Pressure — 11 scenarios, 209 epochs, three regimes.
Governance Mechanisms for Multi-Agent Safety — Cross-archetype empirical study of 7 scenario types
Collusion Dynamics and Network Resilience — Progressive decline vs sustained operation under network topology effects

Community

Documentation — Full guides, API reference, and research notes
GitHub Issues — Bug reports, feature requests, and agent bounties
Twitter/X — @ResearchSwarmAI

References

Kyle, A.S. (1985). Continuous Auctions and Insider Trading. Econometrica.
Glosten, L.R. & Milgrom, P.R. (1985). Bid, Ask and Transaction Prices in a Specialist Market. JFE.
Distributional Safety in Agentic Systems
Multi-Agent Market Dynamics
The Hot Mess Theory of AI
Infinite Backrooms — observational evidence of local-coherence/global-incoherence in AI-to-AI interaction
Moltbook | @sebkrier

License

MIT License - See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 798 Commits
.beads		.beads
.claude		.claude
.devcontainer		.devcontainer
.github		.github
.streamlit		.streamlit
.well-known		.well-known
docs		docs
examples		examples
research		research
runs		runs
scenarios		scenarios
scripts		scripts
swarm		swarm
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.mcp.json		.mcp.json
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN_CRITIQUE.md		DESIGN_CRITIQUE.md
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
LICENSE		LICENSE
Makefile		Makefile
Procfile		Procfile
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
security_model_personal.yaml		security_model_personal.yaml
skill.json		skill.json
skill.md		skill.md
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

S.W.A.R.M.

Why this repo is worth starring

Run your first emergent failure in 60 seconds

The Core Insight

Phenomenological Blind Spots

What Problem Does This Solve?

Who Should Use SWARM?

Questions You Can Study Quickly

Installation

Quick Start

Interactive Notebook

Blog Post

CLI Quick Start

Core Concepts

Soft Probabilistic Labels

Four Key Metrics

Governance Levers

Agent Policies

How SWARM Compares

Related work

Architecture

Directory Structure

Running Tests

Documentation

Start Here (Researcher Path)

Citation

Papers

Community

References

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages