Security Model

Tolokaforge evaluates untrusted LLM agents. This document describes the security boundaries that exist today and the threat model they address.

Architecture Overview

Tolokaforge uses Docker-based isolation for all tool execution. The orchestrator runs on the host (or in its own container) and proxies tool execution to a containerised executor via gRPC. Environment services run on an internal Docker network.

┌─────────────────┐
│  Orchestrator   │ ← LLM API access (default network)
└────────┬────────┘
         │ gRPC (env-net)
┌────────▼────────┐
│    Executor     │ ← cap_drop: ALL, no-new-privileges
└────────┬────────┘
         │ HTTP (env-net, internal: true)
    ┌────▼─────┬──────────┬──────────┐
    │ JSON DB  │ Mock Web │   RAG    │
    └──────────┴──────────┴──────────┘

This architecture provides:

Executor runs with cap_drop: ALL, cap_add: NET_BIND_SERVICE, security_opt: no-new-privileges. It can only reach services on env-net.
env-net is an internal bridge network (internal: true) with no external internet access.
Orchestrator is the only component with external network access (for LLM API calls).
Environment services (JSON DB, Mock Web, RAG) are on env-net and also exposed to localhost for development convenience.

See tolokaforge/docker/stacks/ for the predefined service stack definitions.

Tool-Level Security

Tool Allowlisting

Tasks declare exactly which tools the agent and user can call:

tools:
  agent:
    enabled: ["db_query", "browser", "read_file"]
  user:
    enabled: ["user_check_device"]

The ToolExecutor enforces:

Only registered tools can be called
Arguments are validated against JSON schemas (additionalProperties: false)
Missing required parameters are rejected
Per-tool rate limits are enforced (configurable via ToolPolicy.rate_limit)
Per-tool timeouts are enforced (configurable via ToolPolicy.timeout_s, default 30s)

Rate Limits and Timeouts

Multiple timeout layers prevent runaway execution:

Control	Default	Config path
Per-tool timeout	30s	`ToolPolicy.timeout_s`
Per-turn timeout	60s	`orchestrator.timeouts.turn_s`
Episode timeout	1200s	`orchestrator.timeouts.episode_s`
Max turns	50	`orchestrator.max_turns`
Request throttle	1.0/s	`orchestrator.max_requests_per_second`

Budget Cap

orchestrator.max_budget_usd sets a hard spend limit. The orchestrator tracks cumulative estimated cost and stops leasing new work when the cap is reached.

Log Redaction

Tool call arguments are logged with automatic redaction of keys containing password, token, secret, or api_key. See ToolExecutor._redact_sensitive() in tolokaforge/tools/registry.py.

Secret Management

API keys are host environment variables (loaded from .env). In Docker mode, only the orchestrator container receives them.
Grading assets (grading.yaml, expected states) are read by the orchestrator after trial completion. They are never passed to the agent or exposed through tool outputs.
.env is gitignored and never mounted into executor or environment containers.

Ground Truth Isolation

The agent never sees grading criteria or expected outputs:

grading.yaml and expected state files stay on the host
Environment services reset per trial from fixtures
Grading runs post-trial using host-side data only
JSON DB state is namespaced per {task_id}_{trial_idx} so trials cannot interfere with each other

Threat Model

Addressed

Threat	Mitigation
Agent calling unauthorized tools	Tool allowlisting + schema validation
Runaway execution (cost/time)	Budget cap, episode timeout, max turns, per-tool timeout
Agent accessing grading criteria	Grading data is host-side only, never in tool outputs
Environment state leaking between trials	Per-trial namespace isolation in JSON DB
Executor reaching external internet (Docker mode)	`env-net` is `internal: true`
Executor privilege escalation (Docker mode)	`cap_drop: ALL`, `no-new-privileges`
Sensitive data in logs	Automatic key redaction in tool call logging

Not Addressed

Threat	Notes
Host-level Docker escapes	Out of scope; assumes Docker daemon is trusted
Supply chain attacks in task code	Task authors are assumed trusted
Side-channel attacks	Not mitigated
Agent exfiltrating data via LLM output	The orchestrator relays model output; no content filtering

Testing

Security-related tests live in tests/integration/test_security.py:

TestToolAllowlisting — unregistered tool rejection, schema validation, rate limiting
TestDockerSecurity — verifies env-net is internal: true in docker-compose.yaml
TestNetworkIsolation — executor connectivity checks (requires Docker)

Run them:

# Tool-level tests (no Docker needed)
uv run pytest tests/integration/test_security.py -v -k "Allowlisting"

# Docker isolation tests (requires running containers)
docker compose up -d
uv run pytest tests/integration/test_security.py -v -m requires_docker

Security Checklist

Before running evaluations:

API keys in .env, not committed to git
orchestrator.max_budget_usd set for long runs
Task YAML uses minimal tool allowlist (don't enable bash unless needed)
Ensure runtime: "docker" is set and Docker services are running (docker compose up -d)
Verify env-net is internal: true in docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security

docs/SECURITY.md

Security Model

Architecture Overview

Tool-Level Security

Tool Allowlisting

Rate Limits and Timeouts

Budget Cap

Log Redaction

Secret Management

Ground Truth Isolation

Threat Model

Addressed

Not Addressed

Testing

Security Checklist

There aren't any published security advisories

Security: Toloka/tolokaforge

Security

docs/SECURITY.md

Security Model

Architecture Overview

Tool-Level Security

Tool Allowlisting

Rate Limits and Timeouts

Budget Cap

Log Redaction

Secret Management

Ground Truth Isolation

Threat Model

Addressed

Not Addressed

Testing

Security Checklist

There aren't any published security advisories