Tolokaforge evaluates untrusted LLM agents. This document describes the security boundaries that exist today and the threat model they address.
Tolokaforge uses Docker-based isolation for all tool execution. The orchestrator runs on the host (or in its own container) and proxies tool execution to a containerised executor via gRPC. Environment services run on an internal Docker network.
┌─────────────────┐
│ Orchestrator │ ← LLM API access (default network)
└────────┬────────┘
│ gRPC (env-net)
┌────────▼────────┐
│ Executor │ ← cap_drop: ALL, no-new-privileges
└────────┬────────┘
│ HTTP (env-net, internal: true)
┌────▼─────┬──────────┬──────────┐
│ JSON DB │ Mock Web │ RAG │
└──────────┴──────────┴──────────┘
This architecture provides:
- Executor runs with
cap_drop: ALL,cap_add: NET_BIND_SERVICE,security_opt: no-new-privileges. It can only reach services onenv-net. - env-net is an internal bridge network (
internal: true) with no external internet access. - Orchestrator is the only component with external network access (for LLM API calls).
- Environment services (JSON DB, Mock Web, RAG) are on
env-netand also exposed to localhost for development convenience.
See tolokaforge/docker/stacks/ for the predefined service stack definitions.
Tasks declare exactly which tools the agent and user can call:
tools:
agent:
enabled: ["db_query", "browser", "read_file"]
user:
enabled: ["user_check_device"]The ToolExecutor enforces:
- Only registered tools can be called
- Arguments are validated against JSON schemas (
additionalProperties: false) - Missing required parameters are rejected
- Per-tool rate limits are enforced (configurable via
ToolPolicy.rate_limit) - Per-tool timeouts are enforced (configurable via
ToolPolicy.timeout_s, default 30s)
Multiple timeout layers prevent runaway execution:
| Control | Default | Config path |
|---|---|---|
| Per-tool timeout | 30s | ToolPolicy.timeout_s |
| Per-turn timeout | 60s | orchestrator.timeouts.turn_s |
| Episode timeout | 1200s | orchestrator.timeouts.episode_s |
| Max turns | 50 | orchestrator.max_turns |
| Request throttle | 1.0/s | orchestrator.max_requests_per_second |
orchestrator.max_budget_usd sets a hard spend limit. The orchestrator tracks cumulative estimated cost and stops leasing new work when the cap is reached.
Tool call arguments are logged with automatic redaction of keys containing password, token, secret, or api_key. See ToolExecutor._redact_sensitive() in tolokaforge/tools/registry.py.
- API keys are host environment variables (loaded from
.env). In Docker mode, only the orchestrator container receives them. - Grading assets (
grading.yaml, expected states) are read by the orchestrator after trial completion. They are never passed to the agent or exposed through tool outputs. .envis gitignored and never mounted into executor or environment containers.
The agent never sees grading criteria or expected outputs:
grading.yamland expected state files stay on the host- Environment services reset per trial from fixtures
- Grading runs post-trial using host-side data only
- JSON DB state is namespaced per
{task_id}_{trial_idx}so trials cannot interfere with each other
| Threat | Mitigation |
|---|---|
| Agent calling unauthorized tools | Tool allowlisting + schema validation |
| Runaway execution (cost/time) | Budget cap, episode timeout, max turns, per-tool timeout |
| Agent accessing grading criteria | Grading data is host-side only, never in tool outputs |
| Environment state leaking between trials | Per-trial namespace isolation in JSON DB |
| Executor reaching external internet (Docker mode) | env-net is internal: true |
| Executor privilege escalation (Docker mode) | cap_drop: ALL, no-new-privileges |
| Sensitive data in logs | Automatic key redaction in tool call logging |
| Threat | Notes |
|---|---|
| Host-level Docker escapes | Out of scope; assumes Docker daemon is trusted |
| Supply chain attacks in task code | Task authors are assumed trusted |
| Side-channel attacks | Not mitigated |
| Agent exfiltrating data via LLM output | The orchestrator relays model output; no content filtering |
Security-related tests live in tests/integration/test_security.py:
TestToolAllowlisting— unregistered tool rejection, schema validation, rate limitingTestDockerSecurity— verifiesenv-netisinternal: truein docker-compose.yamlTestNetworkIsolation— executor connectivity checks (requires Docker)
Run them:
# Tool-level tests (no Docker needed)
uv run pytest tests/integration/test_security.py -v -k "Allowlisting"
# Docker isolation tests (requires running containers)
docker compose up -d
uv run pytest tests/integration/test_security.py -v -m requires_dockerBefore running evaluations:
- API keys in
.env, not committed to git -
orchestrator.max_budget_usdset for long runs - Task YAML uses minimal tool allowlist (don't enable
bashunless needed) - Ensure
runtime: "docker"is set and Docker services are running (docker compose up -d) - Verify
env-netisinternal: trueindocker-compose.yaml