The core execution loop lives in agent/core/loop.py. Guardrails logic is
isolated in agent/core/guardrails.py and configured via GuardrailsConfig
inside AgentConfig.
User message
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ LOOP ITERATION │
│ │
│ ┌─ Pre-flight checks ───────────────────────────────────────┐ │
│ │ • cancel_event set? → CANCELLED │ │
│ │ • elapsed > timeout? → TIMED_OUT (skipped if timeout=0) │ │
│ │ • step_count ≥ max_steps? → MAX_STEPS │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─ Context management ──────────────────────────────────────┐ │
│ │ context_window > 0 & strategy="sliding_window" │ │
│ │ → trim_to_token_budget(messages, context_window) │ │
│ │ context_window > 0 & strategy="compact" │ │
│ │ → compact_to_token_budget(messages, context_window) │ │
│ │ keeps first msg + last N msgs; inserts marker │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─ Provider request (with retries) ─────────────────────────┐ │
│ │ • streaming or complete() │ │
│ │ • exponential back-off with jitter on recoverable errors │ │
│ │ • max_retries attempts before ERROR state │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─ Budget accounting ───────────────────────────────────────┐ │
│ │ • accumulate input/output tokens → session.total_tokens │ │
│ │ • calculate cost → session.total_cost │ │
│ │ • token_budget exceeded? → BUDGET_EXCEEDED │ │
│ │ • cost_limit exceeded? → BUDGET_EXCEEDED │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────┴────────────────┐ │
│ │ tool_calls present? │ │
│ YES NO │
│ │ │ │
│ ▼ ▼ │
│ ┌─ GUARDRAIL ──────────┐ ┌─ stop_reason? ───────────────┐ │
│ │ Repetition detection │ │ END_TURN / MAX_TOKENS │ │
│ │ │ │ │ │
│ │ observe_tool_calls() │ │ MAX_TOKENS + recoveries left?│ │
│ │ is_stuck()? │ │ YES → inject continuation │ │
│ │ YES → ERROR state │ │ message & continue │ │
│ │ NO → execute tools│ │ NO → done = True │ │
│ └──────────────────────┘ └──────────────────────────────┘ │
│ │ │ │
│ ▼ │ │
│ loop continues ◄─────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
│
▼
COMPLETED / ERROR
Guardrails are mechanical safety nets — they catch runaway loops and truncated responses without needing the LLM to self-regulate.
All mutable counters live in session.metadata["guardrails"] so they are
automatically persisted and restored across session reloads.
| Config key | Default | Effect |
|---|---|---|
max_tokens_recoveries |
2 |
How many times a max_tokens truncation is auto-recovered |
reserve_tokens |
512 |
If remaining tokens ≤ this, near_budget() returns True |
Behaviour:
Model stops with stop_reason = "max_tokens"
└─ recovery_count < max_tokens_recoveries?
YES → inject continuation prompt + loop again
"Continue from exactly where you left off…"
NO → treat as normal END_TURN, exit loop
The injected prompt (guardrails.max_tokens_followup()) is added as an
internal user message (tagged data["reason"] = "max_tokens_recovery") and is
not shown in the UI.
| Config key | Default | Effect |
|---|---|---|
max_repeated_tool_steps |
3 |
Consecutive identical tool-call sets before the loop aborts |
Behaviour:
Each step with tool calls:
observe_tool_calls(session, tool_calls)
→ compute deterministic signature (tool names + argument key=value pairs)
→ same as last step? repeated_tool_steps += 1
→ different? repeated_tool_steps = 0, update signature
is_stuck()?
repeated_tool_steps ≥ max_repeated_tool_steps
YES → emit ErrorEvent, set state = ERROR, return session
The signature includes argument values (truncated at 200 chars) so calling the same tool on different files is not counted as repetition.
| Config key | Default | Effect |
|---|---|---|
reserve_tokens |
512 |
Token headroom before near_budget() fires |
reserve_cost_fraction |
0.1 |
Fraction of cost_limit treated as reserve |
near_budget() is available for callers that want to warn or slow down before
hitting hard limits. Hard budget enforcement (token_budget, cost_limit) is
handled directly in the loop — the agent exits with BUDGET_EXCEEDED state.
"guardrails": {
"max_tokens_recoveries": 2, // auto-retry truncated responses (0 = off)
"max_repeated_tool_steps": 3, // consecutive identical tool calls before abort
"reserve_tokens": 512, // near_budget() token headroom
"reserve_cost_fraction": 0.1 // near_budget() cost headroom (fraction of cost_limit)
}| Use case | Suggested adjustments |
|---|---|
| Cloud provider, tight budget | Lower reserve_cost_fraction to 0.15–0.2 to warn earlier |
| Long reasoning models (DeepSeek) | Raise max_tokens_recoveries to 3 — truncation is more likely |
| Autonomous/high-step runs | Keep max_repeated_tool_steps at 3; lower only if you trust the model |
| Local models, no cost limit | reserve_cost_fraction has no effect (cost_limit = 0) |
┌──────────────────────────────────────────────┐
│ AgentState │
│ │
start ──────► RUNNING ──────────────────────► COMPLETED │
│ │
├── cancel_event set ────────► CANCELLED │
├── elapsed > timeout > 0 ──► TIMED_OUT │
├── step_count ≥ max_steps ──► MAX_STEPS │
├── budget exceeded ─────────► BUDGET_EXCEEDED │
├── repetition guard ────────► ERROR │
├── provider error (fatal) ──► ERROR │
│ │
├─────► WAITING_FOR_TOOL ───────┐ │
│ (tool execution) │ │
└───────────────────────────────┘ │
│ │
├─────► WAITING_FOR_INPUT │
│ (interactive transports) │
│ │
└──────────────────────────────────────────────┘
_consume_stream() in agent/core/provider_runner.py wraps the async for
delta loop in try/finally so it emits exactly one
StreamChunk(finished=True) event for every stream, including:
- Streams that raise mid-iteration (caught by the outer retry loop; the
finished=Truestill fires before the exception propagates) - Streams that close without a terminal
done=Truedelta (logged at WARNING; the loop falls back toEND_TURNorTOOL_USEas appropriate)
Without this, SSE transports and TUI consumers that block on the end marker would hang after a misbehaving provider.
| File | Purpose |
|---|---|
agent/core/loop.py |
Main run_loop() coroutine — only control flow, nothing else |
agent/core/provider_runner.py |
Provider request + retry, streaming consumption, error translation |
agent/core/loop_helpers.py |
Event emission, usage/budget accounting, parse_stop, internal messages |
agent/core/guardrails.py |
LoopGuardrails, GuardrailsConfig |
agent/core/config.py |
AgentConfig — all config keys |
agent/core/state.py |
AgentState enum |
config/samples/ |
Ready-to-use config files per provider |