The lightweight core AI agent trait and CLI engine.
servling is a Rust crate providing a standardized, resilient interface for AI agents that run as CLI tools. It's the "little servant" (servling) that handles the messy work of interacting with LLM-powered command-line interfaces.
servling now has two explicit execution lanes:
- Batch / turn lane via
TurnRunner/Servling - Interactive session lane via
SessionBackend - Backend registry via
BackendDescriptorhelpers for clean provider discovery and construction
Built originally as the core engine for high-reliability agent tasks, it manages streaming, timeouts, token usage tracking, and automatic fallback logic. That batch lane remains intact. The session lane is additive and capability-based.
- Two-Lane Execution Model: Batch turns stay separate from interactive sessions so real session transports do not get forced into one-shot abstractions.
- Shared Backend Metadata Trait:
Backenddefines provider identity, transport truth, and capability truth once for both lanes. - Standardized Batch Trait:
TurnRunner/Servlingkeeps the existing one-shot path stable. - Optional Interactive Backends:
SessionBackendis only implemented where the provider actually supports interactive transport semantics. - Resilient Fallbacks: Automatic "chain-of-command" logic. If
Claudeis rate-limited,servlingcan automatically fall back toCopilotorCodexwithout missing a beat. - Provider-Pinned Sessions: Interactive sessions do not silently migrate across providers.
- Observability: Built-in
stderrparsing for real-time token usage, cost estimation (USD), and efficiency ratings. - Live Streaming: Full support for real-time output streaming from underlying CLI processes.
- Mission Control: Handles standard mission/task structures, timeouts, and outcome classifications (Ok, Failed, Timeout, RateLimited).
| Backend | CLI Tool |
|---|---|
| Claude | Claude Code |
| Copilot | GitHub Copilot CLI |
| Codex | OpenAI Codex / Generic CLI Wrappers |
| Provider | Transport | Status |
|---|---|---|
| Copilot | cli_jsonrpc over ACP (copilot --acp) |
Implemented |
| Claude | N/A | Batch only |
| Codex | cli_resumable_turns via codex exec resume --json |
Implemented |
Add servling to your Cargo.toml:
[dependencies]
servling = { path = "../servling" }use servling::{build_servling, LLMRequest};
use std::path::PathBuf;
fn main() -> anyhow::Result<()> {
// 1. Build a backend (or a chain of backends!)
let agent = build_servling("codex", None)?;
// 2. Prepare a request
let request = LLMRequest {
prompt: "Refactor this function for better performance.".to_string(),
working_dir: PathBuf::from("."),
source_writable_roots: vec![PathBuf::from(".")],
runtime_writable_roots: Vec::new(),
runtime_env: Vec::new(),
runtime_profile: None,
model: None,
max_runtime_seconds: 300,
stream_output: true,
input_file: None,
};
// 3. Execute!
let response = agent.execute(&request)?;
println!("Response: {}", response.text);
if let Some(usage) = response.token_usage {
println!("{}", usage.to_display_line());
}
Ok(())
}servling doesn't just run agents; it watches them. It parses CLI output to provide detailed stats:
📊 Tokens: 1.2M in, 8.5k out (935.5k cached) | Model: sonnet | Premium: 3
It can even calculate the estimated cost of your session and rate your efficiency from Excellent ✨ to Critical ❌.
servling doesn't just call a library; it orchestrates full-blown CLI processes. Here's what happens when you call agent.execute():
- Command Expansion: It takes a template (e.g.,
claude --print) and dynamically injects parameters like{input_file},{mission_dir}, and the requested--model. - Subprocess Management: It spawns the CLI tool using
std::process::Command, carefully pipingstdin,stdout, andstderr. - IO Orchestration:
- Stdin: If the agent needs a prompt,
servlingwrites it directly to the process's standard input. - Stdout: Captures the agent's response. If
stream_outputis enabled, it renders a cleaned-up version of the response (e.g., stripping JSON noise from Claude's output) in real-time. - Stderr: This is the "metadata channel."
servlingscans this stream with optimized regex to extract token usage, premium request counts, and model names.
- Stdin: If the agent needs a prompt,
- Resilience: It monitors the process for timeouts and specific error patterns (like rate limits) to decide if it should trigger an automatic fallback to the next agent in your chain.
core.rs: SharedBackendmetadata trait plus batch lane traits, provider-neutral transport, and capability truth.backend_registry.rs: Declarative provider registry for batch/session construction.runner.rs: Low-level CLI execution, streaming, and timeout logic.coding_agent.rs: High-level orchestration and fallback chains.session.rs: Interactive session traits, handles, and bounded event model.copilot_acp.rs: Copilot ACP session backend over stdio JSON-RPC.codex_session.rs: Codex resumable-turn session backend overcodex exec --json.token_usage.rs: Regex-powered parsing for AI provider output formats.
- Batch fallback is still valid and automatic.
- Interactive sessions are provider-pinned after creation.
- Capability differences are explicit through structured
ProviderCapabilities { batch, session }. - Provider selection is centralized through
BackendDescriptorregistry helpers instead of scattered factory switches. servlingowns provider/session transport behavior, not durable operator truth.workroachremains the intended future live process host for coding sessions.orchlordshould only persist coarse provider/transport/session-ref truth, not ACP protocol details.