Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
0a47711
feat(worker): centralize API routes and refactor WorkerRuntime constr…
leynos Mar 24, 2026
17f1e1f
refactor(worker): reduce api.rs size by extracting client methods
leynos Mar 26, 2026
1326382
refactor(worker): derive ROUTE constants from PATH to prevent drift
leynos Mar 26, 2026
0199548
refactor(worker): extract reporting logic to meet 400-line limit
leynos Mar 26, 2026
440fbb6
fix(worker): address code review findings
leynos Mar 27, 2026
3799776
fix(tests): update test files to use refactored WorkerRuntime::new API
leynos Mar 27, 2026
ae36562
fix(worker): use iteration 100 for pre-loop failure reporting
leynos Mar 27, 2026
b95241e
fix(tests): restore register_remote_tools call in test helper
leynos Mar 27, 2026
eff07b0
refactor(worker): address code review findings for error handling
leynos Mar 27, 2026
095ca66
fix(rebase): resolve merge conflicts and update test files for new API
leynos Mar 27, 2026
22ed684
style: apply cargo fmt fixes to worker_remote_tool_proxy tests
leynos Mar 28, 2026
372871b
test: add regression tests for WorkerRuntime and WorkerHttpClient err…
leynos Mar 29, 2026
a189552
test: fix doc comments and convert async tests to sync
leynos Mar 29, 2026
fa6b0a6
refactor: address code review findings for error handling consistency
leynos Mar 31, 2026
9977cf4
Fix worker API and container issues from code review
leynos Mar 31, 2026
024ae4d
refactor(worker): extract report_stopped_outcome helper to reduce dup…
leynos Apr 1, 2026
cff9d94
Tighten hosted worker runtime error handling
leynos Apr 1, 2026
fc82e09
Normalise worker API job paths and test helpers
leynos Apr 1, 2026
75f999f
Re-export job_scoped_path and fix remote_tools test context handling
leynos Apr 1, 2026
255c447
Use anyhow::Context for remote_tools test error wrapping
leynos Apr 2, 2026
81c8db9
Tighten worker runtime error handling and event flushing
leynos Apr 2, 2026
f0978a1
Apply cargo fmt to worker_job_url
leynos Apr 2, 2026
9da84fe
Extract post_and_require_success helper in worker client
leynos Apr 2, 2026
1f5a34b
Fix post_event docs and use full endpoint URLs in worker client errors
leynos Apr 2, 2026
b28633b
feat(worker-api): add detailed worker-orchestrator HTTP boundary cont…
leynos Apr 2, 2026
1e8a0b4
refactor(worker/runtime): refactor job loop to use timeout within run…
leynos Apr 3, 2026
9d84ce9
test(worker/api/tests): refactor client test setup and add generic ro…
leynos Apr 3, 2026
4eecb94
test(worker/api/tests): simplify terminal result round-trip test
leynos Apr 3, 2026
b45bdc6
refactor(worker/api): split API types into proxy and remote tool modules
leynos Apr 3, 2026
b94696d
docs(worker/api): add module docs for proxy_types and remote_tool_types
leynos Apr 3, 2026
61fe147
docs(api): correct doc comment for ProxyCompletionRequest struct
leynos Apr 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ hex = "0.4.3"
rusqlite = { version = "0.32", optional = true }
json5 = { version = "0.4", optional = true }
tempfile = { version = "3", optional = true }
const_format = { version = "0.2.35", default-features = false }

# macOS keychain
[target.'cfg(target_os = "macos")'.dependencies]
Expand Down
3 changes: 3 additions & 0 deletions docs/contents.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@
provider interfaces, adapters, and the places embeddings are used.
- [Jobs and routines](jobs-and-routines.md) covers the scheduler, background
jobs, routines engine, touchpoints, and extension seams.
- [Worker-orchestrator contract](worker-orchestrator-contract.md) documents the
sandbox worker HTTP boundary, the shared route constants, and the reporting
split between authoritative status and best-effort events.
- [Agent skills support](agent-skills-support.md) explains how skills are
discovered, installed, selected, and injected into model context.
- [Smart routing spec](smart-routing-spec.md) captures the current design for
Expand Down
156 changes: 156 additions & 0 deletions docs/worker-orchestrator-contract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Worker-orchestrator contract

This document is for maintainers who need to change the hosted worker path
without breaking the orchestrator boundary. It explains the current transport
contract, the dependency-injection seams used for tests and production, and the
event-reporting split between authoritative state and best-effort visibility.
Comment thread
leynos marked this conversation as resolved.

## 1. Scope and source of truth

The worker runtime runs inside a container and talks back to the orchestrator
over HTTP. The shared transport contract lives in
[the worker API module](../src/worker/api.rs) and
[its shared types](../src/worker/api/types.rs). The orchestrator side imports
the same route constants and payload types from that module instead of
re-declaring them.

This document is descriptive for the current implementation. The code remains
the authoritative source of truth for the wire format.

## 2. Boundary model

The worker and orchestrator have distinct responsibilities:

- The orchestrator owns job lifecycle, credential issuance, event ingestion,
and proxied external access.
- The worker owns local reasoning, tool execution inside the sandbox, and
periodic reporting back to the orchestrator.
- The shared HTTP boundary exists so the worker can stay isolated from the
host process while still using the host's approved network, credential, and
observability surfaces.

Figure 1. Worker-orchestrator boundary and reporting channels.

```mermaid
flowchart LR
WorkerRuntime[WorkerRuntime in container]
Delegate[ContainerDelegate]
Client[WorkerHttpClient]
Orchestrator[Orchestrator worker API]
Timeline[Job event timeline]
Status[Authoritative status store]
Completion[Authoritative completion store]

WorkerRuntime --> Delegate
WorkerRuntime --> Client
Delegate --> Client
Client --> Orchestrator
Orchestrator --> Timeline
Orchestrator --> Status
Orchestrator --> Completion
```

## 3. Shared route constants

All worker endpoints are declared once in `src/worker/api/types.rs` as paired
`*_PATH` and `*_ROUTE` constants. The design intent is:

- `*_PATH` is the relative suffix used by `WorkerHttpClient`.
- `*_ROUTE` is the fully scoped Axum route used by the orchestrator router.
- Both sides derive their concrete URLs from the same source strings.

The current contract includes:

Worker-orchestrator HTTP endpoints and their purposes.

| Endpoint | Purpose |
| --- | --- |
| `GET /worker/{job_id}/job` | Fetch the sandboxed job description |
| `GET /worker/{job_id}/credentials` | Deliver job-scoped credentials for child-process injection |
| `POST /worker/{job_id}/status` | Persist authoritative progress state |
| `POST /worker/{job_id}/complete` | Persist authoritative terminal outcome |
| `POST /worker/{job_id}/event` | Append user-visible timeline events |
| `GET /worker/{job_id}/prompt` | Poll orchestrator-injected follow-up prompts |
| `POST /worker/{job_id}/llm/complete` | Proxy plain language model (LLM) completion |
| `POST /worker/{job_id}/llm/complete_with_tools` | Proxy tool-capable language model (LLM) completion |
| `GET /worker/{job_id}/tools/catalog` | Fetch hosted-visible remote tool definitions |
| `POST /worker/{job_id}/tools/execute` | Execute a hosted remote tool through the orchestrator |
Comment thread
coderabbitai[bot] marked this conversation as resolved.

Compile-time assertions in the worker API tests lock the canonical route values
so accidental path drift fails the build before runtime tests execute.

## 4. Dependency injection and construction

`WorkerRuntime` uses two constructors with distinct roles:

- `WorkerRuntime::new(config, client)` is the primary constructor. It is used
by tests and by any caller that already owns a prepared `WorkerHttpClient`.
- `WorkerRuntime::from_env(config)` is the production convenience wrapper. It
reads `IRONCLAW_WORKER_TOKEN` and then delegates to `new`, which builds the
HTTP client with the shared timeout and error mapping.

This split exists so tests can validate runtime behaviour without relying on
ambient environment state. It also gives construction-time validation one
obvious home: `new` checks that `WorkerConfig` and `WorkerHttpClient` agree on
job identity and orchestrator base URL before the runtime starts.

`WorkerHttpClient::new(...)` follows the same pattern for tests, while
`WorkerHttpClient::from_env(...)` is reserved for production bootstrap.

## 5. Authoritative reports versus best-effort events

The worker emits two classes of outbound signal:

- Authoritative reports:
- `report_status`
- `report_complete`
- Best-effort timeline events:
- `post_event`
- `report_status_lossy`

The distinction matters:

- Status and completion calls define the durable job record. If they fail at a
point where correctness depends on them, the worker treats that as a real
error.
- Event posting exists for operator visibility. It enriches the browser and
audit timeline, but it must not be allowed to block or invalidate terminal
completion reporting.

`ContainerDelegate` therefore, uses a background task and bounded queue for
event posting. `shutdown()` closes the queue and waits for the event worker, so
buffered events flush before the delegate disappears.

`WorkerRuntime::post_event(...)` also uses a bounded timeout around terminal
event publication, so the final `report_complete(...)` call remains the
authoritative acknowledgement path.

## 6. Credential handling

Credentials are fetched through `GET /worker/{job_id}/credentials` and
deserialized into `CredentialResponse`. The worker runtime does not write them
into global process environment variables. Instead:

1. `WorkerRuntime::hydrate_credentials()` fetches the granted credentials.
2. The runtime stores them in `extra_env`.
3. Tool execution passes `extra_env` through `JobContext` into child processes.

This keeps credential scope limited to the worker execution path and avoids
cross-test or cross-job global environment mutation.

## 7. Prompt polling and hosted tool context

The worker loop polls `GET /worker/{job_id}/prompt` before LLM calls. The
orchestrator can use that channel to inject operator prompts or follow-up work
without restarting the worker process.

Hosted remote tools use a parallel mechanism:

1. The worker fetches the hosted tool catalogue from the orchestrator.
2. The worker registers local proxy wrappers using the orchestrator-provided
canonical `ToolDefinition` values.
3. The runtime merges those definitions into the reasoning context alongside
container-local tools.

The shared route constants and transport types are what keep that hosted tool
surface consistent across the sandbox boundary.
2 changes: 1 addition & 1 deletion src/error/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ pub use self::repair::RepairError;
pub use self::routine::RoutineError;
pub use self::safety::SafetyError;
pub use self::tool::ToolError;
pub use self::worker::WorkerError;
pub use self::worker::{ConfigMismatchField, WorkerError};
pub use self::workspace::WorkspaceError;
pub use crate::llm::error::LlmError;

Expand Down
28 changes: 28 additions & 0 deletions src/error/worker.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,24 @@ use std::time::Duration;

use uuid::Uuid;

/// Configuration field that mismatched between worker config and HTTP client.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum ConfigMismatchField {
/// The job_id field mismatched.
JobId,
/// The orchestrator_url field mismatched.
OrchestratorUrl,
}

impl std::fmt::Display for ConfigMismatchField {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Self::JobId => write!(f, "job_id"),
Self::OrchestratorUrl => write!(f, "orchestrator_url"),
}
}
}

/// Worker errors (container-side execution).
#[derive(Debug, thiserror::Error)]
pub enum WorkerError {
Expand Down Expand Up @@ -76,4 +94,14 @@ pub enum WorkerError {
/// The worker token environment variable was not available at startup.
#[error("Missing worker token (IRONCLAW_WORKER_TOKEN not set)")]
MissingToken,

/// The worker configuration does not match the provided HTTP client.
///
/// `field` identifies which configuration field mismatched, and `reason`
/// describes the mismatch.
#[error("Worker configuration mismatch for {field}: {reason}")]
ConfigMismatch {
field: ConfigMismatchField,
reason: String,
},
}
27 changes: 14 additions & 13 deletions src/orchestrator/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,11 @@ mod handler_support;
mod handlers;
mod remote_tools;

use crate::worker::api::{REMOTE_TOOL_CATALOG_ROUTE, REMOTE_TOOL_EXECUTE_ROUTE};
use crate::worker::api::{
COMPLETE_ROUTE, CREDENTIALS_ROUTE, EVENT_ROUTE, JOB_ROUTE, LLM_COMPLETE_ROUTE,
LLM_COMPLETE_WITH_TOOLS_ROUTE, PROMPT_ROUTE, REMOTE_TOOL_CATALOG_ROUTE,
REMOTE_TOOL_EXECUTE_ROUTE, STATUS_ROUTE, WORKER_HEALTH_ROUTE,
};
use handler_support::{get_credentials_handler, get_prompt_handler};
use handlers::{
execute_remote_tool, get_job, get_remote_tool_catalog, health_check, job_event_handler,
Expand Down Expand Up @@ -65,25 +69,22 @@ impl OrchestratorApi {
pub fn router(state: OrchestratorState) -> Router {
Router::new()
// Worker routes: authenticated via route_layer middleware.
.route("/worker/{job_id}/job", get(get_job))
.route("/worker/{job_id}/llm/complete", post(llm_complete))
.route(
"/worker/{job_id}/llm/complete_with_tools",
post(llm_complete_with_tools),
)
.route(JOB_ROUTE, get(get_job))
.route(LLM_COMPLETE_ROUTE, post(llm_complete))
.route(LLM_COMPLETE_WITH_TOOLS_ROUTE, post(llm_complete_with_tools))
.route(REMOTE_TOOL_CATALOG_ROUTE, get(get_remote_tool_catalog))
.route(REMOTE_TOOL_EXECUTE_ROUTE, post(execute_remote_tool))
.route("/worker/{job_id}/status", post(report_status))
.route("/worker/{job_id}/complete", post(report_complete))
.route("/worker/{job_id}/event", post(job_event_handler))
.route("/worker/{job_id}/prompt", get(get_prompt_handler))
.route("/worker/{job_id}/credentials", get(get_credentials_handler))
.route(STATUS_ROUTE, post(report_status))
.route(COMPLETE_ROUTE, post(report_complete))
.route(EVENT_ROUTE, post(job_event_handler))
.route(PROMPT_ROUTE, get(get_prompt_handler))
.route(CREDENTIALS_ROUTE, get(get_credentials_handler))
.route_layer(axum::middleware::from_fn_with_state(
state.token_store.clone(),
worker_auth_middleware,
))
// Unauthenticated routes (added after the layer).
.route("/health", get(health_check))
.route(WORKER_HEALTH_ROUTE, get(health_check))
.with_state(state)
}

Expand Down
33 changes: 17 additions & 16 deletions src/tools/builtin/worker_remote_tool_proxy/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,10 @@ async fn proxy_test_server() -> anyhow::Result<ProxyTestServer> {
let _ = axum::serve(listener, router).await;
});
let job_id = Uuid::new_v4();
let client = Arc::new(WorkerHttpClient::new(
format!("http://{}", addr),
job_id,
"test-token".to_string(),
));
let client = Arc::new(
WorkerHttpClient::new(format!("http://{}", addr), job_id, "test-token".to_string())
.context("test client should build")?,
);
Ok(ProxyTestServer {
client,
job_id,
Expand Down Expand Up @@ -183,11 +182,14 @@ async fn worker_remote_tool_proxy_preserves_full_tool_definition_fields() {
"Complex tool for proxy definition fidelity testing",
);

let client = Arc::new(WorkerHttpClient::new(
"http://127.0.0.1:0".to_string(),
Uuid::new_v4(),
"test-token".to_string(),
));
let client = Arc::new(
WorkerHttpClient::new(
"http://127.0.0.1:0".to_string(),
Uuid::new_v4(),
"test-token".to_string(),
)
.expect("test client should build"),
);
let proxy = WorkerRemoteToolProxy::new(complex_definition.clone(), client);

let reconstructed = ToolDefinition {
Expand Down Expand Up @@ -243,11 +245,10 @@ async fn worker_remote_tool_proxy_routes_execution_through_orchestrator_endpoint
});

let job_id = Uuid::new_v4();
let client = Arc::new(WorkerHttpClient::new(
format!("http://{}", addr),
job_id,
"test-token".to_string(),
));
let client = Arc::new(
WorkerHttpClient::new(format!("http://{}", addr), job_id, "test-token".to_string())
.context("test client should build")?,
);
Comment thread
coderabbitai[bot] marked this conversation as resolved.
let proxy = WorkerRemoteToolProxy::new(
ToolDefinition {
name: "route_test_tool".to_string(),
Expand All @@ -272,7 +273,7 @@ async fn worker_remote_tool_proxy_routes_execution_through_orchestrator_endpoint
let (route_path, received_job_id, tool_name) = &requests[0];
assert_eq!(
route_path,
&format!("/worker/{}/tools/execute", job_id),
&REMOTE_TOOL_EXECUTE_ROUTE.replace("{job_id}", &job_id.to_string()),
"proxy must route execution through the correct orchestrator endpoint"
);
assert_eq!(received_job_id, &job_id);
Expand Down
Loading
Loading