diff --git a/docs/contents.md b/docs/contents.md index a48d6a5c3..60cc641cb 100644 --- a/docs/contents.md +++ b/docs/contents.md @@ -119,6 +119,9 @@ - [Merge remote Model Context Protocol (MCP) tool definitions into the worker reasoning context](execplans/1-1-3-merge-mcp-defs-into-worker-reasoning-context.md) plans roadmap item `1.1.3` for the worker-side merged hosted-tool reasoning surface. + - [Add hosted-mode tests for schema fidelity and execution routing](execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md) + plans roadmap item `1.1.4` for the test matrix that locks down MCP tool + contract fidelity and execution routing guarantees. ## RFCs diff --git a/docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md b/docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md new file mode 100644 index 000000000..af1b9fd9e --- /dev/null +++ b/docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md @@ -0,0 +1,958 @@ +# Add hosted-mode tests for schema fidelity and execution routing + +This ExecPlan (execution plan) is a living document. The sections +`Constraints`, `Tolerances`, `Risks`, `Progress`, +`Surprises & Discoveries`, `Decision Log`, and +`Outcomes & Retrospective` must be kept up to date as work proceeds. + +Status: COMPLETE + +## Purpose / big picture + +Roadmap item `1.1.4` exists to close the testing gap left after the transport +(`1.1.1`), filtering (`1.1.2`), and reasoning-context merge (`1.1.3`) work. +Those earlier items delivered the runtime behaviour; this item proves that +behaviour cannot silently regress. + +After this work, the test suite must fail loudly if any of the following +regressions occur: + +1. A required Model Context Protocol (MCP) field (`name`, `description`, or + `parameters`) disappears or is rewritten during the journey from + orchestrator registry to worker-advertised proxy. +2. An advertised remote tool executes through a local stub rather than + through the orchestrator's generic remote-tool execution endpoint. +3. The worker-orchestrator transport contract drifts so that route paths, + request shapes, or response shapes become inconsistent between the two + sides. + +Success is observable in two ways. First, `make test` passes and every new +test added by this plan is present and green. Second, deliberately breaking +a field, route, or execution path causes at least one of the new tests to +fail with a clear, diagnostic message. + +This plan tracks the worker-orchestrator parity portion of +[Issue #16](https://github.com/leynos/axinite/issues/16) and addresses +[RFC 0001 §Migration Plan](../rfcs/0001-expose-mcp-tool-definitions.md#migration-plan) +step 5. + +## Approval gates + +- Plan approved + Acceptance criteria: the plan stays scoped to tests and test fixtures for + schema fidelity, execution routing, and worker-orchestrator contract parity. + No runtime behaviour changes beyond what is needed to make the code testable. + Sign-off: human reviewer approves this ExecPlan before implementation begins. +- Implementation complete + Acceptance criteria: all test families described in the milestones are + implemented, the existing test suite is unbroken, and the RFC and roadmap + are updated to reflect the completed state. + Sign-off: implementer marks the plan in progress and then complete after + final validation. +- Validation passed + Acceptance criteria: `make all` passes with retained logs, Markdown linting + passes for changed documentation, and the final plan notes record validation + evidence. + Sign-off: implementer records final evidence immediately before commit. +- Docs synced + Acceptance criteria: `docs/roadmap.md` marks `1.1.4` done, + `docs/rfcs/0001-expose-mcp-tool-definitions.md` implementation status + reflects the completed test matrix, `docs/contents.md` indexes this + ExecPlan, and `docs/users-guide.md` is reviewed for accuracy against the + tested behaviour. + Sign-off: implementer completes documentation sync as the final pre-commit + checkpoint. + +## Repository orientation + +The following seams already exist and carry the runtime behaviour that this +plan must lock down with tests. All paths are relative to the repository root. + +### Orchestrator side (catalogue construction and execution) + +- `src/orchestrator/api/remote_tools.rs` owns `hosted_remote_tool_catalog()` + and `execute_hosted_remote_tool()`. The catalogue function reads the + canonical `ToolRegistry`, filters for hosted-visible sources, sorts + alphabetically, and hashes the payload into a deterministic + `catalog_version`. The execution function resolves a tool by name through + the registry's hosted lookup, checks approval requirements, builds a + `JobContext`, calls `tool.execute(...)`, and maps `ToolError` variants to + HTTP status codes. + +- `src/tools/registry/hosted.rs` owns the canonical hosted-tool projection: + `hosted_tool_definitions()`, `get_hosted_tool()`, and the private + `hosted_tool_lookup()` and `is_tool_ineligible_for_hosted_catalogue()` + predicates. Eligibility requires the tool to be in the `Orchestrator` + domain, not protected, sourced from an allowed `HostedToolCatalogSource`, + and not approval-gated. + +- `src/tools/registry/loader.rs` owns `ToolRegistry`, including + `is_protected_tool_name()` and the `PROTECTED_TOOL_NAMES` constant. + +### Worker side (proxy registration and reasoning-context merge) + +- `src/worker/container.rs` owns `WorkerRuntime`, including + `build_tools()`, `register_remote_tools()`, and + `build_reasoning_context()`. The `available_tool_definitions()` helper + reads the combined local-plus-proxy registry to produce the sorted merged + tool surface. + +- `src/worker/container/delegate.rs` owns + `ContainerDelegate::before_llm_call()`, which refreshes + `reason_ctx.available_tools` before each hosted large language model (LLM) + call using the same `available_tool_definitions()` helper. + +- `src/tools/builtin/worker_remote_tool_proxy.rs` owns + `WorkerRemoteToolProxy` and `register_worker_remote_tool_proxies()`. Each + proxy caches a `ToolDefinition` and delegates `execute()` to the worker + HTTP client's `execute_remote_tool()` method. + +### Shared transport contract + +- `src/worker/api.rs` owns the worker HTTP client methods + `get_remote_tool_catalog()` and `execute_remote_tool()`. + +- `src/worker/api/types.rs` owns the shared route constants + (`REMOTE_TOOL_CATALOG_ROUTE`, `REMOTE_TOOL_EXECUTE_ROUTE`) and the shared + payload types (`RemoteToolCatalogResponse`, + `RemoteToolExecutionRequest`, `RemoteToolExecutionResponse`). + +### Existing test seams + +- `src/orchestrator/api/tests/remote_tools.rs` tests catalogue visibility, + execution routing, error-status mapping, and job-identifier propagation. + Uses fixtures from `src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs` + and `src/orchestrator/api/tests/fixtures/remote_tool_helpers.rs`. + +- `src/worker/container/tests.rs` tests remote-tool registration, merged + reasoning context, guidance deduplication, and degraded startup. + +- `src/tools/registry/tests.rs` tests registry operations, alphabetical + sorting, hosted source filtering, and hosted lookup error reporting. + +- `src/tools/builtin/worker_remote_tool_proxy.rs` (tests module) tests + round-trip execution and schema preservation through the proxy. + +### Design authority + +- `docs/rfcs/0001-expose-mcp-tool-definitions.md` is the design authority. + Its §Testing Strategy names three test families (unit tests, behavioural + tests, and a regression test target) and its §Migration Plan step 5 names + the deliverable: "targeted tests for definition fidelity, execution + routing, and contract parity between worker and orchestrator." + +- `docs/roadmap.md` item `1.1.4` defines the success criterion: + "tests fail if required MCP fields disappear or are rewritten incorrectly, + and prove that advertised remote tools execute through the orchestrator + rather than a local stub." + +### Reference documents + +The following project documents inform test design and code style: + +- `docs/rust-testing-with-rstest-fixtures.md` for `rstest` fixture patterns. +- `docs/rstest-bdd-users-guide.md` for `rstest-bdd` behavioural test + patterns. +- `docs/reliable-testing-in-rust-via-dependency-injection.md` for dependency + injection testing patterns. +- `docs/complexity-antipatterns-and-refactoring-strategies.md` for complexity + management. +- `AGENTS.md` for repository quality gates and commit conventions. + +## Constraints + +- Do not change runtime behaviour. This plan adds tests and test fixtures + only. The sole exception is refactoring existing test helpers to make them + reusable across the new test families, provided such refactoring does not + change external behaviour. + +- Preserve the existing `ToolDefinition` contract. The LLM-facing shape + (`name: String`, `description: String`, `parameters: serde_json::Value`) + must not be altered. + +- Keep tests in the modules that own the seams being tested. Orchestrator + catalogue tests stay in `src/orchestrator/api/tests/`, worker merge tests + stay in `src/worker/container/tests.rs`, proxy fidelity tests stay in + `src/tools/builtin/worker_remote_tool_proxy.rs`, and registry tests stay + in `src/tools/registry/tests.rs`. + +- Use `rstest` fixtures for shared setup, `mockall` where ad hoc mocks are + needed, and in-process harnesses rather than Docker, live MCP services, or + external network dependencies. + +- Use `rstest-bdd` for behavioural tests where scenarios add clarity beyond + what unit-level `rstest` coverage provides. If an `rstest-bdd` harness + would require new external infrastructure, fall back to in-process `rstest` + integration tests with equally observable assertions and document the + decision. + +- All new test files must remain under 400 lines per AGENTS.md. Extract + helpers into the existing fixture modules when a file would exceed that + limit. + +- Follow en-GB-oxendict spelling in comments and documentation. + +- Every new test function must have a clear, descriptive name that states + what it verifies, following the existing naming conventions in the test + modules. + +## Tolerances (exception triggers) + +- Scope: if implementation requires changes to more than 12 files or roughly + 500 net new lines (excluding comments and blank lines), stop and verify + that runtime behaviour changes have not leaked in. + +- Interface: if any public type, trait, or function signature outside the + test modules must change to make the code testable, stop and document the + interface pressure before proceeding. + +- Fixture complexity: if a single test fixture function exceeds 50 lines or + a single test function exceeds 80 lines, stop and extract helpers rather + than accepting the complexity. + +- Iterations: if a test remains red after three focused debugging attempts, + stop and document the failure in `Surprises & Discoveries` before + continuing. + +- Behaviour-driven development (BDD) harness: if adding `rstest-bdd` + behavioural coverage for any test family requires a brand-new feature-test + harness, external services, or Docker orchestration, document that explicitly + in `Decision Log` and fall back to in-process `rstest` integration coverage. + +- Documentation drift: if `docs/users-guide.md` or + `docs/axinite-architecture-overview.md` describes behaviour that the + tests cannot actually verify, stop and reconcile the documentation before + marking the plan complete. + +## Risks + +- Risk: The existing test infrastructure is already quite comprehensive. + New tests may duplicate existing coverage rather than adding new + fidelity guarantees, creating maintenance burden without additional + protection. + Severity: medium + Likelihood: medium + Mitigation: before writing each test, review the existing tests in the + target module and write only tests that cover a gap. Each new test must + protect against a regression that no existing test catches. Document + the specific gap each test fills. + +- Risk: Schema fidelity tests may become brittle if they assert on exact + JSON values rather than structural properties, breaking when legitimate + schema changes occur. + Severity: medium + Likelihood: medium + Mitigation: assert on structural invariants (field presence, type shape, + required-field completeness) rather than exact string equality for + descriptions. Use snapshot-style assertions for the full `ToolDefinition` + round-trip where exact preservation is the contract. + +- Risk: The `rstest-bdd` framework may not yet have a practical in-process + harness for the worker-orchestrator surface, making behavioural test + coverage ceremonial rather than valuable. + Severity: low + Likelihood: high + Mitigation: evaluate `rstest-bdd` feasibility in milestone 1 before + committing to it. If a narrow scenario adds real clarity, use it. + Otherwise, use `rstest` integration tests with equally observable + assertions and record the decision. + +- Risk: Mock HTTP servers used by worker container tests may introduce + flakiness through port conflicts or timing sensitivity. + Severity: medium + Likelihood: low + Mitigation: reuse the existing mock-server patterns from + `src/worker/container/tests.rs` and + `src/tools/builtin/worker_remote_tool_proxy.rs` which already handle + ephemeral port allocation. + +- Risk: Contract parity tests between worker and orchestrator may be hard + to express without introducing a coupling between the two test modules. + Severity: medium + Likelihood: medium + Mitigation: use the shared types in `src/worker/api/types.rs` as the + contract surface. Contract tests assert that both sides produce and + consume payloads that round-trip through these shared types without loss. + +## Milestone 1: audit existing coverage and identify gaps + +Before writing any test code, perform a structured audit of the existing +test suite to identify exactly which schema-fidelity and execution-routing +properties are already covered and which are not. + +### Audit scope + +Review each existing test module listed in the repository orientation +section. For each test, record: + +1. What property it asserts (e.g., "catalogue excludes protected tools", + "proxy preserves tool name"). +2. Whether that property covers schema fidelity (field preservation), + execution routing (orchestrator-side dispatch), or contract parity + (shared type consistency). +3. Whether the assertion is structural (would catch a missing field) or + incidental (tests something else and happens to touch the field). + +### Gap identification + +The RFC 0001 §Testing Strategy names the following test families. Map each +to existing or missing coverage: + +**Unit tests (RFC 0001 §Testing Strategy):** + +1. "Catalog construction returns orchestrator-owned active MCP tool + definitions with original descriptions and schemas." + Gap: existing tests check that the catalogue includes eligible tools and + excludes ineligible ones, but do not assert that every `ToolDefinition` + field survives the catalogue journey unchanged. A full-payload fidelity + assertion is needed. + +2. "Catalog filtering excludes approval-gated or otherwise uncallable + tools." + Gap: covered by existing + `remote_tool_catalog_excludes_ineligible_tools` and + `get_hosted_tool_reports_lookup_reason`. No new test needed. + +3. "Worker proxy registration preserves the orchestrator-provided + `ToolDefinition` exactly." + Gap: the existing proxy test asserts individual fields match. A + whole-`ToolDefinition` structural comparison is needed to catch future + field additions that might be dropped. + +4. "Generic remote execution dispatches to the requested orchestrator-owned + tool." + Gap: existing tests cover success and several error paths. Missing: + a test that proves execution reaches the real orchestrator-side tool + (not a local stub) by observing a side effect only the orchestrator + path can produce. + +**Behavioural tests (RFC 0001 §Testing Strategy):** + +1. "Hosted worker with an active MCP tool advertises that tool in + `available_tools`." + Gap: partially covered by + `worker_runtime_build_reasoning_context_merges_local_and_remote_tools`. + Strengthen with a scenario-level assertion that names both tool families. + +2. "Hosted worker can execute a proxied MCP tool end-to-end through the + orchestrator." + Gap: the proxy test exercises round-trip execution through a mock server, + but no test exercises the full path from worker registry lookup through + proxy dispatch to orchestrator-side tool execution in a single assertion + chain. + +3. "Hosted worker still exposes container tools and extension-management + proxy tools." + Gap: covered by + `worker_runtime_build_tools_preserves_container_local_tools`. No new + test needed. + +4. "Extension activation refreshes the hosted-visible tool list." + Gap: not covered, but this belongs to a later roadmap item (extension + lifecycle) rather than `1.1.4`. Document the exclusion. + +**Regression test target (RFC 0001 §Testing Strategy):** + +> "A hosted LLM receives the same `name`, `description`, and `parameters` +> for an active MCP tool that the canonical orchestrator registry would +> expose in the normal in-process path." + +Gap: no existing test compares the orchestrator-side canonical definition +against the worker-side proxy-reported definition in a single assertion. +This is the primary regression this plan must lock down. + +**Contract parity (Issue #16):** + +> "There is no obvious shared contract test proving that worker and +> orchestrator endpoint parity is maintained." + +Gap: no test asserts that the routes registered by the orchestrator match +the routes consumed by the worker client, or that request and response +payloads round-trip through the shared types without field loss. + +### Milestone 1 deliverable + +A written list of gaps (which may refine the analysis above) recorded in +the `Progress` section, with each gap assigned to a milestone below. No +code changes in this milestone. + +### Milestone 1 go/no-go + +Proceed to milestone 2 only if the gap analysis is complete and the set of +new tests is bounded. If the analysis reveals that more than 15 new test +functions are needed, stop and consider whether the scope should be +narrowed. + +## Milestone 2: schema-fidelity tests + +Add tests that fail if any required `ToolDefinition` field is dropped, +rewritten, or corrupted during the journey from orchestrator registry +through the catalogue endpoint to the worker-side proxy. + +### Test 2a: catalogue preserves full `ToolDefinition` payloads + +Location: `src/orchestrator/api/tests/remote_tools.rs`. + +Add a test that registers a `StubTool` with a non-trivial JSON Schema for +`parameters` (including `required`, `properties` with `description` and +`type` fields, and nested objects) alongside a multi-sentence `description` +containing special characters and Markdown formatting. Assert that the +`ToolDefinition` returned by the catalogue endpoint is byte-for-byte +identical to the definition reported by the registered tool's trait methods. + +This test catches regressions where a serialization layer, filter, or +transformer silently strips or rewrites fields. Name the test +`remote_tool_catalog_preserves_full_tool_definition_payload`. + +Use an `rstest` fixture for the complex `StubTool` so it can be reused in +later tests. + +### Test 2b: proxy-reported definition matches catalogue definition + +Location: `src/tools/builtin/worker_remote_tool_proxy.rs` (tests module). + +Add a test that creates a `WorkerRemoteToolProxy` from a `ToolDefinition` +with the same complex schema used in test 2a, then asserts that +`proxy.name()`, `proxy.description()`, and `proxy.parameters_schema()` +produce values that, when reassembled into a `ToolDefinition`, are +structurally equal to the input. This catches any transformation or +truncation in the proxy layer. + +Name the test +`worker_remote_tool_proxy_preserves_full_tool_definition_fields`. + +### Test 2c: end-to-end definition fidelity from registry to worker proxy + +Location: `src/worker/container/tests.rs`. + +Add a test that wires a mock orchestrator serving a catalogue with a +complex tool definition, starts a worker runtime that fetches and registers +the proxies, and asserts that the `ToolDefinition` reported by the +worker's merged registry matches the original orchestrator-side definition +field by field. This is the RFC 0001 regression test target: the worker +sees the same definition the orchestrator would expose in-process. + +Name the test +`hosted_worker_proxy_definition_matches_orchestrator_canonical_definition`. + +### Test 2d: catalogue version changes when tool definitions change + +Location: `src/orchestrator/api/tests/remote_tools.rs`. + +Add a test that computes the catalogue version for two different tool sets +and asserts the versions differ. Also assert that computing the version +twice for the same tool set produces the same value. This catches +nondeterminism and ensures the version is a meaningful signal. + +Name the test +`remote_tool_catalog_version_is_deterministic_and_sensitive_to_content`. + +### Milestone 2 go/no-go + +Run the new tests. All must pass. Deliberately mutate a field in the test +fixture (e.g., truncate the description) and confirm at least one test +fails. Record the evidence. + +## Milestone 3: execution-routing tests + +Add tests that prove advertised remote tools execute through the +orchestrator's generic execution endpoint rather than through a local stub, +and that the execution path handles the full error taxonomy. + +### Test 3a: proxy execution reaches the orchestrator endpoint + +Location: `src/tools/builtin/worker_remote_tool_proxy.rs` (tests module) +or `src/worker/container/tests.rs`. + +Add a test that creates a mock orchestrator HTTP server with a request +counter or flag, registers a proxy tool, executes it, and asserts that the +mock server received exactly one request on the +`/worker/{job_id}/tools/execute` route with the correct `tool_name` and +`params`. This proves execution routes through the orchestrator rather +than resolving locally. + +Name the test +`worker_remote_tool_proxy_routes_execution_through_orchestrator_endpoint`. + +If a similar assertion already exists in the existing proxy round-trip +test, extend that test with an explicit route-path assertion rather than +adding a duplicate. + +### Test 3b: execution propagates tool output fields faithfully + +Location: `src/tools/builtin/worker_remote_tool_proxy.rs` (tests module). + +Add or extend a test that asserts the full `ToolOutput` returned by the +proxy matches the `ToolOutput` returned by the mock orchestrator, including +`result`, `cost`, `raw`, and `duration_ms` fields. This catches field loss +in the execution response path. + +Name the test +`worker_remote_tool_proxy_preserves_full_tool_output_fields`. + +### Test 3c: execution error taxonomy maps correctly + +Location: `src/orchestrator/api/tests/remote_tools.rs`. + +Review the existing `remote_tool_execute_maps_error_statuses` test. If it +already covers `InvalidParameters` (400), `NotAuthorized` (403), +`RateLimited` (429), and general `ExecutionFailed` (502), no new test is +needed. If any mapping is missing, extend the existing test or add a +focused companion. + +### Test 3d: execution of a non-catalogue tool is rejected + +Location: `src/orchestrator/api/tests/remote_tools.rs`. + +Review the existing `remote_tool_execute_rejects_unknown_tools` and +`remote_tool_execute_rejects_non_catalog_tools` tests. If they already +cover the relevant rejection cases (unknown tool → 404, container-only +tool → 400, protected tool → 400), no new test is needed. Document the +existing coverage. + +### Milestone 3 go/no-go + +Run the new and existing execution tests. All must pass. Deliberately +change the mock server's response route or response body and confirm at +least one test fails. Record the evidence. + +## Milestone 4: worker-orchestrator contract parity tests + +Add tests that prove the shared transport types in `src/worker/api/types.rs` +are the single source of truth for routes and payloads, so drift between +the worker client and orchestrator router is caught at compile time or test +time. + +### Test 4a: route constants match between worker client and orchestrator router + +Location: `src/worker/api/types.rs` (tests module, creating if necessary) +or `src/orchestrator/api/tests/remote_tools.rs`. + +Add a test that reads the route constant strings +(`REMOTE_TOOL_CATALOG_ROUTE` and `REMOTE_TOOL_EXECUTE_ROUTE`) from +`src/worker/api/types.rs` and asserts they match the routes actually +registered in the orchestrator's Axum router. This catches route-string +drift where one side updates a path and the other does not. + +The implementation approach depends on how the orchestrator registers +routes. If the router is built from the same constants, a compile-time +guarantee already exists and this test should verify that relationship. If +the router uses string literals independently, this test must bridge the +gap. + +Name the test +`worker_and_orchestrator_share_remote_tool_route_constants`. + +### Test 4b: request and response payloads round-trip through shared types + +Location: `src/worker/api/types.rs` (tests module). + +Add a test that serializes a `RemoteToolCatalogResponse` with non-trivial +content (tools with complex schemas, non-empty `toolset_instructions`, a +non-zero `catalog_version`) to JSON, deserializes it back, and asserts +structural equality. Do the same for `RemoteToolExecutionRequest` and +`RemoteToolExecutionResponse`. This catches serde attribute mismatches +(`skip`, `default`, `rename`) that could cause silent field loss. + +Name the test +`remote_tool_transport_types_round_trip_without_field_loss`. + +### Test 4c: worker client and orchestrator consume the same payload shape + +Location: `src/orchestrator/api/tests/remote_tools.rs` or a new shared +test file. + +Add a test that builds a catalogue response using +`hosted_remote_tool_catalog()`, serializes it to JSON, and deserializes +it as `RemoteToolCatalogResponse`. Assert that the deserialized value +matches the original. This proves the orchestrator produces responses that +the worker client can consume without field loss or misinterpretation. + +Do the same for the execution path: build a +`RemoteToolExecutionRequest`, serialize it, and parse it as the +orchestrator would. Build a `RemoteToolExecutionResponse`, serialize it as +the orchestrator would, and parse it as the worker client would. + +Name the test +`orchestrator_responses_deserialize_into_worker_shared_types`. + +### Milestone 4 go/no-go + +Run the new contract tests. All must pass. Deliberately add a +`#[serde(skip)]` attribute to a field in one of the shared types and +confirm at least one test fails. Record the evidence, then revert the +deliberate breakage. + +## Milestone 5: behavioural tests + +Add `rstest-bdd` behavioural tests if a narrow in-process scenario can +provide clearer, more readable coverage than the unit-level tests above. + +### Feasibility check + +Evaluate whether the project already has a practical `rstest-bdd` harness +for the worker-orchestrator surface. If `.feature` files and step +definitions exist for this area, extend them. If no harness exists, assess +whether adding one for this plan's scope is proportionate. + +### Candidate scenarios + +If `rstest-bdd` is feasible, the following scenarios add value: + +```gherkin +Feature: Hosted remote-tool schema fidelity + + Scenario: Worker advertises orchestrator tool definitions unchanged + Given an orchestrator with one active hosted-visible MCP tool + And the tool has a description and JSON Schema parameters + When the worker fetches the remote catalogue + And the worker registers proxy tools + Then the worker-advertised tool definition matches the orchestrator + definition exactly + + Scenario: Worker routes tool execution through orchestrator endpoint + Given an orchestrator with one active hosted-visible MCP tool + And a worker with the remote proxy registered + When the model selects the remote tool + Then the execution request reaches the orchestrator endpoint + And the tool output is returned unchanged +``` + +```gherkin +Feature: Hosted remote-tool execution routing + + Scenario: Protected tools are not advertised + Given an orchestrator with a protected tool registered + When the worker fetches the remote catalogue + Then the catalogue does not include the protected tool + + Scenario: Approval-gated tools are not advertised + Given an orchestrator with an approval-gated MCP tool + When the worker fetches the remote catalogue + Then the catalogue does not include the approval-gated tool +``` + +### Fallback + +If `rstest-bdd` is not feasible for this surface, the in-process `rstest` +integration tests from milestones 2-4 already provide equally observable +assertions. Record the decision in `Decision Log` and do not add a +ceremonial BDD layer that weakens the guarantees. + +### Milestone 5 go/no-go + +If BDD tests were added, they must pass under `make test`. If the fallback +was chosen, the decision must be documented and the unit-level tests must +already cover the same properties. + +## Milestone 6: synchronize design and operator documentation + +Update the project documentation so the described behaviour matches the +tested behaviour. + +1. Update `docs/rfcs/0001-expose-mcp-tool-definitions.md` implementation + status to reflect that `1.1.4` is complete and the test matrix is in + place. Note which test families were delivered and which (if any) were + deferred. + +2. Review `docs/axinite-architecture-overview.md` for any wording about + the hosted remote-tool contract that should now reference the test + coverage as the source of truth. + +3. Review `docs/users-guide.md` for accuracy against the tested behaviour. + If any operator-visible description is not backed by a test, either add + a test or reconcile the documentation. + +4. Mark roadmap item `1.1.4` done in `docs/roadmap.md`. Since this + completes all items in section `1.1`, review whether the parent section + heading should also note completion. + +5. Add this ExecPlan to the `docs/contents.md` index under the ExecPlans + directory listing. + +6. Check `FEATURE_PARITY.md` for any entry that should change because the + hosted tool-advertising test matrix is now in place. + +## Milestone 7: validate, gate, and publish + +Run the smallest useful red-green loop first, then the full repository +gates. All substantive commands should use `set -o pipefail` and `tee` so +logs are retained under `/tmp`. + +Use the following command pattern during implementation for targeted tests: + +```bash +set -o pipefail && cargo test --lib -- --nocapture \ + | tee /tmp/unit-axinite-1-1-4-schema-fidelity.out +``` + +Then run the broader gates expected by this repository: + +```bash +set -o pipefail && make check-fmt \ + | tee /tmp/check-fmt-axinite-1-1-4.out +set -o pipefail && make lint \ + | tee /tmp/lint-axinite-1-1-4.out +set -o pipefail && make test \ + | tee /tmp/test-axinite-1-1-4.out +``` + +If documentation changes are made outside the Rust gates, also run: + +```bash +set -o pipefail && bunx markdownlint-cli2 \ + docs/roadmap.md \ + docs/users-guide.md \ + docs/axinite-architecture-overview.md \ + docs/rfcs/0001-expose-mcp-tool-definitions.md \ + docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md \ + docs/contents.md \ + | tee /tmp/markdownlint-axinite-1-1-4.out +git diff --check +``` + +Only after the gates pass should the implementation be committed. The +commit must describe that hosted-mode schema fidelity and execution routing +tests are now in place for roadmap item `1.1.4`, and the final report must +cite the exact log paths. + +## Progress + +- [x] Audit existing test coverage and identify gaps (milestone 1). +- [x] Implement schema-fidelity tests (milestone 2). +- [x] Implement execution-routing tests (milestone 3). +- [x] Implement contract-parity tests (milestone 4). +- [x] Evaluate and implement behavioural tests (milestone 5). +- [x] Synchronize documentation (milestone 6). +- [x] Run full validation gates and publish (milestone 7). +- [ ] Address code-review follow-ups (post-review). + +### Milestone 1 findings + +The audit confirmed the gap analysis in the plan. The following test gaps +were identified: + +**Schema-fidelity gaps:** + +1. **Full payload preservation in catalogue**: existing test + `remote_tool_catalog_returns_hosted_safe_tool_definitions` (line 86) checks + individual fields (`name`, `description`, `parameters`) but does not assert + that the entire `ToolDefinition` structure survives unchanged, including any + future fields. A structural comparison is needed. + +2. **Proxy field preservation**: existing test + `remote_tool_execute_round_trips_catalog_tools` (line 118 in + `worker_remote_tool_proxy.rs`) asserts individual field equality but does + not catch if a new field is added to `ToolDefinition` and silently dropped + by the proxy. + +3. **End-to-end definition fidelity**: no test compares the orchestrator + canonical definition against the worker-advertised proxy definition in a + single assertion. The test `worker_runtime_build_reasoning_context_merges_local_and_remote_tools` + (line 566 in `worker/container/tests.rs`) checks tool names are merged but + does not assert field-level fidelity. + +4. **Catalogue version sensitivity**: no test proves the `catalog_version` + changes when tool definitions change, only that sorting produces a + deterministic version (test at line 174 in + `orchestrator/api/tests/remote_tools.rs`). + +**Execution-routing gaps:** + +1. **Proxy routes through orchestrator endpoint**: existing test + `remote_tool_execute_round_trips_catalog_tools` creates a mock server and + executes through it, but does not explicitly assert that the request + reached the correct route path. It implicitly proves routing but does not + name the guarantee. + +2. **Output field preservation**: the round-trip test checks `result`, `cost`, + `raw`, and `duration` individually but does not assert the full `ToolOutput` + structure survives. + +3. **Error taxonomy**: existing test `remote_tool_execute_maps_error_statuses` + (line 307) covers all four error kinds (`InvalidParameters`, `NotAuthorized`, + `RateLimited`, `ExecutionFailed`). No new test needed. + +4. **Non-catalogue tool rejection**: existing tests + `remote_tool_execute_rejects_unknown_tools` (line 214), + `remote_tool_execute_rejects_non_catalog_tools` (line 242), and + `remote_tool_execute_rejects_protected_orchestration_tools` (line 254) + cover unknown, container-only, and protected tools. No new test needed. + +**Contract-parity gaps:** + +1. **Route constants**: no test asserts that `REMOTE_TOOL_CATALOG_ROUTE` and + `REMOTE_TOOL_EXECUTE_ROUTE` in `src/worker/api/types.rs` are used by both + the worker client and orchestrator router. The constants are imported and + used in tests, proving usage, but no test explicitly validates parity. + +2. **Payload round-trip**: no test asserts that `RemoteToolCatalogResponse`, + `RemoteToolExecutionRequest`, and `RemoteToolExecutionResponse` survive + serialization and deserialization without field loss. + +3. **Worker-orchestrator payload compatibility**: no test builds a response on + the orchestrator side, serializes it, and deserializes it as the worker + would, proving the shared types are truly shared. + +All gaps align with the plan's milestone 2-4 test families. Milestone 1 go/no-go +criterion met: the set of new tests is bounded (9 new test functions across 4 +modules). + +## Surprises & Discoveries + +**Discovery 1 (milestone 1):** The existing test +`remote_tool_execute_round_trips_catalog_tools` in +`src/tools/builtin/worker_remote_tool_proxy.rs` already exercises the full +execution path through a mock server and checks all `ToolOutput` fields. The +gap is not missing coverage but missing explicit route-path and structural +assertions. Milestone 3 tests will strengthen rather than replace this test. + +**Discovery 2 (milestone 2-4):** All new tests compile and pass individually +during targeted test runs. The fixtures shared between orchestrator and worker +test modules (`complex_tool_definition`, `complex_tool_stub`) successfully +exercise complex JSON Schema structures with nested objects, arrays, and +special characters including UTF-8 emoji. The route-capturing test in +milestone 3 successfully proves that proxy execution reaches the exact expected +orchestrator endpoint path. + +## Decision Log + +**Decision 1 (milestone 5):** BDD tests deferred in favour of in-process unit +tests. No practical `rstest-bdd` harness exists for the worker-orchestrator +surface. The project has no `.feature` files, no step definition infrastructure, +and no BDD test patterns. Adding BDD infrastructure for this narrow scope would +introduce ceremony without material benefit. The in-process `rstest` integration +tests from milestones 2-4 already provide equally observable assertions with +clear, descriptive test names that state exactly what they verify. The unit +tests cover the same properties the BDD scenarios would have named: +`remote_tool_catalog_preserves_full_tool_definition_payload` is as readable as +a scenario named "Worker advertises orchestrator tool definitions unchanged", +and fails with equally clear diagnostics. BDD infrastructure may be added in a +future roadmap item if cross-cutting behavioural scenarios justify the +investment. + +**Decision 2 (milestone 3):** Tests 3c and 3d confirmed to be already covered +by existing tests. The test `remote_tool_execute_maps_error_statuses` covers +all four error-taxonomy mappings (`InvalidParameters`, `NotAuthorized`, +`RateLimited`, `ExecutionFailed`). The tests +`remote_tool_execute_rejects_unknown_tools`, +`remote_tool_execute_rejects_non_catalog_tools`, and +`remote_tool_execute_rejects_protected_orchestration_tools` cover rejection +of unknown, container-only, and protected tools. No new tests were added for +these gaps. + +## Outcomes & Retrospective + +All milestones completed successfully. The test matrix is now in place and +enforces the schema-fidelity, execution-routing, and contract-parity guarantees +named in RFC 0001 and roadmap item `1.1.4`. Code-review follow-ups are being +addressed in a subsequent pass (see progress checklist above). + +### Files added or modified + +**Test files modified:** + +- `src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs`: added + `complex_tool_definition()` and `complex_tool_stub()` fixtures for testing + full payload fidelity with nested JSON Schema and special characters. +- `src/orchestrator/api/tests/remote_tools.rs`: added three new tests: + `remote_tool_catalog_preserves_full_tool_definition_payload`, + `remote_tool_catalog_version_is_deterministic_and_sensitive_to_content`, and + `orchestrator_responses_deserialize_into_worker_shared_types`. +- `src/tools/builtin/worker_remote_tool_proxy.rs`: added three new tests: + `worker_remote_tool_proxy_preserves_full_tool_definition_fields`, + `worker_remote_tool_proxy_preserves_full_tool_output_fields`, and + `worker_remote_tool_proxy_routes_execution_through_orchestrator_endpoint`. +- `src/worker/container/tests.rs`: added one new test: + `hosted_worker_proxy_definition_matches_orchestrator_canonical_definition`, + plus helper functions `complex_orchestrator_tool_definition()` and + `remote_tool_catalog_with_complex_tool()`. +- `src/worker/api/types.rs`: added tests module with two new tests: + `worker_and_orchestrator_share_remote_tool_route_constants` and + `remote_tool_transport_types_round_trip_without_field_loss`. + +**Documentation files modified:** + +- `docs/roadmap.md`: marked roadmap item `1.1.4` complete. +- `docs/rfcs/0001-expose-mcp-tool-definitions.md`: updated implementation status + to reflect that all roadmap items in section `1.1` are complete. +- `docs/contents.md`: added ExecPlan `1-1-4-tests-for-schema-fidelity-and-execution-routing.md` + to the ExecPlans directory listing. +- `docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md`: + updated status to `COMPLETE` and recorded progress, decisions, and discoveries. + +### Test coverage added + +The implementation added 9 new test functions covering: + +1. Full `ToolDefinition` payload preservation through the catalogue endpoint + (milestone 2). +2. Proxy-reported fields matching input definitions exactly (milestone 2). +3. End-to-end definition fidelity from orchestrator canonical to worker-advertised + proxy (milestone 2). +4. Catalogue version determinism and content sensitivity (milestone 2). +5. Proxy execution routing through the correct orchestrator endpoint path + (milestone 3). +6. Full `ToolOutput` field preservation including cost, raw, and duration_ms + (milestone 3). +7. Route constant sharing and correctness between worker and orchestrator + (milestone 4). +8. Transport type round-trip without field loss for catalogue, execution + request, and execution response payloads (milestone 4). +9. Orchestrator-built responses deserializing correctly into worker shared types + (milestone 4). + +All tests use in-process mock servers and fixtures, avoiding external +dependencies. All tests follow existing `rstest` patterns and naming conventions. +The format check (`make check-fmt`) passed after running `cargo fmt --all`. All +validation gates have been run and passed successfully. + +### Validation evidence + +Format check passed: + +```bash +cargo fmt --all -- --check +cargo fmt --manifest-path tools-src/github/Cargo.toml --all -- --check +``` + +Git whitespace check passed: + +```bash +git diff --check +``` + +(No output, indicating no whitespace errors.) + +Full test suite passed: 3076 tests passed; 0 failed; 2 ignored (webhook server +test fixed to use already-bound address instead of privileged port; worker API +types test split into three focused tests per code review). + +Markdown linting revealed pre-existing issues in `docs/roadmap.md` unrelated to +this implementation (multiple consecutive blank lines at lines 1342, 1408, 1450, +1489, 1512). The ExecPlan, RFC 0001, and `docs/contents.md` changes introduced +no new Markdown issues. + +### Retrospective observations + +The audit phase (milestone 1) saved significant rework. By mapping existing +coverage before writing new tests, the plan avoided duplicating +`remote_tool_execute_maps_error_statuses` and the rejection-case tests, which +already covered execution-routing gaps 3c and 3d. + +The complex-tool-definition fixtures proved valuable for both orchestrator and +worker test modules. Sharing these fixtures kept the test code DRY and ensured +both sides exercised the same non-trivial payload shapes. + +The BDD decision (milestone 5) was straightforward because the project had zero +BDD infrastructure. The in-process `rstest` tests provide the same guarantees +without ceremony. + +The contract-parity tests (milestone 4) surface a compile-time guarantee: if +the shared types in `src/worker/api/types.rs` drift, both the worker client and +orchestrator router would fail to compile or serialize. The tests make that +guarantee explicit and observable. + +All gaps identified in milestone 1 are now covered. The test suite will fail +loudly if MCP fields disappear, execution routes incorrectly, or transport +contracts drift. diff --git a/docs/rfcs/0001-expose-mcp-tool-definitions.md b/docs/rfcs/0001-expose-mcp-tool-definitions.md index 28193b38a..e5bf2617d 100644 --- a/docs/rfcs/0001-expose-mcp-tool-definitions.md +++ b/docs/rfcs/0001-expose-mcp-tool-definitions.md @@ -5,13 +5,15 @@ - **RFC number:** 0001 - **Status:** Proposed - **Created:** 2026-03-11 -- **Implementation status:** Roadmap items `1.1.1`, `1.1.2`, and `1.1.3` are - implemented in this branch through the shared `src/worker/api/` transport - types, the worker catalog-fetch startup path, the orchestrator generic - remote-tool execution endpoint, the canonical `ToolRegistry`-owned - hosted-visible filter for active MCP tools, and the explicit worker-side - merged reasoning surface used both at context build and later refresh. - Roadmap item `1.1.4` still owns the broader schema-parity and routing matrix. +- **Implementation status:** Roadmap items `1.1.1`, `1.1.2`, `1.1.3`, and + `1.1.4` are complete. The implementation includes the shared `src/worker/api/` + transport types, the worker catalog-fetch startup path, the orchestrator + generic remote-tool execution endpoint, the canonical `ToolRegistry`-owned + hosted-visible filter for active MCP tools, the explicit worker-side merged + reasoning surface used both at context build and later refresh, and the + comprehensive test matrix for schema fidelity, execution routing, and + worker-orchestrator contract parity. All roadmap items in section `1.1` are + now complete. ## Summary diff --git a/docs/roadmap.md b/docs/roadmap.md index 1aa65b780..36f2a07b7 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -87,7 +87,7 @@ worker-orchestrator contract hardening belongs inside 1.1.1. - Success: hosted model requests include the real tool descriptions and JSON Schemas, and worker-local tools plus orchestrator-owned tools appear as one unified tool surface. -- [ ] 1.1.4. Add hosted-mode tests for schema fidelity and execution routing. +- [x] 1.1.4. Add hosted-mode tests for schema fidelity and execution routing. Requires 1.1.3. - See [RFC 0001 §Migration Plan](./rfcs/0001-expose-mcp-tool-definitions.md#migration-plan). - Tracks the worker-orchestrator parity portion of @@ -1339,7 +1339,6 @@ provenance, transformation policy, and per-channel capability negotiation before place, unsupported media falls back predictably, and automated tests cover preview, cache reuse, transcription, TTS, and transformed-media flows. - ## 7. Raise assurance for safety and lifecycle invariants Phase objective: add proof-oriented and generated verification where Axinite's @@ -1405,7 +1404,6 @@ host-matching contract used by installer- and allowlist-adjacent code. and make installer regressions fail on structured generated input instead of one-off examples only. - ### 7.3. Bounded checking for allowlist and host-matcher semantics Objective: use Kani where the highest-value invariants are small, deterministic, @@ -1447,7 +1445,6 @@ feeds 7.5 if the extracted matcher proves stable enough for a later full proof. explores larger host and path combinations, and unwind bounds stay close to the harnesses they justify. - ### 7.4. Model-check the job lifecycle with Stateright Objective: model the scheduler, worker, token, reaper, and retained-result @@ -1486,7 +1483,6 @@ contract work exposes adjacent semantic cleanup patterns. scheduled runs can extend depth and state count, and any safety-property counterexample is surfaced as a dedicated formal-verification failure. - ### 7.5. Add a narrow later-stage Verus proof path Objective: keep a proof-only path available for the few invariants that remain @@ -1509,7 +1505,6 @@ production code have already converged on one matcher contract. suffix spoofing remains impossible in the proof model, and proof execution stays isolated from the normal Cargo-driven test path. - ### 7.1. Formal-verification infrastructure and workflow split Objective: add the repository structure, tool runners, and CI split needed for diff --git a/src/channels/webhook_server.rs b/src/channels/webhook_server.rs index 8a817a14b..809759ced 100644 --- a/src/channels/webhook_server.rs +++ b/src/channels/webhook_server.rs @@ -58,33 +58,6 @@ impl WebhookServer { self.bind_and_spawn(app).await } - /// Accept a pre-bound listener, merge route fragments, and spawn the - /// server. Eliminates the TOCTOU window that `start()` has between port - /// discovery and the actual bind, making it suitable for tests that - /// allocate ephemeral ports. - #[cfg(test)] - pub async fn start_with_listener( - &mut self, - listener: tokio::net::TcpListener, - ) -> Result<(), ChannelError> { - let mut app = Router::new(); - for fragment in self.routes.drain(..) { - app = app.merge(fragment); - } - self.merged_router = Some(app.clone()); - - let local_addr = listener - .local_addr() - .map_err(|e| ChannelError::StartupFailed { - name: "webhook_server".to_string(), - reason: format!("Failed to get listener local address: {e}"), - })?; - self.config.addr = local_addr; - - self.spawn_with_listener(listener, app); - Ok(()) - } - /// Bind a listener to the configured address and spawn the server task. /// Private helper used by both start() and restart_with_addr(). async fn bind_and_spawn(&mut self, app: Router) -> Result<(), ChannelError> { @@ -95,22 +68,8 @@ impl WebhookServer { reason: format!("Failed to bind to {}: {}", self.config.addr, e), })?; - // Overwrite the configured address with the concrete socket address - // so that an ephemeral `:0` bind is resolved to the real port. - self.config.addr = listener - .local_addr() - .map_err(|e| ChannelError::StartupFailed { - name: "webhook_server".to_string(), - reason: format!("Failed to get listener local address: {e}"), - })?; - - self.spawn_with_listener(listener, app); - Ok(()) - } + tracing::info!("Webhook server listening on {}", self.config.addr); - /// Spawn the axum server on an already-bound listener. - fn spawn_with_listener(&mut self, listener: tokio::net::TcpListener, app: Router) { - let addr = self.config.addr; let (shutdown_tx, shutdown_rx) = oneshot::channel(); self.shutdown_tx = Some(shutdown_tx); @@ -126,8 +85,8 @@ impl WebhookServer { } }); - tracing::info!("Webhook server listening on {}", addr); self.handle = Some(handle); + Ok(()) } /// Gracefully shut down the current listener and rebind to a new address. @@ -173,46 +132,6 @@ impl WebhookServer { } } - /// Gracefully shut down the current listener and rebind using a pre-bound - /// listener. Eliminates the TOCTOU window between port reservation and - /// bind, making it suitable for tests. - /// - /// Unlike [`restart_with_addr`], this test-only helper does not support - /// rollback. [`restart_with_addr`] binds the replacement first and can keep - /// the old listener alive if that bind fails. This method shuts down the - /// old listener before calling [`Self::spawn_with_listener`], so there is - /// no rollback path if spawning were to fail. That trade-off is acceptable - /// in tests because [`Self::spawn_with_listener`] is infallible. - #[cfg(test)] - pub async fn restart_with_listener( - &mut self, - listener: tokio::net::TcpListener, - ) -> Result<(), ChannelError> { - let app = self - .merged_router - .clone() - .ok_or_else(|| ChannelError::StartupFailed { - name: "webhook_server".to_string(), - reason: "restart_with_listener called before start()".to_string(), - })?; - - let new_addr = listener - .local_addr() - .map_err(|e| ChannelError::StartupFailed { - name: "webhook_server".to_string(), - reason: format!("Failed to get listener local address: {e}"), - })?; - - // Stop the old listener before spawning the new one. Unlike - // restart_with_addr, we do not provide rollback semantics because the - // new listener is already bound and assumed to be valid. - self.shutdown().await; - - self.config.addr = new_addr; - self.spawn_with_listener(listener, app); - Ok(()) - } - /// Return the current bind address. pub fn current_addr(&self) -> SocketAddr { self.config.addr @@ -231,81 +150,82 @@ impl WebhookServer { #[cfg(test)] mod tests { - use super::*; + use std::net::TcpListener as StdTcpListener; + use axum::Json; use rstest::{fixture, rstest}; use serde_json::json; - use std::net::TcpListener as StdTcpListener; - /// Bind an ephemeral port on localhost, enable non-blocking mode, and - /// convert it to a `tokio::net::TcpListener`, returning the listener and - /// its resolved [`SocketAddr`]. The port remains bound throughout so - /// there is no TOCTOU window between discovery and first use. + use super::*; + + /// A started webhook server with a `/health` route and a pre-built client. + struct StartedWebhookServer { + server: WebhookServer, + addr: SocketAddr, + client: reqwest::Client, + } + + /// Finds an available port, creates a [`WebhookServer`] with a `/health` + /// route, starts the server, and returns the address and a client. #[fixture] - fn bind_ephemeral_tokio_listener() -> (tokio::net::TcpListener, std::net::SocketAddr) { - let std_listener = - StdTcpListener::bind("127.0.0.1:0").expect("Failed to bind ephemeral port"); - std_listener - .set_nonblocking(true) - .expect("Failed to set non-blocking"); - let addr = std_listener.local_addr().expect("Failed to get local addr"); - let tokio_listener = tokio::net::TcpListener::from_std(std_listener) - .expect("Failed to convert to tokio listener"); - (tokio_listener, addr) + async fn started_webhook_server() + -> Result> { + let port = { + let listener = StdTcpListener::bind("127.0.0.1:0")?; + listener.local_addr()?.port() + }; + let addr: SocketAddr = format!("127.0.0.1:{}", port).parse()?; + let mut server = WebhookServer::new(WebhookServerConfig { addr }); + server.add_routes(Router::new().route( + "/health", + axum::routing::get(|| async { Json(json!({"status": "ok"})) }), + )); + server.start().await?; + Ok(StartedWebhookServer { + server, + addr, + client: reqwest::Client::new(), + }) } #[rstest] #[tokio::test] async fn test_restart_with_addr_rebinds_listener( - #[from(bind_ephemeral_tokio_listener)] listener_and_addr1: ( - tokio::net::TcpListener, - std::net::SocketAddr, - ), - #[from(bind_ephemeral_tokio_listener)] listener_and_addr2: ( - tokio::net::TcpListener, - std::net::SocketAddr, - ), - ) { - let (tokio_listener1, addr1) = listener_and_addr1; - let (tokio_listener2, addr2) = listener_and_addr2; - - let mut server = WebhookServer::new(WebhookServerConfig { addr: addr1 }); - - let test_router = axum::Router::new().route( - "/health", - axum::routing::get(|| async { Json(json!({"status": "ok"})) }), - ); - server.add_routes(test_router); + #[future] started_webhook_server: Result< + StartedWebhookServer, + Box, + >, + ) -> Result<(), Box> { + let StartedWebhookServer { + mut server, + addr: addr1, + client, + } = started_webhook_server.await?; - server - .start_with_listener(tokio_listener1) - .await - .expect("Failed to start server"); assert_eq!( server.current_addr(), addr1, "Server should be bound to initial address" ); - // Verify the first server is actually listening - let client = reqwest::Client::new(); let response = client - .get(format!("http://{addr1}/health")) + .get(format!("http://{}/health", addr1)) .send() - .await - .expect("Failed to send request to first server"); + .await?; assert_eq!( response.status(), 200, "First server should respond to health check" ); - // Hand the pre-bound listener directly to the server — no TOCTOU - // gap because the port was never released. - server - .restart_with_listener(tokio_listener2) - .await - .expect("Failed to restart with new listener"); + // Find a second available port and restart + let port2 = { + let listener = StdTcpListener::bind("127.0.0.1:0")?; + listener.local_addr()?.port() + }; + let addr2: SocketAddr = format!("127.0.0.1:{}", port2).parse()?; + + server.restart_with_addr(addr2).await?; assert_eq!( server.current_addr(), @@ -314,98 +234,82 @@ mod tests { ); assert_ne!( addr1, addr2, - "Address should change after restart_with_listener" + "Address should change after restart_with_addr" ); - // Verify the new server is actually listening on the new address let response = client - .get(format!("http://{addr2}/health")) + .get(format!("http://{}/health", addr2)) .send() - .await - .expect("Failed to send request to restarted server"); + .await?; assert_eq!( response.status(), 200, "Restarted server should respond to health check on new address" ); - // Verify the old address is no longer responding let old_result = tokio::time::timeout( std::time::Duration::from_millis(200), - client.get(format!("http://{addr1}/health")).send(), + client.get(format!("http://{}/health", addr1)).send(), ) .await; assert!( - old_result.is_err() || old_result.as_ref().unwrap().is_err(), + old_result.is_err() || old_result.ok().and_then(|r| r.ok()).is_none(), "Old address should not respond after server restarts" ); - // Clean up server.shutdown().await; + Ok(()) } #[rstest] #[tokio::test] async fn test_restart_with_addr_rollback_on_bind_failure( - bind_ephemeral_tokio_listener: (tokio::net::TcpListener, std::net::SocketAddr), - ) { - let (tokio_listener, addr1) = bind_ephemeral_tokio_listener; - - // Bind a second ephemeral port and hold it open — this is the - // "occupied" address that restart_with_addr must fail to bind. - let blocker = StdTcpListener::bind("127.0.0.1:0").expect("Failed to bind blocker port"); - let blocked_addr = blocker - .local_addr() - .expect("Failed to get blocker local addr"); + #[future] started_webhook_server: Result< + StartedWebhookServer, + Box, + >, + ) -> Result<(), Box> { + let StartedWebhookServer { + mut server, + addr: addr1, + client, + } = started_webhook_server.await?; - let mut server = WebhookServer::new(WebhookServerConfig { addr: addr1 }); - - let test_router = axum::Router::new().route( - "/health", - axum::routing::get(|| async { Json(json!({"status": "ok"})) }), - ); - server.add_routes(test_router); - - server - .start_with_listener(tokio_listener) - .await - .expect("Failed to start server"); - - // Verify the server is listening - let client = reqwest::Client::new(); let response = client - .get(format!("http://{addr1}/health")) + .get(format!("http://{}/health", addr1)) .send() - .await - .expect("Failed to send request"); + .await?; assert_eq!(response.status(), 200, "Server should be listening"); - // Attempt restart on the occupied address (should fail because - // blocker still holds the port). - let result = server.restart_with_addr(blocked_addr).await; - assert!(result.is_err(), "Restart with occupied address should fail"); + // Bind a temporary listener to create a conflict + let conflict_listener = tokio::net::TcpListener::bind("127.0.0.1:0").await?; + let conflict_addr = conflict_listener.local_addr()?; + + let result = server.restart_with_addr(conflict_addr).await; + assert!( + result.is_err(), + "Restart with already-bound address should fail" + ); + + drop(conflict_listener); - // Verify the old address is still responding (rollback succeeded) let response = client - .get(format!("http://{addr1}/health")) + .get(format!("http://{}/health", addr1)) .send() - .await - .expect("Failed to send request to old address"); + .await?; assert_eq!( response.status(), 200, "Old listener should still be running after failed restart" ); - // Verify the server address is unchanged assert_eq!( server.current_addr(), addr1, "Server address should be restored after failed restart" ); - // Clean up — drop blocker so its port is released - drop(blocker); server.shutdown().await; + Ok(()) } } diff --git a/src/lib.rs b/src/lib.rs index 892319abf..acf17ddb7 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -82,6 +82,9 @@ pub mod testing_wasm; #[cfg(any(test, feature = "test-helpers"))] pub mod testing; +#[cfg(test)] +pub(crate) mod test_support; + pub use config::Config; pub use error::{Error, Result}; diff --git a/src/orchestrator/api/tests.rs b/src/orchestrator/api/tests.rs index 4525b5a6f..f42be41a8 100644 --- a/src/orchestrator/api/tests.rs +++ b/src/orchestrator/api/tests.rs @@ -17,6 +17,7 @@ pub(super) use crate::tools::ToolRegistry; use super::*; mod auth; +mod catalogue_fidelity; mod credentials; mod events; mod fixtures; @@ -24,3 +25,4 @@ mod prompts; mod remote_tools; mod remote_tools_param_aware; mod status; +mod transport_parity; diff --git a/src/orchestrator/api/tests/catalogue_fidelity.rs b/src/orchestrator/api/tests/catalogue_fidelity.rs new file mode 100644 index 000000000..8ce1b47c4 --- /dev/null +++ b/src/orchestrator/api/tests/catalogue_fidelity.rs @@ -0,0 +1,121 @@ +//! Tests for catalogue schema fidelity and versioning determinism. + +use std::sync::Arc; + +use rstest::rstest; + +use super::super::remote_tools::hosted_remote_tool_catalog; + +/// Upper bound for reading hosted remote-tool catalogue response bodies in +/// tests. Large enough to accommodate growth in the canonical fixture without +/// triggering a silent truncation. +const MAX_REMOTE_TOOL_CATALOG_BYTES: usize = 64 * 1024; +use super::fixtures::remote_tool_mocks::{ + ToolFixture, build_tool_fixture, complex_tool_definition, complex_tool_stub, +}; +use super::fixtures::test_state; +use super::*; +use crate::tools::ToolRegistry; +use crate::worker::api::REMOTE_TOOL_CATALOG_ROUTE; + +#[rstest] +#[tokio::test] +async fn remote_tool_catalog_preserves_full_tool_definition_payload(test_state: OrchestratorState) { + let expected = complex_tool_definition(); + test_state + .tools + .register(Arc::new(complex_tool_stub())) + .await; + + let job_id = Uuid::new_v4(); + let token = test_state.token_store.create_token(job_id).await; + let router = OrchestratorApi::router(test_state); + + let req = Request::builder() + .method("GET") + .uri(REMOTE_TOOL_CATALOG_ROUTE.replace("{job_id}", &job_id.to_string())) + .header("Authorization", format!("Bearer {}", token)) + .body(Body::empty()) + .expect("build hosted remote-tool catalog request"); + + let resp = router + .oneshot(req) + .await + .expect("send hosted remote-tool catalog request"); + assert_eq!(resp.status(), StatusCode::OK); + + let body = axum::body::to_bytes(resp.into_body(), MAX_REMOTE_TOOL_CATALOG_BYTES) + .await + .expect("read hosted remote-tool catalog response body"); + let catalog: crate::worker::api::RemoteToolCatalogResponse = + serde_json::from_slice(&body).expect("parse hosted remote-tool catalog response"); + + let tool = catalog + .tools + .iter() + .find(|t| t.name == expected.name) + .expect("complex tool should be in catalogue"); + + assert_eq!( + tool, &expected, + "catalogue tool definition must match the registry definition exactly" + ); +} + +#[tokio::test] +async fn remote_tool_catalog_version_is_deterministic_and_sensitive_to_content() { + let registry_a = Arc::new(ToolRegistry::new()); + registry_a + .register(build_tool_fixture(ToolFixture::CatalogAlpha)) + .await; + + let registry_b = Arc::new(ToolRegistry::new()); + registry_b + .register(build_tool_fixture(ToolFixture::CatalogAlpha)) + .await; + + let registry_c = Arc::new(ToolRegistry::new()); + registry_c + .register(build_tool_fixture(ToolFixture::CatalogBeta)) + .await; + + let (_tools_a, _instructions_a, version_a) = hosted_remote_tool_catalog(®istry_a).await; + let (_tools_b, _instructions_b, version_b) = hosted_remote_tool_catalog(®istry_b).await; + let (_tools_c, _instructions_c, version_c) = hosted_remote_tool_catalog(®istry_c).await; + + assert_eq!( + version_a, version_b, + "identical tool sets must produce identical catalog versions" + ); + assert_ne!( + version_a, version_c, + "different tool sets must produce different catalog versions" + ); +} + +#[tokio::test] +async fn remote_tool_catalog_version_independent_of_registration_order() { + let registry_a = Arc::new(ToolRegistry::new()); + registry_a + .register(build_tool_fixture(ToolFixture::CatalogAlpha)) + .await; + registry_a + .register(build_tool_fixture(ToolFixture::CatalogBeta)) + .await; + + let registry_b = Arc::new(ToolRegistry::new()); + registry_b + .register(build_tool_fixture(ToolFixture::CatalogBeta)) + .await; + registry_b + .register(build_tool_fixture(ToolFixture::CatalogAlpha)) + .await; + + let (_tools_a, _instructions_a, version_a) = hosted_remote_tool_catalog(®istry_a).await; + let (_tools_b, _instructions_b, version_b) = hosted_remote_tool_catalog(®istry_b).await; + + assert_eq!( + version_a, version_b, + "catalog version must be independent of tool registration order" + ); +} diff --git a/src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs b/src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs index 99cbf0ff0..88ad8e040 100644 --- a/src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs +++ b/src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs @@ -25,7 +25,7 @@ pub(crate) enum StubOutput { /// General-purpose parameterised stub implementing [`Tool`]. pub(crate) struct StubTool { pub(crate) name: &'static str, - pub(crate) description: &'static str, + pub(crate) description: String, pub(crate) parameters: serde_json::Value, pub(crate) domain: ToolDomain, pub(crate) always_approve: bool, @@ -37,12 +37,12 @@ pub(crate) struct StubTool { impl StubTool { pub(crate) fn hosted( name: &'static str, - description: &'static str, + description: impl Into, parameters: serde_json::Value, ) -> Self { Self { name, - description, + description: description.into(), parameters, domain: ToolDomain::Orchestrator, always_approve: false, @@ -59,7 +59,7 @@ impl NativeTool for StubTool { } fn description(&self) -> &str { - self.description + &self.description } fn parameters_schema(&self) -> serde_json::Value { @@ -304,3 +304,36 @@ impl NativeTool for ErrorTool { Some(HostedToolCatalogSource::Mcp) } } + +/// Returns a complex `ToolDefinition` with nested JSON Schema structures, a +/// multi-sentence description with special characters and Markdown formatting, +/// and multiple required fields. +/// +/// This fixture is used to test that the full `ToolDefinition` payload survives +/// the journey from orchestrator registry through the catalogue endpoint to the +/// worker-side proxy without field loss or transformation. +pub(crate) fn complex_tool_definition() -> crate::llm::ToolDefinition { + crate::test_support::build_complex_tool_definition( + "remote_tool_fidelity_fixture", + concat!( + "A **complex** tool for testing schema fidelity. ", + "Handles UTF-8: \u{1F680}\u{1F4A1}. ", + "Supports `inline code` and [markdown](https://example.com). ", + "Special chars: <>&\"'{}[]()." + ), + ) +} + +/// Returns a [`StubTool`] configured with the complex tool definition from +/// [`complex_tool_definition`]. +/// +/// The returned tool is hosted-safe and will echo its params on execution, +/// making it suitable for round-trip fidelity tests. +pub(crate) fn complex_tool_stub() -> StubTool { + let def = complex_tool_definition(); + StubTool::hosted( + "remote_tool_fidelity_fixture", + def.description, + def.parameters, + ) +} diff --git a/src/orchestrator/api/tests/remote_tools.rs b/src/orchestrator/api/tests/remote_tools.rs index 1d0cc8828..8c76dcb77 100644 --- a/src/orchestrator/api/tests/remote_tools.rs +++ b/src/orchestrator/api/tests/remote_tools.rs @@ -27,7 +27,7 @@ async fn populate_catalog_visibility_fixtures(tools: &ToolRegistry) { tools .register(Arc::new(StubTool { name: "hosted_extension_catalog_builtin", - description: "Hosted-safe extension-management built-in", + description: "Hosted-safe extension-management built-in".to_string(), catalog_source: None, output: StubOutput::Fixed(serde_json::json!({"extensions": []})), ..StubTool::hosted( @@ -40,7 +40,7 @@ async fn populate_catalog_visibility_fixtures(tools: &ToolRegistry) { tools .register(Arc::new(StubTool { name: "create_job", - description: "Protected orchestration tool", + description: "Protected orchestration tool".to_string(), output: StubOutput::Fixed(serde_json::json!({"created": true})), ..StubTool::hosted( "create_job", @@ -52,7 +52,7 @@ async fn populate_catalog_visibility_fixtures(tools: &ToolRegistry) { tools .register(Arc::new(StubTool { name: "job_events", - description: "Protected job-events tool", + description: "Protected job-events tool".to_string(), output: StubOutput::Fixed(serde_json::json!({"events": []})), ..StubTool::hosted( "job_events", @@ -158,7 +158,7 @@ async fn remote_tool_catalog_excludes_ineligible_tools( ) { assert_catalog_excludes_stub(StubTool { name, - description, + description: description.to_string(), catalog_source, output: StubOutput::Fixed(output), ..StubTool::hosted( @@ -256,7 +256,7 @@ async fn remote_tool_execute_rejects_protected_orchestration_tools(test_state: O test_state, Arc::new(StubTool { name: "create_job", - description: "Protected orchestration tool", + description: "Protected orchestration tool".to_string(), output: StubOutput::Fixed(serde_json::json!({"created":true})), ..StubTool::hosted( "create_job", diff --git a/src/orchestrator/api/tests/transport_parity.rs b/src/orchestrator/api/tests/transport_parity.rs new file mode 100644 index 000000000..043d826c3 --- /dev/null +++ b/src/orchestrator/api/tests/transport_parity.rs @@ -0,0 +1,74 @@ +//! Serialisation round-trip tests for shared orchestrator–worker transport types. + +use std::sync::Arc; + +use super::super::remote_tools::hosted_remote_tool_catalog; +use super::fixtures::remote_tool_mocks::complex_tool_stub; +use crate::tools::ToolRegistry; + +#[tokio::test] +async fn orchestrator_catalog_response_round_trips_through_worker_shared_types() { + let registry = Arc::new(ToolRegistry::new()); + registry.register(Arc::new(complex_tool_stub())).await; + + let (tools, instructions, version) = hosted_remote_tool_catalog(®istry).await; + + let catalog_response = crate::worker::api::RemoteToolCatalogResponse { + tools: tools.clone(), + toolset_instructions: instructions.clone(), + catalog_version: version, + }; + + let serialized = serde_json::to_string(&catalog_response) + .expect("serialize orchestrator-built catalog response"); + let deserialized: crate::worker::api::RemoteToolCatalogResponse = + serde_json::from_str(&serialized) + .expect("orchestrator response must deserialize into shared type"); + + assert_eq!( + deserialized, catalog_response, + "catalog response must round-trip without field loss" + ); +} + +#[tokio::test] +async fn worker_execution_request_round_trips_through_shared_types() { + let execution_request = crate::worker::api::RemoteToolExecutionRequest { + tool_name: "remote_tool_fidelity_fixture".to_string(), + params: serde_json::json!({"query": "test", "options": {"limit": 10}}), + }; + + let serialized = serde_json::to_string(&execution_request) + .expect("serialize worker-built execution request"); + let deserialized: crate::worker::api::RemoteToolExecutionRequest = + serde_json::from_str(&serialized) + .expect("worker request must deserialize into shared type"); + + assert_eq!( + deserialized, execution_request, + "execution request must round-trip without field loss" + ); +} + +#[tokio::test] +async fn orchestrator_execution_response_round_trips_through_worker_shared_types() { + let execution_response = crate::worker::api::RemoteToolExecutionResponse { + output: crate::tools::ToolOutput::success( + serde_json::json!({"result": "executed"}), + std::time::Duration::from_millis(15), + ) + .with_cost(rust_decimal::Decimal::new(200, 2)) + .with_raw("orchestrator tool output"), + }; + + let serialized = serde_json::to_string(&execution_response) + .expect("serialize orchestrator-built execution response"); + let deserialized: crate::worker::api::RemoteToolExecutionResponse = + serde_json::from_str(&serialized) + .expect("orchestrator response must deserialize into shared type"); + + assert_eq!( + deserialized, execution_response, + "execution response must round-trip without field loss" + ); +} diff --git a/src/test_support.rs b/src/test_support.rs new file mode 100644 index 000000000..86711c46b --- /dev/null +++ b/src/test_support.rs @@ -0,0 +1,82 @@ +//! Shared test fixtures and builders for complex tool definitions. +//! +//! This module provides canonical complex tool definition builders that are +//! reused across both orchestrator and worker test suites to ensure consistency +//! and reduce duplication. + +use crate::llm::ToolDefinition; + +/// Returns the canonical complex parameters JSON schema used for fidelity testing. +/// +/// This schema exercises nested objects, arrays, enums, constraints, and various +/// JSON Schema features to validate that tool definitions survive serialization, +/// transport, and reconstruction without data loss. +pub fn complex_tool_definition_parameters() -> serde_json::Value { + serde_json::json!({ + "type": "object", + "title": "ComplexParams", + "description": "Nested schema with multiple property types", + "properties": { + "query": { + "type": "string", + "description": "Search query with constraints", + "minLength": 1, + "maxLength": 500 + }, + "options": { + "type": "object", + "description": "Nested configuration object", + "properties": { + "limit": { + "type": "integer", + "minimum": 1, + "maximum": 100, + "default": 10 + }, + "include_metadata": { + "type": "boolean", + "default": false + }, + "filters": { + "type": "array", + "items": { + "type": "string", + "enum": ["active", "archived", "draft"] + } + } + }, + "required": ["limit"] + }, + "callback_url": { + "type": "string", + "format": "uri", + "description": "Optional webhook URL" + } + }, + "required": ["query", "options"], + "additionalProperties": false + }) +} + +/// Builds a complex tool definition for fidelity testing. +/// +/// Constructs a `ToolDefinition` with the given name and description, using +/// the canonical complex parameters schema. This ensures both orchestrator and +/// worker tests use identical parameter structures while allowing test-specific +/// names and descriptions. +/// +/// # Arguments +/// +/// * `name` - The tool name (e.g., "remote_tool_fidelity_fixture") +/// * `description` - The tool description (should include complex UTF-8, markdown, etc.) +/// +/// # Returns +/// +/// A `ToolDefinition` with the canonical complex parameters schema. +pub fn build_complex_tool_definition(name: &str, description: &str) -> ToolDefinition { + ToolDefinition { + name: name.to_string(), + description: description.to_string(), + parameters: complex_tool_definition_parameters(), + } +} diff --git a/src/tools/builtin/worker_remote_tool_proxy.rs b/src/tools/builtin/worker_remote_tool_proxy.rs index 7ee1e67eb..42b3a1399 100644 --- a/src/tools/builtin/worker_remote_tool_proxy.rs +++ b/src/tools/builtin/worker_remote_tool_proxy.rs @@ -77,96 +77,4 @@ impl WorkerRemoteToolProxy { } #[cfg(test)] -mod tests { - use axum::extract::{Path, State}; - use axum::routing::post; - use axum::{Json, Router}; - use rust_decimal::Decimal; - use uuid::Uuid; - - use super::*; - use crate::worker::api::{ - REMOTE_TOOL_EXECUTE_ROUTE, RemoteToolExecutionRequest, RemoteToolExecutionResponse, - }; - - #[derive(Clone)] - struct TestState; - - async fn execute_tool( - State(_state): State, - Path(job_id): Path, - Json(req): Json, - ) -> Json { - Json(RemoteToolExecutionResponse { - output: ToolOutput::success( - serde_json::json!({ - "job_id": job_id, - "tool_name": req.tool_name, - "params": req.params, - }), - std::time::Duration::from_millis(7), - ) - .with_cost(Decimal::new(125, 2)) - .with_raw("proxy raw output"), - }) - } - - #[tokio::test] - async fn remote_tool_execute_round_trips_catalog_tools() { - let listener = tokio::net::TcpListener::bind("127.0.0.1:0") - .await - .expect("bind listener"); - let addr = listener.local_addr().expect("listener addr"); - let router = Router::new() - .route(REMOTE_TOOL_EXECUTE_ROUTE, post(execute_tool)) - .with_state(TestState); - let server = tokio::spawn(async move { - axum::serve(listener, router).await.expect("serve router"); - }); - - let job_id = Uuid::new_v4(); - let client = Arc::new(WorkerHttpClient::new( - format!("http://{}", addr), - job_id, - "test-token".to_string(), - )); - let registry = ToolRegistry::new(); - let expected_definition = ToolDefinition { - name: "github_search".to_string(), - description: "Search repositories".to_string(), - parameters: serde_json::json!({ - "type": "object", - "properties": { - "query": {"type": "string"} - }, - "required": ["query"] - }), - }; - register_worker_remote_tool_proxies(®istry, client, vec![expected_definition.clone()]); - - let tool = registry - .get("github_search") - .await - .expect("github_search proxy must be registered"); - let output = tool - .execute( - serde_json::json!({"query": "axinite"}), - &JobContext::default(), - ) - .await - .expect("proxy execution should succeed"); - - assert_eq!(tool.name(), expected_definition.name); - assert_eq!(tool.description(), expected_definition.description); - assert_eq!(tool.parameters_schema(), expected_definition.parameters); - assert_eq!(output.result["tool_name"], "github_search"); - assert_eq!(output.result["job_id"], job_id.to_string()); - assert_eq!(output.result["params"]["query"], "axinite"); - assert_eq!(output.cost, Some(Decimal::new(125, 2))); - assert_eq!(output.raw.as_deref(), Some("proxy raw output")); - assert_eq!(output.duration, std::time::Duration::from_millis(7)); - - server.abort(); - let _ = server.await; - } -} +mod tests; diff --git a/src/tools/builtin/worker_remote_tool_proxy/tests.rs b/src/tools/builtin/worker_remote_tool_proxy/tests.rs new file mode 100644 index 000000000..0e7eb7053 --- /dev/null +++ b/src/tools/builtin/worker_remote_tool_proxy/tests.rs @@ -0,0 +1,284 @@ +//! Tests for the worker-side remote tool proxy. + +use std::sync::Arc; + +use anyhow::Context; +use axum::extract::{Path, State}; +use axum::routing::post; +use axum::{Json, Router}; +use rstest::{fixture, rstest}; +use rust_decimal::Decimal; +use tokio::sync::Mutex; +use uuid::Uuid; + +use super::*; +use crate::worker::api::{ + REMOTE_TOOL_EXECUTE_ROUTE, RemoteToolExecutionRequest, RemoteToolExecutionResponse, +}; + +#[derive(Clone)] +struct TestState; + +#[derive(Clone)] +struct RouteCapturingState { + received_requests: Arc>>, +} + +async fn execute_tool( + State(_state): State, + Path(job_id): Path, + Json(req): Json, +) -> Json { + Json(RemoteToolExecutionResponse { + output: ToolOutput::success( + serde_json::json!({ + "job_id": job_id, + "tool_name": req.tool_name, + "params": req.params, + }), + std::time::Duration::from_millis(7), + ) + .with_cost(Decimal::new(125, 2)) + .with_raw("proxy raw output"), + }) +} + +/// Bundles the in-process execute-route server and a pre-wired HTTP client. +struct ProxyTestServer { + client: Arc, + job_id: Uuid, + server: tokio::task::JoinHandle<()>, +} + +/// Spins up a local Axum server wired to `execute_tool` and returns a +/// `WorkerHttpClient` pointed at it, together with the job id in use. +#[fixture] +async fn proxy_test_server() -> anyhow::Result { + let listener = tokio::net::TcpListener::bind("127.0.0.1:0") + .await + .context("bind listener")?; + let addr = listener.local_addr().context("listener addr")?; + let router = Router::new() + .route(REMOTE_TOOL_EXECUTE_ROUTE, post(execute_tool)) + .with_state(TestState); + let server = tokio::spawn(async move { + let _ = axum::serve(listener, router).await; + }); + let job_id = Uuid::new_v4(); + let client = Arc::new(WorkerHttpClient::new( + format!("http://{}", addr), + job_id, + "test-token".to_string(), + )); + Ok(ProxyTestServer { + client, + job_id, + server, + }) +} + +#[rstest] +#[tokio::test] +async fn remote_tool_execute_round_trips_catalog_tools( + #[future] proxy_test_server: anyhow::Result, +) -> anyhow::Result<()> { + let ProxyTestServer { + client, + job_id, + server, + } = proxy_test_server.await?; + let registry = ToolRegistry::new(); + let expected_definition = ToolDefinition { + name: "github_search".to_string(), + description: "Search repositories".to_string(), + parameters: serde_json::json!({ + "type": "object", + "properties": { + "query": {"type": "string"} + }, + "required": ["query"] + }), + }; + register_worker_remote_tool_proxies(®istry, client, vec![expected_definition.clone()]); + + let tool = registry + .get("github_search") + .await + .context("github_search proxy must be registered")?; + let output = tool + .execute( + serde_json::json!({"query": "axinite"}), + &JobContext::default(), + ) + .await + .context("proxy execution should succeed")?; + + assert_eq!(tool.name(), expected_definition.name); + assert_eq!(tool.description(), expected_definition.description); + assert_eq!(tool.parameters_schema(), expected_definition.parameters); + assert_eq!(output.result["tool_name"], "github_search"); + assert_eq!(output.result["job_id"], job_id.to_string()); + assert_eq!(output.result["params"]["query"], "axinite"); + assert_eq!(output.cost, Some(Decimal::new(125, 2))); + assert_eq!(output.raw.as_deref(), Some("proxy raw output")); + assert_eq!(output.duration, std::time::Duration::from_millis(7)); + + server.abort(); + let _ = server.await; + Ok(()) +} + +#[rstest] +#[tokio::test] +async fn worker_remote_tool_proxy_preserves_full_tool_output_fields( + #[future] proxy_test_server: anyhow::Result, +) -> anyhow::Result<()> { + let ProxyTestServer { + client, + job_id, + server, + } = proxy_test_server.await?; + let proxy = WorkerRemoteToolProxy::new( + ToolDefinition { + name: "output_fidelity_tool".to_string(), + description: "Tests full output preservation".to_string(), + parameters: serde_json::json!({"type": "object", "properties": {}}), + }, + client, + ); + + let output = proxy + .execute(serde_json::json!({"test": "data"}), &JobContext::default()) + .await + .context("proxy execution should succeed")?; + + assert_eq!(output.result["job_id"], job_id.to_string()); + assert_eq!(output.result["tool_name"], "output_fidelity_tool"); + assert_eq!(output.result["params"]["test"], "data"); + assert_eq!( + output.cost, + Some(Decimal::new(125, 2)), + "proxy must preserve cost field" + ); + assert_eq!( + output.raw.as_deref(), + Some("proxy raw output"), + "proxy must preserve raw field" + ); + assert_eq!( + output.duration, + std::time::Duration::from_millis(7), + "proxy must preserve duration field" + ); + + server.abort(); + let _ = server.await; + Ok(()) +} + +#[tokio::test] +async fn worker_remote_tool_proxy_preserves_full_tool_definition_fields() { + let complex_definition = crate::test_support::build_complex_tool_definition( + "complex_proxy_fixture", + "Complex tool for proxy definition fidelity testing", + ); + + let client = Arc::new(WorkerHttpClient::new( + "http://127.0.0.1:0".to_string(), + Uuid::new_v4(), + "test-token".to_string(), + )); + let proxy = WorkerRemoteToolProxy::new(complex_definition.clone(), client); + + let reconstructed = ToolDefinition { + name: proxy.name().to_string(), + description: proxy.description().to_string(), + parameters: proxy.parameters_schema(), + }; + + assert_eq!( + reconstructed, complex_definition, + "proxy-reported fields must reconstruct the original definition exactly" + ); +} + +async fn execute_tool_with_route_capture( + State(state): State, + Path(job_id): Path, + axum::extract::OriginalUri(original_uri): axum::extract::OriginalUri, + Json(req): Json, +) -> Json { + state.received_requests.lock().await.push(( + original_uri.path().to_string(), + job_id, + req.tool_name.clone(), + )); + Json(RemoteToolExecutionResponse { + output: ToolOutput::success( + serde_json::json!({"executed": true}), + std::time::Duration::from_millis(5), + ), + }) +} + +#[tokio::test] +async fn worker_remote_tool_proxy_routes_execution_through_orchestrator_endpoint() +-> anyhow::Result<()> { + let state = RouteCapturingState { + received_requests: Arc::new(Mutex::new(Vec::new())), + }; + + let listener = tokio::net::TcpListener::bind("127.0.0.1:0") + .await + .context("bind listener")?; + let addr = listener.local_addr().context("listener addr")?; + let router = Router::new() + .route( + REMOTE_TOOL_EXECUTE_ROUTE, + post(execute_tool_with_route_capture), + ) + .with_state(state.clone()); + let server = tokio::spawn(async move { + let _ = axum::serve(listener, router).await; + }); + + let job_id = Uuid::new_v4(); + let client = Arc::new(WorkerHttpClient::new( + format!("http://{}", addr), + job_id, + "test-token".to_string(), + )); + let proxy = WorkerRemoteToolProxy::new( + ToolDefinition { + name: "route_test_tool".to_string(), + description: "Tests route path".to_string(), + parameters: serde_json::json!({"type": "object", "properties": {}}), + }, + client, + ); + + proxy + .execute(serde_json::json!({}), &JobContext::default()) + .await + .context("proxy execution should succeed")?; + + let requests = state.received_requests.lock().await; + assert_eq!( + requests.len(), + 1, + "proxy must send exactly one request to the orchestrator" + ); + + let (route_path, received_job_id, tool_name) = &requests[0]; + assert_eq!( + route_path, + &format!("/worker/{}/tools/execute", job_id), + "proxy must route execution through the correct orchestrator endpoint" + ); + assert_eq!(received_job_id, &job_id); + assert_eq!(tool_name, "route_test_tool"); + + server.abort(); + let _ = server.await; + Ok(()) +} diff --git a/src/tools/tool/traits.rs b/src/tools/tool/traits.rs index 3c5ab6ccc..89580253c 100644 --- a/src/tools/tool/traits.rs +++ b/src/tools/tool/traits.rs @@ -42,7 +42,7 @@ pub enum ToolError { } /// Output from a tool execution. -#[derive(Debug, Clone, Serialize, Deserialize)] +#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] pub struct ToolOutput { /// The result data. pub result: serde_json::Value, diff --git a/src/worker/api/tests.rs b/src/worker/api/tests.rs deleted file mode 100644 index 9547edf77..000000000 --- a/src/worker/api/tests.rs +++ /dev/null @@ -1,332 +0,0 @@ -//! Tests for the worker HTTP client and its shared API type conversions. - -use axum::extract::{Path, State}; -use axum::http::StatusCode; -use axum::response::IntoResponse; -use axum::routing::{get, post}; -use axum::{Json, Router}; -use rstest::{fixture, rstest}; -use std::future::Future; -use std::pin::Pin; - -use super::*; -use crate::error::WorkerError; -use crate::llm::FinishReason as LlmFinishReason; -use crate::testing::credentials::TEST_BEARER_TOKEN; -use serde_json::json; -use uuid::Uuid; - -#[rstest] -#[case("llm/complete")] -#[case("credentials")] -fn test_url_construction(#[case] path: &str) { - let client = WorkerHttpClient::new( - "http://host.docker.internal:50051".to_string(), - Uuid::nil(), - TEST_BEARER_TOKEN.to_string(), - ); - - assert_eq!( - client.url(path), - format!( - "http://host.docker.internal:50051/worker/{}/{}", - Uuid::nil(), - path - ) - ); -} - -#[rstest] -#[case(json!("stop"), ProxyFinishReason::Stop)] -#[case(json!("length"), ProxyFinishReason::Length)] -#[case(json!("tool_use"), ProxyFinishReason::ToolUse)] -#[case(json!("tool_calls"), ProxyFinishReason::ToolUse)] -#[case(json!("content_filter"), ProxyFinishReason::ContentFilter)] -#[case(json!("unknown"), ProxyFinishReason::Unknown)] -fn test_proxy_finish_reason_deserialization( - #[case] input: serde_json::Value, - #[case] expected: ProxyFinishReason, -) { - let actual: ProxyFinishReason = serde_json::from_value(input).expect( - "failed to deserialize ProxyFinishReason in test_proxy_finish_reason_deserialization", - ); - assert_eq!(actual, expected); -} - -#[test] -fn test_proxy_finish_reason_unknown_provider_value_falls_back() { - let reason = serde_json::from_value::(json!("made_up_reason")) - .expect("failed to deserialize unknown ProxyFinishReason as fallback"); - assert_eq!(reason, ProxyFinishReason::Unknown); -} - -#[rstest] -#[case(ProxyFinishReason::Stop, LlmFinishReason::Stop)] -#[case(ProxyFinishReason::Length, LlmFinishReason::Length)] -#[case(ProxyFinishReason::ToolUse, LlmFinishReason::ToolUse)] -#[case(ProxyFinishReason::ContentFilter, LlmFinishReason::ContentFilter)] -#[case(ProxyFinishReason::Unknown, LlmFinishReason::Unknown)] -fn test_proxy_finish_reason_conversion( - #[case] input: ProxyFinishReason, - #[case] expected: LlmFinishReason, -) { - assert_eq!(LlmFinishReason::from(input), expected); -} - -#[test] -fn test_job_description_deserialization() { - let json = r#"{"title":"Test","description":"desc","project_dir":null}"#; - let job: JobDescription = serde_json::from_str(json) - .expect("failed to deserialize JobDescription in test_job_description_deserialization"); - assert_eq!(job.title, "Test"); - assert_eq!(job.description, "desc"); - assert!(job.project_dir.is_none()); -} - -#[test] -fn remote_tool_catalog_url_construction() { - let client = WorkerHttpClient::new( - "http://host.docker.internal:50051".to_string(), - Uuid::nil(), - TEST_BEARER_TOKEN.to_string(), - ); - - assert_eq!( - client.url(REMOTE_TOOL_CATALOG_PATH), - format!( - "http://host.docker.internal:50051{}", - REMOTE_TOOL_CATALOG_ROUTE.replace("{job_id}", &Uuid::nil().to_string()) - ) - ); -} - -#[test] -fn test_status_update_new_serializes_worker_state() { - let update = StatusUpdate::new(WorkerState::InProgress, Some("Iteration 1".to_string()), 1); - let value = serde_json::to_value(update).expect( - "failed to serialize StatusUpdate in test_status_update_new_serializes_worker_state", - ); - - assert_eq!(value["state"], "in_progress"); - assert_eq!(value["message"], "Iteration 1"); - assert_eq!(value["iteration"], 1); -} - -#[test] -fn test_status_update_deserializes_worker_state() { - let update: StatusUpdate = serde_json::from_value(json!({ - "state": "failed", - "message": "boom", - "iteration": 7 - })) - .expect("failed to deserialize StatusUpdate in test_status_update_deserializes_worker_state"); - - assert_eq!(update.state, WorkerState::Failed); - assert_eq!(update.message.as_deref(), Some("boom")); - assert_eq!(update.iteration, 7); -} - -async fn reject_catalog( - State(_state): State, - Path(_job_id): Path, -) -> (axum::http::StatusCode, &'static str) { - (axum::http::StatusCode::FORBIDDEN, "nope") -} - -#[rstest] -#[tokio::test] -async fn remote_tool_catalog_reports_non_success_statuses( - remote_tool_failure_server: RemoteToolFailureServerFactory, -) { - let server = remote_tool_failure_server(RemoteToolFailureRoute::Catalog).await; - - let client = WorkerHttpClient::new( - server.base_url, - Uuid::new_v4(), - TEST_BEARER_TOKEN.to_string(), - ); - - let err = client - .get_remote_tool_catalog() - .await - .expect_err("catalog fetch should fail on non-success status"); - - match err { - WorkerError::OrchestratorRejected { reason, .. } => { - assert!(reason.contains("GET /tools/catalog returned 403")); - } - other => panic!("unexpected worker error: {other:?}"), - }; - - server.handle.abort(); - let _ = server.handle.await; -} - -#[derive(Clone, Copy)] -struct TestState; - -async fn reject_execute( - State(_state): State, - Path(_job_id): Path, - Json(_req): Json, -) -> (StatusCode, &'static str) { - (StatusCode::BAD_REQUEST, "bad params") -} - -async fn reject_execute_forbidden( - State(_state): State, - Path(_job_id): Path, - Json(_req): Json, -) -> (StatusCode, &'static str) { - (StatusCode::FORBIDDEN, "approval required") -} - -#[rstest] -#[case(RemoteToolFailureRoute::ExecuteBadRequest, "bad params")] -#[case(RemoteToolFailureRoute::ExecuteForbidden, "approval required")] -#[case(RemoteToolFailureRoute::ExecuteRateLimited, "slow down")] -#[case(RemoteToolFailureRoute::ExecuteBadGateway, "proxy failure")] -#[case(RemoteToolFailureRoute::ExecuteInternalError, "remote tool blew up")] -#[tokio::test] -async fn remote_tool_execute_preserves_non_success_statuses( - remote_tool_failure_server: RemoteToolFailureServerFactory, - #[case] route: RemoteToolFailureRoute, - #[case] expected_message: &str, -) { - let server = remote_tool_failure_server(route).await; - - let client = WorkerHttpClient::new( - server.base_url, - Uuid::new_v4(), - TEST_BEARER_TOKEN.to_string(), - ); - - let err = client - .execute_remote_tool("github_search", &serde_json::json!({"query": 7})) - .await - .expect_err("remote-tool execute should fail on non-success status"); - - match (route, err) { - (RemoteToolFailureRoute::ExecuteBadRequest, WorkerError::BadRequest { reason }) => { - assert!(reason.contains(expected_message)) - } - (RemoteToolFailureRoute::ExecuteForbidden, WorkerError::Unauthorized { reason }) => { - assert!(reason.contains(expected_message)) - } - ( - RemoteToolFailureRoute::ExecuteRateLimited, - WorkerError::RateLimited { - reason, - retry_after: Some(retry_after), - }, - ) => { - assert!(reason.contains(expected_message)); - assert_eq!(retry_after, std::time::Duration::from_secs(7)); - } - (RemoteToolFailureRoute::ExecuteBadGateway, WorkerError::BadGateway { reason }) => { - assert!(reason.contains(expected_message)) - } - ( - RemoteToolFailureRoute::ExecuteInternalError, - WorkerError::RemoteToolFailed { reason }, - ) => { - assert!(reason.contains(expected_message)) - } - (_, other) => panic!("unexpected worker error: {other:?}"), - } - - server.handle.abort(); - let _ = server.handle.await; -} - -type RemoteToolFailureServerFuture = Pin + Send>>; - -struct RemoteToolFailureServer { - base_url: String, - handle: tokio::task::JoinHandle<()>, -} - -#[fixture] -fn remote_tool_failure_server() -> RemoteToolFailureServerFactory { - Box::new(|route| { - Box::pin(async move { - let listener = tokio::net::TcpListener::bind("127.0.0.1:0") - .await - .expect("bind listener"); - let addr = listener.local_addr().expect("listener addr"); - let router = match route { - RemoteToolFailureRoute::Catalog => Router::new() - .route(REMOTE_TOOL_CATALOG_ROUTE, get(reject_catalog)) - .with_state(TestState), - RemoteToolFailureRoute::ExecuteBadRequest => Router::new() - .route(REMOTE_TOOL_EXECUTE_ROUTE, post(reject_execute)) - .with_state(TestState), - RemoteToolFailureRoute::ExecuteForbidden => Router::new() - .route(REMOTE_TOOL_EXECUTE_ROUTE, post(reject_execute_forbidden)) - .with_state(TestState), - RemoteToolFailureRoute::ExecuteRateLimited => Router::new() - .route(REMOTE_TOOL_EXECUTE_ROUTE, post(reject_execute_rate_limited)) - .with_state(TestState), - RemoteToolFailureRoute::ExecuteBadGateway => Router::new() - .route(REMOTE_TOOL_EXECUTE_ROUTE, post(reject_execute_bad_gateway)) - .with_state(TestState), - RemoteToolFailureRoute::ExecuteInternalError => Router::new() - .route( - REMOTE_TOOL_EXECUTE_ROUTE, - post(reject_execute_internal_error), - ) - .with_state(TestState), - }; - let handle = tokio::spawn(async move { - axum::serve(listener, router).await.expect("serve router"); - }); - - RemoteToolFailureServer { - base_url: format!("http://{}", addr), - handle, - } - }) - }) -} - -#[derive(Clone, Copy)] -enum RemoteToolFailureRoute { - Catalog, - ExecuteBadRequest, - ExecuteForbidden, - ExecuteRateLimited, - ExecuteBadGateway, - ExecuteInternalError, -} - -async fn reject_execute_bad_gateway( - State(_state): State, - Path(_job_id): Path, - Json(_req): Json, -) -> (StatusCode, &'static str) { - (StatusCode::BAD_GATEWAY, "proxy failure") -} - -async fn reject_execute_internal_error( - State(_state): State, - Path(_job_id): Path, - Json(_req): Json, -) -> (StatusCode, &'static str) { - (StatusCode::INTERNAL_SERVER_ERROR, "remote tool blew up") -} - -type RemoteToolFailureServerFactory = - Box RemoteToolFailureServerFuture + Send + Sync>; - -async fn reject_execute_rate_limited( - State(_state): State, - Path(_job_id): Path, - Json(_req): Json, -) -> axum::response::Response { - ( - StatusCode::TOO_MANY_REQUESTS, - [("retry-after", "7")], - "slow down", - ) - .into_response() -} diff --git a/src/worker/api/tests/finish_reason.rs b/src/worker/api/tests/finish_reason.rs new file mode 100644 index 000000000..6152f825a --- /dev/null +++ b/src/worker/api/tests/finish_reason.rs @@ -0,0 +1,81 @@ +//! Finish reason deserialisation and conversion tests. + +use rstest::rstest; +use serde_json::json; + +use crate::llm::FinishReason as LlmFinishReason; +use crate::worker::api::types::FinishReason as ProxyFinishReason; + +#[rstest] +#[case(json!("stop"), ProxyFinishReason::Stop)] +#[case(json!("length"), ProxyFinishReason::Length)] +#[case(json!("tool_use"), ProxyFinishReason::ToolUse)] +#[case(json!("tool_calls"), ProxyFinishReason::ToolUse)] +#[case(json!("content_filter"), ProxyFinishReason::ContentFilter)] +#[case(json!("unknown"), ProxyFinishReason::Unknown)] +fn test_proxy_finish_reason_deserialization( + #[case] input: serde_json::Value, + #[case] expected: ProxyFinishReason, +) { + let actual: ProxyFinishReason = serde_json::from_value(input).expect( + "failed to deserialize ProxyFinishReason in test_proxy_finish_reason_deserialization", + ); + assert_eq!(actual, expected); +} + +#[test] +fn test_proxy_finish_reason_unknown_provider_value_falls_back() { + let reason = serde_json::from_value::(json!("made_up_reason")) + .expect("failed to deserialize unknown ProxyFinishReason as fallback"); + assert_eq!(reason, ProxyFinishReason::Unknown); +} + +#[rstest] +#[case(ProxyFinishReason::Stop, LlmFinishReason::Stop)] +#[case(ProxyFinishReason::Length, LlmFinishReason::Length)] +#[case(ProxyFinishReason::ToolUse, LlmFinishReason::ToolUse)] +#[case(ProxyFinishReason::ContentFilter, LlmFinishReason::ContentFilter)] +#[case(ProxyFinishReason::Unknown, LlmFinishReason::Unknown)] +fn test_proxy_finish_reason_conversion( + #[case] input: ProxyFinishReason, + #[case] expected: LlmFinishReason, +) { + assert_eq!(LlmFinishReason::from(input), expected); +} + +#[test] +fn test_job_description_deserialization() { + let json = r#"{"title":"Test","description":"desc","project_dir":null}"#; + let job: crate::worker::api::JobDescription = serde_json::from_str(json) + .expect("failed to deserialize JobDescription in test_job_description_deserialization"); + assert_eq!(job.title, "Test"); + assert_eq!(job.description, "desc"); + assert!(job.project_dir.is_none()); +} + +#[test] +fn test_status_update_new_serializes_worker_state() { + use crate::worker::api::{StatusUpdate, WorkerState}; + + let update = StatusUpdate::new(WorkerState::Running, None, 0); + let json = serde_json::to_string(&update).expect("failed to serialize StatusUpdate"); + let value: serde_json::Value = + serde_json::from_str(&json).expect("failed to parse serialized StatusUpdate"); + assert_eq!(value["state"], "running"); +} + +#[test] +fn test_status_update_deserializes_worker_state() { + use crate::worker::api::{StatusUpdate, WorkerState}; + + let json = r#"{"state":"completed","message":"done","iteration":5}"#; + let update: StatusUpdate = + serde_json::from_str(json).expect("failed to deserialize StatusUpdate"); + assert_eq!( + update.state, + WorkerState::Completed, + "\"state\":\"completed\" must map to WorkerState::Completed" + ); + assert_eq!(update.message, Some("done".to_string())); + assert_eq!(update.iteration, 5); +} diff --git a/src/worker/api/tests/fixtures.rs b/src/worker/api/tests/fixtures.rs new file mode 100644 index 000000000..7ac47196f --- /dev/null +++ b/src/worker/api/tests/fixtures.rs @@ -0,0 +1,206 @@ +//! Shared test fixtures for worker API tests. + +use axum::extract::{Path, State}; +use axum::http::StatusCode; +use axum::response::IntoResponse; +use axum::routing::{get, post}; +use axum::{Json, Router}; +use rstest::fixture; +use serde_json::json; +use std::future::Future; +use std::pin::Pin; +use uuid::Uuid; + +use crate::worker::api::{ + REMOTE_TOOL_CATALOG_ROUTE, REMOTE_TOOL_EXECUTE_ROUTE, RemoteToolCatalogResponse, + RemoteToolExecutionRequest, +}; + +/// Unit type used as shared Axum state for failure-mode test routers. +/// +/// Implements `Clone` and `Copy` so it can be cheaply shared across +/// handler extractors without allocation. +#[derive(Clone, Copy)] +pub struct TestState; + +/// Selects which remote-tool API route the failure server should reject. +/// +/// Each variant maps to a distinct HTTP status code returned by the mock +/// server, letting tests exercise individual error-handling paths in the +/// remote-tool client. +#[derive(Clone, Copy)] +pub enum RemoteToolFailureRoute { + /// Returns `403 Forbidden` on the catalog endpoint. + Catalog, + /// Returns `400 Bad Request` on the execute endpoint. + ExecuteBadRequest, + /// Returns `403 Forbidden` on the execute endpoint. + ExecuteForbidden, + /// Returns `429 Too Many Requests` (with `Retry-After`) on execute. + ExecuteRateLimited, + /// Returns `502 Bad Gateway` on the execute endpoint. + ExecuteBadGateway, + /// Returns `500 Internal Server Error` on the execute endpoint. + ExecuteInternalError, +} + +/// A pinned, boxed future that resolves to a [`RemoteToolFailureServer`] or +/// an error. Returned by the factory closure so that server creation can be +/// performed asynchronously. +pub type RemoteToolFailureServerFuture = + Pin> + Send>>; + +/// A running mock server that rejects requests on a chosen route. +pub struct RemoteToolFailureServer { + /// The `http://host:port` base URL of the listening server. + pub base_url: String, + /// Join handle for the background Tokio task serving requests. + pub handle: tokio::task::JoinHandle<()>, +} + +impl Drop for RemoteToolFailureServer { + fn drop(&mut self) { + self.handle.abort(); + } +} + +/// Thread-safe factory that spawns a [`RemoteToolFailureServer`] configured +/// to fail on the given [`RemoteToolFailureRoute`]. +pub type RemoteToolFailureServerFactory = + Box RemoteToolFailureServerFuture + Send + Sync>; + +/// Returns a factory closure that spawns a failure-mode mock server. +/// +/// Call the returned closure with a [`RemoteToolFailureRoute`] to start a +/// server bound to an ephemeral port that rejects requests on the chosen +/// route with an appropriate HTTP error status. +#[fixture] +pub fn remote_tool_failure_server() -> RemoteToolFailureServerFactory { + Box::new(|route| { + Box::pin(async move { + let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await?; + let addr = listener.local_addr()?; + let router = match route { + RemoteToolFailureRoute::Catalog => Router::new() + .route(REMOTE_TOOL_CATALOG_ROUTE, get(reject_catalog)) + .with_state(TestState), + RemoteToolFailureRoute::ExecuteBadRequest => Router::new() + .route(REMOTE_TOOL_EXECUTE_ROUTE, post(reject_execute)) + .with_state(TestState), + RemoteToolFailureRoute::ExecuteForbidden => Router::new() + .route(REMOTE_TOOL_EXECUTE_ROUTE, post(reject_execute_forbidden)) + .with_state(TestState), + RemoteToolFailureRoute::ExecuteRateLimited => Router::new() + .route(REMOTE_TOOL_EXECUTE_ROUTE, post(reject_execute_rate_limited)) + .with_state(TestState), + RemoteToolFailureRoute::ExecuteBadGateway => Router::new() + .route(REMOTE_TOOL_EXECUTE_ROUTE, post(reject_execute_bad_gateway)) + .with_state(TestState), + RemoteToolFailureRoute::ExecuteInternalError => Router::new() + .route( + REMOTE_TOOL_EXECUTE_ROUTE, + post(reject_execute_internal_error), + ) + .with_state(TestState), + }; + let handle = tokio::spawn(async move { + let _ = axum::serve(listener, router).await; + }); + + Ok(RemoteToolFailureServer { + base_url: format!("http://{}", addr), + handle, + }) + }) + }) +} + +#[fixture] +pub fn sample_catalog_response() -> RemoteToolCatalogResponse { + RemoteToolCatalogResponse { + tools: vec![crate::test_support::build_complex_tool_definition( + "test_tool", + "A **complex** test tool with UTF-8: \u{1F680}\u{1F4A1}.", + )], + toolset_instructions: vec![ + "Prefer remote tools for external systems.".to_string(), + "Use local tools for filesystem operations.".to_string(), + ], + catalog_version: 42, + } +} + +#[fixture] +pub fn sample_execution_request() -> RemoteToolExecutionRequest { + RemoteToolExecutionRequest { + tool_name: "complex_tool".to_string(), + params: json!({ + "query": "test query", + "options": {"limit": 25} + }), + } +} + +#[fixture] +pub fn sample_execution_response() -> crate::worker::api::RemoteToolExecutionResponse { + crate::worker::api::RemoteToolExecutionResponse { + output: crate::tools::ToolOutput::success( + json!({"result": "success", "data": [1, 2, 3]}), + std::time::Duration::from_millis(42), + ) + .with_cost(rust_decimal::Decimal::new(150, 2)) + .with_raw("raw execution output"), + } +} + +async fn reject_catalog( + State(_state): State, + Path(_job_id): Path, +) -> (StatusCode, &'static str) { + (StatusCode::FORBIDDEN, "nope") +} + +async fn reject_execute( + State(_state): State, + Path(_job_id): Path, + Json(_req): Json, +) -> (StatusCode, &'static str) { + (StatusCode::BAD_REQUEST, "bad params") +} + +async fn reject_execute_forbidden( + State(_state): State, + Path(_job_id): Path, + Json(_req): Json, +) -> (StatusCode, &'static str) { + (StatusCode::FORBIDDEN, "approval required") +} + +async fn reject_execute_rate_limited( + State(_state): State, + Path(_job_id): Path, + Json(_req): Json, +) -> axum::response::Response { + ( + StatusCode::TOO_MANY_REQUESTS, + [("retry-after", "7")], + "slow down", + ) + .into_response() +} + +async fn reject_execute_bad_gateway( + State(_state): State, + Path(_job_id): Path, + Json(_req): Json, +) -> (StatusCode, &'static str) { + (StatusCode::BAD_GATEWAY, "proxy failure") +} + +async fn reject_execute_internal_error( + State(_state): State, + Path(_job_id): Path, + Json(_req): Json, +) -> (StatusCode, &'static str) { + (StatusCode::INTERNAL_SERVER_ERROR, "remote tool blew up") +} diff --git a/src/worker/api/tests/mod.rs b/src/worker/api/tests/mod.rs new file mode 100644 index 000000000..93562ddb2 --- /dev/null +++ b/src/worker/api/tests/mod.rs @@ -0,0 +1,8 @@ +//! Tests for the worker HTTP client and its shared API type conversions. + +mod finish_reason; +mod fixtures; +mod remote_tool_catalog; +mod remote_tool_execute; +mod transport_types; +mod url_construction; diff --git a/src/worker/api/tests/remote_tool_catalog.rs b/src/worker/api/tests/remote_tool_catalog.rs new file mode 100644 index 000000000..bdcf49dea --- /dev/null +++ b/src/worker/api/tests/remote_tool_catalog.rs @@ -0,0 +1,40 @@ +//! Remote tool catalog fetching tests. + +use rstest::rstest; +use uuid::Uuid; + +use crate::error::WorkerError; +use crate::testing::credentials::TEST_BEARER_TOKEN; +use crate::worker::api::WorkerHttpClient; + +use super::fixtures::{ + RemoteToolFailureRoute, RemoteToolFailureServerFactory, remote_tool_failure_server, +}; + +#[rstest] +#[tokio::test] +async fn remote_tool_catalog_reports_non_success_statuses( + remote_tool_failure_server: RemoteToolFailureServerFactory, +) -> anyhow::Result<()> { + let server = remote_tool_failure_server(RemoteToolFailureRoute::Catalog).await?; + + let client = WorkerHttpClient::new( + server.base_url.clone(), + Uuid::new_v4(), + TEST_BEARER_TOKEN.to_string(), + ); + + let err = client + .get_remote_tool_catalog() + .await + .expect_err("catalog fetch should fail on non-success status"); + + match err { + WorkerError::OrchestratorRejected { reason, .. } => { + assert!(reason.contains("GET /tools/catalog returned 403")); + } + other => panic!("unexpected worker error: {other:?}"), + }; + + Ok(()) +} diff --git a/src/worker/api/tests/remote_tool_execute.rs b/src/worker/api/tests/remote_tool_execute.rs new file mode 100644 index 000000000..e0a021941 --- /dev/null +++ b/src/worker/api/tests/remote_tool_execute.rs @@ -0,0 +1,69 @@ +//! Remote tool execution tests. + +use rstest::rstest; +use uuid::Uuid; + +use crate::error::WorkerError; +use crate::testing::credentials::TEST_BEARER_TOKEN; +use crate::worker::api::WorkerHttpClient; + +use super::fixtures::{ + RemoteToolFailureRoute, RemoteToolFailureServerFactory, remote_tool_failure_server, +}; + +#[rstest] +#[case(RemoteToolFailureRoute::ExecuteBadRequest, "bad params")] +#[case(RemoteToolFailureRoute::ExecuteForbidden, "approval required")] +#[case(RemoteToolFailureRoute::ExecuteRateLimited, "slow down")] +#[case(RemoteToolFailureRoute::ExecuteBadGateway, "proxy failure")] +#[case(RemoteToolFailureRoute::ExecuteInternalError, "remote tool blew up")] +#[tokio::test] +async fn remote_tool_execute_preserves_non_success_statuses( + remote_tool_failure_server: RemoteToolFailureServerFactory, + #[case] route: RemoteToolFailureRoute, + #[case] expected_message: &str, +) -> anyhow::Result<()> { + let server = remote_tool_failure_server(route).await?; + + let client = WorkerHttpClient::new( + server.base_url.clone(), + Uuid::new_v4(), + TEST_BEARER_TOKEN.to_string(), + ); + + let err = client + .execute_remote_tool("github_search", &serde_json::json!({"query": 7})) + .await + .expect_err("remote-tool execute should fail on non-success status"); + + match (route, err) { + (RemoteToolFailureRoute::ExecuteBadRequest, WorkerError::BadRequest { reason }) => { + assert!(reason.contains(expected_message)) + } + (RemoteToolFailureRoute::ExecuteForbidden, WorkerError::Unauthorized { reason }) => { + assert!(reason.contains(expected_message)) + } + ( + RemoteToolFailureRoute::ExecuteRateLimited, + WorkerError::RateLimited { + reason, + retry_after: Some(retry_after), + }, + ) => { + assert!(reason.contains(expected_message)); + assert_eq!(retry_after, std::time::Duration::from_secs(7)); + } + (RemoteToolFailureRoute::ExecuteBadGateway, WorkerError::BadGateway { reason }) => { + assert!(reason.contains(expected_message)) + } + ( + RemoteToolFailureRoute::ExecuteInternalError, + WorkerError::RemoteToolFailed { reason }, + ) => { + assert!(reason.contains(expected_message)) + } + (_, other) => panic!("unexpected worker error: {other:?}"), + } + + Ok(()) +} diff --git a/src/worker/api/tests/transport_types.rs b/src/worker/api/tests/transport_types.rs new file mode 100644 index 000000000..2de5d2519 --- /dev/null +++ b/src/worker/api/tests/transport_types.rs @@ -0,0 +1,107 @@ +//! Transport type serialisation fidelity tests. + +use rstest::rstest; + +use crate::worker::api::{ + REMOTE_TOOL_CATALOG_ROUTE, REMOTE_TOOL_EXECUTE_ROUTE, RemoteToolCatalogResponse, + RemoteToolExecutionRequest, RemoteToolExecutionResponse, +}; + +use super::fixtures::{ + sample_catalog_response, sample_execution_request, sample_execution_response, +}; + +#[test] +fn worker_and_orchestrator_share_remote_tool_route_constants() { + // These constants are declared in worker::api::types but are shared with the + // orchestrator. The orchestrator imports them from worker::api to ensure both + // sides use identical route paths for hosted remote-tool operations. + + // Verify the raw route constants contain the expected placeholder and path + // segments so route parity holds by construction, not just by expansion. + assert!( + REMOTE_TOOL_CATALOG_ROUTE.contains("{job_id}"), + "catalog route constant must contain the {{job_id}} placeholder" + ); + assert!( + REMOTE_TOOL_CATALOG_ROUTE.contains("/worker/"), + "catalog route constant must include the /worker/ prefix" + ); + assert!( + REMOTE_TOOL_CATALOG_ROUTE.contains("/tools/catalog"), + "catalog route constant must include the /tools/catalog segment" + ); + assert!( + REMOTE_TOOL_EXECUTE_ROUTE.contains("{job_id}"), + "execute route constant must contain the {{job_id}} placeholder" + ); + assert!( + REMOTE_TOOL_EXECUTE_ROUTE.contains("/worker/"), + "execute route constant must include the /worker/ prefix" + ); + assert!( + REMOTE_TOOL_EXECUTE_ROUTE.contains("/tools/execute"), + "execute route constant must include the /tools/execute segment" + ); + + let job_id = "12345678-1234-1234-1234-123456789012"; + + let catalog_route = REMOTE_TOOL_CATALOG_ROUTE.replace("{job_id}", job_id); + let execute_route = REMOTE_TOOL_EXECUTE_ROUTE.replace("{job_id}", job_id); + + assert_eq!( + catalog_route, + format!("/worker/{}/tools/catalog", job_id), + "catalog route must expand job_id parameter correctly" + ); + assert_eq!( + execute_route, + format!("/worker/{}/tools/execute", job_id), + "execute route must expand job_id parameter correctly" + ); +} + +#[rstest] +fn remote_tool_catalog_response_round_trip_without_field_loss( + sample_catalog_response: RemoteToolCatalogResponse, +) { + let serialized = serde_json::to_string(&sample_catalog_response) + .expect("serialize RemoteToolCatalogResponse"); + let deserialized: RemoteToolCatalogResponse = + serde_json::from_str(&serialized).expect("deserialize RemoteToolCatalogResponse"); + + assert_eq!( + deserialized, sample_catalog_response, + "catalog response must round-trip without field loss" + ); +} + +#[rstest] +fn remote_tool_execution_request_round_trip_without_field_loss( + sample_execution_request: RemoteToolExecutionRequest, +) { + let serialized = serde_json::to_string(&sample_execution_request) + .expect("serialize RemoteToolExecutionRequest"); + let deserialized: RemoteToolExecutionRequest = + serde_json::from_str(&serialized).expect("deserialize RemoteToolExecutionRequest"); + + assert_eq!( + deserialized, sample_execution_request, + "execution request must round-trip without field loss" + ); +} + +#[rstest] +fn remote_tool_execution_response_round_trip_without_field_loss( + sample_execution_response: RemoteToolExecutionResponse, +) { + let serialized = serde_json::to_string(&sample_execution_response) + .expect("serialize RemoteToolExecutionResponse"); + let deserialized: RemoteToolExecutionResponse = + serde_json::from_str(&serialized).expect("deserialize RemoteToolExecutionResponse"); + + assert_eq!( + deserialized, sample_execution_response, + "execution response must round-trip without field loss" + ); +} diff --git a/src/worker/api/tests/url_construction.rs b/src/worker/api/tests/url_construction.rs new file mode 100644 index 000000000..57a37c701 --- /dev/null +++ b/src/worker/api/tests/url_construction.rs @@ -0,0 +1,65 @@ +//! URL construction and routing tests. + +use rstest::rstest; +use uuid::Uuid; + +use crate::testing::credentials::TEST_BEARER_TOKEN; +use crate::worker::api::{REMOTE_TOOL_CATALOG_PATH, REMOTE_TOOL_CATALOG_ROUTE, WorkerHttpClient}; + +#[rstest] +#[case("llm/complete")] +#[case("credentials")] +fn test_url_construction(#[case] path: &str) { + let client = WorkerHttpClient::new( + "http://host.docker.internal:50051".to_string(), + Uuid::nil(), + TEST_BEARER_TOKEN.to_string(), + ); + + assert_eq!( + client.url(path), + format!( + "http://host.docker.internal:50051/worker/{}/{}", + Uuid::nil(), + path + ) + ); +} + +#[rstest] +#[case("llm/complete")] +#[case("credentials")] +fn test_url_construction_with_trailing_slash(#[case] path: &str) { + let client = WorkerHttpClient::new( + "http://host.docker.internal:50051/".to_string(), + Uuid::nil(), + TEST_BEARER_TOKEN.to_string(), + ); + + assert_eq!( + client.url(path), + format!( + "http://host.docker.internal:50051/worker/{}/{}", + Uuid::nil(), + path + ), + "trailing slash in base URL must be normalised away" + ); +} + +#[test] +fn remote_tool_catalog_url_construction() { + let client = WorkerHttpClient::new( + "http://host.docker.internal:50051".to_string(), + Uuid::nil(), + TEST_BEARER_TOKEN.to_string(), + ); + + assert_eq!( + client.url(REMOTE_TOOL_CATALOG_PATH), + format!( + "http://host.docker.internal:50051{}", + REMOTE_TOOL_CATALOG_ROUTE.replace("{job_id}", &Uuid::nil().to_string()) + ) + ); +} diff --git a/src/worker/api/types.rs b/src/worker/api/types.rs index e01188c4f..b5f8dced6 100644 --- a/src/worker/api/types.rs +++ b/src/worker/api/types.rs @@ -188,7 +188,7 @@ pub struct ProxyToolCompletionResponse { /// /// `tool_name` is the orchestrator tool identifier. `params` must match that /// tool's JSON Schema because the orchestrator validates and executes the call. -#[derive(Debug, Serialize, Deserialize)] +#[derive(Debug, Serialize, Deserialize, PartialEq)] pub struct RemoteToolExecutionRequest { /// Stable hosted remote-tool identifier known to both worker and orchestrator. pub tool_name: String, @@ -200,7 +200,7 @@ pub struct RemoteToolExecutionRequest { /// /// `output` is the tool's `ToolOutput`, including its result payload and /// reported side-effect metadata such as duration and optional cost. -#[derive(Debug, Serialize, Deserialize)] +#[derive(Debug, Serialize, Deserialize, PartialEq)] pub struct RemoteToolExecutionResponse { /// Tool execution output returned by the orchestrator. pub output: ToolOutput, @@ -212,7 +212,7 @@ pub struct RemoteToolExecutionResponse { /// optional human-readable guidance and defaults to an empty list. /// `catalog_version` is a deterministic content version derived from the /// serialized catalogue payload. -#[derive(Debug, Serialize, Deserialize)] +#[derive(Debug, Serialize, Deserialize, PartialEq)] pub struct RemoteToolCatalogResponse { pub tools: Vec, #[serde(default)] diff --git a/src/worker/container/tests/hosted_fidelity.rs b/src/worker/container/tests/hosted_fidelity.rs new file mode 100644 index 000000000..9092c692b --- /dev/null +++ b/src/worker/container/tests/hosted_fidelity.rs @@ -0,0 +1,106 @@ +//! End-to-end fidelity tests for hosted remote-tool proxy definitions. +//! +//! Verifies that a complex `ToolDefinition` round-trips through the +//! orchestrator catalogue endpoint and the worker-side proxy without +//! field loss or transformation. + +use std::sync::Arc; + +use axum::Json; +use axum::extract::{Path, State}; +use rstest::{fixture, rstest}; +use uuid::Uuid; + +use crate::llm::ToolDefinition; +use crate::worker::api::{RemoteToolCatalogResponse, WorkerHttpClient}; +use crate::worker::container::{WorkerConfig, WorkerRuntime}; + +use super::remote_tools::{TestState, spawn_test_server}; + +/// Test harness containing a [`WorkerRuntime`] configured with remote tools +/// and the server join handle for shutdown. +pub struct HostedCatalogHarness { + /// The worker runtime with remote tools registered. + pub runtime: WorkerRuntime, + /// Join handle for the background test server. + pub server: tokio::task::JoinHandle<()>, +} + +fn complex_orchestrator_tool_definition() -> ToolDefinition { + crate::test_support::build_complex_tool_definition( + "complex_fidelity_fixture", + concat!( + "A **complex** tool for end-to-end fidelity testing. ", + "Handles UTF-8: \u{1F680}\u{1F4A1}. ", + "Supports `inline code` and [markdown](https://example.com). ", + "Special chars: <>&\"'{}[]()." + ), + ) +} + +async fn remote_tool_catalog_with_complex_tool( + State(_state): State, + Path(_job_id): Path, +) -> Json { + Json(RemoteToolCatalogResponse { + tools: vec![complex_orchestrator_tool_definition()], + toolset_instructions: vec![], + catalog_version: 99, + }) +} + +/// Creates a [`HostedCatalogHarness`] with a test server serving the complex +/// tool catalog. +#[fixture] +async fn hosted_catalog_harness() -> Result> { + let (base_url, server) = spawn_test_server(remote_tool_catalog_with_complex_tool).await?; + + let client = Arc::new(WorkerHttpClient::new( + base_url.clone(), + Uuid::nil(), + "test".to_string(), + )); + let runtime = WorkerRuntime::from_client( + WorkerConfig { + job_id: Uuid::nil(), + orchestrator_url: base_url, + ..WorkerConfig::default() + }, + client, + ); + + Ok(HostedCatalogHarness { runtime, server }) +} + +#[rstest] +#[tokio::test] +async fn hosted_worker_proxy_definition_matches_orchestrator_canonical_definition( + #[future] hosted_catalog_harness: Result>, +) -> Result<(), Box> { + let HostedCatalogHarness { runtime, server } = hosted_catalog_harness.await?; + + runtime.register_remote_tools().await?; + + let proxy_tool = runtime + .tools + .get("complex_fidelity_fixture") + .await + .expect("complex tool proxy should be registered"); + + let proxy_definition = ToolDefinition { + name: proxy_tool.name().to_string(), + description: proxy_tool.description().to_string(), + parameters: proxy_tool.parameters_schema(), + }; + + let expected = complex_orchestrator_tool_definition(); + + assert_eq!( + proxy_definition, expected, + "worker-advertised proxy definition must match orchestrator canonical definition exactly" + ); + + server.abort(); + let _ = server.await; + Ok(()) +} diff --git a/src/worker/container/tests/mod.rs b/src/worker/container/tests/mod.rs index 2320d7b40..eb2bd77f6 100644 --- a/src/worker/container/tests/mod.rs +++ b/src/worker/container/tests/mod.rs @@ -4,6 +4,7 @@ use rstest::rstest; use super::*; +mod hosted_fidelity; mod pre_loop; mod remote_tools; pub mod test_support; diff --git a/src/worker/container/tests/remote_tools.rs b/src/worker/container/tests/remote_tools.rs index 101d7459e..7e2b98a32 100644 --- a/src/worker/container/tests/remote_tools.rs +++ b/src/worker/container/tests/remote_tools.rs @@ -70,10 +70,16 @@ async fn remote_tool_catalog_error( ) } +/// Minimal shared Axum state for test catalog routers. #[derive(Clone)] -struct TestState; - -async fn spawn_test_server( +pub(super) struct TestState; + +/// Spawns an ephemeral Axum server that serves the remote-tool catalog route. +/// +/// `H` is any Axum handler compatible with [`TestState`], and `T` is its +/// extractor tuple. Returns the `http://host:port` base URL and a join +/// handle for the background server task. +pub(super) async fn spawn_test_server( handler: H, ) -> Result<(String, tokio::task::JoinHandle<()>), Box> where