Skip to content

Complete 1.1.4 hosted-mode tests: fidelity, routing, parity (refactor)#57

Merged
leynos merged 20 commits into
mainfrom
add-hosted-mcp-schema-tests-ha7qtg
Mar 27, 2026
Merged

Complete 1.1.4 hosted-mode tests: fidelity, routing, parity (refactor)#57
leynos merged 20 commits into
mainfrom
add-hosted-mcp-schema-tests-ha7qtg

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Mar 23, 2026

Summary

  • Extends the 1.1.4 hosted-mode test matrix with a comprehensive, modular suite covering schema fidelity, execution routing, and worker-orchestrator parity across orchestrator, worker, and proxy layers. ExecPlan 1.1.4 is now reflected as COMPLETE in the canonical planning artifact.
  • Introduces a canonical ExecPlan document for roadmap item 1.1.4 and updates related planning/artifacts to reflect completion.
  • Refactors test organization: replaces the monolithic worker API tests module with a structured set of new test modules under src/* to improve maintainability and reuse fixtures.
  • Adds repository lock for test scaffolding: .agents/mcp/context_pack/packs/.repo.lock.
  • Updates documentation indices and planning artifacts to reflect the completed ExecPlan and expanded test coverage.
  • Extends test coverage across orchestrator, worker, and proxy, including transport-type round-trips, route-path verification, and parity checks.
  • Adds fixtures and support code for complex ToolDefinitions (nested JSON Schemas and UTF-8 content).
  • Updates roadmap and RFC documentation to reflect progress and completion.
  • Updates and expands test modules to integrate new tests and fixtures:
    • src/orchestrator/api/tests/catalogue_fidelity.rs (new)
    • src/orchestrator/api/tests/transport_parity.rs (new)
    • src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs (updated)
    • src/orchestrator/api/tests/remote_tools.rs (updated/augmented via new tests)
    • src/worker/container/tests/hosted_fidelity.rs (new)
    • src/worker/api/tests/mod.rs (new)
    • src/worker/api/tests/remote_tool_catalog.rs (new)
    • src/worker/api/tests/remote_tool_execute.rs (new)
    • src/worker/api/tests/transport_types.rs (new)
    • src/worker/api/tests/url_construction.rs (new)
    • src/worker/api/tests/finish_reason.rs (new)
    • src/worker/api/tests/fixtures.rs (new)
    • src/worker/api/tests/fixtures.ts (note: fixtures centralized under new module)
    • src/test_support.rs (new)
    • src/tools/builtin/worker_remote_tool_proxy.rs (refactored tests module)
    • src/tools/builtin/worker_remote_tool_proxy/tests.rs (new)
    • src/worker/api/tests/remote_tool_catalog.rs (new)
    • src/worker/api/tests/remote_tool_execute.rs (new)
  • Removed monolithic runtime tests file: src/worker/api/tests.rs (refactored into modular structure).

Changes

  • New: docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md (ExecPlan status updated to COMPLETE).
  • New: .agents/mcp/context_pack/packs/.repo.lock (repository lock file for test scaffolding).
  • Updated: docs/contents.md
    • Added entry for the new ExecPlan at execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md.
  • Implemented: extensive test suites to validate schema fidelity, execution routing, and worker-orchestrator contract parity across modules:
    • Orchestrator tests validating full ToolDefinition payload fidelity and deterministic catalogue versions.
    • Worker proxy tests ensuring proxy-reported tool definitions and outputs preserve full definition fidelity.
    • End-to-end tests asserting fidelity from orchestrator registry through worker proxy and back.
    • Contract-parity tests validating shared transport types and route constants across worker and orchestrator.
    • Transport round-trip tests for catalogue responses, execution requests, and execution responses.
    • Route-path verification tests for execution routing through the orchestrator endpoint.
  • Added: fixtures and support code for complex ToolDefinitions to exercise nested JSON Schemas and UTF-8 content.
  • Updated: docs/roadmap.md to reflect completion status for 1.1.4 and related updates.
  • Updated: docs/rfcs/0001-expose-mcp-tool-definitions.md to reflect test-driven progression and status changes.
  • Updated: relevant test modules to integrate new tests and fixtures:
    • src/orchestrator/api/tests/catalogue_fidelity.rs (new)
    • src/orchestrator/api/tests/transport_parity.rs (new)
    • src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs (updated)
    • src/orchestrator/api/tests/remote_tools.rs (updated/augmented)
    • src/worker/container/tests/hosted_fidelity.rs (new)
    • src/worker/api/tests/mod.rs (new)
    • src/worker/api/tests/remote_tool_catalog.rs (new)
    • src/worker/api/tests/remote_tool_execute.rs (new)
    • src/worker/api/tests/transport_types.rs (new)
    • src/worker/api/tests/url_construction.rs (new)
    • src/worker/api/tests/finish_reason.rs (new)
    • src/worker/api/tests/fixtures.rs (new)
    • src/test_support.rs (new)
    • src/tools/builtin/worker_remote_tool_proxy.rs (refactor: tests moved into dedicated module)
    • src/tools/builtin/worker_remote_tool_proxy/tests.rs (new)
  • Updated: docs/rfcs/0001-expose-mcp-tool-definitions.md and docs/roadmap.md to reflect 1.1.4 completion.
  • Updated: src/worker/api/types.rs to align with new shared transport behavior and enable parity checks.
  • Updated: src/worker/container/tests/hosted_fidelity.rs (new end-to-end fidelity tests).

Rationale

  • This PR closes the 1.1.4 testing workstream by delivering a disciplined, test-matrix-driven approach that guards schema fidelity, execution routing, and worker-orchestrator contract parity. It aligns with RFC 0001 and the roadmap to ensure visibility of hosted MCP tool definitions and their faithful transport across the system.

Scope and approach

  • No runtime behaviour changes; this PR adds tests, fixtures, and planning artifacts to lock down hosted MCP schema fidelity and execution routing.
  • Approach: define and implement test families (schema fidelity, behavioural parity, regression parity, contract parity) with explicit milestones, gates, tolerances, and risk mitigations. Tests are implemented in-process with fixtures and mock servers to avoid external dependencies.
  • Introduces a structured test layout that avoids monolithic test files, improving maintainability and enabling focused test execution.

Milestones (high level)

  • Milestone 1: audit existing coverage and identify gaps – completed.
  • Milestone 2: schema-fidelity tests – implemented.
  • Milestone 3: execution-routing tests – implemented.
  • Milestone 4: worker-orchestrator contract parity tests – implemented.
  • Milestone 5: behavioural tests feasibility – evaluated with decisions recorded in ExecPlan.
  • Milestone 6: documentation synchronization – completed.
  • Milestone 7: validate, gate, and publish – completed.

Validation plan

  • Run targeted tests per milestone with logs and verify no unintended runtime changes.
  • Gate criteria: all new tests pass and ExecPlan remains the canonical reference for 1.1.4.

Documentation impact

  • ExecPlan document added and marked COMPLETE as the canonical planning reference for 1.1.4.
  • Roadmap and RFC references updated to reflect test-plan progress and completion.
  • Notes for reviewers: ExecPlan is included and updated to COMPLETE; please review alignment between test matrix and implementation scope.

Next steps

  • Sign-off on the ExecPlan enables finalization of 1.1.4 milestones, update of operator-facing docs as needed, and merge.
  • Documentation synchronization will continue as part of the release workflow.

◳ Generated by DevBoxer and task references preserved in the ExecPlan document.

📎 Task: https://www.devboxer.com/task/cdab742f-1a7e-43c9-9bff-b5ac8b4fe0bf

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9e8d2715-86de-4fb3-b071-e1cfbdcb30f6

📥 Commits

Reviewing files that changed from the base of the PR and between a675be7 and f2a3fe2.

📒 Files selected for processing (2)
  • docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md
  • docs/roadmap.md

Summary by CodeRabbit

Release Notes

  • Documentation

    • Completed roadmap item 1.1.4 with comprehensive test matrix for hosted-mode schema fidelity and execution routing validation.
  • Tests

    • Expanded test infrastructure for remote tool catalogue and execution contract validation across orchestrator and worker integration layers.
  • Refactor

    • Simplified webhook server startup by removing internal listener-based helper APIs.

Walkthrough

Summarise hosted-mode test artefacts, add shared test builders, reorganise and expand unit/integration tests for schema fidelity, execution routing and transport parity, mark roadmap/RFC item 1.1.4 complete, and refactor webhook server test helpers and related test fixtures.

Changes

Cohort / File(s) Summary
Documentation & Roadmap
docs/contents.md, docs/roadmap.md, docs/rfcs/0001-expose-mcp-tool-definitions.md, docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md
Add ExecPlan doc for roadmap item 1.1.4, mark 1.1.4 complete, and update RFC status to reflect hosted-mode tests for schema fidelity, execution routing and worker–orchestrator contract parity.
Test support & builders
src/test_support.rs, src/lib.rs
Add reusable complex-tool schema builder and expose test_support under #[cfg(test)] pub(crate).
Fixture changes
src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs, src/orchestrator/api/tests/remote_tools.rs
Change StubTool.description to owned String; add complex_tool_definition() and complex_tool_stub(); update call sites to supply String.
Type derives for testing
src/worker/api/types.rs, src/tools/tool/traits.rs
Add PartialEq derives to shared transport types (RemoteToolExecutionRequest/Response, RemoteToolCatalogResponse, ToolOutput) for equality assertions in tests.
Worker API tests — reorganisation
src/worker/api/tests.rs, src/worker/api/tests/mod.rs, src/worker/api/tests/*
Remove monolithic test file; add structured test modules (finish_reason, fixtures, remote_tool_catalog, remote_tool_execute, transport_types, url_construction) and shared Axum server fixtures.
Worker HTTP client tests
src/worker/api/tests/remote_tool_catalog.rs, src/worker/api/tests/remote_tool_execute.rs, src/worker/api/tests/transport_types.rs, src/worker/api/tests/url_construction.rs
Add tests asserting precise WorkerError mappings for non-success responses, rate-limit parsing, route constants, URL expansion, and JSON round-trip equality for transport types.
Worker remote-tool proxy tests (moved + expanded)
src/tools/builtin/worker_remote_tool_proxy.rs, src/tools/builtin/worker_remote_tool_proxy/tests.rs
Move inline tests to separate tests.rs; add in-process Axum fixtures and end-to-end tests asserting catalogue→proxy fidelity, full ToolOutput preservation, and orchestrator routing to /worker/{job_id}/tools/execute.
Orchestrator API tests
src/orchestrator/api/tests.rs, src/orchestrator/api/tests/catalogue_fidelity.rs, src/orchestrator/api/tests/transport_parity.rs
Add catalogue_fidelity tests for catalogue equality and deterministic versioning; add transport_parity JSON round-trip tests for orchestrator↔worker transport types.
Worker container integration tests
src/worker/container/tests/mod.rs, src/worker/container/tests/hosted_fidelity.rs, src/worker/container/tests/remote_tools.rs
Add hosted_fidelity integration test to register remote tools from a hosted catalogue and assert exact ToolDefinition round-trip; increase visibility of TestState and spawn_test_server to pub(super).
Worker API tests removed
src/worker/api/tests.rs (deleted)
Remove legacy monolithic worker API test module (replaced by structured modules).
Webhook server tests refactor
src/channels/webhook_server.rs
Remove test-only listener-entry helpers that accepted pre-bound TcpListener; centralise startup via bind_and_spawn() and update tests to use an async started_webhook_server fixture and real bind-conflict scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Client as WorkerRuntime
    participant Worker as WorkerHttpClient
    participant Catalogue as RemoteCatalogServer
    participant Proxy as WorkerRemoteToolProxy
    participant Orchestrator as OrchestratorExecuteEndpoint

    Client->>Catalogue: GET /worker/{job_id}/tools/catalog
    Catalogue-->>Client: 200 RemoteToolCatalogResponse (ToolDefinition)
    Client->>Worker: register_remote_tools()
    Worker-->>Proxy: instantiate proxy exposing ToolDefinition
    Client->>Proxy: execute(tool, params)
    Proxy->>Orchestrator: POST /worker/{job_id}/tools/execute
    Orchestrator-->>Proxy: 200 RemoteToolExecutionResponse (ToolOutput)
    Proxy-->>Client: return ToolOutput (preserve all fields)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🛠️ Schema dances through the wire,
Tools rise from catalogue fire,
Worker and orchestrator align,
Fidelity tests confirm the line—
Hosted routing hums in time.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: completing a comprehensive test suite (1.1.4) covering schema fidelity, execution routing, and worker-orchestrator parity with a refactoring component.
Description check ✅ Passed The description is comprehensive and well-structured, covering summary, change type, validation steps, and addressing all template sections except database/rollback (not applicable to test/docs-only changes).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add-hosted-mcp-schema-tests-ha7qtg

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added scope: docs Documentation size: XS < 10 changed lines (excluding docs) risk: low Changes to docs, tests, or low-risk modules contributor: experienced 6-19 merged PRs labels Mar 23, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Mar 23, 2026

Reviewer's Guide

Adds the canonical ExecPlan for roadmap item 1.1.4 and a comprehensive hosted MCP test matrix that verifies schema fidelity, execution routing, and worker-orchestrator contract parity across orchestrator, worker, and proxy layers, plus minor webhook restart test hardening and documentation updates marking 1.1.4 as complete.

Sequence diagram for worker_remote_tool_proxy execution routing through orchestrator endpoint

sequenceDiagram
    participant Test as RoutePathTest
    participant Proxy as WorkerRemoteToolProxy
    participant Client as WorkerHttpClient
    participant Orchestrator as OrchestratorRouter
    participant Handler as execute_tool_with_route_capture
    participant State as RouteCapturingState

    Test->>Proxy: execute(params, JobContext)
    Proxy->>Client: send_execution_request(job_id, tool_name, params)
    Client->>Orchestrator: POST /worker/{job_id}/tools/execute
    Orchestrator->>Handler: route REMOTE_TOOL_EXECUTE_ROUTE
    Handler->>State: lock received_requests
    State-->>Handler: push(route_path, job_id, tool_name)
    Handler-->>Orchestrator: RemoteToolExecutionResponse
    Orchestrator-->>Client: HTTP 200 + JSON body
    Client-->>Proxy: ToolOutput
    Proxy-->>Test: ToolOutput
    Test-->>Test: assert route_path == "/worker/{job_id}/tools/execute"
    Test-->>Test: assert received_job_id == job_id
    Test-->>Test: assert tool_name == "route_test_tool"
Loading

Updated class diagram for remote tool transport and proxy types under test

classDiagram
    class ToolDefinition {
        +String name
        +String description
        +serde_json::Value parameters
    }

    class RemoteToolCatalogResponse {
        +Vec~ToolDefinition~ tools
        +Vec~String~ toolset_instructions
        +u64 catalog_version
    }

    class RemoteToolExecutionRequest {
        +String tool_name
        +serde_json::Value params
    }

    class ToolOutput {
        +serde_json::Value result
        +Option~rust_decimal::Decimal~ cost
        +Option~String~ raw
        +std::time::Duration duration
        +success(result, duration) ToolOutput
        +with_cost(cost) ToolOutput
        +with_raw(raw) ToolOutput
    }

    class RemoteToolExecutionResponse {
        +ToolOutput output
    }

    class WorkerRemoteToolProxy {
        -ToolDefinition definition
        -Arc~WorkerHttpClient~ client
        +new(definition, client) WorkerRemoteToolProxy
        +execute(params, job_context) ToolOutput
        +name() &str
        +description() &str
        +parameters_schema() serde_json::Value
    }

    class WorkerHttpClient {
        +String base_url
        +uuid::Uuid job_id
        +String token
        +new(base_url, job_id, token) WorkerHttpClient
    }

    class RouteCapturingState {
        +Arc~tokio::sync::Mutex~\<Vec~(String, uuid::Uuid, String)~\> received_requests
    }

    ToolDefinition <|-- RemoteToolCatalogResponse : aggregates
    ToolOutput <|-- RemoteToolExecutionResponse : contains
    WorkerRemoteToolProxy --> ToolDefinition : wraps
    WorkerRemoteToolProxy --> WorkerHttpClient : uses
    RemoteToolExecutionRequest --> ToolDefinition : references name
    RouteCapturingState --> RemoteToolExecutionRequest : records
    RemoteToolExecutionResponse --> ToolOutput : returns
Loading

File-Level Changes

Change Details Files
Strengthen worker remote-tool proxy tests to ensure full fidelity of tool definitions, outputs, and execution routing through the orchestrator endpoint.
  • Add a complex ToolDefinition fixture used to validate that proxy-reported name, description, and parameters can reconstruct the original definition exactly.
  • Add a test that executes a proxy against a mock HTTP server and asserts all ToolOutput fields (result, cost, raw, duration) are preserved.
  • Add a route-capturing test server that records incoming requests and asserts proxy executions hit the expected /worker/{job_id}/tools/execute path with the correct job id and tool name.
src/tools/builtin/worker_remote_tool_proxy.rs
Extend orchestrator remote-tools tests and fixtures to validate full ToolDefinition catalogue fidelity, deterministic catalog versions, and shared transport-type compatibility with worker types.
  • Introduce complex_tool_definition and complex_tool_stub fixtures that emit hosted-safe tools with nested JSON Schema, UTF-8, markdown, and special characters.
  • Add a catalogue test that registers the complex stub tool, calls the hosted remote-tool catalog endpoint, and asserts the returned ToolDefinition exactly matches the registry definition.
  • Add a test that verifies catalog_version is deterministic for identical tool sets and changes when tool contents differ.
  • Add a test that serializes orchestrator-built catalog and execution payloads and deserializes them into crate::worker::api shared types, asserting structural equality to prove contract parity.
src/orchestrator/api/tests/remote_tools.rs
src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs
Add worker-side tests to enforce route-constant parity and round-trip safety of shared remote-tool transport types.
  • Add tests that assert REMOTE_TOOL_CATALOG_ROUTE and REMOTE_TOOL_EXECUTE_ROUTE have the expected literal forms and expand {job_id} correctly, acting as a guardrail for route drift.
  • Add JSON round-trip tests for RemoteToolCatalogResponse, RemoteToolExecutionRequest, and RemoteToolExecutionResponse to ensure no field loss across serialization/deserialization, including complex ToolDefinition parameters and ToolOutput fields.
src/worker/api/types.rs
Add end-to-end worker container tests to ensure proxy definitions registered from the orchestrator match the orchestrator’s canonical ToolDefinitions exactly.
  • Add a complex_orchestrator_tool_definition fixture mirroring the orchestrator’s complex ToolDefinition structure, with nested JSON Schema and special characters.
  • Add a mock catalog handler that serves RemoteToolCatalogResponse containing the complex definition from a fake orchestrator.
  • Add a test that boots a WorkerRuntime against the mock orchestrator, triggers register_remote_tools, then reconstructs a ToolDefinition from the registered proxy and asserts equality with the canonical orchestrator definition.
src/worker/container/tests.rs
Harden webhook server restart behaviour test to validate rollback when restart is attempted on an already-bound address.
  • Replace the previous "invalid address" restart scenario with a realistic port-conflict scenario that binds a temporary listener and attempts to restart the webhook server to that in-use address, asserting the restart fails and the original address continues serving requests.
src/channels/webhook_server.rs
Add ExecPlan and documentation updates to mark roadmap item 1.1.4 complete and document the new test matrix.
  • Add a detailed ExecPlan document for 1.1.4 describing constraints, milestones, risks, the implemented schema-fidelity and execution-routing tests, and their validation gates.
  • Update roadmap to mark 1.1.4 as complete and note that all section 1.1 items are done.
  • Update RFC 0001 implementation status to include 1.1.4 and the comprehensive test matrix.
  • Update docs contents index to reference the new ExecPlan.
docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md
docs/roadmap.md
docs/rfcs/0001-expose-mcp-tool-definitions.md
docs/contents.md

Assessment against linked issues

Issue Objective Addressed Explanation
#16 Replace the placeholder SIGHUP scaffolding in tests/sighup_reload_integration.rs with real integration tests that exercise the hot-reload path end-to-end. The PR does not touch tests/sighup_reload_integration.rs or add any SIGHUP- or hot-reload-related tests. All added tests focus on MCP schema fidelity, execution routing, and worker–orchestrator remote-tool behaviour.
#16 Introduce worker–orchestrator contract-parity safeguards (e.g., a shared protocol module and accompanying tests) to ensure endpoint, route, and payload conventions stay in sync between worker and orchestrator.
#16 Add characterisation tests for terminal job-state persistence, covering persist_status, log_event, and terminal-transition paths in src/worker/job.rs. The PR does not modify src/worker/job.rs or add any tests around job lifecycle, status persistence, persist_status, log_event, or terminal state transitions. Its scope is limited to MCP-related transports, routing, and associated tests.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

codescene-delta-analysis[bot]

This comment was marked as outdated.

@github-actions github-actions Bot added scope: tool/builtin Built-in tools scope: orchestrator Container orchestrator scope: worker Container worker size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules and removed size: XS < 10 changed lines (excluding docs) risk: low Changes to docs, tests, or low-risk modules labels Mar 23, 2026
@leynos leynos changed the title Add ExecPlan for hosted MCP schema fidelity tests Add hosted MCP schema fidelity and execution-routing tests (1.1.4) Mar 23, 2026
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Mar 23, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response.

src/worker/api/types.rs

Comment on file

    fn remote_tool_transport_types_round_trip_without_field_loss() {
        let catalog_response = RemoteToolCatalogResponse {
            tools: vec![ToolDefinition {
                name: "test_tool".to_string(),
                description: "A **complex** test tool with UTF-8: \u{1F680}\u{1F4A1}.".to_string(),
                parameters: serde_json::json!({
                    "type": "object",
                    "title": "TestParams",
                    "properties": {
                        "query": {
                            "type": "string",
                            "minLength": 1,
                            "maxLength": 100
                        },
                        "options": {
                            "type": "object",
                            "properties": {
                                "limit": {"type": "integer", "minimum": 1, "maximum": 50}
                            },
                            "required": ["limit"]
                        }
                    },
                    "required": ["query", "options"]
                }),
            }],
            toolset_instructions: vec![
                "Prefer remote tools for external systems.".to_string(),
                "Use local tools for filesystem operations.".to_string(),
            ],
            catalog_version: 42,
        };

        let serialized =
            serde_json::to_string(&catalog_response).expect("serialize RemoteToolCatalogResponse");
        let deserialized: RemoteToolCatalogResponse =
            serde_json::from_str(&serialized).expect("deserialize RemoteToolCatalogResponse");

        assert_eq!(deserialized.tools.len(), catalog_response.tools.len());
        assert_eq!(deserialized.tools[0].name, catalog_response.tools[0].name);
        assert_eq!(
            deserialized.tools[0].description,
            catalog_response.tools[0].description
        );
        assert_eq!(
            deserialized.tools[0].parameters,
            catalog_response.tools[0].parameters
        );
        assert_eq!(
            deserialized.toolset_instructions,
            catalog_response.toolset_instructions
        );
        assert_eq!(
            deserialized.catalog_version,
            catalog_response.catalog_version
        );

        let execution_request = RemoteToolExecutionRequest {
            tool_name: "complex_tool".to_string(),
            params: serde_json::json!({
                "query": "test query",
                "options": {"limit": 25}
            }),
        };

        let serialized = serde_json::to_string(&execution_request)
            .expect("serialize RemoteToolExecutionRequest");
        let deserialized: RemoteToolExecutionRequest =
            serde_json::from_str(&serialized).expect("deserialize RemoteToolExecutionRequest");

        assert_eq!(deserialized.tool_name, execution_request.tool_name);
        assert_eq!(deserialized.params, execution_request.params);

        let execution_response = RemoteToolExecutionResponse {
            output: ToolOutput::success(
                serde_json::json!({"result": "success", "data": [1, 2, 3]}),
                std::time::Duration::from_millis(42),
            )
            .with_cost(rust_decimal::Decimal::new(150, 2))
            .with_raw("raw execution output"),
        };

        let serialized = serde_json::to_string(&execution_response)
            .expect("serialize RemoteToolExecutionResponse");
        let deserialized: RemoteToolExecutionResponse =
            serde_json::from_str(&serialized).expect("deserialize RemoteToolExecutionResponse");

        assert_eq!(deserialized.output.result, execution_response.output.result);
        assert_eq!(deserialized.output.cost, execution_response.output.cost);
        assert_eq!(deserialized.output.raw, execution_response.output.raw);
        assert_eq!(
            deserialized.output.duration,
            execution_response.output.duration
        );
    }

❌ New issue: Large Method
tests.remote_tool_transport_types_round_trip_without_field_loss has 86 lines, threshold = 70

@coderabbitai

This comment was marked as resolved.

@leynos leynos changed the title Add hosted MCP schema fidelity and execution-routing tests (1.1.4) Complete hosted MCP schema fidelity and execution-routing tests (1.1.4) Mar 23, 2026
codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos leynos marked this pull request as ready for review March 23, 2026 17:40
sourcery-ai[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7216ef007e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/orchestrator/api/tests/remote_tools.rs Outdated
Comment thread src/worker/api/types.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md`:
- Line 8: The document currently states "Status: COMPLETE" while Milestone 7 and
CI gates haven't been validated; either update the plan to mark "milestone 7" as
complete and add records of passing lint and test gates (include the specific
gate artifacts) before keeping "Status: COMPLETE", or change "Status: COMPLETE"
to "IN-PROGRESS" (or similar) until the lint and test gates are confirmed;
search for the literal "Status: COMPLETE" and "milestone 7" in this doc (and the
other affected locations) and make the consistent change and add a brief note
recording the lint/test pass artifacts when marking complete.
- Line 648: Replace the British spelling "synchronise" in the heading
"synchronise design and operator documentation" with the Oxford -ize form
"synchronize" (and similarly change any "Synchronise" to "Synchronize") to
conform to en-GB-oxendict; also scan this document for the other occurrence
mentioned (the similar heading/text later in the file) and update it the same
way so all instances use "-ize".
- Around line 913-916: The two fenced code blocks that currently show cargo
commands (the blocks containing "cargo fmt --all -- --check" and "cargo fmt
--manifest-path tools-src/github/Cargo.toml --all -- --check") need language
identifiers; update both fenced code blocks to use "```bash" as the opening
fence (also apply the same change to the other occurrence noted around the
second block) so markdown lint and repo style checks pass.

In `@src/orchestrator/api/tests/remote_tools.rs`:
- Around line 369-503: The file is over the size limit; move the large
catalogue/execution fidelity tests into new submodules to keep this file under
400 lines: create e.g. modules catalogue_fidelity and transport_parity and
relocate the tests remote_tool_catalog_preserves_full_tool_definition_payload,
remote_tool_catalog_version_is_deterministic_and_sensitive_to_content, and
orchestrator_responses_deserialize_into_worker_shared_types into the appropriate
new files, exporting any helpers they need (e.g., hosted_remote_tool_catalog,
complex_tool_stub, build_tool_fixture, ToolFixture) or importing the parent
crate items; update this file to mod catalogue_fidelity; mod transport_parity;
(or use pub mod) and ensure test attributes (#[tokio::test], #[rstest]) remain
on the moved functions so cargo test picks them up.
- Around line 444-503: Split the single test
orchestrator_responses_deserialize_into_worker_shared_types into three focused
tokio tests: one that only round-trips RemoteToolCatalogResponse (use
hosted_remote_tool_catalog, ToolRegistry/complex_tool_stub, assert tools
length/equality, toolset_instructions and catalog_version), one that only
round-trips RemoteToolExecutionRequest (construct execution_request,
serialize/deserialize, assert tool_name and params), and one that only
round-trips RemoteToolExecutionResponse (build execution_output via
tools::ToolOutput::success(...).with_cost(...).with_raw(...), wrap in
RemoteToolExecutionResponse, serialize/deserialize, assert output.result,
output.cost, output.raw, output.duration). Move any shared setup (like
registry/catalog helper calls) into small helpers used by the relevant test to
keep each test single-responsibility.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 55d624db-88e1-4b30-91f7-8cf8132805a9

📥 Commits

Reviewing files that changed from the base of the PR and between f164194 and 7216ef0.

⛔ Files ignored due to path filters (1)
  • .agents/mcp/context_pack/packs/.repo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • docs/contents.md
  • docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md
  • docs/rfcs/0001-expose-mcp-tool-definitions.md
  • docs/roadmap.md
  • src/channels/webhook_server.rs
  • src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs
  • src/orchestrator/api/tests/remote_tools.rs
  • src/tools/builtin/worker_remote_tool_proxy.rs
  • src/worker/api/types.rs
  • src/worker/container/tests.rs

Comment thread docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md Outdated
Comment thread docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md Outdated
Comment thread src/orchestrator/api/tests/remote_tools.rs Outdated
Comment thread src/orchestrator/api/tests/remote_tools.rs Outdated
@github-actions github-actions Bot added contributor: core 20+ merged PRs and removed contributor: experienced 6-19 merged PRs labels Mar 24, 2026
codescene-delta-analysis[bot]

This comment was marked as outdated.

Update the ExecPlan status from IN-PROGRESS to COMPLETE to match the
Outcomes & Retrospective section which states the status was updated to
COMPLETE.

Fix test_status_update_new_serializes_worker_state to parse JSON into
serde_json::Value and assert on the field value instead of using string
substring matching. This ensures the JSON field is present and correctly
typed.
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Mar 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Change the `started_webhook_server` rstest fixture to return
`Result<StartedWebhookServer, Box<dyn std::error::Error + Send + Sync>>`
instead of panicking on errors. Replace all `.expect()` calls with `?`.

Update both consuming tests to return `Result` and consume the fixture
with `.await?`. Replace `.expect()` calls on fallible operations with `?`,
and restructure the `old_result` assertion to avoid calling `.unwrap()`.
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Mar 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Add `HostedCatalogHarness` struct containing `WorkerRuntime` and server
join handle. Create async rstest fixture `hosted_catalog_harness` that
spawns a test server with the complex tool catalog and constructs the
runtime.

Update `hosted_worker_proxy_definition_matches_orchestrator_canonical_definition`
to consume the fixture instead of inlining client/runtime setup.
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Mar 27, 2026

@coderabbitai review

1 similar comment
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Mar 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/channels/webhook_server.rs (1)

64-71: ⚠️ Potential issue | 🟠 Major

Capture the actual bound address when binding to port 0.

When self.config.addr specifies port 0, the OS assigns an ephemeral port. The log now misleadingly prints port 0 instead of the real bound port. Worse, current_addr() will return the stale value, breaking the SIGHUP restart logic in src/main.rs (lines 792–812) which compares current_addr() against the new configured address.

Update self.config.addr from listener.local_addr() after binding succeeds.

Proposed fix
         let listener = tokio::net::TcpListener::bind(self.config.addr)
             .await
             .map_err(|e| ChannelError::StartupFailed {
                 name: "webhook_server".to_string(),
                 reason: format!("Failed to bind to {}: {}", self.config.addr, e),
             })?;

-        tracing::info!("Webhook server listening on {}", self.config.addr);
+        let actual_addr = listener.local_addr().map_err(|e| ChannelError::StartupFailed {
+            name: "webhook_server".to_string(),
+            reason: format!("Failed to get local address: {e}"),
+        })?;
+        self.config.addr = actual_addr;
+        tracing::info!("Webhook server listening on {}", actual_addr);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/webhook_server.rs` around lines 64 - 71, After successfully
binding the TcpListener (tokio::net::TcpListener::bind) capture the real bound
socket via listener.local_addr() and assign it back to self.config.addr before
emitting the tracing::info log and returning, so logs and downstream logic
(including current_addr() used by the SIGHUP restart path) reflect the actual
ephemeral port; update the tracing::info call to use the updated
self.config.addr and ensure this assignment happens immediately after the
bind/map_err block succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md`:
- Around line 217-220: Replace the first standalone "BDD" in the phrase "BDD
harness: if adding `rstest-bdd` behavioural coverage..." with "behaviour-driven
development (BDD)" so the sentence reads "behaviour-driven development (BDD)
harness: if adding `rstest-bdd` behavioural coverage..." and leave subsequent
occurrences of "BDD" unchanged; update only that first instance in the sentence
that begins with "BDD harness:".

---

Outside diff comments:
In `@src/channels/webhook_server.rs`:
- Around line 64-71: After successfully binding the TcpListener
(tokio::net::TcpListener::bind) capture the real bound socket via
listener.local_addr() and assign it back to self.config.addr before emitting the
tracing::info log and returning, so logs and downstream logic (including
current_addr() used by the SIGHUP restart path) reflect the actual ephemeral
port; update the tracing::info call to use the updated self.config.addr and
ensure this assignment happens immediately after the bind/map_err block
succeeds.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 24b39cde-4031-46db-b404-6454dee0593d

📥 Commits

Reviewing files that changed from the base of the PR and between f084073 and 75b2efd.

📒 Files selected for processing (7)
  • docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md
  • src/channels/webhook_server.rs
  • src/worker/api/tests/finish_reason.rs
  • src/worker/api/tests/fixtures.rs
  • src/worker/api/tests/remote_tool_catalog.rs
  • src/worker/api/tests/remote_tool_execute.rs
  • src/worker/container/tests/hosted_fidelity.rs

Comment thread docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md Outdated
Replace the first standalone "BDD" with "behaviour-driven development (BDD)"
in the harness constraint for clarity.
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Mar 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md`:
- Around line 907-908: The document currently claims "All validation gates …
passed successfully" but `markdownlint` reported issues in docs/roadmap.md;
either remove/fix the lint failures in docs/roadmap.md (resolve markdownlint
issues and re-run checks so the pass claim is accurate) or change the
summary/status sentence to reflect a blocked or passing-with-known-failures
state and update the Status/milestone wording accordingly (e.g., note which
gates failed: markdownlint on docs/roadmap.md), ensuring the statement and the
lint evidence are consistent.
- Around line 896-897: Summary: The docs use the non-canonical ToolOutput field
name "duration" instead of the transport-contract-compliant "duration_ms";
update the wording to match the contract and tests. Fix: in the "Full
`ToolOutput` field preservation" bullet and any related mentions (search for the
symbol ToolOutput and the literal "duration"), replace "duration" with
"duration_ms" so the documentation aligns with the field name used in the
transport contract and tests and keeps the coverage statement consistent with
the earlier reference to the contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 78560c9c-684d-498a-b9d9-ef445fc064d4

📥 Commits

Reviewing files that changed from the base of the PR and between 75b2efd and a675be7.

📒 Files selected for processing (1)
  • docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md

Comment thread docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md Outdated
Fix double blank lines in docs/roadmap.md that caused markdownlint
MD012 errors.

Change `duration_ms` to `duration` in the execplan to match the actual
ToolOutput struct field name. Update validation gate statement to
acknowledge pre-existing markdownlint issues in roadmap.md.

Wrap long BDD harness line to comply with MD013 line length limit.
codescene-delta-analysis[bot]

This comment was marked as outdated.

…ion status

Change "duration" to "duration_ms" in ToolOutput field references to align
with the transport contract wire format used in the Claude bridge.

Update validation gate statement to reflect that all gates now pass,
including markdownlint.
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Mar 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@leynos leynos merged commit 66582fa into main Mar 27, 2026
25 checks passed
leynos added a commit that referenced this pull request Mar 29, 2026
#57)

* docs(execplans): add execplan for schema fidelity and execution routing tests

Add a detailed execution plan (ExecPlan) that outlines the comprehensive testing strategy
for roadmap item 1.1.4. The plan addresses schema fidelity, execution routing,
and contract parity between worker and orchestrator, defining milestones, constraints,
and test gaps. The document includes approval gates, repository orientation, risk
assessment, and progression criteria to guide the implementation of targeted tests
ensuring no silent regressions in hosted-mode functionality.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(hosted-remote-tools): add tests for schema fidelity and execution routing

Add a comprehensive test matrix validating hosted-mode tools for schema fidelity, execution routing, and contract parity between orchestrator and worker. Implement new test fixtures with complex nested JSON schemas and special characters to verify full payload preservation through the tool catalog and execution proxy.

New tests include:
- Full ToolDefinition payload fidelity through the catalog endpoint.
- Proxy preserves all ToolDefinition and ToolOutput fields exactly.
- Execution routes correctly through the orchestrator endpoint.
- Catalog version determinism and sensitivity to tool changes.
- Round-trip serialization of shared worker/orchestrator transport types.
- Validation of route constants shared between worker and orchestrator.
- End-to-end validation that proxy definitions match orchestrator canonicals.

Also mark roadmap item 1.1.4 complete, update RFC 0001 implementation status and documentation accordingly.

This strengthens the worker-orchestrator contract guarantees and prevents silent field loss or routing errors in hosted remote tools.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker_remote_tool_proxy): fix format string reference in execution routing test

Changed format string argument to a reference to match expected type and improve clarity in test assertion for routing execution through orchestrator endpoint.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(api): add round-trip serialization tests for remote tool types

Added serialization and deserialization round-trip tests for RemoteToolCatalogResponse, RemoteToolExecutionRequest, and RemoteToolExecutionResponse to ensure no field loss during JSON processing.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker/api): add serialization fidelity and round-trip tests for remote tool API types

- Added comprehensive serialization and deserialization round-trip tests for RemoteToolCatalogResponse, RemoteToolExecutionRequest, and RemoteToolExecutionResponse types.
- Added remote tool route constant tests ensuring route strings match expected orchestrator endpoints and correct parameter expansion.
- Moved and expanded existing tests from worker/api/types.rs into dedicated test module in worker/api/tests.rs.
- Added new orchestrator API tests to validate catalog version independence of tool registration order and execution request/response serialization fidelity.
- Derived PartialEq on RemoteToolCatalogResponse to support equality assertions in tests.
- Updated documentation to synchronize terminology and reflect test pass status.

These tests improve confidence in API data structure integrity and interoperability between orchestrator and worker components.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker/api): refactor and modularize worker API tests

- Removed monolithic src/worker/api/tests.rs test file and split its contents into multiple focused test files under src/worker/api/tests/.
- Added new test modules: finish_reason.rs, fixtures.rs, mod.rs, remote_tool_catalog.rs, remote_tool_execute.rs, transport_types.rs, url_construction.rs.
- Introduced shared test fixtures to support remote tool failure simulation and sample data.
- Improved test organization by grouping tests by feature and responsibilities.
- Preserved existing test coverage for HTTP client behavior, API type conversions, route constants, and error handling.

This refactor improves test maintainability and readability by decomposing a large test file into specialized modules.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(orchestrator/api): add remote tool fidelity tests for schema and serialization

- Added new test module remote_tool_fidelity.rs covering:
  - Schema fidelity of remote tool catalog
  - Catalog versioning determinism and sensitivity to content
  - Catalog version independence from registration order
  - Round-trip serialization and deserialization of shared orchestrator-worker types
- Removed same tests from remote_tools.rs to centralize and clarify test coverage
- Minor formatting and import cleanup in worker api tests

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker_remote_tool_proxy): add ProxyTestServer fixture to simplify tests

Introduce `ProxyTestServer` fixture bundling an in-process execute-route server and a pre-wired HTTP client. This consolidates repeated server setup code in tests, improves readability, and enables using rstest features for cleaner asynchronous test setups.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* refactor(tests): centralize complex tool definition in shared test_support module

Extract complex tool definition JSON schema and builders into new src/test_support.rs module to reduce duplication across orchestrator and worker test suites. Update tests and fixtures to use the shared builder, improving consistency and maintainability of test data for tool definition fidelity testing.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker_remote_tool_proxy): move tests to a dedicated tests module

Moved all tests from the main worker_remote_tool_proxy module into a new
separate tests.rs file. This cleans up the main module and improves
maintainability by isolating test code from production code.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(remote_tools): add transport parity and hosted fidelity tests

- Add new test module `transport_parity` to verify serialisation round-trips for shared orchestrator-worker transport types.
- Add end-to-end fidelity tests in `hosted_fidelity` to ensure ToolDefinition matches exactly between orchestrator catalog and worker proxy.
- Rename `remote_tool_fidelity.rs` to `catalogue_fidelity.rs` and remove redundant tests now covered by transport parity.
- Improve test fixtures and refactor test declarations for better async error handling.
- Add assertions verifying route constant string contents to ensure route parity by construction.

These changes improve the robustness of remote tool integration testing and ensure data fidelity across the orchestrator and worker boundaries.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(channels/webhook_server): enhance test fixtures and coverage for restart_with_addr

- Introduced StartedWebhookServer fixture for cleaner test setup
- Simplified obtaining available ports with TcpListener for tests
- Added rstest parameterization and improved async test structure
- Verified server restart behavior including rollback on bind failure
- Ensured health endpoint responds correctly after restarts
- Improved documentation and test reliability in webhook_server tests

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker): add failure-mode mock server for remote-tool API tests

- Introduce RemoteToolFailureServer and RemoteToolFailureRoute enums
- Provide a fixture to spawn mock servers rejecting specific routes with HTTP errors
- Enhance test infrastructure in remote_tools.rs and fixtures.rs
- Enable targeted testing of error-handling paths in remote-tool client code

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* Implement Drop for RemoteToolFailureServer to abort background task

Add `impl Drop for RemoteToolFailureServer` that calls `self.handle.abort()`
to ensure the spawned Tokio task is cancelled when the test fixture is dropped.

Update test files to clone `base_url` instead of moving fields out of the
struct, which is no longer permitted with the Drop implementation. Remove
manual `abort()` and `await` calls in tests since Drop now handles cleanup.

* Update execplan status and fix JSON parsing in test

Update the ExecPlan status from IN-PROGRESS to COMPLETE to match the
Outcomes & Retrospective section which states the status was updated to
COMPLETE.

Fix test_status_update_new_serializes_worker_state to parse JSON into
serde_json::Value and assert on the field value instead of using string
substring matching. This ensures the JSON field is present and correctly
typed.

* Return Result from started_webhook_server fixture and consuming tests

Change the `started_webhook_server` rstest fixture to return
`Result<StartedWebhookServer, Box<dyn std::error::Error + Send + Sync>>`
instead of panicking on errors. Replace all `.expect()` calls with `?`.

Update both consuming tests to return `Result` and consume the fixture
with `.await?`. Replace `.expect()` calls on fallible operations with `?`,
and restructure the `old_result` assertion to avoid calling `.unwrap()`.

* Extract hosted_catalog_harness fixture for worker container tests

Add `HostedCatalogHarness` struct containing `WorkerRuntime` and server
join handle. Create async rstest fixture `hosted_catalog_harness` that
spawns a test server with the complex tool catalog and constructs the
runtime.

Update `hosted_worker_proxy_definition_matches_orchestrator_canonical_definition`
to consume the fixture instead of inlining client/runtime setup.

* Expand BDD acronym in execplan constraints

Replace the first standalone "BDD" with "behaviour-driven development (BDD)"
in the harness constraint for clarity.

* Fix markdownlint issues and duration field naming in execplan

Fix double blank lines in docs/roadmap.md that caused markdownlint
MD012 errors.

Change `duration_ms` to `duration` in the execplan to match the actual
ToolOutput struct field name. Update validation gate statement to
acknowledge pre-existing markdownlint issues in roadmap.md.

Wrap long BDD harness line to comply with MD013 line length limit.

* Align execplan with transport contract field names and update validation status

Change "duration" to "duration_ms" in ToolOutput field references to align
with the transport contract wire format used in the Claude bridge.

Update validation gate statement to reflect that all gates now pass,
including markdownlint.

---------

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
leynos added a commit that referenced this pull request Mar 29, 2026
#57)

* docs(execplans): add execplan for schema fidelity and execution routing tests

Add a detailed execution plan (ExecPlan) that outlines the comprehensive testing strategy
for roadmap item 1.1.4. The plan addresses schema fidelity, execution routing,
and contract parity between worker and orchestrator, defining milestones, constraints,
and test gaps. The document includes approval gates, repository orientation, risk
assessment, and progression criteria to guide the implementation of targeted tests
ensuring no silent regressions in hosted-mode functionality.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(hosted-remote-tools): add tests for schema fidelity and execution routing

Add a comprehensive test matrix validating hosted-mode tools for schema fidelity, execution routing, and contract parity between orchestrator and worker. Implement new test fixtures with complex nested JSON schemas and special characters to verify full payload preservation through the tool catalog and execution proxy.

New tests include:
- Full ToolDefinition payload fidelity through the catalog endpoint.
- Proxy preserves all ToolDefinition and ToolOutput fields exactly.
- Execution routes correctly through the orchestrator endpoint.
- Catalog version determinism and sensitivity to tool changes.
- Round-trip serialization of shared worker/orchestrator transport types.
- Validation of route constants shared between worker and orchestrator.
- End-to-end validation that proxy definitions match orchestrator canonicals.

Also mark roadmap item 1.1.4 complete, update RFC 0001 implementation status and documentation accordingly.

This strengthens the worker-orchestrator contract guarantees and prevents silent field loss or routing errors in hosted remote tools.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker_remote_tool_proxy): fix format string reference in execution routing test

Changed format string argument to a reference to match expected type and improve clarity in test assertion for routing execution through orchestrator endpoint.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(api): add round-trip serialization tests for remote tool types

Added serialization and deserialization round-trip tests for RemoteToolCatalogResponse, RemoteToolExecutionRequest, and RemoteToolExecutionResponse to ensure no field loss during JSON processing.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker/api): add serialization fidelity and round-trip tests for remote tool API types

- Added comprehensive serialization and deserialization round-trip tests for RemoteToolCatalogResponse, RemoteToolExecutionRequest, and RemoteToolExecutionResponse types.
- Added remote tool route constant tests ensuring route strings match expected orchestrator endpoints and correct parameter expansion.
- Moved and expanded existing tests from worker/api/types.rs into dedicated test module in worker/api/tests.rs.
- Added new orchestrator API tests to validate catalog version independence of tool registration order and execution request/response serialization fidelity.
- Derived PartialEq on RemoteToolCatalogResponse to support equality assertions in tests.
- Updated documentation to synchronize terminology and reflect test pass status.

These tests improve confidence in API data structure integrity and interoperability between orchestrator and worker components.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker/api): refactor and modularize worker API tests

- Removed monolithic src/worker/api/tests.rs test file and split its contents into multiple focused test files under src/worker/api/tests/.
- Added new test modules: finish_reason.rs, fixtures.rs, mod.rs, remote_tool_catalog.rs, remote_tool_execute.rs, transport_types.rs, url_construction.rs.
- Introduced shared test fixtures to support remote tool failure simulation and sample data.
- Improved test organization by grouping tests by feature and responsibilities.
- Preserved existing test coverage for HTTP client behavior, API type conversions, route constants, and error handling.

This refactor improves test maintainability and readability by decomposing a large test file into specialized modules.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(orchestrator/api): add remote tool fidelity tests for schema and serialization

- Added new test module remote_tool_fidelity.rs covering:
  - Schema fidelity of remote tool catalog
  - Catalog versioning determinism and sensitivity to content
  - Catalog version independence from registration order
  - Round-trip serialization and deserialization of shared orchestrator-worker types
- Removed same tests from remote_tools.rs to centralize and clarify test coverage
- Minor formatting and import cleanup in worker api tests

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker_remote_tool_proxy): add ProxyTestServer fixture to simplify tests

Introduce `ProxyTestServer` fixture bundling an in-process execute-route server and a pre-wired HTTP client. This consolidates repeated server setup code in tests, improves readability, and enables using rstest features for cleaner asynchronous test setups.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* refactor(tests): centralize complex tool definition in shared test_support module

Extract complex tool definition JSON schema and builders into new src/test_support.rs module to reduce duplication across orchestrator and worker test suites. Update tests and fixtures to use the shared builder, improving consistency and maintainability of test data for tool definition fidelity testing.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker_remote_tool_proxy): move tests to a dedicated tests module

Moved all tests from the main worker_remote_tool_proxy module into a new
separate tests.rs file. This cleans up the main module and improves
maintainability by isolating test code from production code.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(remote_tools): add transport parity and hosted fidelity tests

- Add new test module `transport_parity` to verify serialisation round-trips for shared orchestrator-worker transport types.
- Add end-to-end fidelity tests in `hosted_fidelity` to ensure ToolDefinition matches exactly between orchestrator catalog and worker proxy.
- Rename `remote_tool_fidelity.rs` to `catalogue_fidelity.rs` and remove redundant tests now covered by transport parity.
- Improve test fixtures and refactor test declarations for better async error handling.
- Add assertions verifying route constant string contents to ensure route parity by construction.

These changes improve the robustness of remote tool integration testing and ensure data fidelity across the orchestrator and worker boundaries.

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(channels/webhook_server): enhance test fixtures and coverage for restart_with_addr

- Introduced StartedWebhookServer fixture for cleaner test setup
- Simplified obtaining available ports with TcpListener for tests
- Added rstest parameterization and improved async test structure
- Verified server restart behavior including rollback on bind failure
- Ensured health endpoint responds correctly after restarts
- Improved documentation and test reliability in webhook_server tests

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* test(worker): add failure-mode mock server for remote-tool API tests

- Introduce RemoteToolFailureServer and RemoteToolFailureRoute enums
- Provide a fixture to spawn mock servers rejecting specific routes with HTTP errors
- Enhance test infrastructure in remote_tools.rs and fixtures.rs
- Enable targeted testing of error-handling paths in remote-tool client code

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>

* Implement Drop for RemoteToolFailureServer to abort background task

Add `impl Drop for RemoteToolFailureServer` that calls `self.handle.abort()`
to ensure the spawned Tokio task is cancelled when the test fixture is dropped.

Update test files to clone `base_url` instead of moving fields out of the
struct, which is no longer permitted with the Drop implementation. Remove
manual `abort()` and `await` calls in tests since Drop now handles cleanup.

* Update execplan status and fix JSON parsing in test

Update the ExecPlan status from IN-PROGRESS to COMPLETE to match the
Outcomes & Retrospective section which states the status was updated to
COMPLETE.

Fix test_status_update_new_serializes_worker_state to parse JSON into
serde_json::Value and assert on the field value instead of using string
substring matching. This ensures the JSON field is present and correctly
typed.

* Return Result from started_webhook_server fixture and consuming tests

Change the `started_webhook_server` rstest fixture to return
`Result<StartedWebhookServer, Box<dyn std::error::Error + Send + Sync>>`
instead of panicking on errors. Replace all `.expect()` calls with `?`.

Update both consuming tests to return `Result` and consume the fixture
with `.await?`. Replace `.expect()` calls on fallible operations with `?`,
and restructure the `old_result` assertion to avoid calling `.unwrap()`.

* Extract hosted_catalog_harness fixture for worker container tests

Add `HostedCatalogHarness` struct containing `WorkerRuntime` and server
join handle. Create async rstest fixture `hosted_catalog_harness` that
spawns a test server with the complex tool catalog and constructs the
runtime.

Update `hosted_worker_proxy_definition_matches_orchestrator_canonical_definition`
to consume the fixture instead of inlining client/runtime setup.

* Expand BDD acronym in execplan constraints

Replace the first standalone "BDD" with "behaviour-driven development (BDD)"
in the harness constraint for clarity.

* Fix markdownlint issues and duration field naming in execplan

Fix double blank lines in docs/roadmap.md that caused markdownlint
MD012 errors.

Change `duration_ms` to `duration` in the execplan to match the actual
ToolOutput struct field name. Update validation gate statement to
acknowledge pre-existing markdownlint issues in roadmap.md.

Wrap long BDD harness line to comply with MD013 line length limit.

* Align execplan with transport contract field names and update validation status

Change "duration" to "duration_ms" in ToolOutput field references to align
with the transport contract wire format used in the Claude bridge.

Update validation gate statement to reflect that all gates now pass,
including markdownlint.

---------

Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: docs Documentation scope: llm LLM integration scope: orchestrator Container orchestrator scope: tool/builtin Built-in tools scope: tool/mcp MCP client scope: tool Tool infrastructure scope: worker Container worker size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant