Complete 1.1.4 hosted-mode tests: fidelity, routing, parity (refactor)#57
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
Summary by CodeRabbitRelease Notes
WalkthroughSummarise hosted-mode test artefacts, add shared test builders, reorganise and expand unit/integration tests for schema fidelity, execution routing and transport parity, mark roadmap/RFC item 1.1.4 complete, and refactor webhook server test helpers and related test fixtures. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as WorkerRuntime
participant Worker as WorkerHttpClient
participant Catalogue as RemoteCatalogServer
participant Proxy as WorkerRemoteToolProxy
participant Orchestrator as OrchestratorExecuteEndpoint
Client->>Catalogue: GET /worker/{job_id}/tools/catalog
Catalogue-->>Client: 200 RemoteToolCatalogResponse (ToolDefinition)
Client->>Worker: register_remote_tools()
Worker-->>Proxy: instantiate proxy exposing ToolDefinition
Client->>Proxy: execute(tool, params)
Proxy->>Orchestrator: POST /worker/{job_id}/tools/execute
Orchestrator-->>Proxy: 200 RemoteToolExecutionResponse (ToolOutput)
Proxy-->>Client: return ToolOutput (preserve all fields)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Reviewer's GuideAdds the canonical ExecPlan for roadmap item 1.1.4 and a comprehensive hosted MCP test matrix that verifies schema fidelity, execution routing, and worker-orchestrator contract parity across orchestrator, worker, and proxy layers, plus minor webhook restart test hardening and documentation updates marking 1.1.4 as complete. Sequence diagram for worker_remote_tool_proxy execution routing through orchestrator endpointsequenceDiagram
participant Test as RoutePathTest
participant Proxy as WorkerRemoteToolProxy
participant Client as WorkerHttpClient
participant Orchestrator as OrchestratorRouter
participant Handler as execute_tool_with_route_capture
participant State as RouteCapturingState
Test->>Proxy: execute(params, JobContext)
Proxy->>Client: send_execution_request(job_id, tool_name, params)
Client->>Orchestrator: POST /worker/{job_id}/tools/execute
Orchestrator->>Handler: route REMOTE_TOOL_EXECUTE_ROUTE
Handler->>State: lock received_requests
State-->>Handler: push(route_path, job_id, tool_name)
Handler-->>Orchestrator: RemoteToolExecutionResponse
Orchestrator-->>Client: HTTP 200 + JSON body
Client-->>Proxy: ToolOutput
Proxy-->>Test: ToolOutput
Test-->>Test: assert route_path == "/worker/{job_id}/tools/execute"
Test-->>Test: assert received_job_id == job_id
Test-->>Test: assert tool_name == "route_test_tool"
Updated class diagram for remote tool transport and proxy types under testclassDiagram
class ToolDefinition {
+String name
+String description
+serde_json::Value parameters
}
class RemoteToolCatalogResponse {
+Vec~ToolDefinition~ tools
+Vec~String~ toolset_instructions
+u64 catalog_version
}
class RemoteToolExecutionRequest {
+String tool_name
+serde_json::Value params
}
class ToolOutput {
+serde_json::Value result
+Option~rust_decimal::Decimal~ cost
+Option~String~ raw
+std::time::Duration duration
+success(result, duration) ToolOutput
+with_cost(cost) ToolOutput
+with_raw(raw) ToolOutput
}
class RemoteToolExecutionResponse {
+ToolOutput output
}
class WorkerRemoteToolProxy {
-ToolDefinition definition
-Arc~WorkerHttpClient~ client
+new(definition, client) WorkerRemoteToolProxy
+execute(params, job_context) ToolOutput
+name() &str
+description() &str
+parameters_schema() serde_json::Value
}
class WorkerHttpClient {
+String base_url
+uuid::Uuid job_id
+String token
+new(base_url, job_id, token) WorkerHttpClient
}
class RouteCapturingState {
+Arc~tokio::sync::Mutex~\<Vec~(String, uuid::Uuid, String)~\> received_requests
}
ToolDefinition <|-- RemoteToolCatalogResponse : aggregates
ToolOutput <|-- RemoteToolExecutionResponse : contains
WorkerRemoteToolProxy --> ToolDefinition : wraps
WorkerRemoteToolProxy --> WorkerHttpClient : uses
RemoteToolExecutionRequest --> ToolDefinition : references name
RouteCapturingState --> RemoteToolExecutionRequest : records
RemoteToolExecutionResponse --> ToolOutput : returns
File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Comment on file fn remote_tool_transport_types_round_trip_without_field_loss() {
let catalog_response = RemoteToolCatalogResponse {
tools: vec![ToolDefinition {
name: "test_tool".to_string(),
description: "A **complex** test tool with UTF-8: \u{1F680}\u{1F4A1}.".to_string(),
parameters: serde_json::json!({
"type": "object",
"title": "TestParams",
"properties": {
"query": {
"type": "string",
"minLength": 1,
"maxLength": 100
},
"options": {
"type": "object",
"properties": {
"limit": {"type": "integer", "minimum": 1, "maximum": 50}
},
"required": ["limit"]
}
},
"required": ["query", "options"]
}),
}],
toolset_instructions: vec![
"Prefer remote tools for external systems.".to_string(),
"Use local tools for filesystem operations.".to_string(),
],
catalog_version: 42,
};
let serialized =
serde_json::to_string(&catalog_response).expect("serialize RemoteToolCatalogResponse");
let deserialized: RemoteToolCatalogResponse =
serde_json::from_str(&serialized).expect("deserialize RemoteToolCatalogResponse");
assert_eq!(deserialized.tools.len(), catalog_response.tools.len());
assert_eq!(deserialized.tools[0].name, catalog_response.tools[0].name);
assert_eq!(
deserialized.tools[0].description,
catalog_response.tools[0].description
);
assert_eq!(
deserialized.tools[0].parameters,
catalog_response.tools[0].parameters
);
assert_eq!(
deserialized.toolset_instructions,
catalog_response.toolset_instructions
);
assert_eq!(
deserialized.catalog_version,
catalog_response.catalog_version
);
let execution_request = RemoteToolExecutionRequest {
tool_name: "complex_tool".to_string(),
params: serde_json::json!({
"query": "test query",
"options": {"limit": 25}
}),
};
let serialized = serde_json::to_string(&execution_request)
.expect("serialize RemoteToolExecutionRequest");
let deserialized: RemoteToolExecutionRequest =
serde_json::from_str(&serialized).expect("deserialize RemoteToolExecutionRequest");
assert_eq!(deserialized.tool_name, execution_request.tool_name);
assert_eq!(deserialized.params, execution_request.params);
let execution_response = RemoteToolExecutionResponse {
output: ToolOutput::success(
serde_json::json!({"result": "success", "data": [1, 2, 3]}),
std::time::Duration::from_millis(42),
)
.with_cost(rust_decimal::Decimal::new(150, 2))
.with_raw("raw execution output"),
};
let serialized = serde_json::to_string(&execution_response)
.expect("serialize RemoteToolExecutionResponse");
let deserialized: RemoteToolExecutionResponse =
serde_json::from_str(&serialized).expect("deserialize RemoteToolExecutionResponse");
assert_eq!(deserialized.output.result, execution_response.output.result);
assert_eq!(deserialized.output.cost, execution_response.output.cost);
assert_eq!(deserialized.output.raw, execution_response.output.raw);
assert_eq!(
deserialized.output.duration,
execution_response.output.duration
);
}❌ New issue: Large Method |
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7216ef007e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md`:
- Line 8: The document currently states "Status: COMPLETE" while Milestone 7 and
CI gates haven't been validated; either update the plan to mark "milestone 7" as
complete and add records of passing lint and test gates (include the specific
gate artifacts) before keeping "Status: COMPLETE", or change "Status: COMPLETE"
to "IN-PROGRESS" (or similar) until the lint and test gates are confirmed;
search for the literal "Status: COMPLETE" and "milestone 7" in this doc (and the
other affected locations) and make the consistent change and add a brief note
recording the lint/test pass artifacts when marking complete.
- Line 648: Replace the British spelling "synchronise" in the heading
"synchronise design and operator documentation" with the Oxford -ize form
"synchronize" (and similarly change any "Synchronise" to "Synchronize") to
conform to en-GB-oxendict; also scan this document for the other occurrence
mentioned (the similar heading/text later in the file) and update it the same
way so all instances use "-ize".
- Around line 913-916: The two fenced code blocks that currently show cargo
commands (the blocks containing "cargo fmt --all -- --check" and "cargo fmt
--manifest-path tools-src/github/Cargo.toml --all -- --check") need language
identifiers; update both fenced code blocks to use "```bash" as the opening
fence (also apply the same change to the other occurrence noted around the
second block) so markdown lint and repo style checks pass.
In `@src/orchestrator/api/tests/remote_tools.rs`:
- Around line 369-503: The file is over the size limit; move the large
catalogue/execution fidelity tests into new submodules to keep this file under
400 lines: create e.g. modules catalogue_fidelity and transport_parity and
relocate the tests remote_tool_catalog_preserves_full_tool_definition_payload,
remote_tool_catalog_version_is_deterministic_and_sensitive_to_content, and
orchestrator_responses_deserialize_into_worker_shared_types into the appropriate
new files, exporting any helpers they need (e.g., hosted_remote_tool_catalog,
complex_tool_stub, build_tool_fixture, ToolFixture) or importing the parent
crate items; update this file to mod catalogue_fidelity; mod transport_parity;
(or use pub mod) and ensure test attributes (#[tokio::test], #[rstest]) remain
on the moved functions so cargo test picks them up.
- Around line 444-503: Split the single test
orchestrator_responses_deserialize_into_worker_shared_types into three focused
tokio tests: one that only round-trips RemoteToolCatalogResponse (use
hosted_remote_tool_catalog, ToolRegistry/complex_tool_stub, assert tools
length/equality, toolset_instructions and catalog_version), one that only
round-trips RemoteToolExecutionRequest (construct execution_request,
serialize/deserialize, assert tool_name and params), and one that only
round-trips RemoteToolExecutionResponse (build execution_output via
tools::ToolOutput::success(...).with_cost(...).with_raw(...), wrap in
RemoteToolExecutionResponse, serialize/deserialize, assert output.result,
output.cost, output.raw, output.duration). Move any shared setup (like
registry/catalog helper calls) into small helpers used by the relevant test to
keep each test single-responsibility.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 55d624db-88e1-4b30-91f7-8cf8132805a9
⛔ Files ignored due to path filters (1)
.agents/mcp/context_pack/packs/.repo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
docs/contents.mddocs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.mddocs/rfcs/0001-expose-mcp-tool-definitions.mddocs/roadmap.mdsrc/channels/webhook_server.rssrc/orchestrator/api/tests/fixtures/remote_tool_mocks.rssrc/orchestrator/api/tests/remote_tools.rssrc/tools/builtin/worker_remote_tool_proxy.rssrc/worker/api/types.rssrc/worker/container/tests.rs
Update the ExecPlan status from IN-PROGRESS to COMPLETE to match the Outcomes & Retrospective section which states the status was updated to COMPLETE. Fix test_status_update_new_serializes_worker_state to parse JSON into serde_json::Value and assert on the field value instead of using string substring matching. This ensures the JSON field is present and correctly typed.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Change the `started_webhook_server` rstest fixture to return `Result<StartedWebhookServer, Box<dyn std::error::Error + Send + Sync>>` instead of panicking on errors. Replace all `.expect()` calls with `?`. Update both consuming tests to return `Result` and consume the fixture with `.await?`. Replace `.expect()` calls on fallible operations with `?`, and restructure the `old_result` assertion to avoid calling `.unwrap()`.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Add `HostedCatalogHarness` struct containing `WorkerRuntime` and server join handle. Create async rstest fixture `hosted_catalog_harness` that spawns a test server with the complex tool catalog and constructs the runtime. Update `hosted_worker_proxy_definition_matches_orchestrator_canonical_definition` to consume the fixture instead of inlining client/runtime setup.
|
@coderabbitai review |
1 similar comment
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/channels/webhook_server.rs (1)
64-71:⚠️ Potential issue | 🟠 MajorCapture the actual bound address when binding to port 0.
When
self.config.addrspecifies port 0, the OS assigns an ephemeral port. The log now misleadingly prints port 0 instead of the real bound port. Worse,current_addr()will return the stale value, breaking the SIGHUP restart logic insrc/main.rs(lines 792–812) which comparescurrent_addr()against the new configured address.Update
self.config.addrfromlistener.local_addr()after binding succeeds.Proposed fix
let listener = tokio::net::TcpListener::bind(self.config.addr) .await .map_err(|e| ChannelError::StartupFailed { name: "webhook_server".to_string(), reason: format!("Failed to bind to {}: {}", self.config.addr, e), })?; - tracing::info!("Webhook server listening on {}", self.config.addr); + let actual_addr = listener.local_addr().map_err(|e| ChannelError::StartupFailed { + name: "webhook_server".to_string(), + reason: format!("Failed to get local address: {e}"), + })?; + self.config.addr = actual_addr; + tracing::info!("Webhook server listening on {}", actual_addr);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/channels/webhook_server.rs` around lines 64 - 71, After successfully binding the TcpListener (tokio::net::TcpListener::bind) capture the real bound socket via listener.local_addr() and assign it back to self.config.addr before emitting the tracing::info log and returning, so logs and downstream logic (including current_addr() used by the SIGHUP restart path) reflect the actual ephemeral port; update the tracing::info call to use the updated self.config.addr and ensure this assignment happens immediately after the bind/map_err block succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md`:
- Around line 217-220: Replace the first standalone "BDD" in the phrase "BDD
harness: if adding `rstest-bdd` behavioural coverage..." with "behaviour-driven
development (BDD)" so the sentence reads "behaviour-driven development (BDD)
harness: if adding `rstest-bdd` behavioural coverage..." and leave subsequent
occurrences of "BDD" unchanged; update only that first instance in the sentence
that begins with "BDD harness:".
---
Outside diff comments:
In `@src/channels/webhook_server.rs`:
- Around line 64-71: After successfully binding the TcpListener
(tokio::net::TcpListener::bind) capture the real bound socket via
listener.local_addr() and assign it back to self.config.addr before emitting the
tracing::info log and returning, so logs and downstream logic (including
current_addr() used by the SIGHUP restart path) reflect the actual ephemeral
port; update the tracing::info call to use the updated self.config.addr and
ensure this assignment happens immediately after the bind/map_err block
succeeds.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 24b39cde-4031-46db-b404-6454dee0593d
📒 Files selected for processing (7)
docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.mdsrc/channels/webhook_server.rssrc/worker/api/tests/finish_reason.rssrc/worker/api/tests/fixtures.rssrc/worker/api/tests/remote_tool_catalog.rssrc/worker/api/tests/remote_tool_execute.rssrc/worker/container/tests/hosted_fidelity.rs
Replace the first standalone "BDD" with "behaviour-driven development (BDD)" in the harness constraint for clarity.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md`:
- Around line 907-908: The document currently claims "All validation gates …
passed successfully" but `markdownlint` reported issues in docs/roadmap.md;
either remove/fix the lint failures in docs/roadmap.md (resolve markdownlint
issues and re-run checks so the pass claim is accurate) or change the
summary/status sentence to reflect a blocked or passing-with-known-failures
state and update the Status/milestone wording accordingly (e.g., note which
gates failed: markdownlint on docs/roadmap.md), ensuring the statement and the
lint evidence are consistent.
- Around line 896-897: Summary: The docs use the non-canonical ToolOutput field
name "duration" instead of the transport-contract-compliant "duration_ms";
update the wording to match the contract and tests. Fix: in the "Full
`ToolOutput` field preservation" bullet and any related mentions (search for the
symbol ToolOutput and the literal "duration"), replace "duration" with
"duration_ms" so the documentation aligns with the field name used in the
transport contract and tests and keeps the coverage statement consistent with
the earlier reference to the contract.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 78560c9c-684d-498a-b9d9-ef445fc064d4
📒 Files selected for processing (1)
docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md
Fix double blank lines in docs/roadmap.md that caused markdownlint MD012 errors. Change `duration_ms` to `duration` in the execplan to match the actual ToolOutput struct field name. Update validation gate statement to acknowledge pre-existing markdownlint issues in roadmap.md. Wrap long BDD harness line to comply with MD013 line length limit.
…ion status Change "duration" to "duration_ms" in ToolOutput field references to align with the transport contract wire format used in the Claude bridge. Update validation gate statement to reflect that all gates now pass, including markdownlint.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
#57) * docs(execplans): add execplan for schema fidelity and execution routing tests Add a detailed execution plan (ExecPlan) that outlines the comprehensive testing strategy for roadmap item 1.1.4. The plan addresses schema fidelity, execution routing, and contract parity between worker and orchestrator, defining milestones, constraints, and test gaps. The document includes approval gates, repository orientation, risk assessment, and progression criteria to guide the implementation of targeted tests ensuring no silent regressions in hosted-mode functionality. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(hosted-remote-tools): add tests for schema fidelity and execution routing Add a comprehensive test matrix validating hosted-mode tools for schema fidelity, execution routing, and contract parity between orchestrator and worker. Implement new test fixtures with complex nested JSON schemas and special characters to verify full payload preservation through the tool catalog and execution proxy. New tests include: - Full ToolDefinition payload fidelity through the catalog endpoint. - Proxy preserves all ToolDefinition and ToolOutput fields exactly. - Execution routes correctly through the orchestrator endpoint. - Catalog version determinism and sensitivity to tool changes. - Round-trip serialization of shared worker/orchestrator transport types. - Validation of route constants shared between worker and orchestrator. - End-to-end validation that proxy definitions match orchestrator canonicals. Also mark roadmap item 1.1.4 complete, update RFC 0001 implementation status and documentation accordingly. This strengthens the worker-orchestrator contract guarantees and prevents silent field loss or routing errors in hosted remote tools. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker_remote_tool_proxy): fix format string reference in execution routing test Changed format string argument to a reference to match expected type and improve clarity in test assertion for routing execution through orchestrator endpoint. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(api): add round-trip serialization tests for remote tool types Added serialization and deserialization round-trip tests for RemoteToolCatalogResponse, RemoteToolExecutionRequest, and RemoteToolExecutionResponse to ensure no field loss during JSON processing. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker/api): add serialization fidelity and round-trip tests for remote tool API types - Added comprehensive serialization and deserialization round-trip tests for RemoteToolCatalogResponse, RemoteToolExecutionRequest, and RemoteToolExecutionResponse types. - Added remote tool route constant tests ensuring route strings match expected orchestrator endpoints and correct parameter expansion. - Moved and expanded existing tests from worker/api/types.rs into dedicated test module in worker/api/tests.rs. - Added new orchestrator API tests to validate catalog version independence of tool registration order and execution request/response serialization fidelity. - Derived PartialEq on RemoteToolCatalogResponse to support equality assertions in tests. - Updated documentation to synchronize terminology and reflect test pass status. These tests improve confidence in API data structure integrity and interoperability between orchestrator and worker components. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker/api): refactor and modularize worker API tests - Removed monolithic src/worker/api/tests.rs test file and split its contents into multiple focused test files under src/worker/api/tests/. - Added new test modules: finish_reason.rs, fixtures.rs, mod.rs, remote_tool_catalog.rs, remote_tool_execute.rs, transport_types.rs, url_construction.rs. - Introduced shared test fixtures to support remote tool failure simulation and sample data. - Improved test organization by grouping tests by feature and responsibilities. - Preserved existing test coverage for HTTP client behavior, API type conversions, route constants, and error handling. This refactor improves test maintainability and readability by decomposing a large test file into specialized modules. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(orchestrator/api): add remote tool fidelity tests for schema and serialization - Added new test module remote_tool_fidelity.rs covering: - Schema fidelity of remote tool catalog - Catalog versioning determinism and sensitivity to content - Catalog version independence from registration order - Round-trip serialization and deserialization of shared orchestrator-worker types - Removed same tests from remote_tools.rs to centralize and clarify test coverage - Minor formatting and import cleanup in worker api tests Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker_remote_tool_proxy): add ProxyTestServer fixture to simplify tests Introduce `ProxyTestServer` fixture bundling an in-process execute-route server and a pre-wired HTTP client. This consolidates repeated server setup code in tests, improves readability, and enables using rstest features for cleaner asynchronous test setups. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * refactor(tests): centralize complex tool definition in shared test_support module Extract complex tool definition JSON schema and builders into new src/test_support.rs module to reduce duplication across orchestrator and worker test suites. Update tests and fixtures to use the shared builder, improving consistency and maintainability of test data for tool definition fidelity testing. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker_remote_tool_proxy): move tests to a dedicated tests module Moved all tests from the main worker_remote_tool_proxy module into a new separate tests.rs file. This cleans up the main module and improves maintainability by isolating test code from production code. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(remote_tools): add transport parity and hosted fidelity tests - Add new test module `transport_parity` to verify serialisation round-trips for shared orchestrator-worker transport types. - Add end-to-end fidelity tests in `hosted_fidelity` to ensure ToolDefinition matches exactly between orchestrator catalog and worker proxy. - Rename `remote_tool_fidelity.rs` to `catalogue_fidelity.rs` and remove redundant tests now covered by transport parity. - Improve test fixtures and refactor test declarations for better async error handling. - Add assertions verifying route constant string contents to ensure route parity by construction. These changes improve the robustness of remote tool integration testing and ensure data fidelity across the orchestrator and worker boundaries. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(channels/webhook_server): enhance test fixtures and coverage for restart_with_addr - Introduced StartedWebhookServer fixture for cleaner test setup - Simplified obtaining available ports with TcpListener for tests - Added rstest parameterization and improved async test structure - Verified server restart behavior including rollback on bind failure - Ensured health endpoint responds correctly after restarts - Improved documentation and test reliability in webhook_server tests Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker): add failure-mode mock server for remote-tool API tests - Introduce RemoteToolFailureServer and RemoteToolFailureRoute enums - Provide a fixture to spawn mock servers rejecting specific routes with HTTP errors - Enhance test infrastructure in remote_tools.rs and fixtures.rs - Enable targeted testing of error-handling paths in remote-tool client code Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * Implement Drop for RemoteToolFailureServer to abort background task Add `impl Drop for RemoteToolFailureServer` that calls `self.handle.abort()` to ensure the spawned Tokio task is cancelled when the test fixture is dropped. Update test files to clone `base_url` instead of moving fields out of the struct, which is no longer permitted with the Drop implementation. Remove manual `abort()` and `await` calls in tests since Drop now handles cleanup. * Update execplan status and fix JSON parsing in test Update the ExecPlan status from IN-PROGRESS to COMPLETE to match the Outcomes & Retrospective section which states the status was updated to COMPLETE. Fix test_status_update_new_serializes_worker_state to parse JSON into serde_json::Value and assert on the field value instead of using string substring matching. This ensures the JSON field is present and correctly typed. * Return Result from started_webhook_server fixture and consuming tests Change the `started_webhook_server` rstest fixture to return `Result<StartedWebhookServer, Box<dyn std::error::Error + Send + Sync>>` instead of panicking on errors. Replace all `.expect()` calls with `?`. Update both consuming tests to return `Result` and consume the fixture with `.await?`. Replace `.expect()` calls on fallible operations with `?`, and restructure the `old_result` assertion to avoid calling `.unwrap()`. * Extract hosted_catalog_harness fixture for worker container tests Add `HostedCatalogHarness` struct containing `WorkerRuntime` and server join handle. Create async rstest fixture `hosted_catalog_harness` that spawns a test server with the complex tool catalog and constructs the runtime. Update `hosted_worker_proxy_definition_matches_orchestrator_canonical_definition` to consume the fixture instead of inlining client/runtime setup. * Expand BDD acronym in execplan constraints Replace the first standalone "BDD" with "behaviour-driven development (BDD)" in the harness constraint for clarity. * Fix markdownlint issues and duration field naming in execplan Fix double blank lines in docs/roadmap.md that caused markdownlint MD012 errors. Change `duration_ms` to `duration` in the execplan to match the actual ToolOutput struct field name. Update validation gate statement to acknowledge pre-existing markdownlint issues in roadmap.md. Wrap long BDD harness line to comply with MD013 line length limit. * Align execplan with transport contract field names and update validation status Change "duration" to "duration_ms" in ToolOutput field references to align with the transport contract wire format used in the Claude bridge. Update validation gate statement to reflect that all gates now pass, including markdownlint. --------- Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
#57) * docs(execplans): add execplan for schema fidelity and execution routing tests Add a detailed execution plan (ExecPlan) that outlines the comprehensive testing strategy for roadmap item 1.1.4. The plan addresses schema fidelity, execution routing, and contract parity between worker and orchestrator, defining milestones, constraints, and test gaps. The document includes approval gates, repository orientation, risk assessment, and progression criteria to guide the implementation of targeted tests ensuring no silent regressions in hosted-mode functionality. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(hosted-remote-tools): add tests for schema fidelity and execution routing Add a comprehensive test matrix validating hosted-mode tools for schema fidelity, execution routing, and contract parity between orchestrator and worker. Implement new test fixtures with complex nested JSON schemas and special characters to verify full payload preservation through the tool catalog and execution proxy. New tests include: - Full ToolDefinition payload fidelity through the catalog endpoint. - Proxy preserves all ToolDefinition and ToolOutput fields exactly. - Execution routes correctly through the orchestrator endpoint. - Catalog version determinism and sensitivity to tool changes. - Round-trip serialization of shared worker/orchestrator transport types. - Validation of route constants shared between worker and orchestrator. - End-to-end validation that proxy definitions match orchestrator canonicals. Also mark roadmap item 1.1.4 complete, update RFC 0001 implementation status and documentation accordingly. This strengthens the worker-orchestrator contract guarantees and prevents silent field loss or routing errors in hosted remote tools. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker_remote_tool_proxy): fix format string reference in execution routing test Changed format string argument to a reference to match expected type and improve clarity in test assertion for routing execution through orchestrator endpoint. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(api): add round-trip serialization tests for remote tool types Added serialization and deserialization round-trip tests for RemoteToolCatalogResponse, RemoteToolExecutionRequest, and RemoteToolExecutionResponse to ensure no field loss during JSON processing. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker/api): add serialization fidelity and round-trip tests for remote tool API types - Added comprehensive serialization and deserialization round-trip tests for RemoteToolCatalogResponse, RemoteToolExecutionRequest, and RemoteToolExecutionResponse types. - Added remote tool route constant tests ensuring route strings match expected orchestrator endpoints and correct parameter expansion. - Moved and expanded existing tests from worker/api/types.rs into dedicated test module in worker/api/tests.rs. - Added new orchestrator API tests to validate catalog version independence of tool registration order and execution request/response serialization fidelity. - Derived PartialEq on RemoteToolCatalogResponse to support equality assertions in tests. - Updated documentation to synchronize terminology and reflect test pass status. These tests improve confidence in API data structure integrity and interoperability between orchestrator and worker components. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker/api): refactor and modularize worker API tests - Removed monolithic src/worker/api/tests.rs test file and split its contents into multiple focused test files under src/worker/api/tests/. - Added new test modules: finish_reason.rs, fixtures.rs, mod.rs, remote_tool_catalog.rs, remote_tool_execute.rs, transport_types.rs, url_construction.rs. - Introduced shared test fixtures to support remote tool failure simulation and sample data. - Improved test organization by grouping tests by feature and responsibilities. - Preserved existing test coverage for HTTP client behavior, API type conversions, route constants, and error handling. This refactor improves test maintainability and readability by decomposing a large test file into specialized modules. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(orchestrator/api): add remote tool fidelity tests for schema and serialization - Added new test module remote_tool_fidelity.rs covering: - Schema fidelity of remote tool catalog - Catalog versioning determinism and sensitivity to content - Catalog version independence from registration order - Round-trip serialization and deserialization of shared orchestrator-worker types - Removed same tests from remote_tools.rs to centralize and clarify test coverage - Minor formatting and import cleanup in worker api tests Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker_remote_tool_proxy): add ProxyTestServer fixture to simplify tests Introduce `ProxyTestServer` fixture bundling an in-process execute-route server and a pre-wired HTTP client. This consolidates repeated server setup code in tests, improves readability, and enables using rstest features for cleaner asynchronous test setups. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * refactor(tests): centralize complex tool definition in shared test_support module Extract complex tool definition JSON schema and builders into new src/test_support.rs module to reduce duplication across orchestrator and worker test suites. Update tests and fixtures to use the shared builder, improving consistency and maintainability of test data for tool definition fidelity testing. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker_remote_tool_proxy): move tests to a dedicated tests module Moved all tests from the main worker_remote_tool_proxy module into a new separate tests.rs file. This cleans up the main module and improves maintainability by isolating test code from production code. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(remote_tools): add transport parity and hosted fidelity tests - Add new test module `transport_parity` to verify serialisation round-trips for shared orchestrator-worker transport types. - Add end-to-end fidelity tests in `hosted_fidelity` to ensure ToolDefinition matches exactly between orchestrator catalog and worker proxy. - Rename `remote_tool_fidelity.rs` to `catalogue_fidelity.rs` and remove redundant tests now covered by transport parity. - Improve test fixtures and refactor test declarations for better async error handling. - Add assertions verifying route constant string contents to ensure route parity by construction. These changes improve the robustness of remote tool integration testing and ensure data fidelity across the orchestrator and worker boundaries. Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(channels/webhook_server): enhance test fixtures and coverage for restart_with_addr - Introduced StartedWebhookServer fixture for cleaner test setup - Simplified obtaining available ports with TcpListener for tests - Added rstest parameterization and improved async test structure - Verified server restart behavior including rollback on bind failure - Ensured health endpoint responds correctly after restarts - Improved documentation and test reliability in webhook_server tests Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * test(worker): add failure-mode mock server for remote-tool API tests - Introduce RemoteToolFailureServer and RemoteToolFailureRoute enums - Provide a fixture to spawn mock servers rejecting specific routes with HTTP errors - Enhance test infrastructure in remote_tools.rs and fixtures.rs - Enable targeted testing of error-handling paths in remote-tool client code Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com> * Implement Drop for RemoteToolFailureServer to abort background task Add `impl Drop for RemoteToolFailureServer` that calls `self.handle.abort()` to ensure the spawned Tokio task is cancelled when the test fixture is dropped. Update test files to clone `base_url` instead of moving fields out of the struct, which is no longer permitted with the Drop implementation. Remove manual `abort()` and `await` calls in tests since Drop now handles cleanup. * Update execplan status and fix JSON parsing in test Update the ExecPlan status from IN-PROGRESS to COMPLETE to match the Outcomes & Retrospective section which states the status was updated to COMPLETE. Fix test_status_update_new_serializes_worker_state to parse JSON into serde_json::Value and assert on the field value instead of using string substring matching. This ensures the JSON field is present and correctly typed. * Return Result from started_webhook_server fixture and consuming tests Change the `started_webhook_server` rstest fixture to return `Result<StartedWebhookServer, Box<dyn std::error::Error + Send + Sync>>` instead of panicking on errors. Replace all `.expect()` calls with `?`. Update both consuming tests to return `Result` and consume the fixture with `.await?`. Replace `.expect()` calls on fallible operations with `?`, and restructure the `old_result` assertion to avoid calling `.unwrap()`. * Extract hosted_catalog_harness fixture for worker container tests Add `HostedCatalogHarness` struct containing `WorkerRuntime` and server join handle. Create async rstest fixture `hosted_catalog_harness` that spawns a test server with the complex tool catalog and constructs the runtime. Update `hosted_worker_proxy_definition_matches_orchestrator_canonical_definition` to consume the fixture instead of inlining client/runtime setup. * Expand BDD acronym in execplan constraints Replace the first standalone "BDD" with "behaviour-driven development (BDD)" in the harness constraint for clarity. * Fix markdownlint issues and duration field naming in execplan Fix double blank lines in docs/roadmap.md that caused markdownlint MD012 errors. Change `duration_ms` to `duration` in the execplan to match the actual ToolOutput struct field name. Update validation gate statement to acknowledge pre-existing markdownlint issues in roadmap.md. Wrap long BDD harness line to comply with MD013 line length limit. * Align execplan with transport contract field names and update validation status Change "duration" to "duration_ms" in ToolOutput field references to align with the transport contract wire format used in the Claude bridge. Update validation gate statement to reflect that all gates now pass, including markdownlint. --------- Co-authored-by: devboxerhub[bot] <devboxerhub[bot]@users.noreply.github.com>
Summary
src/*to improve maintainability and reuse fixtures..agents/mcp/context_pack/packs/.repo.lock.Changes
Rationale
Scope and approach
Milestones (high level)
Validation plan
Documentation impact
Next steps
◳ Generated by DevBoxer and task references preserved in the ExecPlan document.
📎 Task: https://www.devboxer.com/task/cdab742f-1a7e-43c9-9bff-b5ac8b4fe0bf