Add regression coverage for worker reload contracts#123
Conversation
|
Note Reviews pausedUse the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughSummarise worker terminal-state transitions into a single durable persistence call with in-memory rollback on failure; add test harnesses, null/capturing DB doubles and new tests; introduce listener-based WebhookServer lifecycle; extract delegate-based agent dispatcher with phased tool execution; decompose thread-ops into focused modules. Changes
Sequence Diagram(s)sequenceDiagram
participant Worker as Worker
participant Context as ContextManager
participant DB as Database
participant Broad as Broadcaster
Worker->>Context: transition_terminal_state(job_id) — capture prior state, set in-memory terminal
alt prior state non-terminal
Worker->>DB: persist_terminal_result_and_status(TerminalJobPersistence)
alt persistence success
DB-->>Worker: Ok
Worker->>Broad: broadcast SSE "result" event (after durable write)
Worker-->>Worker: return Ok
else persistence failure
DB-->>Worker: Err
Worker->>Context: set_state_rollback(previous)
Worker-->>Worker: return persistence error
end
else prior state already terminal
Worker-->>Worker: reject duplicate terminal transition (no DB call)
end
Possibly related issues
Suggested labels
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Reviewer's GuideAdds regression-style tests to lock in worker/orchestrator HTTP contracts and worker terminal state persistence, refactors worker test setup to share dependencies, and replaces placeholder SIGHUP reload tests with executable integration-style tests over WebhookServer/HttpChannel. Sequence diagram for worker terminal state persistence characterisationsequenceDiagram
actor Test
participant Worker
participant ContextManager
participant CapturingStore
Test->>Worker: make_worker_with_capturing_store()
Worker-->>Test: Worker, CapturingStore
Test->>Worker: transition_to_in_progress()
Worker->>ContextManager: update_context(job_id, set state InProgress)
ContextManager-->>Worker: JobContext(InProgress)
Worker-->>Test: ok
Test->>Worker: mark_failed("budget exceeded")
Worker->>ContextManager: update_context(job_id, transition_to Failed)
ContextManager-->>Worker: JobContext(Failed)
Worker->>CapturingStore: update_job_status(job_id, Failed, "budget exceeded")
CapturingStore-->>Worker: ok
Worker->>CapturingStore: save_job_event(job_id, result, {status: failed})
CapturingStore-->>Worker: ok
Worker-->>Test: ok
Test->>ContextManager: get_context(job_id)
ContextManager-->>Test: JobContext(Failed)
Test->>CapturingStore: captured_calls()
CapturingStore-->>Test: [UpdateJobStatus, SaveJobEvent]
Test->>Test: assert exactly one status update and one result event
Class diagram for worker terminal state persistence test harnessclassDiagram
class Worker {
+Uuid job_id
+WorkerDeps deps
+mark_completed() Result
+mark_failed(reason)
+mark_stuck(reason)
+context_manager() ContextManager
}
class WorkerDeps {
+ContextManager context_manager
+StubLlm llm
+SafetyLayer safety
+ToolRegistry tools
+Database store
+HookRegistry hooks
+Duration timeout
+bool use_planning
+Sender sse_tx
+ApprovalContext approval_context
+HttpInterceptor http_interceptor
}
class ContextManager {
+create_job(user, description) Result~Uuid~
+get_context(job_id) Result~JobContext~
+update_context(job_id, updater) Result~JobContext~
}
class JobContext {
+JobState state
+transition_to(state, reason) Result
}
class JobState {
<<enum>>
Pending
InProgress
Completed
Failed
Stuck
}
class CapturedCall {
<<enum>>
UpdateJobStatus
SaveJobEvent
}
class CapturingStore {
+Mutex~Vec~ calls
+new() CapturingStore
+captured_calls() Vec~CapturedCall~
+doc_not_found(doc_type) WorkspaceError
}
class Database {
<<interface>>
}
class NativeDatabase {
<<interface>>
+run_migrations() Result
}
class NativeJobStore {
<<interface>>
+save_job(ctx) Result
+get_job(id) Result~JobContext~
+update_job_status(id, status, failure_reason) Result
+mark_job_stuck(id) Result
+get_stuck_jobs() Result~Vec~
+save_action(job_id, action) Result
+get_job_actions(job_id) Result~Vec~
+record_llm_call(record) Result~Uuid~
+save_estimation_snapshot(params) Result~Uuid~
+update_estimation_actuals(params) Result
}
class NativeSandboxStore {
<<interface>>
+save_sandbox_job(job) Result
+get_sandbox_job(id) Result
+list_sandbox_jobs() Result
+update_sandbox_job_status(params) Result
+cleanup_stale_sandbox_jobs() Result
+sandbox_job_summary() Result
+list_sandbox_jobs_for_user(user_id) Result
+sandbox_job_summary_for_user(user_id) Result
+sandbox_job_belongs_to_user(job_id, user_id) Result
+update_sandbox_job_mode(id, mode) Result
+get_sandbox_job_mode(id) Result
+save_job_event(job_id, event_type, data) Result
+list_job_events(job_id, before_id, limit) Result
}
class NativeConversationStore {
<<interface>>
}
class NativeRoutineStore {
<<interface>>
}
class NativeToolFailureStore {
<<interface>>
}
class NativeSettingsStore {
<<interface>>
}
class NativeWorkspaceStore {
<<interface>>
}
class ToolRegistry {
+new() ToolRegistry
+register(tool)
}
class Tool {
<<interface>>
}
class StubLlm {
}
class SafetyLayer {
+new(config) SafetyLayer
}
class SafetyConfig {
+u64 max_output_length
+bool injection_check_enabled
}
class HookRegistry {
+new() HookRegistry
}
class ApprovalContext {
}
class HttpInterceptor {
}
class Uuid {
}
class Duration {
}
Worker --> WorkerDeps
WorkerDeps --> ContextManager
WorkerDeps --> StubLlm
WorkerDeps --> SafetyLayer
WorkerDeps --> ToolRegistry
WorkerDeps --> Database
WorkerDeps --> HookRegistry
WorkerDeps --> Duration
WorkerDeps --> ApprovalContext
WorkerDeps --> HttpInterceptor
ContextManager --> JobContext
JobContext --> JobState
CapturingStore ..|> Database
CapturingStore ..|> NativeDatabase
CapturingStore ..|> NativeJobStore
CapturingStore ..|> NativeSandboxStore
CapturingStore ..|> NativeConversationStore
CapturingStore ..|> NativeRoutineStore
CapturingStore ..|> NativeToolFailureStore
CapturingStore ..|> NativeSettingsStore
CapturingStore ..|> NativeWorkspaceStore
CapturingStore --> CapturedCall
CapturedCall --> JobState
ToolRegistry --> Tool
SafetyLayer --> SafetyConfig
HookRegistry --> WorkerDeps
class Mutex {
}
class Vec {
}
CapturingStore --> Mutex
Mutex --> Vec
class WorkspaceError {
}
CapturingStore --> WorkspaceError
NativeJobStore --> Uuid
NativeSandboxStore --> Uuid
NativeDatabase --> WorkspaceError
NativeWorkspaceStore --> WorkspaceError
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
@coderabbitai please create a new GitHub issue for the following: Comment on file };
use crate::safety::SafetyLayer;
use crate::tools::{NativeTool, Tool, ToolError as ToolExecError, ToolOutput};
use tokio::sync::Mutex;❌ Getting worse: Low Cohesion |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Comment on file };
use crate::safety::SafetyLayer;
use crate::tools::{NativeTool, Tool, ToolError as ToolExecError, ToolOutput};
use tokio::sync::Mutex;❌ Getting worse: Code Duplication |
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. tests/worker_orchestrator_contract.rs Comment on file async fn complete(
&self,
_req: CompletionRequest,
) -> Result<CompletionResponse, ironclaw::error::LlmError> {
Ok(CompletionResponse {
content: String::new(),
input_tokens: 0,
output_tokens: 0,
finish_reason: FinishReason::Stop,
cache_read_input_tokens: 0,
cache_creation_input_tokens: 0,
})
}❌ New issue: Code Duplication |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 396
It documents the CodeScene finding, explains the impact, and outlines suggested next steps for decomposing the module into more cohesive sub-modules. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph. If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced. Comment on file };
use crate::safety::SafetyLayer;
use crate::tools::{NativeTool, Tool, ToolError as ToolExecError, ToolOutput};
use tokio::sync::Mutex;❌ Getting worse: Code Duplication |
This comment was marked as resolved.
This comment was marked as resolved.
|
@coderabbitai please create a new GitHub issue for the following: Comment on file };
use crate::safety::SafetyLayer;
use crate::tools::{NativeTool, Tool, ToolError as ToolExecError, ToolOutput};
use tokio::sync::Mutex;❌ Getting worse: Low Cohesion |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 44
It documents the updated CodeScene finding (responsibilities now at 75, up from 74 in |
Split call_llm into dedicated guardrail, retry, and cost-recording helpers so the main delegate hook stays below the target size and each phase is easier to follow.\n\nThis preserves the existing call flow and retry semantics while narrowing call_llm to the orchestration steps.
Factor the shared terminal transition, atomic persistence, and rollback-on-error flow into a private Worker helper and delegate the three terminal state methods to it.\n\nThis preserves the existing payload strings and rollback behaviour while removing duplicated terminal update logic.
Replace the semicolon before “and” in the listener-based lifecycle bullet list with a comma, as requested.\n\nNo other documentation content changed.
|
@coderabbitai Have the following now been resolved? Please address the following concerns and ensure all commit gates succeed:
|
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 2708
Evidence from the current branch:
This collapses the shared snapshot → transition → atomic persist → rollback pattern into a single, well-named function and removes the drift risk across the three mark_* methods. No further changes needed for this item. |
Address the remaining verified review findings across dispatcher,\nsession, persistence, context, null-db, test support, and docs.\n\nThis batch adds compaction property coverage, restores safety and\nerror propagation in tool and session paths, consolidates duplicated\nrollback tests, documents atomic terminal persistence, and checks in\nthe new e2e/proptest artefacts needed to keep the revised tests\nstable.
Align the public ToolCallIndexError message with the requested\ncontract and add regression tests for out-of-bounds indexed tool\nresult and error writes.\n\nThe indexed Turn helpers and production call-site handling were\nalready in place, so this commit closes the remaining API and test\ngaps without changing runtime behaviour.
Update the module-level test documentation to use the requested\nJSON-aware wording while keeping the file layout unchanged.
Align the Postgres rollback regression with the requested two-case\nrstest shape and explicit invalid-job scenarios while preserving the\nexisting rollback assertions and cleanup flow.
Change the null database UUID/cache helpers to return DatabaseError\ninstead of silently recovering from poisoned locks, and thread the\nnew Result through the null-db conversation, job, and workspace\nhelpers.\n\nThe requested null-db library test remains blocked by unrelated\npre-existing compile failures in src/testing/mod.rs.
|
@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph. If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced. Code Duplicationsrc/agent/session.rs: What lead to degradation?The module contains 4 functions with similar structure: Thread.new,Thread.with_id,Turn.record_tool_error_at,Turn.record_tool_result_at Why does this problem occur?Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health. How to fix it?A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More |
Remove stale test-only imports and gate harness tests behind the same\nlibsql plus test-helpers feature combination that provides the\nTestHarnessBuilder::build path.\n\nThis resolves the unused-import lints and the lib-test type errors in\nsrc/testing while keeping the existing test logic unchanged.
|
Tip For best results, initiate chat on the files or code changes.
File: src/agent/session.rs Proposed fix
Rust (shape-only) impl Thread {
// Private: single place for initialising a Thread
fn init(session_id: Uuid, thread_id: Uuid) -> Self {
Thread {
id: thread_id,
session_id,
// ... all the other default fields exactly as used today ...
// e.g., turns: Vec::new(), state: ThreadState::Idle, updated_at: Utc::now(), etc.
}
}
pub fn new(session_id: Uuid) -> Self {
let thread_id = Uuid::new_v4();
Self::init(session_id, thread_id)
}
pub fn with_id(thread_id: Uuid, session_id: Uuid) -> Self {
Self::init(session_id, thread_id)
}
}
Rust (shape-only) impl Turn {
// Private: updates a single tool_call outcome at `idx`
fn set_tool_outcome_at(
&mut self,
idx: usize,
result: Option<serde_json::Value>,
error: Option<String>,
) {
if let Some(tc) = self.tool_calls.get_mut(idx) {
tc.result = result;
tc.error = error;
} else {
// Preserve today’s behaviour: if you currently panic/expect on OOB,
// replicate it here instead of silently ignoring.
// e.g., panic!("tool call index {} out of bounds", idx);
}
}
pub fn record_tool_result_at(&mut self, idx: usize, result: serde_json::Value) {
self.set_tool_outcome_at(idx, Some(result), None);
}
pub fn record_tool_error_at(&mut self, idx: usize, error: impl Into<String>) {
self.set_tool_outcome_at(idx, None, Some(error.into()));
}
}Why this resolves the finding
AI coding‑agent prompt If further refinement is deleterious |
Remove the explicit lifetime from get_or_create_in_cache to satisfy\nclippy::needless_lifetimes without changing the helper's behaviour\nor call sites.
Apply rustfmt to the rollback test after the earlier rstest refactor so\nCI's formatting check matches the branch.
Remove the duplicated Thread construction and indexed tool-outcome\nmutation logic in src/agent/session.rs while preserving the current\nResult-based public API and behaviour.\n\nThe refactor is covered by the existing session test suite, which now\npasses unchanged.
There was a problem hiding this comment.
Code Health Improved
(3 files improve in Code Health)
Gates Failed
Enforce critical code health rules
(1 file with Low Cohesion)
Gates Passed
4 Quality Gates Passed
See analysis details in CodeScene
Reason for failure
| Enforce critical code health rules | Violations | Code Health Impact | |
|---|---|---|---|
| mod.rs | 1 critical rule | 5.06 → 6.37 | Suppress |
View Improvements
| File | Code Health Impact | Categories Improved |
|---|---|---|
| job.rs | 5.57 → 5.91 | Low Cohesion, Code Duplication |
| thread_ops.rs | 5.72 → 9.69 | Low Cohesion, Code Duplication, Complex Method, Overall Code Complexity, Excess Number of Function Arguments |
| mod.rs | 5.06 → 6.37 | Complex Method, Bumpy Road Ahead, Deep, Nested Complexity, Excess Number of Function Arguments |
Absence of Expected Change Pattern
- axinite/src/db/mod.rs is usually changed with: axinite/src/db/postgres.rs
Quality Gate Profile: Pay Down Tech Debt
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.
Replace the SIGHUP scaffolding with executable integration tests and add worker-orchestrator contract checks plus terminal state persistence characterisation tests so these boundaries stay aligned.
Summary by Sourcery
Add regression tests to pin down worker–orchestrator HTTP contracts and terminal job-state persistence, and replace the previous SIGHUP scaffolding test with concrete webhook reload integration tests.
Enhancements:
Tests: