Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions docs/testing-strategy.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,84 @@ context accumulation without external model variance. The browser E2E suite
uses the same principle at the HTTP and DOM layer: real runtime, fake upstream
provider.

### 2.6 Routine and heartbeat test helpers

The `tests/support/routines.rs` module provides shared helpers for E2E tests
that exercise the routine engine and heartbeat runner. These helpers are
gated behind the `libsql` feature and use `TraceLlm` for deterministic
execution without live LLM calls.

#### `create_test_db() -> Result<(Arc<dyn Database>, TempDir), Box<dyn std::error::Error>>`

Creates a temporary libSQL database with migrations applied. Returns the
database handle and the temporary directory (to keep the database file alive
for the duration of the test). Errors are propagated with `?` rather than
panicking, allowing tests to use `.expect()` with a descriptive message.

#### `create_workspace(db: &Arc<dyn Database>) -> Arc<Workspace>`

Creates a workspace backed by the test database. The workspace is used by
routine engine and heartbeat tests for persistence.

#### `make_routine(name: &str, trigger: Trigger, prompt: &str) -> Routine`

Factory function for creating a `Routine` with sensible defaults for tests.
Sets up a lightweight action with the given prompt and trigger configuration.
All guardrails use permissive defaults (no cooldown, max 5 concurrent).

#### `make_test_incoming_message(content: &str) -> IncomingMessage`

Builds a minimal `IncomingMessage` for event-trigger tests. The message has
a unique ID, default user/channel values, and the provided content.

#### `make_minimal_engine(trace: LlmTrace, db: Arc<dyn Database>, ws: Arc<Workspace>) -> (Arc<RoutineEngine>, Receiver<OutgoingResponse>)`

Builds a minimal `RoutineEngine` from a `TraceLlm` and returns both the
engine and the notification receiver. This allows tests to receive routine
completion notifications without duplicating engine construction.

#### `SystemEventSpec<'a>`

Describes a system event to be emitted in tests. Used with
`assert_system_event_count` to verify that system event triggers fire
correctly.

#### `register_github_issue_routine(db: &Arc<dyn Database>, engine: &RoutineEngine) -> Routine`

Helper for system event tests that registers a GitHub issue-opened routine
with a filter for the `nearai/ironclaw` repository.

#### `assert_system_event_count(engine: &RoutineEngine, spec: SystemEventSpec<'_>, expected: usize, msg: &str)`

Asserts that emitting a system event fires the expected number of routines.
Used in table-driven tests for system event trigger matching and filtering.

#### E2E test writing patterns

When writing E2E tests for routines:

1. Use `create_test_db().await.expect("...")` to set up the database.
2. Use `create_workspace(&db)` to get a workspace.
3. Use `make_minimal_engine(trace, db.clone(), ws)` to get an engine.
4. For event-trigger tests, use polling loops rather than fixed sleeps:

```rust
// Poll for routine completion with timeout
let mut attempts = 0;
let max_attempts = 50;
loop {
let runs = db.list_routine_runs(routine.id, 10).await.expect("...");
if !runs.is_empty() {
break;
}
attempts += 1;
assert!(attempts < max_attempts, "Routine did not complete within timeout");
tokio::time::sleep(Duration::from_millis(10)).await;
}
```

This pattern provides deterministic synchronization without arbitrary delays.

### 2.5 Manual and ignored tests

Some tests are intentionally excluded from the default path because they need
Expand Down
12 changes: 10 additions & 2 deletions tests/e2e_traces.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,20 @@ mod advanced_traces;
mod attachments;
#[path = "e2e_traces/builtin_tool_coverage.rs"]
mod builtin_tool_coverage;
#[path = "e2e_traces/heartbeat.rs"]
mod heartbeat;
#[path = "e2e_traces/metrics.rs"]
mod metrics;
#[path = "e2e_traces/recorded_trace.rs"]
mod recorded_trace;
#[path = "e2e_traces/routine_heartbeat.rs"]
mod routine_heartbeat;
#[path = "e2e_traces/routine_cooldown.rs"]
mod routine_cooldown;
#[path = "e2e_traces/routine_cron.rs"]
mod routine_cron;
#[path = "e2e_traces/routine_event.rs"]
mod routine_event;
#[path = "e2e_traces/routine_system_event.rs"]
mod routine_system_event;
#[path = "e2e_traces/safety_layer.rs"]
mod safety_layer;
#[path = "e2e_traces/spot_checks.rs"]
Expand Down
104 changes: 104 additions & 0 deletions tests/e2e_traces/heartbeat.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
//! E2E tests: heartbeat runner.
//!
//! Tests that the HeartbeatRunner correctly processes heartbeat checklists
//! and handles findings or skips appropriately.

use std::sync::Arc;

use ironclaw::agent::{HeartbeatConfig, HeartbeatRunner};
use ironclaw::workspace::hygiene::HygieneConfig;

use crate::support::routines::{create_test_db, create_workspace};
use crate::support::trace_llm::{LlmTrace, TraceLlm, TraceResponse, TraceStep};

#[tokio::test]
async fn heartbeat_findings() {
let (db, _tmp) = create_test_db().await.expect("create_test_db");
let ws = create_workspace(&db);

// Write a real heartbeat checklist.
ws.write(
"HEARTBEAT.md",
"# Heartbeat Checklist\n\n- [ ] Check if the server is running\n- [ ] Review error logs",
)
.await
.expect("write heartbeat");

// LLM responds with findings (not HEARTBEAT_OK).
let trace = LlmTrace::single_turn(
"test-heartbeat-findings",
"heartbeat",
vec![TraceStep {
request_hint: None,
response: TraceResponse::Text {
content: "The server has elevated error rates. Review the logs immediately."
.to_string(),
input_tokens: 100,
output_tokens: 20,
},
expected_tool_results: vec![],
}],
);
let llm = Arc::new(TraceLlm::from_trace(trace));

let (tx, mut rx) = tokio::sync::mpsc::channel(16);

let hygiene_config = HygieneConfig {
enabled: false,
daily_retention_days: 30,
conversation_retention_days: 7,
cadence_hours: 24,
state_dir: _tmp.path().to_path_buf(),
};

let runner = HeartbeatRunner::new(HeartbeatConfig::default(), hygiene_config, ws, llm)
.with_response_channel(tx);

let result = runner.check_heartbeat().await;
match result {
ironclaw::agent::HeartbeatResult::NeedsAttention(msg) => {
assert!(
msg.contains("error"),
"Expected 'error' in attention message: {msg}"
);
}
other => panic!("Expected NeedsAttention, got: {other:?}"),
}

// No notification since we called check_heartbeat directly (not run).
let _ = rx.try_recv();
}

#[tokio::test]
async fn heartbeat_empty_skip() {
let (db, _tmp) = create_test_db().await.expect("create_test_db");
let ws = create_workspace(&db);

// Write an effectively empty heartbeat (just headers and comments).
ws.write(
"HEARTBEAT.md",
"# Heartbeat Checklist\n\n<!-- No tasks yet -->\n",
)
.await
.expect("write heartbeat");

// LLM should NOT be called, so provide a trace that would panic if called.
let trace = LlmTrace::single_turn("test-heartbeat-skip", "skip", vec![]);
let llm = Arc::new(TraceLlm::from_trace(trace));

let hygiene_config = HygieneConfig {
enabled: false,
daily_retention_days: 30,
conversation_retention_days: 7,
cadence_hours: 24,
state_dir: _tmp.path().to_path_buf(),
};

let runner = HeartbeatRunner::new(HeartbeatConfig::default(), hygiene_config, ws, llm);

let result = runner.check_heartbeat().await;
assert!(
matches!(result, ironclaw::agent::HeartbeatResult::Skipped),
"Expected Skipped for empty checklist, got: {result:?}"
);
}
94 changes: 94 additions & 0 deletions tests/e2e_traces/routine_cooldown.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
//! E2E tests: routine cooldown behaviour.
//!
//! Tests that routines respect their configured cooldown period and
//! prevent re-triggering within the cooldown window.

use std::time::Duration;

use chrono::Utc;

use ironclaw::agent::routine::Trigger;
use ironclaw::db::RoutineRuntimeUpdate;

use crate::support::routines::{
create_test_db, create_workspace, make_minimal_engine, make_routine, make_test_incoming_message,
};
use crate::support::trace_llm::{LlmTrace, TraceResponse, TraceStep};

#[tokio::test]
async fn routine_cooldown() {
let (db, _tmp) = create_test_db().await.expect("create_test_db");
let ws = create_workspace(&db);

// Need two LLM responses (one for the first fire).
let trace = LlmTrace::single_turn(
"test-cooldown",
"check",
vec![TraceStep {
request_hint: None,
response: TraceResponse::Text {
content: "ROUTINE_OK".to_string(),
input_tokens: 50,
output_tokens: 5,
},
expected_tool_results: vec![],
}],
);
let (engine, _notify_rx) = make_minimal_engine(trace, db.clone(), ws);

// Insert an event routine with 1-hour cooldown.
let mut routine = make_routine(
"cooldown-test",
Trigger::Event {
channel: None,
pattern: "test-cooldown".to_string(),
},
"Check status.",
);
routine.guardrails.cooldown = Duration::from_secs(3600);
db.create_routine(&routine).await.expect("create_routine");
engine.refresh_event_cache().await;

// First fire should work.
let msg = make_test_incoming_message("test-cooldown trigger");
let fired1 = engine.check_event_triggers(&msg).await;
assert!(fired1 >= 1, "First fire should work");

// Poll for routine completion with timeout before updating last_run_at.
let mut attempts = 0;
let max_attempts = 50;
loop {
let runs = db
.list_routine_runs(routine.id, 10)
.await
.expect("list_routine_runs");
if !runs.is_empty() {
break;
}
attempts += 1;
assert!(
attempts < max_attempts,
"Routine did not complete within timeout"
);
tokio::time::sleep(Duration::from_millis(10)).await;
}

// Update the routine's last_run_at to now (simulating it just ran).
db.update_routine_runtime(RoutineRuntimeUpdate {
id: routine.id,
last_run_at: Utc::now(),
next_fire_at: None,
run_count: 1,
consecutive_failures: 0,
state: &serde_json::json!({}),
})
.await
.expect("update_routine_runtime");
Comment thread
coderabbitai[bot] marked this conversation as resolved.

// Refresh cache to pick up updated last_run_at.
engine.refresh_event_cache().await;

// Second fire should be blocked by cooldown.
let fired2 = engine.check_event_triggers(&msg).await;
assert_eq!(fired2, 0, "Second fire should be blocked by cooldown");
}
75 changes: 75 additions & 0 deletions tests/e2e_traces/routine_cron.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
//! E2E tests: cron-triggered routines.
//!
//! Tests that routines with cron schedules fire correctly when their
//! next_fire_at time is in the past.

use std::time::Duration;

use chrono::Utc;

use ironclaw::agent::routine::Trigger;

use crate::support::routines::{
create_test_db, create_workspace, make_minimal_engine, make_routine,
};
use crate::support::trace_llm::{LlmTrace, TraceResponse, TraceStep};

#[tokio::test]
async fn cron_routine_fires() {
let (db, _tmp) = create_test_db().await.expect("create_test_db");
let ws = create_workspace(&db);

// Create a TraceLlm that responds with ROUTINE_OK.
let trace = LlmTrace::single_turn(
"test-cron-fire",
"check",
vec![TraceStep {
request_hint: None,
response: TraceResponse::Text {
content: "ROUTINE_OK".to_string(),
input_tokens: 50,
output_tokens: 5,
},
expected_tool_results: vec![],
}],
);
let (engine, mut notify_rx) = make_minimal_engine(trace, db.clone(), ws);

// Insert a cron routine with next_fire_at in the past.
let mut routine = make_routine(
"cron-test",
Trigger::Cron {
schedule: "* * * * *".to_string(),
timezone: None,
},
"Check system status.",
);
routine.next_fire_at = Some(Utc::now() - chrono::Duration::minutes(5));
db.create_routine(&routine).await.expect("create_routine");

// Fire cron triggers.
engine.check_cron_triggers().await;

// Poll for routine completion with timeout.
let mut attempts = 0;
let max_attempts = 50;
loop {
let runs = db
.list_routine_runs(routine.id, 10)
.await
.expect("list_routine_runs");
if !runs.is_empty() {
break;
}
attempts += 1;
assert!(
attempts < max_attempts,
"Routine did not complete within timeout"
);
tokio::time::sleep(Duration::from_millis(10)).await;
}

// Notification may or may not be sent depending on config;
// just verify no panic occurred. Drain the channel.
let _ = notify_rx.try_recv();
}
Loading
Loading