Skip to content

feat: add local E2E sanity test harness#461

Merged
kulvirgit merged 1 commit intomainfrom
feat/sanity-test-harness
Mar 25, 2026
Merged

feat: add local E2E sanity test harness#461
kulvirgit merged 1 commit intomainfrom
feat/sanity-test-harness

Conversation

@kulvirgit
Copy link
Collaborator

@kulvirgit kulvirgit commented Mar 25, 2026

Summary

Docker-based "new user simulator" that tests the shipped npm artifact end-to-end — catches bugs that existing unit tests miss.

  • Phase 1: verify-install — 9 checks: binary links, 17 skills ship, napi binding loads, dbt/git available
  • Phase 2: smoke-tests — 10 E2E tests via altimate run --yolo --format json: commands resolve, sql_analyze works, DuckDB driver, agent routing, graceful error handling
  • Phase 3: resilience — 8 tests: SQLite DB at correct path with WAL, sessions persist, compaction circuit breaker, missing API key = clean error, old config backwards compat
  • PR-aware test generationgit diff analysis generates targeted tests for what changed
  • Local CI pipelinebun run ci runs typecheck + tests + markers in one command
  • Machine-aware parallelism — detects cores/RAM, runs 2-6 concurrent smoke tests

27 tests, all passing in ~2:48 on a 20-core machine.

How to use

bun run ci            # Fast local CI (~2 min)
bun run sanity        # Full Docker suite (~3 min)
bun run sanity --pr   # + PR-specific tests from diff

Test plan

  • Phase 1 verify-install: 9/9 passed in Docker
  • Phase 2 smoke-tests: 10/10 passed (2 skipped without Postgres/Snowflake creds)
  • Phase 3 resilience: 8/8 passed
  • Full suite with API key: ALL PASSED in 2:48
  • Full suite without API key: graceful skips, no crashes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Added a multi-phase sanity test suite (install, smoke, resilience, upgrade), PR-targeted test generation and execution, parallel runner and orchestration scripts, assertion/cleanup helpers, fixtures, baselines, and test-run wrappers for timeouts and output capture.
  • Chores
    • Added npm scripts to run local CI/sanity workflows: ci, ci:full, ci:pr, sanity, sanity:upgrade.
  • Documentation
    • Added an example env file documenting required test environment variables.

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a local sanity CI system: npm scripts to run sanity flows, Dockerfiles and a Docker Compose environment for multi‑DB testing, many Bash helpers and phased test scripts (verify-install, smoke, resilience, verify-upgrade), PR-specific test generation/runner, fixtures, and orchestrating entrypoints.

Changes

Cohort / File(s) Summary
Package Configuration
package.json
Added CI/sanity npm scripts (ci, ci:full, ci:pr, sanity, sanity:upgrade) and minor JSON punctuation tweak in test.
Env & Fixtures
test/sanity/.env.example, test/sanity/fixtures/*
New .env example and fixtures: broken-config.json, old-config.json, compaction-session.sql, test.sql for various sanity scenarios.
Docker Images & Compose
test/sanity/Dockerfile, test/sanity/Dockerfile.upgrade, test/sanity/docker-compose.yml, test/sanity/docker-compose.upgrade.yml
New Dockerfiles for fresh/upgrade images and compose/override files defining sanity service and backing DB services (Postgres, MySQL, MSSQL, Redshift) with healthchecks and env wiring.
Local CI Orchestration
test/sanity/ci-local.sh, test/sanity/run.sh
New local CI runner supporting fast/full/pr modes and top-level run script sequencing phases and aggregating exit codes (supports --upgrade).
Phase Scripts
test/sanity/phases/...
verify-install.sh, smoke-tests.sh, resilience.sh, verify-upgrade.sh
Added phased suites: installation verification, parallel smoke tests, resilience tests, and upgrade verification.
Helpers & Libs
test/sanity/lib/...
assert.sh, altimate-run.sh, cleanup.sh, parallel.sh
Assertion counters and helpers, altimate run wrappers with timeout/output capture, cleanup utilities, and CPU/RAM-aware parallel runner.
PR Test Tools
test/sanity/pr-tests/*
generate.sh, run-pr-tests.sh
Git-diff-driven PR manifest generator and manifest runner that execute PR-specific tests from a generated manifest.
Results Baseline
test/sanity/results/baseline.json
Added baseline JSON with initial phase counts and metadata.
Entrypoints & Permissions
test/sanity/*.sh, test/sanity/*
Many new executable shell scripts copied into container images and used as test entrypoints; Dockerfile marks scripts executable and sets container ENTRYPOINTs.

Sequence Diagram(s)

sequenceDiagram
    participant CI as CI Script (ci-local.sh)
    participant Compose as Docker Compose
    participant DB as DB Services (Pg/MySQL/MSSQL/Redshift)
    participant Sanity as Sanity Container
    participant CLI as Altimate CLI
    participant Assert as Assertion Libs

    CI->>CI: choose mode (fast / full / pr / upgrade)
    CI->>Compose: docker compose up --build (full / upgrade)
    Compose->>DB: start DB services & run healthchecks
    DB-->>Compose: healthy
    Compose->>Sanity: start `sanity` service (depends_on healthy)
    Sanity->>CLI: run phase scripts (verify-install, smoke, resilience, verify-upgrade)
    CLI->>DB: execute queries / integrations
    DB-->>CLI: return results
    CLI-->>Sanity: emit outputs / exit codes
    Sanity->>Assert: validate outputs, write per-test results
    Sanity-->>CI: exit code / container result
    CI->>Compose: docker compose down --volumes
    CI->>CI: aggregate + report final result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

contributor

Suggested reviewers

  • suryaiyer95

Poem

🐰 I hopped through Docker, scripts in tow,
I seeded DBs where test queries go,
Phases run in bash, parallel and spry,
A carrot-cheer for CI on the fly,
May logs be green and tests forever sprout.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description includes a Summary section explaining what changed and why, but lacks a Test Plan section and Checklist as required by the template. Add a 'Test Plan' section describing how the changes were tested and complete the 'Checklist' section with checkboxes for tests, documentation, and CHANGELOG updates.
Docstring Coverage ⚠️ Warning Docstring coverage is 21.21% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add local E2E sanity test harness' accurately and concisely describes the main change—adding a new end-to-end sanity test harness—which aligns with the changeset's primary focus.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sanity-test-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kulvirgit kulvirgit force-pushed the feat/sanity-test-harness branch from 467b5ab to 09ee72c Compare March 25, 2026 01:42
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (11)
test/sanity/lib/assert.sh (2)

19-28: Same $? issue applies here, though less impactful.

For assert_exit_nonzero, the exit code isn't printed so it's less visible, but for consistency you may want to capture it if you ever want to log what the actual exit code was.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/lib/assert.sh` around lines 19 - 28, In assert_exit_nonzero,
capture the command's exit code and include it in the FAIL message for
consistency: run the command (still redirecting output to /dev/null 2>&1), store
its exit status in a variable (e.g., rc = $?), then test rc to decide PASS/FAIL
and use rc in the failure echo; refer to the assert_exit_nonzero function and
the rc variable when making the change.

80-91: Integer comparison may fail silently on non-numeric input.

The 2>/dev/null suppresses errors when $actual or $expected are not valid integers, causing the assertion to fail silently with a misleading message. Consider validating inputs or using a more explicit error message.

Proposed improvement
 assert_ge() {
   local actual="$1"
   local expected="$2"
   local desc="$3"
-  if [ "$actual" -ge "$expected" ] 2>/dev/null; then
+  if ! [[ "$actual" =~ ^[0-9]+$ ]] || ! [[ "$expected" =~ ^[0-9]+$ ]]; then
+    echo "  FAIL: $desc (non-numeric: actual='$actual', expected='$expected')"
+    ((FAIL_COUNT++))
+  elif [ "$actual" -ge "$expected" ]; then
     echo "  PASS: $desc"
     ((PASS_COUNT++))
   else
     echo "  FAIL: $desc (got $actual, expected >= $expected)"
     ((FAIL_COUNT++))
   fi
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/lib/assert.sh` around lines 80 - 91, The assert_ge function
currently silences numeric errors via "2>/dev/null", which hides non-numeric
input problems; update assert_ge to validate that both "$actual" and "$expected"
are valid integers (e.g., using a regex like '^[+-]?[0-9]+$' with [[ ... =~ ...
]]) before performing the numeric comparison, and if validation fails emit a
clear error/FAIL message that includes the offending values and increment
FAIL_COUNT instead of hiding the error; keep the numeric comparison logic (the
-ge test) but remove the error suppression so failures are explicit and the
function name assert_ge is still used.
test/sanity/lib/parallel.sh (2)

4-5: Declare and assign separately to avoid masking return values.

The local declaration masks the exit code of the command substitution. If nproc or sysctl fails silently, you won't detect it. While the fallback chain handles this functionally, separating declaration and assignment is cleaner.

Proposed fix
 detect_parallelism() {
-  local cores=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo "2")
+  local cores
+  cores=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo "2")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/lib/parallel.sh` around lines 4 - 5, In detect_parallelism(),
don't declare and assign cores in one local statement because the command
substitution masks its exit status; instead, declare local cores on its own,
then perform the fallback command substitution into cores and check the exit
status (or use a conditional fallback chain) so failures of nproc/sysctl are
detectable and handled; update references to the variable name cores accordingly
and ensure the function returns a non-zero status when all detection attempts
fail.

46-65: Batch completion waits for all jobs before starting next batch.

The current implementation waits for the entire batch to complete before starting new jobs. This means if one job finishes quickly, its slot stays idle until the slowest job in the batch completes. A more efficient approach would use a job slot pool, but for a test harness with ~10 smoke tests, this is likely acceptable.

Alternative: slot-based parallelism (optional enhancement)

For future consideration if test times vary significantly:

run_parallel() {
  local max_parallel="$1"; shift
  local results=()
  
  for cmd in "$@"; do
    # Wait if at capacity
    while [ $(jobs -r | wc -l) -ge "$max_parallel" ]; do
      wait -n 2>/dev/null || sleep 0.1
    done
    eval "$cmd" &
    results+=($!)
  done
  
  # Wait for all remaining
  local failed=0
  for pid in "${results[@]}"; do
    wait "$pid" || ((failed++))
  done
  
  [ "$failed" -eq 0 ]
}

Note: wait -n requires Bash 4.3+, which may not be available in all environments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/lib/parallel.sh` around lines 46 - 65, The run_parallel function
currently batches commands and waits for all pids in pids[] to finish before
starting the next batch, which leaves slots idle; change run_parallel to use
slot-based concurrency: before launching each command check running count
against max_parallel and if at capacity wait for any single job to finish (use
wait -n to reap one child and capture its exit status into results[], removing
its pid from pids[]), falling back to a small sleep+jobs -r loop if wait -n
isn’t available; after dispatching all commands, wait for remaining pids in
pids[] and return success/failure based on collected results[]. Ensure you
update handling of results[] and remove the batch counter variable if unused.
test/sanity/phases/resilience.sh (2)

86-96: API key restoration may not happen on unexpected exit.

If the script exits between unset ANTHROPIC_API_KEY (line 89) and the restoration (line 95), the key remains unset for subsequent processes. Consider using a subshell or trap.

♻️ Proposed fix using subshell
 # 6. Graceful on missing provider key
 echo "  [6/7] Missing API key handling..."
-SAVED_KEY="${ANTHROPIC_API_KEY:-}"
-unset ANTHROPIC_API_KEY
-OUTPUT=$(timeout 10 altimate run --max-turns 1 --yolo "hello" 2>&1 || true)
+OUTPUT=$(
+  unset ANTHROPIC_API_KEY
+  timeout 10 altimate run --max-turns 1 --yolo "hello" 2>&1 || true
+)
 # Should get a clean error, not an unhandled exception / stack trace
 assert_not_contains "$OUTPUT" "TypeError" "no TypeError on missing key"
 assert_not_contains "$OUTPUT" "Cannot read properties" "no unhandled error on missing key"
-if [ -n "$SAVED_KEY" ]; then
-  export ANTHROPIC_API_KEY="$SAVED_KEY"
-fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/resilience.sh` around lines 86 - 96, The test unsets
ANTHROPIC_API_KEY and only restores it later, which can leave the environment
modified if the script exits early; update the block around
SAVED_KEY/ANTHROPIC_API_KEY so the original key is always restored on exit by
either executing the unset/restore inside a subshell or installing a trap that
captures the original value (SAVED_KEY) and resets ANTHROPIC_API_KEY on EXIT;
locate and modify the section that declares SAVED_KEY, unsets ANTHROPIC_API_KEY,
runs altimate, and restores the key to ensure restoration happens even on
unexpected exits.

18-18: Add error handling for cd command.

Per shellcheck SC2164, if the cd fails (e.g., mktemp returned a path that was immediately deleted), subsequent commands would run in an unexpected directory.

♻️ Proposed fix
 WORKDIR=$(mktemp -d /tmp/sanity-resilience-XXXXXX)
-cd "$WORKDIR"
+cd "$WORKDIR" || { echo "FAIL: could not cd to $WORKDIR"; exit 1; }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/resilience.sh` at line 18, The lone cd "$WORKDIR" lacks
error handling and can silently fail; update the script to check the exit status
of cd (the cd "$WORKDIR" invocation) and abort the script on failure — for
example, immediately follow cd "$WORKDIR" with a failing-exit branch that prints
an explanatory error to stderr and exits non‑zero (or use a helper die/err
function if one exists) so subsequent commands never run in the wrong directory.
test/sanity/ci-local.sh (1)

52-72: Docker cleanup may not run on early failure.

If any step between lines 53-70 fails and causes an early exit (despite set -e being disabled by run_step), the cleanup on line 72 won't execute, leaving containers running. Consider using a trap for cleanup.

Also, the hardcoded sleep 10 on line 56 is fragile—services may need more or less time depending on the host.

♻️ Proposed fix with trap-based cleanup
 if [ "$MODE" = "--full" ] || [ "$MODE" = "full" ]; then
   echo ""
   echo "=== Full CI (Docker) ==="

+  cleanup_docker() {
+    docker compose -f "$REPO_ROOT/test/sanity/docker-compose.yml" down --volumes --remove-orphans 2>/dev/null || true
+  }
+  trap cleanup_docker EXIT
+
   # Driver E2E with Docker containers
   run_step "Docker Services Up" docker compose -f "$REPO_ROOT/test/sanity/docker-compose.yml" up -d postgres mysql mssql redshift

-  echo "  Waiting for services to be healthy..."
-  sleep 10
+  echo "  Waiting for services to be healthy (up to 30s)..."
+  for i in {1..30}; do
+    if docker compose -f "$REPO_ROOT/test/sanity/docker-compose.yml" ps --status running | grep -q postgres; then
+      break
+    fi
+    sleep 1
+  done

   # ... tests ...

-  docker compose -f "$REPO_ROOT/test/sanity/docker-compose.yml" down --volumes --remove-orphans 2>/dev/null
+  trap - EXIT
+  cleanup_docker
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/ci-local.sh` around lines 52 - 72, Wrap the docker bring-up and
test steps in a guarded section that installs a trap to always run the existing
cleanup command (docker compose ... down --volumes --remove-orphans) on EXIT, so
containers are removed even if any run_step or tests fail; specifically, add a
trap that calls the same cleanup command before running run_step "Docker
Services Up" and the subsequent run_step test blocks (referencing the run_step
invocation names and the docker compose command from the diff). Replace the
fixed sleep 10 with a retrying health-check loop that polls container health or
attempts TCP connects to TEST_*_PORTs with a timeout/attempt limit (used before
running the local and docker E2E run_step tests) so the script waits
deterministically for services instead of sleeping.
test/sanity/phases/smoke-tests.sh (2)

52-61: Separate declaration from assignment to avoid masking errors.

The pattern local output=$(get_output ...) masks the exit code of get_output. If the function fails, the error is silently ignored. This applies to multiple test functions (lines 55, 72, 83, 114, 125, 136, 147).

♻️ Example fix for test_discover_mcps
 test_discover_mcps() {
-  cd "$WORKDIR"
+  cd "$WORKDIR" || return 1
   altimate_run "discover-mcps" --command discover-and-add-mcps "list"
-  local output=$(get_output "discover-mcps")
+  local output
+  output=$(get_output "discover-mcps")
   if echo "$output" | grep -qi "command not found\|Unknown command"; then

Apply similar changes to other test functions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` around lines 52 - 61, The test functions
(e.g., test_discover_mcps) currently use combined declaration+assignment like
`local output=$(get_output ...)` which masks get_output's exit status; change
each to declare the variable first (e.g., `local output`) then run the
assignment as a separate statement so you can capture and check the command's
exit code (inspect `$?` or use `if ! output=$(get_output ...) ; then ... fi`)
and fail the test early when get_output errors before grepping; apply the same
pattern to the other affected functions listed in the comment.

212-233: Remove unused variable and improve robustness.

Line 214 calculates num but it's never used in the loop. This appears to be dead code.

Additionally, the static analysis warnings about cd without error handling (SC2164) and combined declare+assign (SC2155) are valid. While the risk is low in this test context, separating declaration from assignment prevents masking failures from get_output.

♻️ Proposed fix for result collection
 # Collect results
 echo ""
 echo "  Results:"
 for name in "${TEST_NAMES[@]}"; do
   result=$(cat "$RESULTS_DIR/$name" 2>/dev/null || echo "MISSING")
-  num=$((${`#TESTS`[@]} - ${`#TEST_NAMES`[@]} + 1))
   case "$result" in
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` around lines 212 - 233, Remove the unused
dead variable `num` computed in the loop (derived from `TESTS`/`TEST_NAMES`) and
simplify the loop to only read results from `RESULTS_DIR` for each `name` in
`TEST_NAMES`; also fix shellcheck warnings by not mixing declaration+assignment
when capturing output from helper functions (separate declaration and assignment
where `get_output` is used) and add explicit error handling for any `cd` calls
used earlier (e.g., check `cd` return or use pushd/popd) so failures are not
masked; locate changes around the loop that references TEST_NAMES/RESULTS_DIR
and update related counters PASS_COUNT/FAIL_COUNT/SKIP_COUNT as-is.
test/sanity/Dockerfile (1)

4-6: Add --no-install-recommends to reduce image size.

The static analysis correctly identifies that --no-install-recommends is missing. This flag prevents installation of unnecessary recommended packages, reducing image size and attack surface.

♻️ Proposed fix
 # System deps (what a real user would have)
-RUN apt-get update && apt-get install -y \
+RUN apt-get update && apt-get install -y --no-install-recommends \
     git python3 python3-pip python3-venv curl sqlite3 \
     && rm -rf /var/lib/apt/lists/*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/Dockerfile` around lines 4 - 6, The apt-get install command in
the Dockerfile RUN line should include --no-install-recommends to avoid pulling
recommended packages; update the RUN instruction that executes "apt-get update
&& apt-get install -y git python3 python3-pip python3-venv curl sqlite3" to add
the --no-install-recommends flag (preserving the existing cleanup "&& rm -rf
/var/lib/apt/lists/*") so the image size and attack surface are reduced.
test/sanity/pr-tests/generate.sh (1)

9-9: Use explicit no-op for file truncation.

The bare redirection > "$MANIFEST" works but is flagged by shellcheck (SC2188). Using an explicit command improves clarity.

♻️ Proposed fix
-> "$MANIFEST"
+: > "$MANIFEST"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/pr-tests/generate.sh` at line 9, Replace the bare redirection
that truncates the file (" > \"$MANIFEST\"") with an explicit no-op redirection
to satisfy shellcheck SC2188; e.g., use the null command prefix (colon) to
truncate the file (: > "$MANIFEST") so the truncation is explicit—update the
occurrence that references the MANIFEST variable in generate.sh.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/sanity/Dockerfile`:
- Around line 32-35: The Dockerfile currently hardcodes the architecture and
binary name (copying from
dist/@altimateai/altimate-code-linux-arm64/bin/altimate), which breaks on x86_64
and doesn't match the postinstall symlink naming; update the RUN step to detect
platform/arch the same way as postinstall.mjs and copy the actual binary target
(follow the postinstall symlink name like bin/.altimate-code -> altimate-code)
into /home/testuser/.local/bin/altimate, ensuring you compute the correct dist
directory name (e.g., altimate-code-<platform>-<arch> or resolve the
.altimate-code symlink) instead of using the hardcoded altimate-code-linux-arm64
and adjust the source path to the real binary (altimate-code) before setting
executable permissions.

In `@test/sanity/Dockerfile.upgrade`:
- Around line 17-18: The Dockerfile uses ARG PRIOR_VERSION=latest which makes
the upgrade baseline implicit and fragile; change the Dockerfile.upgrade to
require an explicit build arg (use ARG PRIOR_VERSION without a default) and keep
the RUN npm install -g `@altimateai/altimate-code`@${PRIOR_VERSION} line as-is,
then update your CI/compose build invocation to pass a concrete PRIOR_VERSION
build-arg (e.g., via --build-arg PRIOR_VERSION=<version>) so tests always run
against a reproducible prior release.

In `@test/sanity/fixtures/compaction-session.sql`:
- Around line 6-7: The seed INSERT into the session table is missing the
required NOT NULL slug column so the row is ignored; update the INSERT statement
for session (the INSERT OR IGNORE INTO session (...) VALUES (...)) to include
the slug column name in the column list and add an appropriate slug string in
the corresponding VALUES tuple (e.g. a URL-safe or derived slug like
"compaction-test" or a deterministic slug from the title) so the row inserts
successfully.

In `@test/sanity/lib/cleanup.sh`:
- Around line 4-10: Source the cleanup helper and register its functions as a
trap in the runner/entrypoint so the compose stack and temp outputs are removed
on early interrupt: add a one-line "source" of the cleanup library at the top of
the entrypoint script (so cleanup_sanity_outputs and cleanup_docker are
available), then install a trap (e.g., trap 'cleanup_docker;
cleanup_sanity_outputs' EXIT INT TERM) before invoking docker compose up; ensure
the trap runs before starting the compose stack so interrupts always call
cleanup_docker and cleanup_sanity_outputs.

In `@test/sanity/phases/resilience.sh`:
- Around line 76-78: The test uses a literal true in assert_exit_0 so it always
passes; instead capture the real exit status of the altimate_run_with_turns
invocation and pass that to assert_exit_0. Concretely, run
altimate_run_with_turns "compaction" 3 --continue "continue working", save its
exit code (e.g. into a variable like rc) and then call assert_exit_0 "compaction
did not crash" "$rc" (or otherwise assert that rc == 0) so the assertion
reflects the actual command result; update references to altimate_run_with_turns
and assert_exit_0 accordingly.

In `@test/sanity/phases/smoke-tests.sh`:
- Around line 91-109: The tests unconditionally write "PASS" after calling
altimate_run; change test_postgres and test_snowflake to check altimate_run's
result (capture its exit code or stdout) and only write "PASS" to
"$RESULTS_DIR/postgres" or "$RESULTS_DIR/snowflake" on success, otherwise write
"FAIL" (and optionally include error output) and return a non-zero code;
specifically, after calling altimate_run in test_postgres and test_snowflake,
inspect $? (or use if altimate_run ...; then ...) to decide between echo "PASS"
> "$RESULTS_DIR/..." and echo "FAIL" > "$RESULTS_DIR/..." so failures aren’t
masked.
- Around line 63-67: The test_configure_claude function currently writes "PASS"
unconditionally; change it to capture and check the result of the altimate_run
invocation (use its exit code or captured output from altimate_run) and only
write "PASS" to RESULTS_DIR/configure-claude when altimate_run succeeds,
otherwise write "FAIL" (and optionally include the captured stderr/stdout or
exit code for diagnostics); update references in test_configure_claude to use
the captured status from altimate_run to decide the file contents.

In `@test/sanity/phases/verify-install.sh`:
- Around line 18-19: The SKILL_COUNT assignment using ls | wc -l || echo "0" can
produce a multiline or duplicated result under pipefail; replace it with a
deterministic find-based count to reliably produce a single numeric value.
Update the SKILL_COUNT assignment (the variable set near the assert_ge call) to
use find on ~/.altimate/builtin to locate SKILL.md files (e.g. using
-mindepth/-maxdepth or appropriate path matching) and pipe that to wc -l
(redirecting stderr to /dev/null) so SKILL_COUNT is always a single integer
before calling assert_ge.

In `@test/sanity/pr-tests/generate.sh`:
- Around line 67-70: The current upgrade test emits a no-op command via
emit_test "upgrade-needed" "echo TRIGGER_UPGRADE_TEST" which always exits 0 and
creates a false positive; replace that emitted command so it either (a) emits a
clear skip marker that exits non-zero with an explanatory message (so
run-pr-tests.sh will mark it skipped/failed) or (b) invokes the real upgrade
verification script (e.g., call the project's upgrade-check/run-upgrade-test
entrypoint) instead of echo; update the emit_test invocation for
"upgrade-needed" accordingly so run-pr-tests.sh's eval runs a meaningful check
(refer to the emit_test invocation, the "upgrade-needed" test name, and the echo
TRIGGER_UPGRADE_TEST string to locate the change).

In `@test/sanity/pr-tests/run-pr-tests.sh`:
- Around line 19-24: The post-increment arithmetic ((PASS_COUNT++)) and
((FAIL_COUNT++)) can return non-zero when the counter is 0 and, under set -e,
cause the script to exit early; replace those C-style increments with
assignment-based increments (use PASS_COUNT=$((PASS_COUNT+1)) and
FAIL_COUNT=$((FAIL_COUNT+1))) in the test loop where cmd and name are handled so
the increments always return 0 and allow report_results to run.

---

Nitpick comments:
In `@test/sanity/ci-local.sh`:
- Around line 52-72: Wrap the docker bring-up and test steps in a guarded
section that installs a trap to always run the existing cleanup command (docker
compose ... down --volumes --remove-orphans) on EXIT, so containers are removed
even if any run_step or tests fail; specifically, add a trap that calls the same
cleanup command before running run_step "Docker Services Up" and the subsequent
run_step test blocks (referencing the run_step invocation names and the docker
compose command from the diff). Replace the fixed sleep 10 with a retrying
health-check loop that polls container health or attempts TCP connects to
TEST_*_PORTs with a timeout/attempt limit (used before running the local and
docker E2E run_step tests) so the script waits deterministically for services
instead of sleeping.

In `@test/sanity/Dockerfile`:
- Around line 4-6: The apt-get install command in the Dockerfile RUN line should
include --no-install-recommends to avoid pulling recommended packages; update
the RUN instruction that executes "apt-get update && apt-get install -y git
python3 python3-pip python3-venv curl sqlite3" to add the
--no-install-recommends flag (preserving the existing cleanup "&& rm -rf
/var/lib/apt/lists/*") so the image size and attack surface are reduced.

In `@test/sanity/lib/assert.sh`:
- Around line 19-28: In assert_exit_nonzero, capture the command's exit code and
include it in the FAIL message for consistency: run the command (still
redirecting output to /dev/null 2>&1), store its exit status in a variable
(e.g., rc = $?), then test rc to decide PASS/FAIL and use rc in the failure
echo; refer to the assert_exit_nonzero function and the rc variable when making
the change.
- Around line 80-91: The assert_ge function currently silences numeric errors
via "2>/dev/null", which hides non-numeric input problems; update assert_ge to
validate that both "$actual" and "$expected" are valid integers (e.g., using a
regex like '^[+-]?[0-9]+$' with [[ ... =~ ... ]]) before performing the numeric
comparison, and if validation fails emit a clear error/FAIL message that
includes the offending values and increment FAIL_COUNT instead of hiding the
error; keep the numeric comparison logic (the -ge test) but remove the error
suppression so failures are explicit and the function name assert_ge is still
used.

In `@test/sanity/lib/parallel.sh`:
- Around line 4-5: In detect_parallelism(), don't declare and assign cores in
one local statement because the command substitution masks its exit status;
instead, declare local cores on its own, then perform the fallback command
substitution into cores and check the exit status (or use a conditional fallback
chain) so failures of nproc/sysctl are detectable and handled; update references
to the variable name cores accordingly and ensure the function returns a
non-zero status when all detection attempts fail.
- Around line 46-65: The run_parallel function currently batches commands and
waits for all pids in pids[] to finish before starting the next batch, which
leaves slots idle; change run_parallel to use slot-based concurrency: before
launching each command check running count against max_parallel and if at
capacity wait for any single job to finish (use wait -n to reap one child and
capture its exit status into results[], removing its pid from pids[]), falling
back to a small sleep+jobs -r loop if wait -n isn’t available; after dispatching
all commands, wait for remaining pids in pids[] and return success/failure based
on collected results[]. Ensure you update handling of results[] and remove the
batch counter variable if unused.

In `@test/sanity/phases/resilience.sh`:
- Around line 86-96: The test unsets ANTHROPIC_API_KEY and only restores it
later, which can leave the environment modified if the script exits early;
update the block around SAVED_KEY/ANTHROPIC_API_KEY so the original key is
always restored on exit by either executing the unset/restore inside a subshell
or installing a trap that captures the original value (SAVED_KEY) and resets
ANTHROPIC_API_KEY on EXIT; locate and modify the section that declares
SAVED_KEY, unsets ANTHROPIC_API_KEY, runs altimate, and restores the key to
ensure restoration happens even on unexpected exits.
- Line 18: The lone cd "$WORKDIR" lacks error handling and can silently fail;
update the script to check the exit status of cd (the cd "$WORKDIR" invocation)
and abort the script on failure — for example, immediately follow cd "$WORKDIR"
with a failing-exit branch that prints an explanatory error to stderr and exits
non‑zero (or use a helper die/err function if one exists) so subsequent commands
never run in the wrong directory.

In `@test/sanity/phases/smoke-tests.sh`:
- Around line 52-61: The test functions (e.g., test_discover_mcps) currently use
combined declaration+assignment like `local output=$(get_output ...)` which
masks get_output's exit status; change each to declare the variable first (e.g.,
`local output`) then run the assignment as a separate statement so you can
capture and check the command's exit code (inspect `$?` or use `if !
output=$(get_output ...) ; then ... fi`) and fail the test early when get_output
errors before grepping; apply the same pattern to the other affected functions
listed in the comment.
- Around line 212-233: Remove the unused dead variable `num` computed in the
loop (derived from `TESTS`/`TEST_NAMES`) and simplify the loop to only read
results from `RESULTS_DIR` for each `name` in `TEST_NAMES`; also fix shellcheck
warnings by not mixing declaration+assignment when capturing output from helper
functions (separate declaration and assignment where `get_output` is used) and
add explicit error handling for any `cd` calls used earlier (e.g., check `cd`
return or use pushd/popd) so failures are not masked; locate changes around the
loop that references TEST_NAMES/RESULTS_DIR and update related counters
PASS_COUNT/FAIL_COUNT/SKIP_COUNT as-is.

In `@test/sanity/pr-tests/generate.sh`:
- Line 9: Replace the bare redirection that truncates the file (" >
\"$MANIFEST\"") with an explicit no-op redirection to satisfy shellcheck SC2188;
e.g., use the null command prefix (colon) to truncate the file (: > "$MANIFEST")
so the truncation is explicit—update the occurrence that references the MANIFEST
variable in generate.sh.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 117018ed-b237-4d66-8866-5b2d4252d58c

📥 Commits

Reviewing files that changed from the base of the PR and between 16c38bb and 467b5ab.

📒 Files selected for processing (22)
  • package.json
  • test/sanity/.env.example
  • test/sanity/Dockerfile
  • test/sanity/Dockerfile.upgrade
  • test/sanity/ci-local.sh
  • test/sanity/docker-compose.yml
  • test/sanity/fixtures/broken-config.json
  • test/sanity/fixtures/compaction-session.sql
  • test/sanity/fixtures/old-config.json
  • test/sanity/fixtures/test.sql
  • test/sanity/lib/altimate-run.sh
  • test/sanity/lib/assert.sh
  • test/sanity/lib/cleanup.sh
  • test/sanity/lib/parallel.sh
  • test/sanity/phases/resilience.sh
  • test/sanity/phases/smoke-tests.sh
  • test/sanity/phases/verify-install.sh
  • test/sanity/phases/verify-upgrade.sh
  • test/sanity/pr-tests/generate.sh
  • test/sanity/pr-tests/run-pr-tests.sh
  • test/sanity/results/baseline.json
  • test/sanity/run.sh

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (11)
test/sanity/lib/parallel.sh (2)

36-38: Redundant cap check.

The tiered logic above already caps parallel at 6 (line 26), so this condition can never be true. It's harmless defensive code, but could be removed for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/lib/parallel.sh` around lines 36 - 38, Remove the redundant
defensive cap that reassigns the variable "parallel" to 6 (the if block checking
[ "$parallel" -gt 6 ] and setting parallel=6); the tiered logic above already
enforces this cap, so delete that conditional block to simplify the script while
leaving all earlier logic that sets "parallel" unchanged.

5-5: Consider separating declaration and assignment to preserve exit codes.

The static analyzer flags that combining local declaration with command substitution masks the return value. While the fallback || echo "2" mitigates this for nproc/sysctl, separating them is safer practice.

♻️ Proposed fix
 detect_parallelism() {
-  local cores=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo "2")
+  local cores
+  cores=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo "2")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/lib/parallel.sh` at line 5, Split the declaration and assignment
for the variable cores to avoid masking exit codes: declare local cores first
(use the existing variable name "cores"), then assign it with the command
substitution that tries nproc, falls back to sysctl -n hw.ncpu, and finally
echoes "2"; update the line containing the command substitution so it becomes an
assignment to the already-declared local variable, preserving the original
fallback behavior.
test/sanity/pr-tests/generate.sh (2)

9-9: Prefer explicit no-op for file truncation.

While > "$MANIFEST" works in Bash, it's flagged by shellcheck. Using : > "$MANIFEST" or true > "$MANIFEST" is more portable and explicit.

♻️ Proposed fix
-> "$MANIFEST"
+: > "$MANIFEST"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/pr-tests/generate.sh` at line 9, Replace the bare redirection
that truncates the manifest file by using an explicit no-op command before the
redirection: change the line that currently uses > "$MANIFEST" to use a no-op
redirection like : > "$MANIFEST" (or true > "$MANIFEST") so the truncation is
explicit and shellcheck-friendly; update the same occurrence that references the
MANIFEST variable in generate.sh.

57-65: Potential duplicate resilience test registration.

If a PR changes both session/ and config/, two identical resilience.sh entries are added to the manifest, causing redundant execution. Consider deduplicating or combining conditions.

♻️ Combined condition
-# session/compaction/storage changed → resilience
-if echo "$changed" | grep -qE "session/|compaction|storage/"; then
-  emit_test "resilience" "$SCRIPT_DIR/phases/resilience.sh"
-fi
-
-# config changed → backwards compat
-if echo "$changed" | grep -q "config/"; then
-  emit_test "config-compat" "$SCRIPT_DIR/phases/resilience.sh"
-fi
+# session/compaction/storage/config changed → resilience
+if echo "$changed" | grep -qE "session/|compaction|storage/|config/"; then
+  emit_test "resilience" "$SCRIPT_DIR/phases/resilience.sh"
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/pr-tests/generate.sh` around lines 57 - 65, The script can
register the same resilience test twice because both the first grep and the
config check call emit_test with "$SCRIPT_DIR/phases/resilience.sh"; to fix,
ensure emit_test for resilience.sh is only called once by adding a guard:
introduce a local flag (e.g., resilience_emitted) that's set when emit_test
"resilience" "$SCRIPT_DIR/phases/resilience.sh" is called in the
session/compaction/storage branch and check that flag before calling emit_test
again in the config/ branch (or alternatively merge the two grep conditions into
one grep -qE that includes "config/"); update any references to the changed
variable, emit_test invocation, and resilience.sh to implement the
single-emission guard.
test/sanity/phases/smoke-tests.sh (2)

214-214: Unused variable num.

This line calculates num which is always 1 (since TESTS and TEST_NAMES have equal length), but it's never used in the subsequent case block.

🧹 Remove dead code
   result=$(cat "$RESULTS_DIR/$name" 2>/dev/null || echo "MISSING")
-  num=$((${`#TESTS`[@]} - ${`#TEST_NAMES`[@]} + 1))
   case "$result" in
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` at line 214, The variable num computed on
the line "num=$((${`#TESTS`[@]} - ${`#TEST_NAMES`[@]} + 1))" is unused and always 1;
remove that dead code by deleting the num assignment and any related unused
references, leaving the case block to operate without it (look for the TESTS and
TEST_NAMES arrays and the case block in smoke-tests.sh to verify nothing else
depends on num).

186-207: Consider using run_parallel from parallel.sh.

This manual batch loop duplicates logic from run_parallel. You could simplify by generating command strings for each test function and using run_parallel "$MAX_PARALLEL" "${cmds[@]}".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` around lines 186 - 207, The manual
batching loop using idx, PIDS and wait duplicates existing logic; replace it by
building an array of command strings from the TESTS array (e.g., for each
element of TESTS push "${TESTS[$i]}" into cmds) and call the existing helper
run_parallel with the concurrency limit (run_parallel "$MAX_PARALLEL"
"${cmds[@]}") instead of the idx/PIDS loop; ensure you remove the manual batch
loop and rely on run_parallel to manage background jobs and waiting.
test/sanity/Dockerfile (1)

4-6: Add --no-install-recommends to reduce image size and attack surface.

Same as Dockerfile.upgrade—installing recommended packages is unnecessary for test containers.

🐳 Proposed fix
-RUN apt-get update && apt-get install -y \
+RUN apt-get update && apt-get install -y --no-install-recommends \
     git python3 python3-pip python3-venv curl sqlite3 \
     && rm -rf /var/lib/apt/lists/*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/Dockerfile` around lines 4 - 6, Update the Dockerfile RUN line
that installs packages (the apt-get update && apt-get install -y \ git python3
python3-pip python3-venv curl sqlite3 \ && rm -rf /var/lib/apt/lists/* command)
to include --no-install-recommends in the apt-get install invocation so apt-get
install uses apt-get install -y --no-install-recommends ... thereby reducing
image size and surface area.
test/sanity/Dockerfile.upgrade (1)

3-5: Add --no-install-recommends to reduce image size and attack surface.

Installing recommended packages increases the image footprint unnecessarily for a test container.

🐳 Proposed fix
-RUN apt-get update && apt-get install -y \
+RUN apt-get update && apt-get install -y --no-install-recommends \
     git python3 python3-pip python3-venv curl sqlite3 \
     && rm -rf /var/lib/apt/lists/*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/Dockerfile.upgrade` around lines 3 - 5, Update the Dockerfile RUN
line to prevent installing recommended packages: modify the apt-get install
invocation in the RUN command (the line starting with "apt-get update && apt-get
install -y") to include the --no-install-recommends flag so only essential
packages (git, python3, python3-pip, python3-venv, curl, sqlite3) are installed
and the image size/attack surface is reduced.
test/sanity/phases/resilience.sh (2)

98-113: Config file cleanup may be skipped on early exit.

If an error occurs between copying old-config.json (line 103) and the cleanup (line 110), the config file remains. Consider using a trap for reliable cleanup.

♻️ Proposed fix using trap
 # 7. Config backwards compatibility
 echo "  [7/7] Config backwards compat..."
 CONFIG_DIR="${XDG_CONFIG_HOME:-$HOME/.config}/altimate-code"
 mkdir -p "$CONFIG_DIR"
+CONFIG_CLEANUP=""
 if [ -f "$SCRIPT_DIR/fixtures/old-config.json" ]; then
   cp "$SCRIPT_DIR/fixtures/old-config.json" "$CONFIG_DIR/opencode.json"
+  CONFIG_CLEANUP="$CONFIG_DIR/opencode.json"
+  trap 'rm -f "$CONFIG_CLEANUP"' EXIT
   if [ -n "${ANTHROPIC_API_KEY:-}" ]; then
     altimate_run "old-config" "say hello" || true
     assert_not_contains "$(get_output old-config)" "parse error" "old config loads without parse error"
   else
     skip_test "Config backwards compat" "no ANTHROPIC_API_KEY"
   fi
-  rm -f "$CONFIG_DIR/opencode.json"
+  rm -f "$CONFIG_CLEANUP"
+  trap - EXIT
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/resilience.sh` around lines 98 - 113, The test can leave
CONFIG_DIR/opencode.json behind if the script exits early; after copying
"$SCRIPT_DIR/fixtures/old-config.json" to "$CONFIG_DIR/opencode.json" (the cp
call that creates the artifact used by altimate_run and get_output), install a
trap that always removes "$CONFIG_DIR/opencode.json" (and/or unsets it) on EXIT
or ERR so cleanup runs even on failures, set the trap immediately after the cp
and before calling altimate_run/get_output, and optionally clear the trap after
the explicit rm -f to avoid double actions.

88-96: Re-export may differ from original state when ANTHROPIC_API_KEY was unset.

If ANTHROPIC_API_KEY was originally unset (not just empty), SAVED_KEY captures an empty string and line 95 exports an empty variable rather than leaving it unset. This is likely fine for this test context but worth noting.

♻️ More precise state restoration
 # 6. Graceful on missing provider key
 echo "  [6/7] Missing API key handling..."
-SAVED_KEY="${ANTHROPIC_API_KEY:-}"
+if [ -n "${ANTHROPIC_API_KEY+x}" ]; then
+  SAVED_KEY="$ANTHROPIC_API_KEY"
+  HAD_KEY=1
+else
+  HAD_KEY=0
+fi
 unset ANTHROPIC_API_KEY
 OUTPUT=$(timeout 10 altimate run --max-turns 1 --yolo "hello" 2>&1 || true)
 # Should get a clean error, not an unhandled exception / stack trace
 assert_not_contains "$OUTPUT" "TypeError" "no TypeError on missing key"
 assert_not_contains "$OUTPUT" "Cannot read properties" "no unhandled error on missing key"
-if [ -n "$SAVED_KEY" ]; then
+if [ "$HAD_KEY" = "1" ]; then
   export ANTHROPIC_API_KEY="$SAVED_KEY"
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/resilience.sh` around lines 88 - 96, The test currently
uses SAVED_KEY="${ANTHROPIC_API_KEY:-}" which loses the distinction between an
originally unset and an empty ANTHROPIC_API_KEY; update the save/restore logic
to track whether ANTHROPIC_API_KEY was set (e.g. save a presence flag like
SAVED_KEY_SET or use parameter-expansion check ${ANTHROPIC_API_KEY+x}) before
unsetting, then only re-export ANTHROPIC_API_KEY if the flag indicates it was
originally set so you restore the previous state exactly; adjust references
around SAVED_KEY, ANTHROPIC_API_KEY and the final export in the resilience.sh
snippet accordingly.
test/sanity/run.sh (1)

41-43: Separate declaration and assignment to avoid masking cat failures.

If OLD_VERSION_FILE exists but cat fails (permissions, I/O error), the failure is masked and OLD_VERSION gets an empty string.

♻️ Proposed fix
   if [ -f "${OLD_VERSION_FILE:-}" ]; then
-    export OLD_VERSION=$(cat "$OLD_VERSION_FILE")
+    OLD_VERSION=$(cat "$OLD_VERSION_FILE")
+    export OLD_VERSION
   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/run.sh` around lines 41 - 43, Separate declaration from the
assignment and explicitly check the read/cat exit status so a failing cat
doesn't get masked; instead of doing a blind export+command substitution, first
export or initialize OLD_VERSION, then attempt to read the file (using read -r
OLD_VERSION < "$OLD_VERSION_FILE" or run cat "$OLD_VERSION_FILE") and if that
command fails, log or exit with an error; reference the
OLD_VERSION_FILE/OLD_VERSION variables and ensure the script treats a failing
read/cat as an error rather than quietly leaving OLD_VERSION empty.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/sanity/Dockerfile`:
- Around line 27-30: The Dockerfile currently overwrites the existing
package.json with only dependencies, which loses metadata used by
postinstall.mjs; change the RUN step that writes package.json to instead merge
the new dependency into the existing
/home/testuser/.altimate-install/package.json (e.g., use jq or a short node
script to read the existing package.json, set
dependencies["@altimateai/altimate-core"]="latest" while preserving other
fields, write back the merged JSON, then run bun install), ensuring
postinstall.mjs can still read the original metadata.

In `@test/sanity/phases/resilience.sh`:
- Line 18: The script currently runs cd "$WORKDIR" without checking success;
update the step that changes to WORKDIR so the failure is detected and handled
(e.g., test the exit status of the cd command and on failure print an error
mentioning WORKDIR and exit with non‑zero status). Locate the cd "$WORKDIR"
invocation in the resilience.sh flow and replace it with a guarded form that
aborts the script when cd fails to prevent running subsequent tests in the wrong
directory.

In `@test/sanity/phases/smoke-tests.sh`:
- Line 26: The cd "$WORKDIR" call (and every other cd in the test functions)
must check for failure and abort rather than continuing silently; change each cd
invocation to detect a non‑zero exit and immediately log an error and exit (for
example, test the return value and call exit 1 or return non‑zero from the
function) so the script stops if the directory change fails and does not
continue executing in the wrong location.

---

Nitpick comments:
In `@test/sanity/Dockerfile`:
- Around line 4-6: Update the Dockerfile RUN line that installs packages (the
apt-get update && apt-get install -y \ git python3 python3-pip python3-venv curl
sqlite3 \ && rm -rf /var/lib/apt/lists/* command) to include
--no-install-recommends in the apt-get install invocation so apt-get install
uses apt-get install -y --no-install-recommends ... thereby reducing image size
and surface area.

In `@test/sanity/Dockerfile.upgrade`:
- Around line 3-5: Update the Dockerfile RUN line to prevent installing
recommended packages: modify the apt-get install invocation in the RUN command
(the line starting with "apt-get update && apt-get install -y") to include the
--no-install-recommends flag so only essential packages (git, python3,
python3-pip, python3-venv, curl, sqlite3) are installed and the image
size/attack surface is reduced.

In `@test/sanity/lib/parallel.sh`:
- Around line 36-38: Remove the redundant defensive cap that reassigns the
variable "parallel" to 6 (the if block checking [ "$parallel" -gt 6 ] and
setting parallel=6); the tiered logic above already enforces this cap, so delete
that conditional block to simplify the script while leaving all earlier logic
that sets "parallel" unchanged.
- Line 5: Split the declaration and assignment for the variable cores to avoid
masking exit codes: declare local cores first (use the existing variable name
"cores"), then assign it with the command substitution that tries nproc, falls
back to sysctl -n hw.ncpu, and finally echoes "2"; update the line containing
the command substitution so it becomes an assignment to the already-declared
local variable, preserving the original fallback behavior.

In `@test/sanity/phases/resilience.sh`:
- Around line 98-113: The test can leave CONFIG_DIR/opencode.json behind if the
script exits early; after copying "$SCRIPT_DIR/fixtures/old-config.json" to
"$CONFIG_DIR/opencode.json" (the cp call that creates the artifact used by
altimate_run and get_output), install a trap that always removes
"$CONFIG_DIR/opencode.json" (and/or unsets it) on EXIT or ERR so cleanup runs
even on failures, set the trap immediately after the cp and before calling
altimate_run/get_output, and optionally clear the trap after the explicit rm -f
to avoid double actions.
- Around line 88-96: The test currently uses SAVED_KEY="${ANTHROPIC_API_KEY:-}"
which loses the distinction between an originally unset and an empty
ANTHROPIC_API_KEY; update the save/restore logic to track whether
ANTHROPIC_API_KEY was set (e.g. save a presence flag like SAVED_KEY_SET or use
parameter-expansion check ${ANTHROPIC_API_KEY+x}) before unsetting, then only
re-export ANTHROPIC_API_KEY if the flag indicates it was originally set so you
restore the previous state exactly; adjust references around SAVED_KEY,
ANTHROPIC_API_KEY and the final export in the resilience.sh snippet accordingly.

In `@test/sanity/phases/smoke-tests.sh`:
- Line 214: The variable num computed on the line "num=$((${`#TESTS`[@]} -
${`#TEST_NAMES`[@]} + 1))" is unused and always 1; remove that dead code by
deleting the num assignment and any related unused references, leaving the case
block to operate without it (look for the TESTS and TEST_NAMES arrays and the
case block in smoke-tests.sh to verify nothing else depends on num).
- Around line 186-207: The manual batching loop using idx, PIDS and wait
duplicates existing logic; replace it by building an array of command strings
from the TESTS array (e.g., for each element of TESTS push "${TESTS[$i]}" into
cmds) and call the existing helper run_parallel with the concurrency limit
(run_parallel "$MAX_PARALLEL" "${cmds[@]}") instead of the idx/PIDS loop; ensure
you remove the manual batch loop and rely on run_parallel to manage background
jobs and waiting.

In `@test/sanity/pr-tests/generate.sh`:
- Line 9: Replace the bare redirection that truncates the manifest file by using
an explicit no-op command before the redirection: change the line that currently
uses > "$MANIFEST" to use a no-op redirection like : > "$MANIFEST" (or true >
"$MANIFEST") so the truncation is explicit and shellcheck-friendly; update the
same occurrence that references the MANIFEST variable in generate.sh.
- Around line 57-65: The script can register the same resilience test twice
because both the first grep and the config check call emit_test with
"$SCRIPT_DIR/phases/resilience.sh"; to fix, ensure emit_test for resilience.sh
is only called once by adding a guard: introduce a local flag (e.g.,
resilience_emitted) that's set when emit_test "resilience"
"$SCRIPT_DIR/phases/resilience.sh" is called in the session/compaction/storage
branch and check that flag before calling emit_test again in the config/ branch
(or alternatively merge the two grep conditions into one grep -qE that includes
"config/"); update any references to the changed variable, emit_test invocation,
and resilience.sh to implement the single-emission guard.

In `@test/sanity/run.sh`:
- Around line 41-43: Separate declaration from the assignment and explicitly
check the read/cat exit status so a failing cat doesn't get masked; instead of
doing a blind export+command substitution, first export or initialize
OLD_VERSION, then attempt to read the file (using read -r OLD_VERSION <
"$OLD_VERSION_FILE" or run cat "$OLD_VERSION_FILE") and if that command fails,
log or exit with an error; reference the OLD_VERSION_FILE/OLD_VERSION variables
and ensure the script treats a failing read/cat as an error rather than quietly
leaving OLD_VERSION empty.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2f77a047-9947-47fc-a80a-082e0786e0d8

📥 Commits

Reviewing files that changed from the base of the PR and between 467b5ab and 09ee72c.

📒 Files selected for processing (22)
  • package.json
  • test/sanity/.env.example
  • test/sanity/Dockerfile
  • test/sanity/Dockerfile.upgrade
  • test/sanity/ci-local.sh
  • test/sanity/docker-compose.yml
  • test/sanity/fixtures/broken-config.json
  • test/sanity/fixtures/compaction-session.sql
  • test/sanity/fixtures/old-config.json
  • test/sanity/fixtures/test.sql
  • test/sanity/lib/altimate-run.sh
  • test/sanity/lib/assert.sh
  • test/sanity/lib/cleanup.sh
  • test/sanity/lib/parallel.sh
  • test/sanity/phases/resilience.sh
  • test/sanity/phases/smoke-tests.sh
  • test/sanity/phases/verify-install.sh
  • test/sanity/phases/verify-upgrade.sh
  • test/sanity/pr-tests/generate.sh
  • test/sanity/pr-tests/run-pr-tests.sh
  • test/sanity/results/baseline.json
  • test/sanity/run.sh
✅ Files skipped from review due to trivial changes (9)
  • test/sanity/fixtures/old-config.json
  • test/sanity/fixtures/test.sql
  • test/sanity/.env.example
  • test/sanity/fixtures/compaction-session.sql
  • test/sanity/results/baseline.json
  • test/sanity/lib/cleanup.sh
  • test/sanity/phases/verify-install.sh
  • test/sanity/lib/altimate-run.sh
  • test/sanity/lib/assert.sh
🚧 Files skipped from review as they are similar to previous changes (6)
  • test/sanity/fixtures/broken-config.json
  • test/sanity/docker-compose.yml
  • test/sanity/pr-tests/run-pr-tests.sh
  • package.json
  • test/sanity/phases/verify-upgrade.sh
  • test/sanity/ci-local.sh

@kulvirgit kulvirgit force-pushed the feat/sanity-test-harness branch from 09ee72c to a1bcfff Compare March 25, 2026 02:10
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
test/sanity/Dockerfile (2)

32-35: ⚠️ Potential issue | 🔴 Critical

Hardcoded architecture will fail on x86_64 hosts.

Line 34 hardcodes altimate-code-linux-arm64, which will fail when building or running on x86_64 machines. Use dynamic architecture detection to support both arm64 and x86_64.

Proposed fix using Docker's TARGETARCH
+ARG TARGETARCH
 # Link binary to PATH and copy skills to ~/.altimate/builtin/
 RUN mkdir -p /home/testuser/.local/bin && \
-    cp /home/testuser/.altimate-install/dist/@altimateai/altimate-code-linux-arm64/bin/altimate /home/testuser/.local/bin/altimate && \
+    ARCH=$([ "$TARGETARCH" = "amd64" ] && echo "x64" || echo "arm64") && \
+    cp /home/testuser/.altimate-install/dist/@altimateai/altimate-code-linux-${ARCH}/bin/altimate /home/testuser/.local/bin/altimate && \
     chmod +x /home/testuser/.local/bin/altimate && \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/Dockerfile` around lines 32 - 35, The Dockerfile copies a
hardcoded arch-specific binary path
(/home/testuser/.altimate-install/dist/@altimateai/altimate-code-linux-arm64/bin/altimate),
which will fail on x86_64; change the RUN step that copies into
/home/testuser/.local/bin (and the referenced altimate binary name) to use
Docker build-time architecture detection (e.g., TARGETARCH or similar) to select
the correct dist folder dynamically so both arm64 and amd64 are supported;
update the COPY/CP source path construction to reference the chosen architecture
variable and keep the chmod +x and mkdir logic intact for
/home/testuser/.local/bin.

27-30: ⚠️ Potential issue | 🟡 Minor

Overwrites package.json, losing original metadata.

Line 29 replaces the package.json copied at line 24 with a minimal stub. If postinstall.mjs reads version info or other metadata from package.json, this could cause issues.

Alternative: merge dependencies with jq
 # Install altimate-core native binding (required at runtime)
 RUN cd /home/testuser/.altimate-install && \
-    echo '{"dependencies":{"@altimateai/altimate-core":"latest"}}' > package.json && \
+    jq '. + {"dependencies": (.dependencies // {} | . + {"@altimateai/altimate-core": "latest"})}' package.json > package.json.tmp && \
+    mv package.json.tmp package.json && \
     bun install 2>&1 | tail -5

This requires adding jq to the apt-get install list.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/Dockerfile` around lines 27 - 30, The Dockerfile RUN step
overwrites the existing package.json at /home/testuser/.altimate-install (the
echo '{"dependencies":...}' > package.json) which can strip metadata used by
postinstall.mjs; instead modify that RUN to merge or update only the
dependencies field—e.g., read the existing package.json and merge in
"@altimateai/altimate-core":"latest" (using jq, a short Node one-liner, or
npm/yarn package set) so you preserve version, name, and other metadata; target
the RUN that performs cd /home/testuser/.altimate-install and replace the
overwrite with a merge/update operation on package.json.
test/sanity/phases/smoke-tests.sh (1)

26-26: ⚠️ Potential issue | 🟡 Minor

Add error handling for cd command.

If mktemp succeeds but cd fails (e.g., permission issues), the script continues in an unexpected directory, causing git init and subsequent operations to run in the wrong location.

🔧 Proposed fix
 WORKDIR=$(mktemp -d /tmp/sanity-workdir-XXXXXX)
-cd "$WORKDIR"
+cd "$WORKDIR" || { echo "FAIL: cannot cd to $WORKDIR"; exit 1; }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` at line 26, Add explicit error handling
after the cd "$WORKDIR" command: verify that the chdir succeeded and abort the
script (with an explanatory error message) if it failed, so that a successful
mktemp won't be followed by running git init and other operations in the wrong
directory; update the script where cd "$WORKDIR" appears to check its exit
status and exit non‑zero on failure (or use an equivalent guard like enabling
strict mode) so subsequent commands never run in the wrong location.
test/sanity/phases/resilience.sh (1)

18-18: ⚠️ Potential issue | 🟡 Minor

Add error handling for cd command.

If cd "$WORKDIR" fails, subsequent git commands and tests run in an unexpected directory, causing misleading failures.

🔧 Proposed fix
 WORKDIR=$(mktemp -d /tmp/sanity-resilience-XXXXXX)
-cd "$WORKDIR"
+cd "$WORKDIR" || { echo "FAIL: cannot cd to $WORKDIR"; exit 1; }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/resilience.sh` at line 18, Ensure the script checks and
handles failure of the directory change: after the cd "$WORKDIR" command
validate its exit status and abort with a clear error if it fails (e.g., replace
or augment cd "$WORKDIR" with a guarded form that prints an error and exits
non‑zero). Update the resilience.sh logic around the cd "$WORKDIR" call so
subsequent git commands and tests only run when the working directory was
successfully changed.
🧹 Nitpick comments (9)
test/sanity/lib/assert.sh (1)

8-17: The $? in the FAIL message may not reflect the command's exit code.

In assert_exit_0, the $? on line 14 captures the exit status of the if condition evaluation, not the original command's exit code. After if "$@" >/dev/null 2>&1; then evaluates to false, $? is set to the result of the if-test (which is the command's exit code), but this can be unreliable in some shells.

Suggested fix: capture exit code explicitly
 assert_exit_0() {
   local desc="$1"; shift
-  if "$@" >/dev/null 2>&1; then
+  local rc=0
+  "$@" >/dev/null 2>&1 || rc=$?
+  if [ "$rc" -eq 0 ]; then
     echo "  PASS: $desc"
     PASS_COUNT=$((PASS_COUNT + 1))
   else
-    echo "  FAIL: $desc (exit code $?)"
+    echo "  FAIL: $desc (exit code $rc)"
     FAIL_COUNT=$((FAIL_COUNT + 1))
   fi
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/lib/assert.sh` around lines 8 - 17, The FAIL message in
assert_exit_0 can show an incorrect exit code because `$?` is not captured
immediately; modify assert_exit_0 to capture the command's exit status into a
local variable (e.g., rc) right after running the command (run the command with
"$@" >/dev/null 2>&1; rc=$?) and then use that rc in the FAIL echo and any logic
that increments FAIL_COUNT, ensuring you reference the function name
assert_exit_0 and the captured variable (rc) when updating output.
test/sanity/Dockerfile (1)

4-6: Consider adding --no-install-recommends to reduce image size.

Suggested fix
-RUN apt-get update && apt-get install -y \
+RUN apt-get update && apt-get install -y --no-install-recommends \
     git python3 python3-pip python3-venv curl sqlite3 \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/Dockerfile` around lines 4 - 6, Update the Dockerfile RUN
instruction that calls apt-get install so it uses apt-get install
--no-install-recommends -y (i.e., change the RUN line installing git python3
python3-pip python3-venv curl sqlite3 to include --no-install-recommends) while
keeping the apt-get update and the trailing rm -rf /var/lib/apt/lists/* intact;
locate the RUN command in the Dockerfile containing the apt-get update &&
apt-get install -y sequence and insert --no-install-recommends into the install
command.
test/sanity/lib/parallel.sh (1)

4-20: Consider separating declaration and assignment for SC2155 compliance.

While the current code works due to the || fallback pattern, separating declaration and assignment is a best practice to avoid masking return values in edge cases.

Suggested fix for lines 5 and 10
 detect_parallelism() {
-  local cores=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo "2")
+  local cores
+  cores=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo "2")

   # Try multiple methods for RAM detection
   local ram_gb=""
   # Linux: free
-  ram_gb=$(free -g 2>/dev/null | awk '/Mem:/{print $2}')
+  ram_gb=$(free -g 2>/dev/null | awk '/Mem:/{print $2}') || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/lib/parallel.sh` around lines 4 - 20, The function
detect_parallelism currently declares and assigns variables inline (e.g., local
cores=$(nproc ... ) and local ram_gb="") which can trigger shellcheck SC2155;
change to separate declaration and assignment: declare local cores and later
assign cores=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo "2"),
and declare local ram_gb then assign to ram_gb in the subsequent detection steps
(free, /proc/meminfo awk, sysctl) while preserving the existing fallback logic
and final ram_gb="${ram_gb:-8}"; update references inside detect_parallelism to
use these separated declarations to avoid masking return values.
test/sanity/Dockerfile.upgrade (1)

3-5: Consider adding --no-install-recommends to reduce image size.

This is a minor optimization that avoids pulling in unnecessary recommended packages.

Suggested fix
-RUN apt-get update && apt-get install -y \
+RUN apt-get update && apt-get install -y --no-install-recommends \
     git python3 python3-pip python3-venv curl sqlite3 \
     && rm -rf /var/lib/apt/lists/*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/Dockerfile.upgrade` around lines 3 - 5, The apt-get install
invocation in the RUN command (the line starting with "RUN apt-get update &&
apt-get install -y \ git python3 python3-pip python3-venv curl sqlite3 \")
should include the --no-install-recommends flag to avoid installing unnecessary
recommended packages and reduce image size; update that RUN line to add
--no-install-recommends immediately after apt-get install -y while keeping the
existing package list and the cleanup of /var/lib/apt/lists/*.
test/sanity/run.sh (1)

44-48: Separate declaration and assignment to avoid masking errors (SC2155).

If cat "$OLD_VERSION_FILE" fails, the error is masked by the combined export. This could lead to silent failures.

Suggested fix
   # Read old version from file set by Dockerfile.upgrade
   if [ -f "${OLD_VERSION_FILE:-}" ]; then
-    export OLD_VERSION=$(cat "$OLD_VERSION_FILE")
+    OLD_VERSION=$(cat "$OLD_VERSION_FILE")
+    export OLD_VERSION
   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/run.sh` around lines 44 - 48, Separate declaration and assignment
for OLD_VERSION to avoid masking errors: first assign OLD_VERSION with
OLD_VERSION=$(cat "$OLD_VERSION_FILE") (checking for failures) then export
OLD_VERSION in a separate statement; keep the conditional block that checks [ -f
"${OLD_VERSION_FILE:-}" ] and leave the subsequent run_phase call unchanged
(referencing OLD_VERSION_FILE, OLD_VERSION, and run_phase to locate the code).
test/sanity/pr-tests/generate.sh (1)

9-9: Add a no-op command for the redirection.

ShellCheck SC2188: A bare redirection > "$MANIFEST" without a command can be confusing. Use : > (colon is a no-op builtin) for clarity.

Suggested fix
-> "$MANIFEST"
+: > "$MANIFEST"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/pr-tests/generate.sh` at line 9, Replace the bare redirection
line containing "> \"$MANIFEST\"" with a no-op builtin redirection to make the
intent explicit; specifically, use the colon no-op (:) before the redirection so
the shell performs the truncate/create without relying on an empty command.
Update the line that references "$MANIFEST" in generate.sh accordingly.
test/sanity/phases/smoke-tests.sh (3)

53-53: Multiple cd commands lack error handling.

Several test functions use cd "$WORKDIR" without error handling. While tests run in subshells (limiting blast radius), a failing cd could still produce misleading results.

♻️ Proposed fix pattern
 test_discover_mcps() {
-  cd "$WORKDIR"
+  cd "$WORKDIR" || return 1
   altimate_run "discover-mcps" --command discover-and-add-mcps "list"

Apply the same || return 1 pattern to test_sql_analyze, test_duckdb, test_builder, test_analyst, test_bad_command, and test_discover.

Also applies to: 75-75, 86-86, 127-127, 138-138, 149-149, 160-160

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` at line 53, Several test functions call cd
"$WORKDIR" without checking for failure; update each such invocation in
test_sql_analyze, test_duckdb, test_builder, test_analyst, test_bad_command, and
test_discover to append a failure guard (e.g., "|| return 1") so the function
exits immediately if the cd fails, ensuring subsequent test steps don't run in
the wrong directory.

227-248: Unused variable num in results loop.

Line 229 computes num but it's never used. This appears to be leftover from an earlier iteration.

♻️ Remove unused variable
 for name in "${TEST_NAMES[@]}"; do
   result=$(cat "$RESULTS_DIR/$name" 2>/dev/null || echo "MISSING")
-  num=$((${`#TESTS`[@]} - ${`#TEST_NAMES`[@]} + 1))
   case "$result" in
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` around lines 227 - 248, Remove the unused
variable assignment "num=$((${`#TESTS`[@]} - ${`#TEST_NAMES`[@]} + 1))" from the
results loop in the smoke-tests.sh script; locate the for loop that iterates
over TEST_NAMES and the case on result (references: TEST_NAMES, RESULTS_DIR,
PASS_COUNT, FAIL_COUNT, SKIP_COUNT) and delete that num calculation line so
there are no unused variables left in the loop.

36-49: Dead code: run_smoke_test function is never called.

This wrapper function is defined but the test execution loop at lines 211-214 directly invokes the test functions (${TESTS[$i]} &) without using run_smoke_test.

Either remove the unused function or integrate it into the test execution:

♻️ Option 1: Remove unused function
-run_smoke_test() {
-  local num="$1"
-  local name="$2"
-  local result_file="$RESULTS_DIR/$name"
-  shift 2
-
-  echo "  [$num] $name..."
-
-  # Run the test
-  local exit_code=0
-  "$@" || exit_code=$?
-
-  echo "$exit_code" > "$result_file"
-}
-
 # Define all test functions
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` around lines 36 - 49, The run_smoke_test
wrapper is dead code; either delete the unused function or update the test-run
loop to call it. To integrate, change the loop that currently invokes tests via
"${TESTS[$i]} &" to call run_smoke_test with the index, test name, and command:
run_smoke_test "$i" "${TESTS[$i]}" "${TESTS_CMD[$i]}" & (or expand the test
command/args so run_smoke_test receives the command and args), ensuring
RESULTS_DIR is used for result_file and that run_smoke_test's shift behavior
consumes the first two params (num and name) before executing the command;
alternatively simply remove the run_smoke_test function if you prefer the direct
background invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@package.json`:
- Line 22: The package.json script "sanity:upgrade" references a non-existent
override file test/sanity/docker-compose.upgrade.yml which will cause the script
to fail; fix by either adding the missing override file (create
test/sanity/docker-compose.upgrade.yml with the expected service overrides for
the upgrade scenario) or update the "sanity:upgrade" script entry to remove or
correct the reference to docker-compose.upgrade.yml so it only uses existing
files (adjust the script name "sanity:upgrade" in package.json accordingly).

In `@test/sanity/phases/resilience.sh`:
- Line 78: The line uses the shell-only keyword `local` outside a function,
causing a syntax error; replace the top-level declaration `local
comp_output=$(get_output "compaction")` with a plain assignment
`comp_output=$(get_output "compaction")` (or move that assignment inside a
function if you intended function-local scope), referencing the get_output call
and the comp_output variable so the brittle `local` usage is removed.

---

Duplicate comments:
In `@test/sanity/Dockerfile`:
- Around line 32-35: The Dockerfile copies a hardcoded arch-specific binary path
(/home/testuser/.altimate-install/dist/@altimateai/altimate-code-linux-arm64/bin/altimate),
which will fail on x86_64; change the RUN step that copies into
/home/testuser/.local/bin (and the referenced altimate binary name) to use
Docker build-time architecture detection (e.g., TARGETARCH or similar) to select
the correct dist folder dynamically so both arm64 and amd64 are supported;
update the COPY/CP source path construction to reference the chosen architecture
variable and keep the chmod +x and mkdir logic intact for
/home/testuser/.local/bin.
- Around line 27-30: The Dockerfile RUN step overwrites the existing
package.json at /home/testuser/.altimate-install (the echo
'{"dependencies":...}' > package.json) which can strip metadata used by
postinstall.mjs; instead modify that RUN to merge or update only the
dependencies field—e.g., read the existing package.json and merge in
"@altimateai/altimate-core":"latest" (using jq, a short Node one-liner, or
npm/yarn package set) so you preserve version, name, and other metadata; target
the RUN that performs cd /home/testuser/.altimate-install and replace the
overwrite with a merge/update operation on package.json.

In `@test/sanity/phases/resilience.sh`:
- Line 18: Ensure the script checks and handles failure of the directory change:
after the cd "$WORKDIR" command validate its exit status and abort with a clear
error if it fails (e.g., replace or augment cd "$WORKDIR" with a guarded form
that prints an error and exits non‑zero). Update the resilience.sh logic around
the cd "$WORKDIR" call so subsequent git commands and tests only run when the
working directory was successfully changed.

In `@test/sanity/phases/smoke-tests.sh`:
- Line 26: Add explicit error handling after the cd "$WORKDIR" command: verify
that the chdir succeeded and abort the script (with an explanatory error
message) if it failed, so that a successful mktemp won't be followed by running
git init and other operations in the wrong directory; update the script where cd
"$WORKDIR" appears to check its exit status and exit non‑zero on failure (or use
an equivalent guard like enabling strict mode) so subsequent commands never run
in the wrong location.

---

Nitpick comments:
In `@test/sanity/Dockerfile`:
- Around line 4-6: Update the Dockerfile RUN instruction that calls apt-get
install so it uses apt-get install --no-install-recommends -y (i.e., change the
RUN line installing git python3 python3-pip python3-venv curl sqlite3 to include
--no-install-recommends) while keeping the apt-get update and the trailing rm
-rf /var/lib/apt/lists/* intact; locate the RUN command in the Dockerfile
containing the apt-get update && apt-get install -y sequence and insert
--no-install-recommends into the install command.

In `@test/sanity/Dockerfile.upgrade`:
- Around line 3-5: The apt-get install invocation in the RUN command (the line
starting with "RUN apt-get update && apt-get install -y \ git python3
python3-pip python3-venv curl sqlite3 \") should include the
--no-install-recommends flag to avoid installing unnecessary recommended
packages and reduce image size; update that RUN line to add
--no-install-recommends immediately after apt-get install -y while keeping the
existing package list and the cleanup of /var/lib/apt/lists/*.

In `@test/sanity/lib/assert.sh`:
- Around line 8-17: The FAIL message in assert_exit_0 can show an incorrect exit
code because `$?` is not captured immediately; modify assert_exit_0 to capture
the command's exit status into a local variable (e.g., rc) right after running
the command (run the command with "$@" >/dev/null 2>&1; rc=$?) and then use that
rc in the FAIL echo and any logic that increments FAIL_COUNT, ensuring you
reference the function name assert_exit_0 and the captured variable (rc) when
updating output.

In `@test/sanity/lib/parallel.sh`:
- Around line 4-20: The function detect_parallelism currently declares and
assigns variables inline (e.g., local cores=$(nproc ... ) and local ram_gb="")
which can trigger shellcheck SC2155; change to separate declaration and
assignment: declare local cores and later assign cores=$(nproc 2>/dev/null ||
sysctl -n hw.ncpu 2>/dev/null || echo "2"), and declare local ram_gb then assign
to ram_gb in the subsequent detection steps (free, /proc/meminfo awk, sysctl)
while preserving the existing fallback logic and final ram_gb="${ram_gb:-8}";
update references inside detect_parallelism to use these separated declarations
to avoid masking return values.

In `@test/sanity/phases/smoke-tests.sh`:
- Line 53: Several test functions call cd "$WORKDIR" without checking for
failure; update each such invocation in test_sql_analyze, test_duckdb,
test_builder, test_analyst, test_bad_command, and test_discover to append a
failure guard (e.g., "|| return 1") so the function exits immediately if the cd
fails, ensuring subsequent test steps don't run in the wrong directory.
- Around line 227-248: Remove the unused variable assignment
"num=$((${`#TESTS`[@]} - ${`#TEST_NAMES`[@]} + 1))" from the results loop in the
smoke-tests.sh script; locate the for loop that iterates over TEST_NAMES and the
case on result (references: TEST_NAMES, RESULTS_DIR, PASS_COUNT, FAIL_COUNT,
SKIP_COUNT) and delete that num calculation line so there are no unused
variables left in the loop.
- Around line 36-49: The run_smoke_test wrapper is dead code; either delete the
unused function or update the test-run loop to call it. To integrate, change the
loop that currently invokes tests via "${TESTS[$i]} &" to call run_smoke_test
with the index, test name, and command: run_smoke_test "$i" "${TESTS[$i]}"
"${TESTS_CMD[$i]}" & (or expand the test command/args so run_smoke_test receives
the command and args), ensuring RESULTS_DIR is used for result_file and that
run_smoke_test's shift behavior consumes the first two params (num and name)
before executing the command; alternatively simply remove the run_smoke_test
function if you prefer the direct background invocation.

In `@test/sanity/pr-tests/generate.sh`:
- Line 9: Replace the bare redirection line containing "> \"$MANIFEST\"" with a
no-op builtin redirection to make the intent explicit; specifically, use the
colon no-op (:) before the redirection so the shell performs the truncate/create
without relying on an empty command. Update the line that references "$MANIFEST"
in generate.sh accordingly.

In `@test/sanity/run.sh`:
- Around line 44-48: Separate declaration and assignment for OLD_VERSION to
avoid masking errors: first assign OLD_VERSION with OLD_VERSION=$(cat
"$OLD_VERSION_FILE") (checking for failures) then export OLD_VERSION in a
separate statement; keep the conditional block that checks [ -f
"${OLD_VERSION_FILE:-}" ] and leave the subsequent run_phase call unchanged
(referencing OLD_VERSION_FILE, OLD_VERSION, and run_phase to locate the code).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ea7f24e5-b1af-4282-ab4f-cf7311a2a766

📥 Commits

Reviewing files that changed from the base of the PR and between 09ee72c and 7700d48.

📒 Files selected for processing (22)
  • package.json
  • test/sanity/.env.example
  • test/sanity/Dockerfile
  • test/sanity/Dockerfile.upgrade
  • test/sanity/ci-local.sh
  • test/sanity/docker-compose.yml
  • test/sanity/fixtures/broken-config.json
  • test/sanity/fixtures/compaction-session.sql
  • test/sanity/fixtures/old-config.json
  • test/sanity/fixtures/test.sql
  • test/sanity/lib/altimate-run.sh
  • test/sanity/lib/assert.sh
  • test/sanity/lib/cleanup.sh
  • test/sanity/lib/parallel.sh
  • test/sanity/phases/resilience.sh
  • test/sanity/phases/smoke-tests.sh
  • test/sanity/phases/verify-install.sh
  • test/sanity/phases/verify-upgrade.sh
  • test/sanity/pr-tests/generate.sh
  • test/sanity/pr-tests/run-pr-tests.sh
  • test/sanity/results/baseline.json
  • test/sanity/run.sh
✅ Files skipped from review due to trivial changes (9)
  • test/sanity/fixtures/old-config.json
  • test/sanity/.env.example
  • test/sanity/results/baseline.json
  • test/sanity/fixtures/broken-config.json
  • test/sanity/lib/cleanup.sh
  • test/sanity/fixtures/compaction-session.sql
  • test/sanity/docker-compose.yml
  • test/sanity/phases/verify-install.sh
  • test/sanity/fixtures/test.sql
🚧 Files skipped from review as they are similar to previous changes (3)
  • test/sanity/lib/altimate-run.sh
  • test/sanity/ci-local.sh
  • test/sanity/phases/verify-upgrade.sh

@kulvirgit kulvirgit force-pushed the feat/sanity-test-harness branch from 7700d48 to c977474 Compare March 25, 2026 02:18
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
package.json (1)

22-22: ⚠️ Potential issue | 🟡 Minor

docker-compose.upgrade.yml still missing.

This script references test/sanity/docker-compose.upgrade.yml which does not exist. The script will fail at runtime. Previously acknowledged as "planned" — consider either adding the file or removing/commenting out this script until it's ready.

#!/bin/bash
# Verify if docker-compose.upgrade.yml exists
if [ -f "test/sanity/docker-compose.upgrade.yml" ]; then
  echo "✓ docker-compose.upgrade.yml exists"
else
  echo "✗ docker-compose.upgrade.yml NOT FOUND"
  ls -la test/sanity/docker-compose*.yml 2>/dev/null || echo "No docker-compose files found"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@package.json` at line 22, The package.json npm script "sanity:upgrade"
references a missing docker-compose file (docker-compose.upgrade.yml) which will
cause failures; fix by either adding the missing docker-compose.upgrade.yml
under the test/sanity test fixtures (ensure it contains the intended upgrade
orchestration) or remove/comment/disable the "sanity:upgrade" script from
package.json until that file and its contents are implemented; update or run the
provided verification shell snippet to confirm the file exists if you add it.
test/sanity/Dockerfile (1)

32-35: ⚠️ Potential issue | 🟠 Major

Hardcoded arm64 architecture will fail on x86_64 hosts.

Line 34 still hardcodes altimate-code-linux-arm64. This was flagged in a previous review and marked as "Fixed," but the code remains unchanged. The path will not exist on x86_64 machines (most CI runners and developer workstations).

🔧 Proposed fix using Docker's TARGETARCH
 # Link binary to PATH and copy skills to ~/.altimate/builtin/
+ARG TARGETARCH
 RUN mkdir -p /home/testuser/.local/bin && \
-    cp /home/testuser/.altimate-install/dist/@altimateai/altimate-code-linux-arm64/bin/altimate /home/testuser/.local/bin/altimate && \
+    ARCH=$([ "$TARGETARCH" = "amd64" ] && echo "x64" || echo "arm64") && \
+    cp /home/testuser/.altimate-install/dist/@altimateai/altimate-code-linux-${ARCH}/bin/altimate /home/testuser/.local/bin/altimate && \
     chmod +x /home/testuser/.local/bin/altimate && \
     mkdir -p /home/testuser/.altimate/builtin && \
     cp -r /home/testuser/.altimate-install/skills/* /home/testuser/.altimate/builtin/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/Dockerfile` around lines 32 - 35, The Dockerfile still hardcodes
the arm64 build directory (dist/@altimateai/altimate-code-linux-arm64) when
linking the altimate binary; update the RUN step to use Docker's build-time
TARGETARCH instead: add an ARG TARGETARCH (with optional default) at the top of
the Dockerfile and replace the hardcoded segment in the cp source path with the
variable (e.g., dist/@altimateai/altimate-code-linux-${TARGETARCH}) so the cp in
the RUN line copies the correct arch-specific binary into
/home/testuser/.local/bin and preserves chmod +x and mkdir -p steps.
🧹 Nitpick comments (8)
test/sanity/Dockerfile.upgrade (1)

3-5: Consider adding --no-install-recommends for consistency with best practices.

Same suggestion as the main Dockerfile — reduces image size by excluding recommended packages.

🔧 Proposed fix
 RUN apt-get update && apt-get install -y \
+    --no-install-recommends \
     git python3 python3-pip python3-venv curl sqlite3 \
     && rm -rf /var/lib/apt/lists/*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/Dockerfile.upgrade` around lines 3 - 5, Update the Dockerfile RUN
installation command that currently uses "apt-get install -y git python3
python3-pip python3-venv curl sqlite3" to include the --no-install-recommends
flag so recommended packages are not pulled in; modify the RUN line (the one
starting with "RUN apt-get update && apt-get install -y") to add
--no-install-recommends immediately after "apt-get install" to keep image size
consistent with the main Dockerfile.
test/sanity/Dockerfile (1)

4-6: Consider adding --no-install-recommends to reduce image size.

The apt-get install command pulls in recommended packages that aren't strictly required. Adding --no-install-recommends can reduce the final image size.

🔧 Proposed fix
 # System deps (what a real user would have)
 RUN apt-get update && apt-get install -y \
+    --no-install-recommends \
     git python3 python3-pip python3-venv curl sqlite3 \
     && rm -rf /var/lib/apt/lists/*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/Dockerfile` around lines 4 - 6, Update the Dockerfile RUN apt-get
install line to add the --no-install-recommends flag so apt doesn't pull
recommended packages (modify the existing RUN line that starts with "apt-get
update && apt-get install -y \ git python3 python3-pip python3-venv curl sqlite3
\ && rm -rf /var/lib/apt/lists/*"); keep the existing -y and the rm -rf cleanup
so the install behavior and cache cleanup remain unchanged.
test/sanity/phases/smoke-tests.sh (3)

53-53: Several cd commands lack error handling.

Some test functions have cd "$WORKDIR" || return 1 (lines 64, 101, 116) while others have bare cd "$WORKDIR" without error handling. For consistency and safety, all should handle cd failures.

🔧 Proposed fix for test_discover_mcps (apply similar pattern to others)
 test_discover_mcps() {
-  cd "$WORKDIR"
+  cd "$WORKDIR" || return 1
   altimate_run "discover-mcps" --command discover-and-add-mcps "list"

Also applies to: 75-75, 86-86, 127-127, 138-138, 149-149, 160-160

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` at line 53, Several places call cd
"$WORKDIR" without error handling; update every bare cd "$WORKDIR" (including
occurrences around the test functions like test_discover_mcps and the other test
functions noted) to bail on failure by appending || return 1 so the test
function exits if the chdir fails; search for all occurrences of cd "$WORKDIR"
in this script and change them to cd "$WORKDIR" || return 1 so the tests
consistently handle cd failures.

36-49: run_smoke_test function is defined but never used.

This helper is defined but tests invoke themselves directly (e.g., ${TESTS[$i]} &). Consider removing this dead code or refactoring the tests to use this wrapper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` around lines 36 - 49, The helper function
run_smoke_test is dead code; either remove it or refactor existing concurrent
test invocations to use it. Locate run_smoke_test and either delete its
definition, or replace places that spawn tests directly (e.g., usages like
"${TESTS[$i]} &") with calls to run_smoke_test "$i" "$name" "${TESTS[$i]}" (pass
an index, a name to create RESULT file, and the command plus args), ensuring you
preserve the previous backgrounding/exit-code capture semantics implemented by
run_smoke_test.

229-229: Unused variable num with incorrect calculation.

The variable num is calculated but never used, and the formula ${#TESTS[@]} - ${#TEST_NAMES[@]} + 1 always evaluates to 1 since both arrays have the same length.

🔧 Proposed fix — remove unused variable
 for name in "${TEST_NAMES[@]}"; do
   result=$(cat "$RESULTS_DIR/$name" 2>/dev/null || echo "MISSING")
-  num=$((${`#TESTS`[@]} - ${`#TEST_NAMES`[@]} + 1))
   case "$result" in
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` at line 229, Remove the unused variable
assignment "num=$((${`#TESTS`[@]} - ${`#TEST_NAMES`[@]} + 1))" since it is never
used and the calculation is incorrect (both arrays have same length); delete
that line and any related references to "num" in the script (leave TESTS and
TEST_NAMES untouched) so the script no longer defines an unused variable.
test/sanity/run.sh (1)

45-47: Separate declaration and assignment to catch file read errors.

If cat "$OLD_VERSION_FILE" fails, the error is masked and OLD_VERSION becomes empty silently. Separating allows proper error handling.

🔧 Proposed fix
   # Read old version from file set by Dockerfile.upgrade
   if [ -f "${OLD_VERSION_FILE:-}" ]; then
-    export OLD_VERSION=$(cat "$OLD_VERSION_FILE")
+    OLD_VERSION=$(cat "$OLD_VERSION_FILE") || OLD_VERSION="unknown"
+    export OLD_VERSION
   fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/run.sh` around lines 45 - 47, The current if-block sets
OLD_VERSION by directly exporting the command substitution which can mask read
errors; change it to first export/declare OLD_VERSION (or leave unset), then
read the file into a variable using cat or read (referencing OLD_VERSION_FILE
and OLD_VERSION) and check the command's exit status ($? or use set -e) so a
failing read is detected and handled (e.g., log and exit non-zero) instead of
silently leaving OLD_VERSION empty.
test/sanity/lib/parallel.sh (2)

4-6: Consider separating declaration and assignment to avoid masking return values.

While the fallback || echo "2" handles failures gracefully, separating declaration and assignment is a shell best practice.

🔧 Proposed fix
 detect_parallelism() {
-  local cores=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo "2")
+  local cores
+  cores=$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo "2")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/lib/parallel.sh` around lines 4 - 6, In detect_parallelism(),
avoid combining declaration and assignment which can mask return values; instead
declare the variable first (local cores) and then assign cores via command
substitution using the same fallback chain (nproc || sysctl -n hw.ncpu || echo
"2") so failures from the commands aren’t masked and the function still returns
the correct status; update references in detect_parallelism to use the separated
variable.

35-38: Redundant cap — parallelism already maxes at 6.

The if [ "$parallel" -gt 6 ] check can never be true since the preceding conditionals only assign values 2, 3, 4, or 6. This block can be removed or kept as defensive code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/lib/parallel.sh` around lines 35 - 38, Remove the redundant
defensive cap in test/sanity/lib/parallel.sh: delete the if block that checks
`if [ "$parallel" -gt 6 ]; then parallel=6; fi` since `parallel` is already
constrained to 2,3,4, or 6 earlier; if you prefer to keep a guard, replace the
block with a brief comment explaining it's defensive and why it can never
trigger. Ensure references to the `parallel` variable remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/sanity/Dockerfile.upgrade`:
- Around line 31-33: The Docker upgrade test fails because Dockerfile.upgrade's
COPY --chown=testuser dist/package.tgz expects dist/package.tgz in the build
context but CI doesn't produce it; fix by either adding a
docker-compose.upgrade.yml that builds the monorepo and produces
dist/package.tgz into the build context before docker build, or modify the
sanity:upgrade pipeline to run a pack step (e.g., run bun pm pack or npm pack in
the packages/opencode dist directory) to create dist/package.tgz prior to
invoking docker build, or change Dockerfile.upgrade to mirror the standard
Dockerfile approach (copying packages/opencode/dist/@altimateai/... instead of a
tarball) so npm run sanity:upgrade completes the COPY step successfully.

In `@test/sanity/phases/smoke-tests.sh`:
- Around line 230-247: The counters PASS_COUNT, FAIL_COUNT, and SKIP_COUNT are
never initialized before being incremented in the result-processing case block
(the case handling PASS/FAIL/SKIP/*), so initialize them to 0 before the loop
that processes results; add simple assignments PASS_COUNT=0 FAIL_COUNT=0
SKIP_COUNT=0 placed prior to the code that iterates over results so the
arithmetic increments succeed.

In `@test/sanity/pr-tests/generate.sh`:
- Around line 31-40: The pipelines that feed the while-read loops in generate.sh
can cause the script to exit under set -euo pipefail because grep returns
non-zero when there are no matches; update the two grep pipelines that precede
the "while read -r f; do" blocks (the ones producing command-*.txt and
skills/.*/SKILL.md) so they never produce a non-zero exit (for example append
"|| true" to the grep command or use a grep -q check and guard the loop),
leaving the emit_test invocations (emit_test "command-${cmd}" ... and emit_test
"skill-${skill}" ...) unchanged.
- Around line 32-39: The emitted commands build shell strings from untrusted
filename-derived variables cmd and skill (used with emit_test) and can lead to
shell injection; fix by validating/sanitizing those values before use: when
parsing each changed file, constrain cmd and skill to a safe whitelist pattern
(e.g. reject unless they match /^[A-Za-z0-9._-]+$/) and skip/warn on mismatches,
or explicitly shell-escape them (e.g. using printf '%q' or equivalent) when
constructing the command string passed to emit_test; update the blocks that set
cmd and skill and all emit_test calls so they use the sanitized/escaped variable
names.

---

Duplicate comments:
In `@package.json`:
- Line 22: The package.json npm script "sanity:upgrade" references a missing
docker-compose file (docker-compose.upgrade.yml) which will cause failures; fix
by either adding the missing docker-compose.upgrade.yml under the test/sanity
test fixtures (ensure it contains the intended upgrade orchestration) or
remove/comment/disable the "sanity:upgrade" script from package.json until that
file and its contents are implemented; update or run the provided verification
shell snippet to confirm the file exists if you add it.

In `@test/sanity/Dockerfile`:
- Around line 32-35: The Dockerfile still hardcodes the arm64 build directory
(dist/@altimateai/altimate-code-linux-arm64) when linking the altimate binary;
update the RUN step to use Docker's build-time TARGETARCH instead: add an ARG
TARGETARCH (with optional default) at the top of the Dockerfile and replace the
hardcoded segment in the cp source path with the variable (e.g.,
dist/@altimateai/altimate-code-linux-${TARGETARCH}) so the cp in the RUN line
copies the correct arch-specific binary into /home/testuser/.local/bin and
preserves chmod +x and mkdir -p steps.

---

Nitpick comments:
In `@test/sanity/Dockerfile`:
- Around line 4-6: Update the Dockerfile RUN apt-get install line to add the
--no-install-recommends flag so apt doesn't pull recommended packages (modify
the existing RUN line that starts with "apt-get update && apt-get install -y \
git python3 python3-pip python3-venv curl sqlite3 \ && rm -rf
/var/lib/apt/lists/*"); keep the existing -y and the rm -rf cleanup so the
install behavior and cache cleanup remain unchanged.

In `@test/sanity/Dockerfile.upgrade`:
- Around line 3-5: Update the Dockerfile RUN installation command that currently
uses "apt-get install -y git python3 python3-pip python3-venv curl sqlite3" to
include the --no-install-recommends flag so recommended packages are not pulled
in; modify the RUN line (the one starting with "RUN apt-get update && apt-get
install -y") to add --no-install-recommends immediately after "apt-get install"
to keep image size consistent with the main Dockerfile.

In `@test/sanity/lib/parallel.sh`:
- Around line 4-6: In detect_parallelism(), avoid combining declaration and
assignment which can mask return values; instead declare the variable first
(local cores) and then assign cores via command substitution using the same
fallback chain (nproc || sysctl -n hw.ncpu || echo "2") so failures from the
commands aren’t masked and the function still returns the correct status; update
references in detect_parallelism to use the separated variable.
- Around line 35-38: Remove the redundant defensive cap in
test/sanity/lib/parallel.sh: delete the if block that checks `if [ "$parallel"
-gt 6 ]; then parallel=6; fi` since `parallel` is already constrained to 2,3,4,
or 6 earlier; if you prefer to keep a guard, replace the block with a brief
comment explaining it's defensive and why it can never trigger. Ensure
references to the `parallel` variable remain unchanged.

In `@test/sanity/phases/smoke-tests.sh`:
- Line 53: Several places call cd "$WORKDIR" without error handling; update
every bare cd "$WORKDIR" (including occurrences around the test functions like
test_discover_mcps and the other test functions noted) to bail on failure by
appending || return 1 so the test function exits if the chdir fails; search for
all occurrences of cd "$WORKDIR" in this script and change them to cd "$WORKDIR"
|| return 1 so the tests consistently handle cd failures.
- Around line 36-49: The helper function run_smoke_test is dead code; either
remove it or refactor existing concurrent test invocations to use it. Locate
run_smoke_test and either delete its definition, or replace places that spawn
tests directly (e.g., usages like "${TESTS[$i]} &") with calls to run_smoke_test
"$i" "$name" "${TESTS[$i]}" (pass an index, a name to create RESULT file, and
the command plus args), ensuring you preserve the previous
backgrounding/exit-code capture semantics implemented by run_smoke_test.
- Line 229: Remove the unused variable assignment "num=$((${`#TESTS`[@]} -
${`#TEST_NAMES`[@]} + 1))" since it is never used and the calculation is incorrect
(both arrays have same length); delete that line and any related references to
"num" in the script (leave TESTS and TEST_NAMES untouched) so the script no
longer defines an unused variable.

In `@test/sanity/run.sh`:
- Around line 45-47: The current if-block sets OLD_VERSION by directly exporting
the command substitution which can mask read errors; change it to first
export/declare OLD_VERSION (or leave unset), then read the file into a variable
using cat or read (referencing OLD_VERSION_FILE and OLD_VERSION) and check the
command's exit status ($? or use set -e) so a failing read is detected and
handled (e.g., log and exit non-zero) instead of silently leaving OLD_VERSION
empty.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f84b0cb7-5843-4150-ae71-977a15dafd19

📥 Commits

Reviewing files that changed from the base of the PR and between 7700d48 and 299dbf8.

📒 Files selected for processing (22)
  • package.json
  • test/sanity/.env.example
  • test/sanity/Dockerfile
  • test/sanity/Dockerfile.upgrade
  • test/sanity/ci-local.sh
  • test/sanity/docker-compose.yml
  • test/sanity/fixtures/broken-config.json
  • test/sanity/fixtures/compaction-session.sql
  • test/sanity/fixtures/old-config.json
  • test/sanity/fixtures/test.sql
  • test/sanity/lib/altimate-run.sh
  • test/sanity/lib/assert.sh
  • test/sanity/lib/cleanup.sh
  • test/sanity/lib/parallel.sh
  • test/sanity/phases/resilience.sh
  • test/sanity/phases/smoke-tests.sh
  • test/sanity/phases/verify-install.sh
  • test/sanity/phases/verify-upgrade.sh
  • test/sanity/pr-tests/generate.sh
  • test/sanity/pr-tests/run-pr-tests.sh
  • test/sanity/results/baseline.json
  • test/sanity/run.sh
✅ Files skipped from review due to trivial changes (13)
  • test/sanity/fixtures/old-config.json
  • test/sanity/.env.example
  • test/sanity/lib/cleanup.sh
  • test/sanity/results/baseline.json
  • test/sanity/fixtures/broken-config.json
  • test/sanity/fixtures/test.sql
  • test/sanity/fixtures/compaction-session.sql
  • test/sanity/pr-tests/run-pr-tests.sh
  • test/sanity/phases/verify-install.sh
  • test/sanity/docker-compose.yml
  • test/sanity/lib/altimate-run.sh
  • test/sanity/lib/assert.sh
  • test/sanity/phases/resilience.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/sanity/phases/verify-upgrade.sh
  • test/sanity/ci-local.sh

@kulvirgit kulvirgit force-pushed the feat/sanity-test-harness branch from 299dbf8 to 1335dbd Compare March 25, 2026 17:06
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
test/sanity/Dockerfile (1)

34-37: ⚠️ Potential issue | 🔴 Critical

Map Docker's amd64 to the published x64 artifact name.

On BuildKit/Compose x86_64 builds, TARGETARCH is amd64, but the copied dist folder here is altimate-code-linux-x64. The current path resolves to ...-amd64/..., so the image build breaks on the most common host architecture.

🐛 Proposed fix
 ARG TARGETARCH
-RUN ARCH="${TARGETARCH:-$(uname -m | sed 's/x86_64/x64/' | sed 's/aarch64/arm64/')}" && \
+RUN case "${TARGETARCH:-$(uname -m)}" in \
+      amd64|x86_64) ARCH=x64 ;; \
+      arm64|aarch64) ARCH=arm64 ;; \
+      *) echo "Unsupported arch: ${TARGETARCH:-$(uname -m)}" >&2; exit 1 ;; \
+    esac && \
     mkdir -p /home/testuser/.local/bin && \
     cp /home/testuser/.altimate-install/dist/@altimateai/altimate-code-linux-${ARCH}/bin/altimate /home/testuser/.local/bin/altimate && \

Run this to confirm the mismatch between the Docker arg and the actual artifact names:

#!/bin/bash
set -euo pipefail

echo "Relevant Dockerfile lines:"
sed -n '34,37p' test/sanity/Dockerfile

echo
echo "Available Linux artifacts:"
find packages/opencode/dist/@altimateai -maxdepth 1 -type d -name 'altimate-code-linux-*' -exec basename {} \; | sort

Expected result: you should see altimate-code-linux-x64/altimate-code-linux-arm64, not altimate-code-linux-amd64.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/Dockerfile` around lines 34 - 37, The build fails because
TARGETARCH can be "amd64" but artifacts are named "x64"; update the ARCH
calculation to map amd64->x64 (and keep aarch64->arm64) so the cp path points to
the correct dist folder: change the ARCH assignment that currently uses sed
substitutions to also replace "amd64" with "x64" (e.g., add a sed 's/amd64/x64/'
step) so the cp in the RUN line copies from
`@altimateai/altimate-code-linux-`${ARCH}/bin/altimate to
/home/testuser/.local/bin/altimate successfully.
test/sanity/phases/smoke-tests.sh (1)

37-152: ⚠️ Potential issue | 🟠 Major

These helper verdicts can still record PASS for failed runs.

Most of these functions ignore the runner's exit status and only reject one or two substrings from the captured output. That means provider/auth/config failures — or even a missing capture file via local output=$(get_output ...) — can still fall through to PASS, so the suite does not reliably prove command resolution, DuckDB, agent routing, or graceful bad-command handling. Require the command to succeed (or require the expected handled error for test_bad_command) before writing PASS, and treat a missing output capture as FAIL.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/smoke-tests.sh` around lines 37 - 152, Many tests
(test_discover_mcps, test_configure_claude, test_sql_analyze, test_duckdb,
test_postgres, test_snowflake, test_builder, test_analyst, test_bad_command,
test_discover) currently ignore the runner exit status and treat missing/empty
captures as PASS; ensure each test first checks the exit code from altimate_run
or altimate_run_with_turns and that get_output returned non-empty content before
writing PASS, treating missing output as FAIL, and for test_bad_command require
the command to produce the expected handled error (not just absence of
"unhandled") before writing PASS; update each test to fail early if the runner
exit code is non-zero (except where a specific handled error is expected), and
only write PASS when both the exit code and output validation pass (use the
existing RESULTS_DIR paths and functions altimate_run, altimate_run_with_turns,
and get_output to implement the checks).
test/sanity/phases/resilience.sh (1)

93-133: ⚠️ Potential issue | 🟠 Major

These resilience checks don't actually assert the advertised outcome.

The missing-key, old-config, and broken-config branches only verify that one crash string is absent. Empty output or an unrelated error will still count as success, so this does not really prove “clean missing-key error” or “backward-compatible config load.” Assert the expected user-facing message and/or exit path before marking these cases as passed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/phases/resilience.sh` around lines 93 - 133, The tests under
"Missing API key handling", "Config backwards compat..." and "Broken config
handling..." only assert absence of certain crash strings (using
assert_not_contains) which allows empty or wrong outputs to pass; update the
checks to assert the expected user-facing behavior instead: for the missing-key
branch capture OUTPUT from the altimate run and assert that it contains the
explicit help/error message (e.g., "API key not set" or the CLI's intended
missing-key text) and that the command exited with a non-zero status; for the
old-config branch after altimate_run use get_output and assert it contains the
successful load message or expected prompt/response indicating backward
compatibility instead of merely asserting no "parse error"; for the
broken-config branch similarly assert the CLI prints a clear config parse
failure message (or falls back to defaults) and check exit status; replace or
augment the existing assert_not_contains calls with positive assertions
(assert_contains and exit-code checks) referencing the test blocks by their echo
markers and the helper calls altimate_run, get_output, OUTPUT, and
assert_not_contains so the changes are easy to locate.
test/sanity/pr-tests/generate.sh (1)

31-33: ⚠️ Potential issue | 🔴 Critical

Sanitize path-derived values before emitting eval’d commands.

Line 32 and Line 38 derive cmd/skill from changed filenames and Line 33/Line 39 inject them into command strings. Since these manifest commands are executed downstream, this is a shell-injection vector.

🔒 Suggested hardening
-echo "$changed" | grep "command/template/.*\.txt" 2>/dev/null | while read -r f; do
+echo "$changed" | grep -E "command/template/.*\.txt" 2>/dev/null | while read -r f; do
   cmd=$(basename "$f" .txt)
-  emit_test "command-${cmd}" "altimate run --max-turns 1 --yolo --command ${cmd} test"
+  [[ "$cmd" =~ ^[A-Za-z0-9._-]+$ ]] || { echo "  Skipping unsafe command name: $cmd"; continue; }
+  escaped_cmd=$(printf '%q' "$cmd")
+  emit_test "command-${cmd}" "altimate run --max-turns 1 --yolo --command ${escaped_cmd} test"
 done

-echo "$changed" | grep "skills/.*/SKILL.md" 2>/dev/null | while read -r f; do
+echo "$changed" | grep -E "skills/.*/SKILL.md" 2>/dev/null | while read -r f; do
   skill=$(basename "$(dirname "$f")")
-  emit_test "skill-${skill}" "ls ~/.altimate/builtin/${skill}/SKILL.md"
+  [[ "$skill" =~ ^[A-Za-z0-9._-]+$ ]] || { echo "  Skipping unsafe skill name: $skill"; continue; }
+  escaped_skill=$(printf '%q' "$skill")
+  emit_test "skill-${skill}" "ls ~/.altimate/builtin/${escaped_skill}/SKILL.md"
 done

Also applies to: 37-39

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/pr-tests/generate.sh` around lines 31 - 33, The script constructs
command strings using untrusted basename-derived values (variables cmd and
skill) and then emits/evaluates them, creating a shell-injection risk; update
the emit path by validating/sanitizing those derived values before use: in the
block that sets cmd=$(basename "$f" .txt) and skill=$(basename "$f" .txt)
enforce a strict whitelist regex (e.g., only [a-z0-9_-]+) and skip or fail if
the value doesn't match, or alternatively escape them safely (e.g., use
shell-safe quoting like printf '%q' or pass them as separate arguments instead
of interpolating into a single eval string) so emit_test is never given
unsanitized input.
🧹 Nitpick comments (1)
test/sanity/pr-tests/generate.sh (1)

9-9: Use an explicit no-op for manifest truncation.

Line 9 works in Bash, but ShellCheck SC2188 flags it. Using : > "$MANIFEST" is clearer and avoids lint noise.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/sanity/pr-tests/generate.sh` at line 9, Replace the bare redirection
used to truncate the manifest file (currently using > "$MANIFEST") with an
explicit no-op redirection so ShellCheck SC2188 is satisfied; change the
truncation invocation to use a no-op command redirect (i.e., use ':' as the
command) targeting the MANIFEST variable to clearly indicate intended truncation
in the generate.sh script.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/sanity/ci-local.sh`:
- Around line 59-80: The health-check loop sets HEALTHY but doesn't gate
downstream steps; add an explicit failure check after the loop to exit (or call
run_step failure behavior) if HEALTHY is less than 4 or if the prior docker
compose up failed. Concretely, after the loop that computes HEALTHY, test the
variable (e.g., if [ "$HEALTHY" -lt 4 ]; then echo "Services not healthy"; exit
1; fi) so that the subsequent run_step "Driver E2E (local)" / "Driver E2E
(docker)" and "Sanity Suite (Docker)" are not executed when docker compose up or
the health check failed; ensure the check references the HEALTHY variable and
the same docker compose invocation used earlier so failures short-circuit
immediately.

In `@test/sanity/Dockerfile`:
- Around line 27-30: The Docker RUN currently pipes "bun install 2>&1 | tail -5"
which hides bun's exit status; change it so the install's exit code is checked
and you still print the last lines: run bun install with output redirected to a
logfile and use && to tail it (for example run bun install >
/tmp/bun-install.log 2>&1 && tail -5 /tmp/bun-install.log), keeping the working
directory /home/testuser/.altimate-install and the existing package.json
creation; update the RUN line that references
"/home/testuser/.altimate-install", "bun install" and "tail -5" accordingly.

In `@test/sanity/pr-tests/generate.sh`:
- Around line 11-16: The current assignment to changed falls back to echo "" on
git errors, causing silent skip of PR tests; instead detect git command failures
and fail-open. Replace the single-line fallback with explicit checks: run the
primary git diff command, if it fails run the secondary, and if that also fails
print a clear error (including git's stderr) and exit non-zero; ensure the
variable changed is set only on success and the script exits 1 when neither git
diff command succeeds so tests are not silently skipped (refer to the changed
variable and the two git diff invocations).

---

Duplicate comments:
In `@test/sanity/Dockerfile`:
- Around line 34-37: The build fails because TARGETARCH can be "amd64" but
artifacts are named "x64"; update the ARCH calculation to map amd64->x64 (and
keep aarch64->arm64) so the cp path points to the correct dist folder: change
the ARCH assignment that currently uses sed substitutions to also replace
"amd64" with "x64" (e.g., add a sed 's/amd64/x64/' step) so the cp in the RUN
line copies from `@altimateai/altimate-code-linux-`${ARCH}/bin/altimate to
/home/testuser/.local/bin/altimate successfully.

In `@test/sanity/phases/resilience.sh`:
- Around line 93-133: The tests under "Missing API key handling", "Config
backwards compat..." and "Broken config handling..." only assert absence of
certain crash strings (using assert_not_contains) which allows empty or wrong
outputs to pass; update the checks to assert the expected user-facing behavior
instead: for the missing-key branch capture OUTPUT from the altimate run and
assert that it contains the explicit help/error message (e.g., "API key not set"
or the CLI's intended missing-key text) and that the command exited with a
non-zero status; for the old-config branch after altimate_run use get_output and
assert it contains the successful load message or expected prompt/response
indicating backward compatibility instead of merely asserting no "parse error";
for the broken-config branch similarly assert the CLI prints a clear config
parse failure message (or falls back to defaults) and check exit status; replace
or augment the existing assert_not_contains calls with positive assertions
(assert_contains and exit-code checks) referencing the test blocks by their echo
markers and the helper calls altimate_run, get_output, OUTPUT, and
assert_not_contains so the changes are easy to locate.

In `@test/sanity/phases/smoke-tests.sh`:
- Around line 37-152: Many tests (test_discover_mcps, test_configure_claude,
test_sql_analyze, test_duckdb, test_postgres, test_snowflake, test_builder,
test_analyst, test_bad_command, test_discover) currently ignore the runner exit
status and treat missing/empty captures as PASS; ensure each test first checks
the exit code from altimate_run or altimate_run_with_turns and that get_output
returned non-empty content before writing PASS, treating missing output as FAIL,
and for test_bad_command require the command to produce the expected handled
error (not just absence of "unhandled") before writing PASS; update each test to
fail early if the runner exit code is non-zero (except where a specific handled
error is expected), and only write PASS when both the exit code and output
validation pass (use the existing RESULTS_DIR paths and functions altimate_run,
altimate_run_with_turns, and get_output to implement the checks).

In `@test/sanity/pr-tests/generate.sh`:
- Around line 31-33: The script constructs command strings using untrusted
basename-derived values (variables cmd and skill) and then emits/evaluates them,
creating a shell-injection risk; update the emit path by validating/sanitizing
those derived values before use: in the block that sets cmd=$(basename "$f"
.txt) and skill=$(basename "$f" .txt) enforce a strict whitelist regex (e.g.,
only [a-z0-9_-]+) and skip or fail if the value doesn't match, or alternatively
escape them safely (e.g., use shell-safe quoting like printf '%q' or pass them
as separate arguments instead of interpolating into a single eval string) so
emit_test is never given unsanitized input.

---

Nitpick comments:
In `@test/sanity/pr-tests/generate.sh`:
- Line 9: Replace the bare redirection used to truncate the manifest file
(currently using > "$MANIFEST") with an explicit no-op redirection so ShellCheck
SC2188 is satisfied; change the truncation invocation to use a no-op command
redirect (i.e., use ':' as the command) targeting the MANIFEST variable to
clearly indicate intended truncation in the generate.sh script.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b587e827-6537-49cd-bfaf-82a71a842613

📥 Commits

Reviewing files that changed from the base of the PR and between 299dbf8 and 1335dbd.

📒 Files selected for processing (23)
  • package.json
  • test/sanity/.env.example
  • test/sanity/Dockerfile
  • test/sanity/Dockerfile.upgrade
  • test/sanity/ci-local.sh
  • test/sanity/docker-compose.upgrade.yml
  • test/sanity/docker-compose.yml
  • test/sanity/fixtures/broken-config.json
  • test/sanity/fixtures/compaction-session.sql
  • test/sanity/fixtures/old-config.json
  • test/sanity/fixtures/test.sql
  • test/sanity/lib/altimate-run.sh
  • test/sanity/lib/assert.sh
  • test/sanity/lib/cleanup.sh
  • test/sanity/lib/parallel.sh
  • test/sanity/phases/resilience.sh
  • test/sanity/phases/smoke-tests.sh
  • test/sanity/phases/verify-install.sh
  • test/sanity/phases/verify-upgrade.sh
  • test/sanity/pr-tests/generate.sh
  • test/sanity/pr-tests/run-pr-tests.sh
  • test/sanity/results/baseline.json
  • test/sanity/run.sh
✅ Files skipped from review due to trivial changes (10)
  • test/sanity/docker-compose.upgrade.yml
  • test/sanity/fixtures/compaction-session.sql
  • test/sanity/.env.example
  • test/sanity/fixtures/old-config.json
  • test/sanity/pr-tests/run-pr-tests.sh
  • test/sanity/lib/cleanup.sh
  • test/sanity/fixtures/broken-config.json
  • test/sanity/results/baseline.json
  • test/sanity/docker-compose.yml
  • test/sanity/lib/altimate-run.sh
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/sanity/fixtures/test.sql
  • test/sanity/phases/verify-upgrade.sh
  • test/sanity/phases/verify-install.sh
  • test/sanity/lib/assert.sh

@kulvirgit kulvirgit force-pushed the feat/sanity-test-harness branch from 1335dbd to ab5d0f9 Compare March 25, 2026 17:52
Docker-based "new user simulator" that tests the shipped npm artifact:
- Phase 1: verify-install (9 checks: binary, skills, napi, dbt, git)
- Phase 2: smoke-tests (10 E2E tests via altimate run, parallelized)
- Phase 3: resilience (8 tests: SQLite, WAL, sessions, compaction, config)
- PR-aware test generation (git diff → targeted tests)
- Local CI pipeline (bun run ci → typecheck + tests + markers)
- Machine-aware parallelism (2-6 concurrent based on cores/RAM)

27 tests, all passing in ~2:48 on 20-core machine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@kulvirgit kulvirgit force-pushed the feat/sanity-test-harness branch from ab5d0f9 to a0c4e72 Compare March 25, 2026 19:04
@kulvirgit kulvirgit merged commit 79c34c5 into main Mar 25, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant