Skip to content

[codex] Add Phase 2 validate_patch decision engine#2

Merged
qkal merged 17 commits into
masterfrom
feat/phase-2-decision-engine-split
Jun 19, 2026
Merged

[codex] Add Phase 2 validate_patch decision engine#2
qkal merged 17 commits into
masterfrom
feat/phase-2-decision-engine-split

Conversation

@qkal

@qkal qkal commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add a Phase 2 validate_patch response contract centered on deterministic decisions, blockers, required checks, evidence, next actions, and fix plans.
  • Wire the service and MCP tool output through agent_quality_mcp.response while preserving shadow-workspace validation and structured rejection for unsupported mutation modes.
  • Add regression coverage for decision precedence, optional quick-mode tooling, diagnostic truncation evidence, resource-limit classification, and stale response-contract exports.

Why

The previous response shape exposed raw status buckets and suggestions but did not give callers a clear, deterministic decision path. This adds a decision-engine-first contract so agents can tell whether to apply, revise, fix tooling, reject, or escalate.

Validation

  • .venv/bin/python -m pytest -v -> 277 passed
  • .venv/bin/ruff check . -> passed
  • .venv/bin/pyright --pythonpath .venv/bin/python -> 0 errors
  • git diff --check -> passed

Notes

The branch was rebased onto master before publishing so the PR diff is scoped to the Phase 2 implementation.

Summary by CodeRabbit

Release Notes

  • New Features

    • Redesigned validation response with structured decision outcomes, confidence levels, actionable next steps, and targeted fix plans replacing previous status-based format.
    • Added comprehensive evidence tracking including diagnostic counts, truncation flags, tool availability, and workspace modification state.
  • Bug Fixes

    • Improved error handling for resource limit violations with dedicated error type.
  • Documentation

    • Updated documentation explaining new Phase 2 response structure and field mappings from previous format.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@qkal, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 26 minutes and 1 second. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9a044544-6760-4d6b-9296-250548aa5665

📥 Commits

Reviewing files that changed from the base of the PR and between fafca53 and cbaa070.

📒 Files selected for processing (2)
  • src/agent_quality_mcp/response.py
  • tests/unit/test_response_contract.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/phase-2-decision-engine-split

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qkal qkal force-pushed the feat/phase-2-decision-engine-split branch from 3bf8e04 to fafca53 Compare June 19, 2026 20:53
@qkal qkal marked this pull request as ready for review June 19, 2026 20:59

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Incomplete transformation: _final_response computes risk_score from compressed while success path uses diagnostics

The PR intentionally changed the success path (line 172-176) from compute_risk_score(compressed, ...) to compute_risk_score(diagnostics, ...) so that decision-making uses the full uncompressed diagnostic set. The _final_response function was partially updated — the _response_from_parts call was changed to pass diagnostics instead of compressed (line 422), but the compute_risk_score and _missing_tools calls on lines 407-410 still use compressed. This creates a semantic inconsistency: the decision pipeline in build_validate_patch_response (response.py:157-169) operates on the full diagnostics, but the risk_score passed alongside was computed from a potentially different compressed subset. Currently this has no observable impact because _final_response is always called with a single blocking diagnostic that compression never removes, but the inconsistency would surface if _final_response were ever called with multiple diagnostics where compression truncates some.

(Refers to lines 406-410)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/agent_quality_mcp/response.py`:
- Around line 60-66: The decision field in the ValidatePatchResponse class is
currently typed as a raw str, which weakens type safety and contract validation.
Change the type annotation of the decision field from str to PatchDecision enum
to ensure deterministic schema validation and stronger contract guarantees while
maintaining string serialization for API responses.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c404e0e4-8928-4bb8-9321-0d1fe15dc866

📥 Commits

Reviewing files that changed from the base of the PR and between f4c7cd0 and fafca53.

📒 Files selected for processing (19)
  • README.md
  • src/agent_quality_mcp/actions.py
  • src/agent_quality_mcp/decision.py
  • src/agent_quality_mcp/exceptions.py
  • src/agent_quality_mcp/grouping.py
  • src/agent_quality_mcp/models.py
  • src/agent_quality_mcp/response.py
  • src/agent_quality_mcp/service.py
  • src/agent_quality_mcp/shadow.py
  • src/agent_quality_mcp/tools.py
  • tests/integration/test_validate_patch_demo.py
  • tests/unit/test_actions.py
  • tests/unit/test_decision.py
  • tests/unit/test_grouping.py
  • tests/unit/test_models.py
  • tests/unit/test_response_contract.py
  • tests/unit/test_service.py
  • tests/unit/test_tools_server.py
  • tests/unit/test_workspace_shadow.py
💤 Files with no reviewable changes (1)
  • src/agent_quality_mcp/models.py
📜 Review details
🧰 Additional context used
🪛 ast-grep (0.43.0)
tests/unit/test_response_contract.py

[info] 19-19: Do not hardcode temporary file or directory names
Context: "/tmp/demo"
Note: [CWE-377].

(hardcoded-tmp-file)

tests/unit/test_service.py

[info] 26-26: Do not hardcode temporary file or directory names
Context: "/tmp/shadow"
Note: [CWE-377].

(hardcoded-tmp-file)

🔇 Additional comments (24)
README.md (1)

10-11: LGTM!

Also applies to: 110-164

src/agent_quality_mcp/response.py (1)

76-335: LGTM!

tests/unit/test_response_contract.py (1)

24-209: LGTM!

tests/unit/test_models.py (1)

3-5: LGTM!

Also applies to: 103-105

tests/unit/test_service.py (1)

23-31: LGTM!

Also applies to: 37-38, 52-52, 65-65, 113-116, 138-173, 190-190, 202-202, 212-212, 268-270, 306-307, 336-337, 366-369, 386-389, 392-629, 676-677

src/agent_quality_mcp/tools.py (1)

10-11: LGTM!

Also applies to: 41-45, 74-89

tests/integration/test_validate_patch_demo.py (1)

14-14: LGTM!

Also applies to: 54-70, 95-106

tests/unit/test_tools_server.py (1)

5-5: LGTM!

Also applies to: 88-88, 144-148, 151-203

src/agent_quality_mcp/exceptions.py (1)

24-26: LGTM!

src/agent_quality_mcp/shadow.py (1)

12-12: LGTM!

Also applies to: 99-106

src/agent_quality_mcp/service.py (1)

25-25: LGTM!

Also applies to: 48-48, 173-183, 368-371, 384-384, 422-422, 453-478, 481-489, 530-531

tests/unit/test_workspace_shadow.py (1)

6-6: LGTM!

Also applies to: 75-78, 91-94

src/agent_quality_mcp/decision.py (4)

1-99: LGTM!


101-146: LGTM!


149-203: LGTM!


205-298: LGTM!

tests/unit/test_decision.py (1)

1-268: LGTM!

src/agent_quality_mcp/grouping.py (3)

1-98: LGTM!


101-169: LGTM!


172-238: LGTM!

tests/unit/test_grouping.py (1)

1-232: LGTM!

src/agent_quality_mcp/actions.py (2)

1-104: LGTM!


106-226: LGTM!

tests/unit/test_actions.py (1)

1-190: LGTM!

Comment thread src/agent_quality_mcp/response.py
@qkal qkal merged commit 554b512 into master Jun 19, 2026
3 checks passed
@qkal qkal deleted the feat/phase-2-decision-engine-split branch June 19, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant