Skip to content

feat: add dry_run preview to apply_refactor_tool (#176)#226

Closed
azizur100389 wants to merge 1 commit intotirth8205:mainfrom
azizur100389:feat/refactor-dry-run
Closed

feat: add dry_run preview to apply_refactor_tool (#176)#226
azizur100389 wants to merge 1 commit intotirth8205:mainfrom
azizur100389:feat/refactor-dry-run

Conversation

@azizur100389
Copy link
Copy Markdown
Contributor

Summary

Adds a dry_run: bool = False parameter to apply_refactor_tool so callers can preview a unified diff of every edited file without writing to disk. When dry_run=True the pending refactor is kept in the cache so the caller can review the diffs and call the tool again with the same refactor_id to actually apply — the two-step commit pattern the maintainer explicitly validated in the issue thread.

Closes #176.

Root cause

apply_refactor_tool wrote changes to disk immediately with no confirmation step. In agentic workflows where multiple MCP tools are chained, this tool could be called as a side effect of a broader task the developer didn't intend to modify files for — and there was no undo path beyond git checkout.

Maintainer validation from the issue thread:

Valid security/UX concern. apply_refactor_tool currently writes changes to disk immediately with no confirmation step. The suggested two-step approach (dry_run first, then confirmed=true to apply) is the right pattern for agentic workflows...

Fix

  1. code_review_graph/refactor.pyapply_refactor() gains dry_run: bool = False. When True, compute a unified diff via stdlib difflib.unified_diff between the original file content and the post-edit content, collect per-file entries in a diffs list, and return without writing. The preview is not popped from _pending_refactors in dry-run mode so a follow-up real apply works.
  2. code_review_graph/tools/refactor_tools.pyapply_refactor_func() threads dry_run through.
  3. code_review_graph/main.pyapply_refactor_tool MCP decorator exposes dry_run to clients with an updated docstring.
  4. Path-traversal validation and expiry checks still run in dry-run mode — dry-run is not an escape hatch for security.

Return shape

The non-dry-run path is unchanged (no new keys) so no existing caller breaks. Dry-run adds two keys:

{
    "status": "ok",
    "applied": 0,
    "files_modified": [],
    "edits_applied": 0,
    "dry_run": True,
    "diffs": [{"file": "...", "diff": "--- a/...\n+++ b/...\n@@ ..."}]
}

Tests added (tests/test_refactor.py::TestApplyRefactor)

  • test_apply_refactor_dry_run_returns_diff — valid unified diff, file on disk untouched, preview still cached
  • test_apply_refactor_dry_run_then_apply — dry-run followed by real apply with the same refactor_id writes the file and consumes the preview
  • test_apply_refactor_dry_run_blocks_path_traversal — path-traversal check still fires
  • test_apply_refactor_dry_run_rejects_expired — expired previews still rejected

Test results

Stage Result
Stage 1 — new targeted tests 4/4 passed
Stage 2 — tests/test_refactor.py full 26 passed, 16 pre-existing Windows file-lock teardown errors (verified identical on unchanged main)
Stage 3 — adjacent tests/test_tools.py 62 passed, 15 pre-existing Windows teardown errors
Stage 4 — full suite 699 passed, 6 pre-existing Windows failures in test_incremental/test_notebook (verified identical on main)
Stage 5 — ruff check on all 4 touched files clean
Stage 6 — MCP tool-wrapper smoke test dry_run=True returns valid diff without writing; dry_run=False after dry-run writes and consumes

Zero regressions. All pre-existing test issues are Windows-specific file-lock teardown failures that reproduce on unchanged main.

Why this fix is safe

  • No new dependency — difflib is stdlib.
  • Non-dry-run return shape is unchanged; existing callers and tests work unmodified.
  • Dry-run preserves the refactor_id so a real apply can follow without re-running rename_preview.
  • Security invariants (path traversal, expiry) are enforced in both modes.

Add a dry_run: bool = False parameter to apply_refactor_tool so callers
can preview a unified diff of every edited file without touching disk.
When dry_run is True, the pending refactor is left in the cache so the
caller can review the diffs and invoke the tool again with the same
refactor_id to actually apply. This is the two-step commit pattern the
maintainer explicitly validated in the issue thread.

Root cause
----------
apply_refactor_tool wrote changes to disk immediately with no confirmation
step. Agentic workflows that chain MCP tool calls could trigger file
rewrites as a side effect of a broader task the developer did not intend
to modify files for. There was no undo path beyond `git checkout`, and
unstaged work could be lost.

Fix
---
1. refactor.py::apply_refactor gains `dry_run: bool = False`. When true,
   compute a unified diff via stdlib difflib.unified_diff between the
   original file content and the post-edit content, collect per-file
   entries in a `diffs` list, and return without writing. The preview
   is NOT popped from _pending_refactors in dry-run mode so the caller
   can apply it afterwards.
2. tools/refactor_tools.py::apply_refactor_func threads the flag through.
3. main.py::apply_refactor_tool exposes dry_run to MCP clients.
4. Path-traversal validation and expiry checks still run in dry-run mode
   so dry-run is not an escape hatch for security.

Return shape
------------
Non-dry-run path is unchanged (no new keys) so no existing caller breaks.
Dry-run adds `dry_run: True` and `diffs: [{file, diff}, ...]`.

Tests added (tests/test_refactor.py::TestApplyRefactor)
-------------------------------------------------------
- test_apply_refactor_dry_run_returns_diff: valid unified diff returned,
  file unchanged on disk, preview still cached.
- test_apply_refactor_dry_run_then_apply: dry-run followed by real apply
  on the same refactor_id writes the file and consumes the preview.
- test_apply_refactor_dry_run_blocks_path_traversal: path-traversal
  check still fires in dry-run mode.
- test_apply_refactor_dry_run_rejects_expired: expired previews still
  rejected in dry-run mode.

Test results
------------
Stage 1 (new targeted tests): 4/4 passed.
Stage 2 (tests/test_refactor.py): 26 passed, 16 pre-existing Windows
  file-lock teardown errors (verified identical on unchanged main).
Stage 3 (tests/test_tools.py adjacent): 62 passed, 15 pre-existing
  Windows teardown errors.
Stage 4 (full suite): 699 passed, 6 pre-existing Windows failures in
  test_incremental/test_notebook (verified identical on main).
Stage 5 (ruff lint on all 4 touched files): clean.
Stage 6 (MCP tool wrapper smoke test): dry_run=True returns valid diff
  without writing; dry_run=False after dry-run writes and consumes.

Zero regressions from this change.

Why this fix is safe
--------------------
- No new dependency (difflib is stdlib).
- Non-dry-run return shape is unchanged; all existing callers and tests
  work with no modification.
- Dry-run preserves the refactor_id so a real apply can follow without
  re-running rename_preview.
- Security invariants (path traversal, expiry) still enforced in both
  modes.
@azizur100389 azizur100389 force-pushed the feat/refactor-dry-run branch from 4aff6b2 to 69abec6 Compare April 11, 2026 20:49
azizur100389 added a commit to azizur100389/code-review-graph that referenced this pull request Apr 11, 2026
Register .sh / .bash / .zsh / .ksh with tree-sitter-bash. Extract File
and Function nodes for shell function definitions, CALLS edges for
command invocations (both to local functions and external binaries),
and IMPORTS_FROM edges for `source` / `.` dot-includes. Test functions
(test_* prefix) are classified as Test kind.

Root cause of tirth8205#197
------------------
Shell-script-heavy repos (installers, infra, CI tooling) previously
indexed to 0 nodes / 0 edges because .sh was not in EXTENSION_TO_LANGUAGE.
build_or_update_graph reported `parsed 0 files` for FragHub-style repos,
which blocked architecture mapping and flow detection for the category
of projects that most need them.

What's supported
----------------
* Function definitions in both tree-sitter-bash shapes:
    foo() { ... }
    function foo { ... }
    function foo() { ... }
* Call extraction for `command` nodes (tree-sitter-bash wraps every
  invocation in a `command` node with a `command_name` > `word` child).
  Local calls resolve to qualified names; external binaries (curl, grep,
  awk) are recorded with their raw name, consistent with how unresolved
  calls work in other languages.
* `source lib.sh` and `. lib.sh` emit IMPORTS_FROM edges (both bareword
  and quoted-string arguments handled).
* `test_*` prefix -> Test kind via the existing _TEST_PATTERNS list.

Implementation
--------------
1. EXTENSION_TO_LANGUAGE: +4 shell extensions -> "bash"
2. _CLASS_TYPES["bash"] = []  (no class concept in bash)
3. _FUNCTION_TYPES["bash"] = ["function_definition"]
4. _IMPORT_TYPES["bash"] = []  (source handled via constructs handler)
5. _CALL_TYPES["bash"] = ["command"]
6. New _extract_bash_constructs() + _bash_get_source_target() helpers
   following the _extract_lua_constructs pattern. Dispatched next to
   the existing Lua/Luau handler.
7. _get_name: new "bash" branch that reads the first `word` child of
   a function_definition (handles both paren and function-keyword forms).
8. _get_call_name: new "bash" branch that reads command_name > word.

Scope caveats documented in PR body
-----------------------------------
* No cross-file call resolution beyond direct source includes (bash has
  no formal module system).
* Calls whose name is a variable expansion ($foo, `$(cmd)`) are not
  statically resolvable and are skipped by _get_call_name.
* Pipelines, subshells, heredocs are parsed by tree-sitter but not
  semantically linked yet.

Tests added (tests/test_multilang.py::TestBashParsing — 10 tests)
-----------------------------------------------------------------
- test_detects_language (.sh, .bash, .zsh)
- test_finds_function_definitions_paren_form
- test_finds_function_definitions_function_keyword_form
- test_finds_source_imports (source + . + quoted string)
- test_finds_calls_between_local_functions
- test_finds_external_command_calls
- test_finds_contains_edges
- test_detects_test_functions
- test_nodes_have_bash_language
- test_calls_inside_functions

New fixture: tests/fixtures/sample.sh exercises both function-definition
forms, both import forms (bareword + quoted), local + external call
mix, and test_* naming.

Test results
------------
Stage 1 (new targeted tests): 10/10 passed.
Stage 2 (tests/test_multilang.py full): 140/140 passed — no regressions
  across any language.
Stage 3 (tests/test_parser.py adjacent): 67/67 passed.
Stage 4 (full suite): 705 passed (up from 699 baseline — +6 new), 6
  pre-existing Windows failures in test_incremental/test_notebook
  (verified identical on unchanged main in PR tirth8205#226 work).
Stage 5 (ruff check): clean.
Stage 6 (fixture smoke parse): 8 nodes, 26 edges. All expected function,
  call, import, and test nodes present.

Zero regressions. All new code lives behind the bash language check so
existing languages are untouched.
@azizur100389
Copy link
Copy Markdown
Contributor Author

Closing this — I just noticed v2.3.0 ships the dry-run preview as part of your super-PR #228. My implementation and the shipped one converge on the same two-step pattern (dry_run=True returns a unified diff via difflib, pending refactor is retained so a follow-up dry_run=False call applies it), so nothing to salvage here.

Thanks for landing it — glad the pattern matched what you had in mind. I've moved on to a smaller follow-up for the .ksh extension you flagged in #230 (tracked in #235, PR #236).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

apply_refactor_tool writes to files with no dry-run or confirmation step

1 participant