Skip to content

Prevent AI-Induced Version Downgrades#3005

Draft
google-labs-jules[bot] wants to merge 6 commits into
mainfrom
prevent-version-downgrades-2113600157740646610
Draft

Prevent AI-Induced Version Downgrades#3005
google-labs-jules[bot] wants to merge 6 commits into
mainfrom
prevent-version-downgrades-2113600157740646610

Conversation

@google-labs-jules

Copy link
Copy Markdown
Contributor

Implemented a multi-layered defense against Knowledge Cutoff Regression:

  1. Context Grounding: AI models now receive the current repository stack versions in their system prompts, reducing hallucinations.
  2. Deterministic Validation: A new verify_versions.py script compares proposed changes against HEAD and external registries (npm/GitHub).
  3. Hard Locks: Core runtime versions (Node.js) are strictly locked in CI and local checks unless explicitly overridden.
  4. Autonomous Guards: AI-generated conflict resolutions are automatically validated and rejected if they introduce version regressions.

Fixes #3003


PR created automatically by Jules for task 2113600157740646610 started by @arii

This change implements a deterministic validation layer and context-injection step to prevent AI agents from downgrading dependencies or modifying core runtime versions.

Key changes:
- Added `get_stack_versions()` to `dev-tools/utils.py` to extract ground-truth versions from the repo.
- Injected current stack versions into AI prompts in `ai_service.py` and `ai_reviewer.py` for factual grounding.
- Created `dev-tools/verify_versions.py` to parse diffs and detect version downgrades/hard-blocks.
- Implemented hard blocks on Node.js version modifications in `scripts/check-runtime.mjs` and `scripts/check-runtime-files.mjs` (overridable via ALLOW_NODE_VERSION_CHANGE=true).
- Integrated version verification into `td_cli.py` (`td gh verify-versions`) and `Orchestrator.pre_submit_checks`.
- Added post-processing to `AIClient.resolve_file_conflicts` to reject resolutions containing version violations.
- Added comprehensive unit tests in `tests/dev-tools/test_version_protection.py`.
@google-labs-jules

Copy link
Copy Markdown
Contributor Author

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

🚀 Deployment Details (Last updated: Jun 26, 2026, 4:26 PM PST)

🚀 Pushed to gh-pages; publish in progress

@github-actions

Copy link
Copy Markdown
Contributor

👁️ Gemini Code Review Agent

Powered by Gemini 3.x

Reviewing: PR #3005

Code Review Feedback

[ARCHITECTURE] Review

Error: failed to execute ARCHITECTURE review. Details: [GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent: [429 Too Many Requests] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps.

[PERFORMANCE] Review

Error: failed to execute PERFORMANCE review. Details: [GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent: [429 Too Many Requests] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps.

[SECURITY] Review

Error: failed to execute SECURITY review. Details: [GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent: [429 Too Many Requests] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps.

[STYLE] Review

Error: failed to execute STYLE review. Details: [GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-3.5-flash:generateContent: [429 Too Many Requests] Your project has exceeded its monthly spending cap. Please go to AI Studio at https://ai.studio/spend to manage your project spend cap. Learn more at https://ai.google.dev/gemini-api/docs/billing#project-spend-caps.


Generated by gemini-code-review

@github-actions

Copy link
Copy Markdown
Contributor

🐙 GitHub Models Code Review

Powered by GitHub Models

Reviewing: PR #3005

Model: gpt-4o

Code Review Feedback

[ARCHITECTURE] Review

Error: failed to execute ARCHITECTURE review. Details: GitHub Models API error: 429 Too Many Requests - {"error":{"code":"RateLimitReached","message":"Rate limit of 2 per 0s exceeded for UserConcurrentRequests. Please wait 0 seconds before retrying.","details":"Rate limit of 2 per 0s exceeded for UserConcurrentRequests. Please wait 0 seconds before retrying."}}

[PERFORMANCE] Review

Review Summary

This PR introduces a multi-layered defense mechanism against AI-induced version downgrades. It includes changes to several files, such as ai_reviewer.py, pyproject.toml, cli.py, orchestrator.py, ai_service.py, and a new verify_versions.py script. The changes aim to ensure that AI-generated code reviews and conflict resolutions do not introduce version regressions. While the implementation is generally sound, there are several high-severity issues, as well as some questions and nitpicks.


High Severity Issues

Issue 1: Missing Error Handling for get_stack_versions in ai_reviewer.py

File: dev-tools/ai_reviewer.py
Line: 25
Snippet: stack_versions = get_stack_versions()
Issue: The get_stack_versions() function can raise exceptions (e.g., file read errors or JSON parsing errors), but there is no error handling in place here. If an exception occurs, the script will crash, potentially causing CI failures.
Fix Summary: Wrap the call to get_stack_versions() in a try-except block and handle errors gracefully, e.g., by logging a warning and falling back to default values.


Issue 2: Unvalidated AI Conflict Resolution in ai_service.py

File: dev-tools/tdw_services/services/ai_service.py
Line: 148
Snippet: if any(file_path.endswith(sf) or sf in file_path for sf in sensitive_files):
Issue: The logic for validating AI-generated conflict resolutions relies on heuristic checks for sensitive files and a synthesized diff. However, the validation process is incomplete and does not guarantee that all version downgrades are caught. Additionally, the fallback behavior (return False) is overly aggressive and may lead to unnecessary failures.
Fix Summary: Refactor the validation logic to ensure that all version changes are checked against the get_stack_versions() output. Provide more granular error handling and logging to differentiate between critical and non-critical issues.


Issue 3: Potential Infinite Loop in verify_versions.py

File: dev-tools/verify_versions.py
Line: [TRUNCATED]
Snippet: while True:
Issue: The verify_versions.py script contains a while True loop without any clear exit condition. This could lead to an infinite loop if the expected termination condition is not met.
Fix Summary: Ensure that the loop has a proper exit condition or a timeout mechanism to prevent indefinite execution.


Issue 4: Missing Dependency Version Constraints in pyproject.toml

File: dev-tools/pyproject.toml
Line: 13
Snippet: "packaging",
Issue: The packaging library is added as a dependency without specifying a version constraint. This can lead to compatibility issues if future versions introduce breaking changes.
Fix Summary: Specify a version constraint for packaging, e.g., "packaging>=21.0".


Questions

Question 1: Validation of verify_versions.py Output

File: dev-tools/tdw_services/cli.py
Line: 249
Snippet: findings = json.loads(proc.stdout)
Concern: The script assumes that verify_versions.py will always produce valid JSON output. What happens if the script fails or produces malformed output? Should there be additional validation or fallback behavior?


Question 2: Hardcoded Fallback Versions in get_stack_versions

File: dev-tools/utils.py
Line: 478
Snippet: "node": "24.16.0", # Fallback
Concern: The fallback versions for Node.js, pnpm, and GitHub Actions are hardcoded. Should these values be configurable via environment variables or a configuration file?


Nitpicks

Nitpick 1: Redundant Code in ai_service.py

File: dev-tools/tdw_services/services/ai_service.py
Line: 148
Snippet: if any(file_path.endswith(sf) or sf in file_path for sf in sensitive_files):
Concern: The condition file_path.endswith(sf) or sf in file_path is redundant because sf in file_path already covers the case where file_path ends with sf.
Fix Summary: Simplify the condition to if any(sf in file_path for sf in sensitive_files):.


Nitpick 2: Lack of Unit Tests for verify_versions.py

File: dev-tools/verify_versions.py
Line: [TRUNCATED]
Snippet: [Entire file]
Concern: The new verify_versions.py script does not appear to have any associated unit tests. While this is not a blocking issue, it is recommended to add tests to ensure the script behaves as expected.
Fix Summary: Add unit tests for verify_versions.py, focusing on edge cases such as malformed diffs and missing files.


Final Verdict

Given the high-severity issues identified, particularly the lack of error handling in critical paths and the incomplete validation logic for AI-generated conflict resolutions, this PR cannot be approved in its current state. These issues could lead to CI failures or allow version downgrades to slip through undetected.


Findings Summary

[SECURITY] Review

Error: failed to execute SECURITY review. Details: GitHub Models API error: 429 Too Many Requests - {"error":{"code":"RateLimitReached","message":"Rate limit of 2 per 0s exceeded for UserConcurrentRequests. Please wait 0 seconds before retrying.","details":"Rate limit of 2 per 0s exceeded for UserConcurrentRequests. Please wait 0 seconds before retrying."}}

[STYLE] Review

Review Summary

The provided diff introduces several changes to the codebase to implement a multi-layered defense against AI-induced version downgrades. The changes include:

  1. Adding stack version awareness to AI prompts.
  2. Introducing a verify_versions.py script to validate version changes.
  3. Adding a verify_versions command to the CLI.
  4. Integrating version downgrade checks into the orchestrator's workflow.
  5. Enhancing AI conflict resolution to prevent version downgrades.

While the changes align with the stated goal, there are a few high-severity issues and several non-blocking concerns that need to be addressed.


High-Severity Issues (Blocking)

1. Missing Import in verify_versions.py

  • File: dev-tools/verify_versions.py
  • Line: 13
  • Snippet: from utils import get_stack
  • Issue: The get_stack function is not defined in the provided context or the diff. This will result in an ImportError when the script is executed.
  • Fix Summary: Verify that get_stack is defined in utils.py or another module. If it is not defined, implement the function or remove the import.

2. Uncaught Exception in verify_versions Command

  • File: dev-tools/tdw_services/cli.py
  • Line: 249
  • Snippet: err(ctx, f"Error running validator: {e}")
  • Issue: If the subprocess.run call fails, the exception is logged, but the program does not exit with a non-zero status. This could lead to false positives in CI pipelines.
  • Fix Summary: Add a sys.exit(1) call after logging the error to ensure the process exits with a failure status.

3. Hardcoded Fallback Versions in get_stack_versions

  • File: dev-tools/utils.py
  • Line: 481
  • Snippet:
    versions = {
        "node": "24.16.0", # Fallback
        "pnpm": "10.28.2", # Fallback
        "actions/checkout": "v4",
        "actions/setup-node": "v4",
        "actions/upload-artifact": "v4",
    }
  • Issue: Hardcoding fallback versions introduces a risk of these values becoming outdated, leading to incorrect validation results.
  • Fix Summary: Replace hardcoded values with a mechanism to fetch the latest versions dynamically (e.g., from a central configuration file or an API).

4. Potential Infinite Loop in verify_versions

  • File: dev-tools/verify_versions.py
  • Line: [Not visible in the diff]
  • Snippet: [Not visible in the diff]
  • Issue: The verify_versions.py script appears to parse diffs and validate version changes. However, the logic for handling recursive or circular dependencies is not visible in the diff. If not handled, this could lead to infinite loops or stack overflows.
  • Fix Summary: Ensure that the verify_versions.py script includes safeguards against recursive or circular dependencies when validating version changes.

Non-Blocking Concerns (Questions, Suggestions, and Nitpicks)

1. Error Handling in get_stack_versions

  • File: dev-tools/utils.py
  • Line: 531
  • Snippet: log_warn(f"Failed to extract stack versions: {e}")
  • Concern: The function logs a warning but does not provide a fallback mechanism if it fails to extract stack versions. This could lead to incomplete or incorrect validation.
  • Suggestion: Consider adding a fallback mechanism (e.g., using default values or skipping validation) to handle cases where stack version extraction fails.

2. Hardcoded Rules in AI Prompts

  • File: dev-tools/ai_reviewer.py
  • Line: 45
  • Snippet:
    f"Rules:\n"
    f"- DO NOT suggest downgrading any versions listed in the 'Current Stack Versions' section.\n\n"
  • Concern: Hardcoding rules in the AI prompt makes it difficult to update or customize them in the future.
  • Suggestion: Move the rules to a configuration file or a centralized location to make them easier to manage and update.

3. Error Handling in review_file

  • File: dev-tools/ai_reviewer.py
  • Line: 38
  • Snippet:
    except Exception as e:
        print(f"Error reading file {file_path}: {e}", file=sys.stderr)
        sys.exit(1)
  • Concern: Using sys.exit(1) directly in a utility function makes it harder to reuse the function in other contexts.
  • Suggestion: Raise a custom exception instead of calling sys.exit(1). Let the caller decide how to handle the error.

4. Use of sys.path.insert

  • File: dev-tools/tdw_services/services/ai_service.py
  • Line: 10
  • Snippet:
    sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..')))
  • Concern: While this is acceptable for the current CI invocation pattern, it can lead to unexpected behavior if the script is executed in a different context.
  • Suggestion: Consider using a more robust method for managing imports, such as packaging the dev-tools directory as a Python module.

5. Potential Performance Issue in get_stack_versions

  • File: dev-tools/utils.py
  • Line: 503
  • Snippet:
    for filename in os.listdir(workflow_dir):
        if filename.endswith(".yml") or filename.endswith(".yaml"):
            with open(os.path.join(workflow_dir, filename), "r") as f:
                content = f.read()
                matches = re.findall(r"uses:\s+([\w\-/]+)@([\w\.]+)", content)
  • Concern: Reading all workflow files into memory and using regex to extract versions may be inefficient for large repositories with many workflows.
  • Suggestion: Consider using a streaming approach to process the files line by line, which would be more memory-efficient.

Final Verdict

Given the high-severity issues identified, particularly the missing import and the lack of proper error handling, this PR cannot be approved in its current state.



Generated by github-models-code-review

This change implements a deterministic validation layer and context-injection step to prevent AI agents from downgrading dependencies or modifying core runtime versions.

Key changes:
- Added `get_stack_versions()` to `dev-tools/utils.py` to extract ground-truth versions from the repo.
- Injected current stack versions into AI prompts in `ai_service.py` and `ai_reviewer.py` for factual grounding, with robust error handling.
- Created `dev-tools/verify_versions.py` to parse diffs and detect version downgrades/hard-blocks, supporting both standard and synthesized diffs.
- Implemented hard blocks on Node.js version modifications in `scripts/check-runtime.mjs` and `scripts/check-runtime-files.mjs` (overridable via ALLOW_NODE_VERSION_CHANGE=true).
- Integrated version verification into `td_cli.py` (`td gh verify-versions`) and `Orchestrator.pre_submit_checks`.
- Added post-processing to `AIClient.resolve_file_conflicts` to reject resolutions containing version violations using synthesized diff validation.
- Added comprehensive unit tests in `tests/dev-tools/test_version_protection.py`.
- Updated `dev-tools/pyproject.toml` with `packaging` dependency and version constraints.
This change implements a deterministic validation layer and context-injection step to prevent AI agents from downgrading dependencies or modifying core runtime versions.

Key improvements:
- Robust version extraction using a "highest version found" heuristic for GitHub Actions.
- Deterministic diff parsing in `verify_versions.py` that correctly correlates removals and additions.
- Scalable CLI implementation using temporary files for large diffs.
- Multi-layered defense: prompt grounding, CI validation, and overridable hard runtime blocks.
- Autonomous guardrails for AI-generated conflict resolutions.
- Comprehensive test coverage.
This change implements a deterministic validation layer and context-injection step to prevent AI agents from downgrading dependencies or modifying core runtime versions.

Key improvements:
- Live version querying: `get_stack_versions(fetch_latest=True)` now queries npm/GitHub registries to provide absolute latest versions to AI models.
- Robust version extraction: Uses a "highest version found" heuristic for GitHub Actions across all workflows.
- Scalable CLI: Passes diffs via temporary files to avoid OS command-line length limits.
- Accurate Diff Parsing: Fixed correlation logic in `verify_versions.py` to correctly handle multi-line version updates.
- Hard Runtime Blocks: Node.js version modifications are strictly locked unless explicitly overridden via ALLOW_NODE_VERSION_CHANGE.
- Autonomous Guardrails: AI resolutions are automatically rejected if they introduce version regressions.
- Comprehensive Test Suite: New tests cover synthesized diffs, sensitive file filtering, and multi-line correlation.
@arii

arii commented Jun 26, 2026

Copy link
Copy Markdown
Owner

🤖 AI Technical Audit

ANTI-AI-SLOP

This PR introduces a robust, multi-layered defense against AI-induced version downgrades, a critical issue for maintaining the stability and security of our tech stack. The solution covers context grounding for AI models, deterministic validation during PRs, strict hard locks, and autonomous guards for AI-generated conflict resolutions. The architecture is well-thought-out, addressing the problem from multiple angles.

There are no signs of AI slop. The code is concise, focused, and directly addresses the problem. The test coverage for the new verify_versions.py script is excellent, covering various scenarios including edge cases and overrides. The use of packaging.version for robust version comparison is a good technical decision.

Minor Points:

  1. Redundancy (Positive): The Node.js hard-blocking logic appears in scripts/check-runtime-files.mjs, scripts/check-runtime.mjs, and verify_versions.py. This is intentional and contributes to the multi-layered defense, ensuring that version changes are caught at different stages (diff validation, pre-commit/CI runtime checks). This is a good example of defense-in-depth, not slop.
  2. get_stack_versions Fallbacks: The hardcoded fallback versions for node and pnpm in get_stack_versions ("node": "24.16.0", "pnpm": "10.28.2") are immediately overwritten by file reads. While generally harmless, in a scenario where no files are found, these static values could become stale. This is a minor concern, as the primary goal is to extract current versions from the repo. No changes are required here, but it's noted.

FINAL RECOMMENDATION

Approved with Minor Changes

DEFINITION OF DONE

The only blocking issue is a circular import dependency that should be refactored for better modularity and robustness. Once this is addressed, the PR can be approved.

Review automatically published via RepoAuditor.

This change implements a deterministic validation layer and context-injection step to prevent AI agents from downgrading dependencies or modifying core runtime versions.

Key improvements:
- Live version querying: `get_stack_versions(fetch_latest=True)` now queries npm/GitHub registries to provide absolute latest versions to AI models.
- Robust version extraction: Uses a "highest version found" heuristic for GitHub Actions across all workflows.
- Scalable CLI: Passes diffs via temporary files to avoid OS command-line length limits.
- Accurate Diff Parsing: Re-implemented correlation logic in `verify_versions.py` to correctly handle multi-line version updates.
- Modular Design: Consolidated version querying and comparison logic into `utils.py` to resolve circular dependencies.
- Hard Runtime Blocks: Node.js version modifications are strictly locked unless explicitly overridden via ALLOW_NODE_VERSION_CHANGE.
- Autonomous Guardrails: AI resolutions are automatically validated and rejected if they introduce version regressions.
- Comprehensive Test Suite: New tests cover synthesized diffs, sensitive file filtering, and multi-line correlation.
This change implements a deterministic validation layer and context-injection step to prevent AI agents from downgrading dependencies or modifying core runtime versions.

Key improvements:
- Live version querying: `get_stack_versions(fetch_latest=True)` now queries npm/GitHub/Node.js registries to provide absolute latest versions to AI models.
- Robust version extraction: Uses a "highest version found" heuristic for GitHub Actions across all workflows.
- Scalable CLI: Passes diffs via temporary files to avoid OS command-line length limits.
- Accurate Diff Parsing: Re-implemented correlation logic in `verify_versions.py` to correctly handle multi-line version updates.
- Modular Design: Consolidated version querying and comparison logic into `version_utils.py` to resolve circular dependencies and improve maintainability.
- Hard Runtime Blocks: Node.js version modifications are strictly locked unless explicitly overridden via ALLOW_NODE_VERSION_CHANGE.
- Autonomous Guardrails: AI resolutions are automatically validated and rejected if they introduce version regressions.
- Comprehensive Test Suite: New tests cover synthesized diffs, sensitive file filtering, and multi-line correlation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prevent AI-Induced Version Downgrades (Knowledge Cutoff Regression)

1 participant