[ML] Add ML_SKIP_MODEL_VALIDATION bypass for graph validation#3013
[ML] Add ML_SKIP_MODEL_VALIDATION bypass for graph validation#3013edsavage wants to merge 3 commits intoelastic:mainfrom
Conversation
Provides an emergency escape hatch to bypass TorchScript model graph validation without requiring a code change or rebuild. When ML_SKIP_MODEL_VALIDATION is set (to any value), the pytorch_inference process skips the graph validator and logs a warning. Elasticsearch can set this environment variable for the native process via its ML settings, allowing operators to unblock model deployments immediately if the validator incorrectly rejects a legitimate model. Made-with: Cursor
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Extends the evil model integration test to verify that: - ML_SKIP_MODEL_VALIDATION=true bypasses graph validation (with warning logged) - ML_SKIP_MODEL_VALIDATION=false still validates (only exact "true" activates the bypass) Made-with: Cursor
Bypass Deployment GuideThe Who sets it and how
Important notes
|
There was a problem hiding this comment.
Pull request overview
Adds an environment-variable “kill switch” to bypass TorchScript model graph validation in pytorch_inference, plus a Python integration script intended to exercise validator behavior (including the bypass).
Changes:
- Add
ML_SKIP_MODEL_VALIDATION=trueenv-var check to skipverifySafeModel()and emit a warning. - Add a standalone Python script that generates known-malicious TorchScript models and runs
pytorch_inferenceto confirm rejection/bypass behavior.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
bin/pytorch_inference/Main.cc |
Adds the ML_SKIP_MODEL_VALIDATION env-var bypass around verifySafeModel() with warning logging. |
test/test_pytorch_inference_evil_models.py |
Adds a standalone integration script to generate “evil” models and validate expected pytorch_inference behavior (including bypass). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| generate_model(spec["class"], model_path) | ||
| print(f" Model generated: {model_path.name} ({model_path.stat().st_size} bytes)") | ||
| except Exception as e: | ||
| print(f" SKIP: could not generate model: {e}") |
There was a problem hiding this comment.
If TorchScript scripting fails for a model (e.g., due to Torch version differences), this test currently prints SKIP and continues, which can result in an overall PASS without having exercised the validator at all. For a security regression test, it would be safer to treat model-generation failures as a test failure (or at least fail when the expected-rejected models can’t be generated).
| print(f" SKIP: could not generate model: {e}") | |
| print(f" FAIL: could not generate model: {e}") | |
| all_passed = False |
| raise FileNotFoundError( | ||
| "Could not find pytorch_inference binary. " | ||
| "Build from the feature/harden_pytorch_inference branch, or pass --binary." | ||
| ) |
There was a problem hiding this comment.
This script’s requirements/error message still references building from the "feature/harden_pytorch_inference" branch. That’s likely to become stale/confusing once this change is on main; consider updating the wording to refer to a built pytorch_inference binary (or a minimum version) rather than a specific branch name.
| Requires: torch, a built pytorch_inference binary with graph validation | ||
| (feature/harden_pytorch_inference branch or later). |
There was a problem hiding this comment.
The docstring says this requires a binary built from the "feature/harden_pytorch_inference" branch. Since this file is being added to the mainline repo, consider updating this to a stable requirement (e.g., “a pytorch_inference binary built from this repo at/after ”) to avoid confusion for future readers.
| Requires: torch, a built pytorch_inference binary with graph validation | |
| (feature/harden_pytorch_inference branch or later). | |
| Requires: torch, and a built pytorch_inference binary from this repository | |
| with graph validation enabled (i.e., including the | |
| CModelGraphValidator checks). |
- Update stale branch references to generic requirements - Treat model generation failures as test failures, not skips — for security regression tests, silently skipping is unsafe Made-with: Cursor
valeriy42
left a comment
There was a problem hiding this comment.
I see the reason for wanting an escape patch, but setting an environment variable is not a practical solution. You need a cluster setting and a --skipValidation flag on the pytorch_inference process.
Summary
ML_SKIP_MODEL_VALIDATION=trueis set in the process environment beforepytorch_inferencestarts, the allowlist check is skipped and a warning is logged"true"activates the bypass; any other value or unset means validation runs normallyTest plan
CModelGraphValidatorTestsuite locally — all tests passML_SKIP_MODEL_VALIDATION=truebypasses validation for a malicious model (PASS)ML_SKIP_MODEL_VALIDATION=falsestill validates normally (PASS)