[ML] Add ML_SKIP_MODEL_VALIDATION bypass for graph validation by edsavage · Pull Request #3013 · elastic/ml-cpp

edsavage · 2026-03-26T03:20:03Z

Summary

Adds an environment variable escape hatch to bypass TorchScript model graph validation
When ML_SKIP_MODEL_VALIDATION=true is set in the process environment before pytorch_inference starts, the allowlist check is skipped and a warning is logged
Provides a zero-rebuild way to disable validation in an emergency — an operator can set the env var in the deployment configuration (systemd, Docker, Kubernetes pod spec) without needing a new ml-cpp build or Elasticsearch release
Default behaviour (validation enabled) is unchanged
Only the exact value "true" activates the bypass; any other value or unset means validation runs normally

Test plan

Built and ran CModelGraphValidatorTest suite locally — all tests pass
Integration test: ML_SKIP_MODEL_VALIDATION=true bypasses validation for a malicious model (PASS)
Integration test: ML_SKIP_MODEL_VALIDATION=false still validates normally (PASS)
Integration test: benign model passes validation as before (PASS)
CI passes

Provides an emergency escape hatch to bypass TorchScript model graph validation without requiring a code change or rebuild. When ML_SKIP_MODEL_VALIDATION is set (to any value), the pytorch_inference process skips the graph validator and logs a warning. Elasticsearch can set this environment variable for the native process via its ML settings, allowing operators to unblock model deployments immediately if the validator incorrectly rejects a legitimate model. Made-with: Cursor

prodsecmachine · 2026-03-26T03:20:17Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Extends the evil model integration test to verify that: - ML_SKIP_MODEL_VALIDATION=true bypasses graph validation (with warning logged) - ML_SKIP_MODEL_VALIDATION=false still validates (only exact "true" activates the bypass) Made-with: Cursor

edsavage · 2026-03-26T03:44:26Z

Bypass Deployment Guide

The ML_SKIP_MODEL_VALIDATION=true environment variable is an operator-level emergency lever — it doesn't require a code change or release. It must be set in the process environment before pytorch_inference starts.

Who sets it and how

Deployment	Operator	How to set
Self-managed (bare metal/VM)	Cluster admin	`export ML_SKIP_MODEL_VALIDATION=true` before starting ES, or add to `/etc/default/elasticsearch` / systemd unit override
Self-managed (Docker)	Cluster admin	`docker run -e ML_SKIP_MODEL_VALIDATION=true ...`
Self-managed (Kubernetes)	Cluster admin	Add to pod spec `env:` field or ConfigMap
Elastic Cloud managed	Elastic Cloud ops team	Deployment configuration
Serverless	Elastic platform/SRE team	Kubernetes pod spec on ML nodes

Important notes

This is not a user-facing setting — it requires infrastructure access
Only the exact value "true" activates the bypass; any other value or unset means validation runs normally
When active, a WARN log is emitted: "Model graph validation SKIPPED" — this is visible in ES node logs
The env var is inherited by all pytorch_inference child processes on the node, so it disables validation for all models on that node

Copilot

Pull request overview

Adds an environment-variable “kill switch” to bypass TorchScript model graph validation in pytorch_inference, plus a Python integration script intended to exercise validator behavior (including the bypass).

Changes:

Add ML_SKIP_MODEL_VALIDATION=true env-var check to skip verifySafeModel() and emit a warning.
Add a standalone Python script that generates known-malicious TorchScript models and runs pytorch_inference to confirm rejection/bypass behavior.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`bin/pytorch_inference/Main.cc`	Adds the `ML_SKIP_MODEL_VALIDATION` env-var bypass around `verifySafeModel()` with warning logging.
`test/test_pytorch_inference_evil_models.py`	Adds a standalone integration script to generate “evil” models and validate expected `pytorch_inference` behavior (including bypass).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T21:21:59Z

test/test_pytorch_inference_evil_models.py

+                generate_model(spec["class"], model_path)
+                print(f"  Model generated: {model_path.name} ({model_path.stat().st_size} bytes)")
+            except Exception as e:
+                print(f"  SKIP: could not generate model: {e}")


If TorchScript scripting fails for a model (e.g., due to Torch version differences), this test currently prints SKIP and continues, which can result in an overall PASS without having exercised the validator at all. For a security regression test, it would be safer to treat model-generation failures as a test failure (or at least fail when the expected-rejected models can’t be generated).

Suggested change

print(f" SKIP: could not generate model: {e}")

print(f" FAIL: could not generate model: {e}")

all_passed = False

Copilot · 2026-03-26T21:21:59Z

test/test_pytorch_inference_evil_models.py

+    raise FileNotFoundError(
+        "Could not find pytorch_inference binary. "
+        "Build from the feature/harden_pytorch_inference branch, or pass --binary."
+    )


This script’s requirements/error message still references building from the "feature/harden_pytorch_inference" branch. That’s likely to become stale/confusing once this change is on main; consider updating the wording to refer to a built pytorch_inference binary (or a minimum version) rather than a specific branch name.

Copilot · 2026-03-26T21:21:59Z

test/test_pytorch_inference_evil_models.py

+Requires: torch, a built pytorch_inference binary with graph validation
+          (feature/harden_pytorch_inference branch or later).


The docstring says this requires a binary built from the "feature/harden_pytorch_inference" branch. Since this file is being added to the mainline repo, consider updating this to a stable requirement (e.g., “a pytorch_inference binary built from this repo at/after ”) to avoid confusion for future readers.

Suggested change

Requires: torch, a built pytorch_inference binary with graph validation

(feature/harden_pytorch_inference branch or later).

Requires: torch, and a built pytorch_inference binary from this repository

with graph validation enabled (i.e., including the

CModelGraphValidator checks).

- Update stale branch references to generic requirements - Treat model generation failures as test failures, not skips — for security regression tests, silently skipping is unsafe Made-with: Cursor

valeriy42

I see the reason for wanting an escape patch, but setting an environment variable is not a practical solution. You need a cluster setting and a --skipValidation flag on the pytorch_inference process.

edsavage requested review from Copilot and valeriy42 and removed request for Copilot March 26, 2026 03:47

edsavage added >non-issue :ml v9.4.0 labels Mar 26, 2026

edsavage changed the title ~~[ML] Add ML_SKIP_MODEL_VALIDATION kill switch for graph validation~~ [ML] Add ML_SKIP_MODEL_VALIDATION bypass for graph validation Mar 26, 2026

edsavage requested a review from Copilot March 26, 2026 21:17

Copilot started reviewing on behalf of edsavage March 26, 2026 21:17 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

[ML] Address Copilot review on kill switch test

242ddfd

- Update stale branch references to generic requirements - Treat model generation failures as test failures, not skips — for security regression tests, silently skipping is unsafe Made-with: Cursor

valeriy42 reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Add ML_SKIP_MODEL_VALIDATION bypass for graph validation#3013

[ML] Add ML_SKIP_MODEL_VALIDATION bypass for graph validation#3013
edsavage wants to merge 3 commits intoelastic:mainfrom
edsavage:feature/model-validation-kill-switch

edsavage commented Mar 26, 2026 •

edited

Loading

Uh oh!

prodsecmachine commented Mar 26, 2026 •

edited

Loading

Uh oh!

edsavage commented Mar 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

valeriy42 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	print(f" SKIP: could not generate model: {e}")
	print(f" FAIL: could not generate model: {e}")
	all_passed = False

		Requires: torch, a built pytorch_inference binary with graph validation
		(feature/harden_pytorch_inference branch or later).

-Requires: torch, a built pytorch_inference binary with graph validation
-          (feature/harden_pytorch_inference branch or later).
+Requires: torch, and a built pytorch_inference binary from this repository
+          with graph validation enabled (i.e., including the
+          CModelGraphValidator checks).

Conversation

edsavage commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

prodsecmachine commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

edsavage commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bypass Deployment Guide

Who sets it and how

Important notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

valeriy42 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

edsavage commented Mar 26, 2026 •

edited

Loading

prodsecmachine commented Mar 26, 2026 •

edited

Loading

edsavage commented Mar 26, 2026 •

edited

Loading