docs: don't exec() LLM-generated workflow code in test snippet by sebastiondev · Pull Request #794 · nottelabs/notte

sebastiondev · 2026-05-01T05:19:46Z

Summary

The documentation snippet that teaches users how to test agent-generated workflow code uses exec() on a Python string returned by agent.workflow.code(). Because that string is produced by an LLM whose inputs include attacker-controllable web content, copy-pasting and running the snippet can lead to arbitrary code execution on the user's machine.

This PR replaces the exec() call in the docs example with a write-to-file pattern and adds a <Warning> admonition reminding users to review LLM-generated code before running it.

Vulnerability details

Class: CWE-94 (Improper Control of Generation of Code — 'Code Injection') via LLM output
Affected files:
- docs/src/snippets/agents/workflows/test-generated-functions.mdx
- docs/src/testers/agents/workflows/test-generated-functions.py
- docs/src/features/agents/workflows.mdx (added warning)
Data flow: A page visited by the agent → captured into the agent's reasoning context → influences agent.workflow.code() output → exec(code.python_script) in the user's local Python process.

The relevant lines previously read:

# Test in fresh session
with client.Session() as session:
    exec(code.python_script)

Anyone following the documented quickstart for "test generated functions" would be running unreviewed LLM-generated code in their own interpreter.

Fix

Replace the exec() with a save-to-file step and prompt the user to review before running:

# Save for review before testing in a fresh session
with open("generated_function.py", "w") as generated_file:
    generated_file.write(code.python_script)

print("Saved generated_function.py. Review the code before running it in a fresh session.")

A <Warning> block is also added to workflows.mdx next to the <ExecuteGeneratedCode /> snippet so the guidance is visible inline with the example.

Why this is exploitable

The workflow code generator's prompts incorporate page content the agent has observed during recording. Indirect prompt injection from any visited site (a comment field, a search result, a fake error message in a screenshot, etc.) can steer the model into emitting Python that does whatever the attacker wants — file exfiltration, reverse shell, credential theft from ~/.config, etc. Users following the docs literally would execute that payload in their own shell with their own privileges.

Preconditions:

User copy-pastes the documented snippet (the explicit purpose of the page is to demonstrate the pattern).
The recorded workflow's page content can influence the generated script — this is the standard attack surface for any agent that browses untrusted sites.

No additional auth or network position is required beyond the agent visiting attacker-controlled content, which is the normal operating mode.

What I tested

grep -R "exec(code" docs/ packages/ — confirmed no remaining instances of the pattern in the repo after the change.
Verified the @sniptest show= line range was updated (6-13 → 6-17) so the rendered snippet still matches the tester file.
Diff is minimal and confined to documentation/snippet files; no runtime code is affected.

Adversarial review

Before submitting, I tried to talk myself out of this. Possible counter-arguments: (a) "it's only docs, not shipped code" — true, but the docs are the canonical instruction set users follow, and the snippet is a complete copy-paste recipe; (b) "users should know not to exec() LLM output" — many won't, especially when the official example does exactly that; (c) "there's a sandboxing layer somewhere" — there isn't; the snippet runs in the user's local Python process. None of these neutralize the issue, and the fix has zero functional cost — it just defers execution one manual step.

cc @lewiswigmore

Summary by CodeRabbit

Release Notes

Documentation
- Added prominent warnings against direct execution of AI-generated code
- Updated code examples to demonstrate safer practices: saving generated code to files, manual review, and execution in isolated environments
- Reinforced importance of human code review and approval before running LLM-generated workflows

Note

Removes exec(code.python_script) from the "test generated functions" documentation snippet and replaces it with a write-to-file pattern. Adds a <Warning> admonition in workflows.mdx advising users to review LLM-generated code before running it. Updates the @sniptest show= line range to match the new snippet length.

^{Written by Mendral for commit a330e5b.}

greptile-apps · 2026-05-01T05:19:50Z

PR author is not in the allowed authors list.

coderabbitai · 2026-05-01T05:20:00Z

Walkthrough

This pull request modifies documentation and code snippets related to agent workflows to enhance security guidance around LLM-generated code execution. A warning section is added to the primary documentation discouraging direct use of exec() for unreviewed generated code. Concurrently, two code snippet examples are updated to replace the exec()-based execution pattern with an alternative approach: writing generated Python code to a file named generated_function.py and providing a message prompting users to manually review the code before executing it in an isolated session.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: removing exec() usage from LLM-generated workflow code in documentation examples.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

2027-evals · 2026-05-01T05:30:50Z

⚠️ Couldn't find a preview deployment for commit a330e5b after 10 minutes.

2027 auto-runs evals against preview deployments of your docs. To enable this, install one of:

Mintlify — if you use Mintlify docs
Vercel — for Next.js / static sites
Netlify — for most static docs

Once a preview is deployed, open a new PR and we'll run the eval automatically.

Evaluating agent experience using 2027.dev · View dashboard

mendral-app

LGTM

The fix is correct and complete — exec() on LLM-generated code is removed from both the rendered snippet and the tester file, the warning is placed at the right callsite, and the @sniptest range is updated to stay in sync. CI failure is not caused by this PR — it matches the known integration test flakiness issue (~85% failure rate on main due to staging API instability and xdist OOM crashes); no action needed from the author.

_{Tag @mendral-app with feedback or questions. View session}

fix: remove exec from generated workflow example

a330e5b

mendral-app Bot reviewed May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: don't exec() LLM-generated workflow code in test snippet#794

docs: don't exec() LLM-generated workflow code in test snippet#794
sebastiondev wants to merge 1 commit into
nottelabs:mainfrom
sebastiondev:fix/cwe94-exec-docs-v3

sebastiondev commented May 1, 2026 •

edited by mendral-app Bot

Loading

Uh oh!

greptile-apps Bot commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026 •

edited

Loading

Uh oh!

2027-evals Bot commented May 1, 2026

Uh oh!

mendral-app Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sebastiondev commented May 1, 2026 • edited by mendral-app Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Vulnerability details

Fix

Why this is exploitable

What I tested

Adversarial review

Summary by CodeRabbit

Release Notes

Uh oh!

greptile-apps Bot commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

2027-evals Bot commented May 1, 2026

Uh oh!

mendral-app Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sebastiondev commented May 1, 2026 •

edited by mendral-app Bot

Loading

coderabbitai Bot commented May 1, 2026 •

edited

Loading