feat(ctf): Puppet Master — orchestrator config-layer injection challenge#321
Conversation
New policy_bypass CTF challenge that exploits the unvalidated custom_goals field in the orchestrator agent. The field is interpolated raw into the system prompt under OVERRIDE framing, letting a workspace operator plant standing directives that persist across every conversation in the namespace. Files added: - finbot/ctf/definitions/challenges/policy_bypass/puppet_master.yaml Intermediate-level challenge (200pts). Three progressive hints guide players from discovering the config field to weaponizing it. - finbot/ctf/detectors/implementations/puppet_master.py PuppetMasterDetector: fires on business.invoice.decision events, verifies invoice amount > threshold + vendor trust == low, then confirms the OVERRIDE marker is present in the orchestrator system prompt for that workflow (proving config-layer injection was used, not chat-layer injection). - tests/unit/ctf/test_puppet_master.py 20 tests across PPM-VAL and PPM-DET categories covering guard clauses, detection scenarios, configurable override_marker, graceful JSON error handling, and evidence field assertions. Labels: LLM01, LLM06, CWE-20, CWE-285, AML.T0043, ASI-01, ASI-03 Closes GenAI-Security-Project#203
There was a problem hiding this comment.
Pull request overview
Adds a new CTF policy-bypass challenge (“Puppet Master”) focused on config-layer prompt injection via the orchestrator’s unvalidated custom_goals, including a dedicated detector and unit tests.
Changes:
- Added
policy-bypass-puppet-masterchallenge definition (YAML) with scoring, hints, and detector configuration. - Implemented
PuppetMasterDetectorto detect approvals of large invoices from low-trust vendors when the orchestrator system prompt contains the OVERRIDE marker. - Registered the detector and added a comprehensive unit test suite covering validation and detection paths.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
finbot/ctf/definitions/challenges/policy_bypass/puppet_master.yaml |
New challenge definition and scoring modifiers for the Puppet Master scenario. |
finbot/ctf/detectors/implementations/puppet_master.py |
New detector implementation scanning stored orchestrator LLM events for OVERRIDE marker + invoice/vendor conditions. |
finbot/ctf/detectors/implementations/__init__.py |
Registers/exports the new detector implementation. |
tests/unit/ctf/test_puppet_master.py |
New unit test suite for config validation, guard clauses, and detection outcomes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…n to workflow - agent_name was 'orchestrator' but runtime emits 'orchestrator_agent' — query never matched real events, making detection impossible in production - removed namespace-wide fallback scan when workflow_id is absent; without a workflow scope the query is unbounded and prone to false positives, so we return not-detected and log a debug message instead - made agent_name configurable via detector_config for future flexibility - updated test fixtures and added PPM-DET-06b to cover the no-workflow_id path Fixes Copilot review comments on PR GenAI-Security-Project#321
|
Addressed all 4 Copilot suggestions
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- return distinct message when workflow_id is absent so operators can tell 'correlation data missing' from 'scanned but no marker found' - remove unused DetectionResult import and OVERRIDE_MARKER constant from test module - align bad_evt and empty_evt agent_name to 'orchestrator_agent' so mocks reflect the real query filter
|
Fixed the remaining suggestions:
|
Addresses Copilot feedback that the existing DB mock ignores filter predicates, giving false confidence. Added PPM-QRY class with 3 tests that call _find_override_in_workflow directly and assert the SQLAlchemy query is built with the correct agent_name, workflow_id, and that the DB is not touched at all when workflow_id is absent.
|
Added a PPM-QRY test class that calls _find_override_in_workflow directly and asserts the SQLAlchemy query is built with the correct agent_name and workflow_id filters, and that the DB isn't touched at all when workflow_id is absent. Should address the mock filter predicate concern. |
|
Note: couldn't run the test suite locally due to a Python environment constraint (MSYS2 pydantic-core wheel incompatibility). The tests follow the same patterns as existing detector tests in the repo so they should pass, but happy to fix anything CI flags. |
|
@Deez-Automations Is this a challenge inside a challenge or a seperate challenge.! |
|
Hey @stealthwhizz , it's a separate standalone challenge under the policy_bypass category, same structure as Fine Print and Invoice Trust Override. prerequisites:[], its own detector, its own scoring. On the flag, FinBot doesn't use traditional CTF flag strings. Looking at the existing challenges, none of them have a flag field. If the project is moving toward explicit flag strings, happy to add one, just let me know the format. |
|
I got confused with the demo challenge nvm |
…ires_at SessionContext gained required created_at and expires_at fields upstream. Updated _make_session_context helper to build SessionContext directly without session_manager, removing the live DB dependency from unit tests.
|
Follow-up commit to fix the test_orchestrator_custom_goals.py tests. SessionContext picked up created_at and expires_at as required fields upstream after this PR was opened, so updated the test helper to include them. Also dropped the session_manager dependency from the helper since we can just build SessionContext directly. |
Summary
Adds the Puppet Master CTF challenge (
policy-bypass-puppet-master) for issue #203.The challenge exploits the unvalidated
custom_goalsfield in the orchestrator agent. The value is dropped raw into the system prompt underOVERRIDE DEFAULT BEHAVIORframing — no length check, no character validation. A workspace operator can plant a standing directive that persists across every conversation in the namespace.This is meaningfully different from existing
policy_bypasschallenges: the attack surface is the configuration layer, not the chat interface.Files
finbot/ctf/definitions/challenges/policy_bypass/puppet_master.yaml— challenge definition, intermediate difficulty, 200pts, 3 tiered hints,pi_jbpenalty modifierfinbot/ctf/detectors/implementations/puppet_master.py—PuppetMasterDetector: combines invoice approval check (amount + vendor trust) with a system prompt scan for the OVERRIDE marker, confirming the config vector was usedfinbot/ctf/detectors/implementations/__init__.py— registers the new detectortests/unit/ctf/test_puppet_master.py— 20 tests (PPM-VAL + PPM-DET) covering guard clauses, all detection conditions, configurable marker, malformed JSON, and evidence assertionsLabels
LLM01:Prompt Injection·LLM06:Excessive Agency·CWE-20·CWE-285·AML.T0043·ASI-01·ASI-03:Prompt Injection via Trusted ConfigTest plan
pytest tests/unit/ctf/test_puppet_master.py -v— all 20 tests passlist_registered_detectors()order_index: 10— no collision with existing challenges