Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions docs/specs/SPEC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# PolicyStack Oracle Specification (`policyctl` + `policy_federation` surface)

Status: draft
Owner: spec/policystack-oracle

Scope: CLI command surface and package API under `cli/src/policy_federation/`.
Only source-backed behavior in `cli/src/policy_federation` and `cli/pyproject.toml` is covered.

## Functional Requirements

- **FR-1**: `policyctl resolve` SHALL output a resolved payload containing `policy_hash`, `scope_chain`, `policy`, `source_files`, and `resolved_at` for a given `--harness`/`--domain`.
- **Mapped to**: `cli.py:resolve_command` -> `resolver.py:resolve`; `resolver.py:_policy_layers`
- **TESTABLE**: Run `policyctl resolve` on a fixture repo and assert the JSON includes the listed keys and `policy_hash` is non-empty.

- **FR-2**: `policyctl manifest` SHALL list active policy layer scope names and source paths in merge order.
- **Mapped to**: `cli.py:manifest_command` and `resolver.py:_policy_layers`
- **TESTABLE**: Given layered policy files, run `policyctl manifest` and assert layers are present with deterministic order and expected source paths.

- **FR-3**: `policyctl check` SHALL validate one or all policy files using schema + authorization block validation.
- **Mapped to**: `cli.py:check_command`, `validate.py:validate_policy_file`
- **TESTABLE**: Run `policyctl check` against valid and invalid policy YAML files; valid set exits 0 and invalid set raises `SystemExit` with failure.

- **FR-4**: `policyctl evaluate` SHALL return the policy decision and a decision summary that includes `reason`, matched rules, and metadata for the request.
- **Mapped to**: `cli.py:evaluate_command`, `authorization.py:evaluate_authorization`
- **TESTABLE**: Given a rule that matches `action`/`command`, verify output decision equals the winning rule and includes `matched_rules`.

- **FR-5**: Authorization rule evaluation SHALL apply deterministic precedence: highest priority first, then stricter effect order (deny > ask > allow).
- **Mapped to**: `authorization.py:normalize_authorization_rules`, `authorization.py:evaluate_authorization`, `authorization.AuthorizationRule`
- **TESTABLE**: Provide two rules with same priority and different effects; assert effective decision follows deny over ask over allow.

- **FR-6**: `policyctl compile --target <supported>` SHALL produce target-specific `native_config`, canonical `defaults`, and `shim_rules` for non-native conditions/targets.
- **Mapped to**: `cli.py:compile_command`, `compiler.py:compile_target`, `compiler.SUPPORTED_TARGETS`
- **TESTABLE**: Compile a policy with conditional/runtime-only rules for each supported target and assert shim entries are generated as required.

- **FR-7**: Delegation shall attempt local-fast evaluation first, then cache lookup, then harness fallback chain, and return `allow|deny|ask` with confidence.
- **Mapped to**: `delegate.py:delegate_ask`, `delegate.py:_local_fast_evaluate`, `delegate.py:_get_cached_decision`, `delegate.py:_invoke_harness`, `delegate.HARNESS_FALLBACK`
- **TESTABLE**: Mock a local-fast allow path and a fallback path; assert returned `DelegateResult` contains expected `decision`, `source`, and `confidence`.

- **FR-8**: `policyctl intercept|write-check|network-check` and review modes SHALL map policy decisions to normalized exit behavior (`allow`/`deny`/`ask`) using `ALLOW_EXIT_CODE`, `DENY_EXIT_CODE`, `ASK_EXIT_CODE`.
- **Mapped to**: `interceptor.py:intercept_command`, `interceptor.py:ALLOW_EXIT_CODE`, `interceptor.py:DENY_EXIT_CODE`, `interceptor.py:ASK_EXIT_CODE`, `interceptor.py:_sources_hash`
- **TESTABLE**: Evaluate deny/allow/ask outcomes across modes and verify returned `exit_code` and `final_decision`.

- **FR-9**: `policyctl exec` SHALL verify policy integrity immediately before executing subprocess and deny execution on source hash change.
- **Mapped to**: `interceptor.py:run_guarded_subprocess`, `interceptor.py:intercept_command`, `resolver.py:hash_policy_sources`
- **TESTABLE**: Simulate `sources_hash` drift between pre-check and execution and verify subprocess is blocked with `DENY_EXIT_CODE`.

- **FR-10**: Audit workflows (`audit`, `verify`) shall support filtering by action/decision/time/actor and optional chain validation.
- **Mapped to**: `cli.py:audit_command`, `_parse_iso_datetime`, `_compute_audit_summary`, `runtime_artifacts.py:filter_audit_events`, `runtime_artifacts.py:verify_audit_chain`
- **TESTABLE**: Use a fixture audit log with mixed actions and decisions and assert filters and `verify_audit_chain` report expected counts/validity.

- **FR-11**: `policyctl add-rule` and `policyctl remove-rule` SHALL safely mutate policy YAML while enforcing duplicate/absence constraints.
- **Mapped to**: `cli.py:add_rule_command`, `cli.py:remove_rule_command`, `policy_editor.py:add_rule`, `policy_editor.py:remove_rule`
- **TESTABLE**: Add a unique rule then attempt duplicate add and non-existent remove; assert success and validation failures are represented in errors.

- **FR-12**: `policyctl diff` SHALL report added, removed, modified, and effect changes when comparing two policies.
- **Mapped to**: `cli.py:diff_command`, `policy_federation.policy_diff.diff_policies`
- **TESTABLE**: Compare two fixture policies with all four diff types and assert each output bucket is populated accordingly.

- **FR-13**: `policyctl verify` SHALL create `.policy-federation/verify` baseline on first run and fail with `tampered` if subsequent hash differs.
- **Mapped to**: `cli.py:verify_command`, `resolver.py:hash_policy_sources`
- **TESTABLE**: Run verify twice with unchanged files (status ok) and once after file touch (status tampered).

- **FR-14**: Runtime integration install/uninstall commands SHALL install/remove launcher wrappers and return structured before/after payloads.
- **Mapped to**: `cli.py:install_runtime_command`, `cli.py:uninstall_runtime_command`, `integrations.py:install_runtime_integrations`, `integrations.py:uninstall_runtime_integrations`
- **TESTABLE**: Invoke install then uninstall on temp HOME and assert payload includes wrapper and backup/restore metadata.

## Non-Functional Requirements

- **NFR-1**: JSON emission from CLI entrypoints SHALL be deterministic for the same inputs.
- **Mapped to**: `cli.py:_emit_json`, `resolve.py:resolve` ordering via `scope_chain`, `resolver.py:hash_policy_sources`
- **TESTABLE**: Run identical command twice against same fixtures and compare stable output JSON semantics including sorted keys.

- **NFR-2**: Delegation and API fallback paths SHALL bound runtime via explicit timeouts and fail-safe `ask` behavior on unresponsive or missing tools.
- **Mapped to**: `delegate.py:_invoke_harness`, `delegate.py:_invoke_api_harness`, `delegate.HARNESS_CONFIG`
- **TESTABLE**: Configure harness path/URL unavailable and assert returned result is `ask` and does not hang indefinitely.

- **NFR-3**: Audit event handling SHALL keep chain integrity checks non-crashing and report invalid entries with explicit invalid-event context.
- **Mapped to**: `runtime_artifacts.py:verify_audit_chain`, `runtime_artifacts.py:filter_audit_events`
- **TESTABLE**: Feed malformed events missing required fields and assert `valid=False` with details.

- **NFR-4**: Delegation cache shall support bounded TTL behavior and explicit clearability, and not crash policy enforcement when unavailable.
- **Mapped to**: `delegate.py:_get_cached_decision`, `delegate.py:_cache_decision`, `delegate.py:clear_cache`, `delegate.py:get_cache_stats`
- **TESTABLE**: Write and invalidate cache path entries and validate hit/miss behavior and clear_cache success boolean.

- **NFR-5**: Decision artifacts for execution failures (denial/tamper/path) SHALL include source action/rules context to support post-hoc investigation.
- **Mapped to**: `runtime_artifacts.py:build_permission_audit_event`, `runtime_artifacts.py:record_audit_event`, `interceptor.py:intercept_command`, `interceptor.py:run_guarded_subprocess`
- **TESTABLE**: Force a denied execution and assert audit payload contains `request.action`, `final_decision`, and `scope_chain`.

- **NFR-6**: Runtime context helpers SHALL avoid throwing on empty input and infer repository context from common worktree path patterns.
- **Mapped to**: `runtime_context.py:infer_repo_name_from_cwd`
- **TESTABLE**: Call with worktree and non-worktree paths and assert non-empty, deterministic repo inference.

## Acceptance Artifacts

- Pending Gherkin files: one per FR
- `docs/specs/acceptance/FR-1-resolve.command.feature`
- `docs/specs/acceptance/FR-2-manifest.command.feature`
- `docs/specs/acceptance/FR-3-check.command.feature`
- `docs/specs/acceptance/FR-4-evaluate.command.feature`
- `docs/specs/acceptance/FR-5-rule-precedence.command.feature`
- `docs/specs/acceptance/FR-6-compile.command.feature`
- `docs/specs/acceptance/FR-7-delegate.command.feature`
- `docs/specs/acceptance/FR-8-intercept-exit.command.feature`
- `docs/specs/acceptance/FR-9-exec-tamper.command.feature`
- `docs/specs/acceptance/FR-10-audit.command.feature`
- `docs/specs/acceptance/FR-11-policy-edit.command.feature`
- `docs/specs/acceptance/FR-12-diff.command.feature`
- `docs/specs/acceptance/FR-13-verify.command.feature`
- `docs/specs/acceptance/FR-14-runtime-install.command.feature`
7 changes: 7 additions & 0 deletions docs/specs/acceptance/FR-1-resolve.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
@pending
Feature: FR-1 Resolve policy contract
Scenario: Pending -- resolve command payload
Given a repository fixture with layered policy files
When I run `policyctl resolve --harness forge --domain devops`
Then I should receive JSON with keys policy_hash, scope_chain, policy, source_files, resolved_at
And final decision should be emitted via JSON mode

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The scenario asserts that policyctl resolve emits a final decision, but the resolve command only returns resolution metadata (policy hash, scope chain, merged policy, source files, timestamp, etc.) and does not compute authorization decisions. This creates an API contract mismatch and will make the acceptance test fail once implemented. Update this step to assert only resolve output fields, or switch to an evaluate/intercept command if decision output is required. [api mismatch]

Severity Level: Major ⚠️
- ❌ FR-1 resolve acceptance test fails against CLI behavior.
- ⚠️ Confuses resolve versus evaluate responsibilities in SPEC.
- ⚠️ Misleads consumers expecting decisions from resolve output.
Steps of Reproduction ✅
1. Inspect the acceptance spec at
`docs/specs/acceptance/FR-1-resolve.command.feature:3-7`, where the scenario for FR-1
includes the step on line 7: `And final decision should be emitted via JSON mode`,
implying that `policyctl resolve` returns a decision.

2. Open the oracle specification at `docs/specs/SPEC.md:11-25`, which defines FR-1 as
"`policyctl resolve` SHALL output a resolved payload containing `policy_hash`,
`scope_chain`, `policy`, `source_files`, and `resolved_at`" and separately defines FR-4 as
"`policyctl evaluate` SHALL return the policy decision and a decision summary", showing
that decisions belong to `evaluate`, not `resolve`.

3. Examine the CLI implementation at `cli/src/policy_federation/cli.py:10-20` where
`resolve_command(args: argparse.Namespace)` calls `resolver.resolve(...)` and then
`_emit_json(result)`; this command's JSON output is exactly the dictionary returned by
`resolver.resolve`.

4. Inspect `cli/src/policy_federation/resolver.py:4-80`, where `resolve(...)` returns a
payload containing `policy_hash`, `scope_chain`, `policy`, `authorization_summary`,
`policy_contract_versions`, `contract_count`, `conflicts`, `resolved_at`, `source_files`,
and `extensions`, but no `decision` or `final_decision` field; thus, when the FR-1
acceptance scenario is implemented and run against `policyctl resolve --harness forge
--domain devops`, the step asserting a final decision in JSON will fail because the
command never produces authorization decisions.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docs/specs/acceptance/FR-1-resolve.command.feature
**Line:** 7:7
**Comment:**
	*Api Mismatch: The scenario asserts that `policyctl resolve` emits a final decision, but the resolve command only returns resolution metadata (policy hash, scope chain, merged policy, source files, timestamp, etc.) and does not compute authorization decisions. This creates an API contract mismatch and will make the acceptance test fail once implemented. Update this step to assert only resolve output fields, or switch to an evaluate/intercept command if decision output is required.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

7 changes: 7 additions & 0 deletions docs/specs/acceptance/FR-10-audit.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
@pending
Feature: FR-10 Audit filtering and verification
Scenario: Pending -- audit query and chain check
Given an audit log with mixed actions and decisions
When I run `policyctl audit --log-path <fixture> --summary --since 2026-01-01T00:00:00Z --verify-chain`
Then summary fields total/by_decision/by_action should be present
And verify-chain result should indicate chain validity
8 changes: 8 additions & 0 deletions docs/specs/acceptance/FR-11-policy-edit.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
@pending
Feature: FR-11 Policy edit commands

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Replace the shorthand requirement reference with a full FR-XXX-NNN trace (or add an explicit FR trace tag) so this acceptance test complies with the repository FR traceability format. [custom_rule]

Severity Level: Minor ⚠️

Why it matters? 🤔

The repository rule requires FR-XXX-NNN references in test names, markers, tags, or docstrings. This feature file only uses the shorthand FR-11, which does not match the required FR trace format, so the suggestion identifies a real violation.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docs/specs/acceptance/FR-11-policy-edit.command.feature
**Line:** 2:2
**Comment:**
	*Custom Rule: Replace the shorthand requirement reference with a full `FR-XXX-NNN` trace (or add an explicit FR trace tag) so this acceptance test complies with the repository FR traceability format.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Scenario: Pending -- add and remove policy rules
Given a writable policy fixture
When I run `policyctl add-rule --file policies/test.yaml --id test-allow --effect allow --priority 5 --actions exec`
Then the rule should be persisted to the file
When I run `policyctl remove-rule --file policies/test.yaml --id test-allow`
Then the rule should be removed from the file
6 changes: 6 additions & 0 deletions docs/specs/acceptance/FR-12-diff.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
@pending
Feature: FR-12 Policy diff command
Scenario: Pending -- compare pre/post policy snapshots
Given two policy files with add/remove/modify/effect changes
When I run `policyctl diff before.yaml after.yaml`
Then output should include added_rules, removed_rules, modified_rules, and effect_changes
8 changes: 8 additions & 0 deletions docs/specs/acceptance/FR-13-verify.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
@pending
Feature: FR-13 Verify baseline hash
Scenario: Pending -- baseline tamper detection
Given a policies directory with existing policy files
When I run `policyctl verify`
Then status should become baseline-recorded
And a second run without changes should return ok
And after touching a file should return tampered and non-zero
8 changes: 8 additions & 0 deletions docs/specs/acceptance/FR-14-runtime-install.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
@pending
Feature: FR-14 Runtime integration install/uninstall
Scenario: Pending -- install and uninstall runtime wrappers
Given a temporary HOME directory
When I run `policyctl install-runtime`
Then install output should report installed wrappers and patched files
When I run `policyctl uninstall-runtime`
Then uninstall output should report removed wrappers and launcher restore state
6 changes: 6 additions & 0 deletions docs/specs/acceptance/FR-2-manifest.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
@pending
Feature: FR-2 Manifest layer discovery

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Replace the non-canonical FR reference with at least one traceable FR-XXX-NNN ID (for example in the feature title or a trace tag) so this acceptance test satisfies repository FR traceability rules. [custom_rule]

Severity Level: Minor ⚠️

Why it matters? 🤔

This acceptance test file is intended to satisfy FR traceability rules, but the only requirement reference in the feature title is FR-2, which is not in the required FR-XXX-NNN style mentioned by the repository rule. That makes the suggestion a real rule violation.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docs/specs/acceptance/FR-2-manifest.command.feature
**Line:** 2:2
**Comment:**
	*Custom Rule: Replace the non-canonical FR reference with at least one traceable `FR-XXX-NNN` ID (for example in the feature title or a trace tag) so this acceptance test satisfies repository FR traceability rules.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Scenario: Pending -- manifest command lists layers
Given a repository fixture with harness/domain policy layers
When I run `policyctl manifest --harness forge --domain devops`
Then the output should include ordered layer scope names and source file paths
6 changes: 6 additions & 0 deletions docs/specs/acceptance/FR-3-check.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
@pending
Feature: FR-3 Check policy files
Scenario: Pending -- policy check validates policy files
Given a valid policy fixture and an invalid policy fixture
When I run `policyctl check`
Then I should see successful result for the valid file and an error for invalid inputs
7 changes: 7 additions & 0 deletions docs/specs/acceptance/FR-4-evaluate.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
@pending
Feature: FR-4 Evaluate authorization
Scenario: Pending -- evaluate returns matched rule reasoning
Given a policy with a matching authorization rule for a command
When I run `policyctl evaluate --harness forge --domain devops --action exec --command "git status"`
Then JSON output should include decision and winning rule metadata
And reason should reference matched rule
7 changes: 7 additions & 0 deletions docs/specs/acceptance/FR-5-rule-precedence.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
@pending
Feature: FR-5 Rule precedence
Scenario: Pending -- deny precedence over lower effect
Given two authorization rules with the same priority but different effects
When I run `policyctl evaluate --harness forge --domain devops --action exec --command "rm -rf /tmp/test"`
Then deny should win when a tie for priority exists
And output should include ordered matching rules
7 changes: 7 additions & 0 deletions docs/specs/acceptance/FR-6-compile.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
@pending
Feature: FR-6 Compile policy to target
Scenario: Pending -- compile creates native config and shim rules
Given a policy with conditional and runtime-only rules
When I run `policyctl compile --target claude-code --harness forge --domain devops`
Then output should include target, defaults, native_config, and shim_rules fields
And shim_rules should include non-native-rule entries
6 changes: 6 additions & 0 deletions docs/specs/acceptance/FR-7-delegate.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
@pending
Feature: FR-7 Delegation pipeline

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Update the feature title to include a repository-compliant FR trace ID in the FR-XXX-NNN format so the acceptance test is traceable by the quality gate. [custom_rule]

Severity Level: Minor ⚠️

Why it matters? 🤔

The repository rule requires FR references in the FR-XXX-NNN format for test traceability.
This feature title only contains FR-7, which does not match the required pattern, so the violation is real.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docs/specs/acceptance/FR-7-delegate.command.feature
**Line:** 2:2
**Comment:**
	*Custom Rule: Update the feature title to include a repository-compliant FR trace ID in the `FR-XXX-NNN` format so the acceptance test is traceable by the quality gate.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Scenario: Pending -- delegate asks with local fast path and cache

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add an FR trace reference to the scenario name (or an equivalent scenario-level trace marker) using the FR-XXX-NNN format so the new test case has explicit requirement linkage. [custom_rule]

Severity Level: Minor ⚠️

Why it matters? 🤔

The rule requires new test cases to include an FR trace marker, requirement annotation, docstring trace, or FR-based test name.
This scenario name has no FR-XXX-NNN reference or equivalent trace marker, so the suggestion identifies a real violation.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docs/specs/acceptance/FR-7-delegate.command.feature
**Line:** 3:3
**Comment:**
	*Custom Rule: Add an FR trace reference to the scenario name (or an equivalent scenario-level trace marker) using the `FR-XXX-NNN` format so the new test case has explicit requirement linkage.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Given a command that can be decided by local-fast
When I run `policyctl intercept --harness forge --domain devops --action write --command "echo hi" --ask-mode delegate`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The command uses an unsupported --ask-mode value (delegate), so argparse will reject the invocation before the scenario can test delegation behavior. Use a valid ask mode (from the CLI choices) and model delegation behavior via the scenario setup/assertions instead of an invalid flag value. [api mismatch]

Severity Level: Major ⚠️
- ⚠️ FR-7 delegation acceptance spec cannot run as written.
- ⚠️ Delegation behavior untested due to CLI parse failure.
Steps of Reproduction ✅
1. Inspect the FR-7 acceptance spec at
`docs/specs/acceptance/FR-7-delegate.command.feature:5`, which defines the step `When I
run \`policyctl intercept --harness forge --domain devops --action write --command "echo
hi" --ask-mode delegate\``.

2. Open the CLI definition at `cli/src/policy_federation/cli.py:651-668`, where
`intercept_parser = sub.add_parser("intercept")` and
`intercept_parser.add_argument("--ask-mode", choices=["fail", "allow", "prompt"],
default="fail")` define the allowed `--ask-mode` values.

3. Run the exact command from the FR-7 spec (either manually from a shell or via any
future Cucumber step harness that shells out to `policyctl`): `policyctl intercept
--harness forge --domain devops --action write --command "echo hi" --ask-mode delegate`.

4. Observe that `argparse` in `cli.py` rejects `delegate` as an invalid `--ask-mode`
choice (since only `fail`, `allow`, and `prompt` are permitted), raising `SystemExit`
before `intercept_command_cli` is invoked, so the FR-7 scenario never exercises delegation
behavior or asserts a deterministic delegate result.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docs/specs/acceptance/FR-7-delegate.command.feature
**Line:** 5:5
**Comment:**
	*Api Mismatch: The command uses an unsupported `--ask-mode` value (`delegate`), so argparse will reject the invocation before the scenario can test delegation behavior. Use a valid ask mode (from the CLI choices) and model delegation behavior via the scenario setup/assertions instead of an invalid flag value.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Then delegate result should be deterministic and include decision/confidence
6 changes: 6 additions & 0 deletions docs/specs/acceptance/FR-8-intercept-exit.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
@pending
Feature: FR-8 Intercept exit mapping
Scenario: Pending -- allow/deny/ask mapping
Given policies producing allow and deny and ask outcomes
When I run `policyctl intercept` with each outcome scenario

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This invocation is missing required intercept arguments (--harness, --domain, --action, and --command), so it cannot run as written and will fail argument validation instead of exercising exit-code mapping. Provide a fully valid command per outcome case. [api mismatch]

Severity Level: Major ⚠️
- ⚠️ FR-8 intercept exit-code mapping spec fails early.
- ⚠️ Normalized exit-code behavior remains unvalidated.
Steps of Reproduction ✅
1. Inspect the FR-8 acceptance spec at
`docs/specs/acceptance/FR-8-intercept-exit.command.feature:5`, which defines the step
`When I run \`policyctl intercept\` with each outcome scenario` without any additional
flags.

2. Open the intercept CLI parser in `cli/src/policy_federation/cli.py:651-668`, where
`intercept_parser.add_argument("--harness", required=True)`, `--domain`, `--action`, and
`--command` are all marked `required=True`.

3. Execute `policyctl intercept` exactly as written in the FR-8 spec (either manually or
via a Cucumber step that shells out to `policyctl`), so the process runs with no
`--harness`, `--domain`, `--action`, or `--command` arguments.

4. Observe that `argparse` reports missing required arguments (`--harness`, `--domain`,
`--action`, `--command`) and exits with `SystemExit` before the intercept implementation
(`intercept_command` mapped from `docs/specs/SPEC.md` FR-8 to
`interceptor.py:intercept_command`) can run, so the exit-code mapping behavior under test
is never exercised.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docs/specs/acceptance/FR-8-intercept-exit.command.feature
**Line:** 5:5
**Comment:**
	*Api Mismatch: This invocation is missing required `intercept` arguments (`--harness`, `--domain`, `--action`, and `--command`), so it cannot run as written and will fail argument validation instead of exercising exit-code mapping. Provide a fully valid command per outcome case.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Then exit_code should map to 0 for allow, 2 for deny, and 3 for ask
8 changes: 8 additions & 0 deletions docs/specs/acceptance/FR-9-exec-tamper.command.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
@pending
Feature: FR-9 Exec TOCTOU protection

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Update this test artifact to include a valid FR-XXX-NNN trace reference (for example in the feature title or an explicit trace tag) so it satisfies the repository's FR traceability requirement. [custom_rule]

Severity Level: Minor ⚠️

Why it matters? 🤔

The repository rule requires FR-XXX-NNN references in test names, markers, tags, or docstrings. This feature file only contains FR-9, which is not the required trace format, so the suggestion correctly identifies a real traceability violation.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docs/specs/acceptance/FR-9-exec-tamper.command.feature
**Line:** 2:2
**Comment:**
	*Custom Rule: Update this test artifact to include a valid `FR-XXX-NNN` trace reference (for example in the feature title or an explicit trace tag) so it satisfies the repository's FR traceability requirement.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Scenario: Pending -- deny when policy hash changes before execution
Given a command that is initially allowed
And policy files change after pre-exec policy resolution
When I run `policyctl exec --harness forge --domain devops -- ask -- echo ok`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The exec command syntax is malformed: -- ask -- echo ok makes ask the subprocess executable after the -- separator, so the scenario does not execute the intended command and can fail with command-not-found instead of validating TOCTOU denial logic. Use the proper exec ... -- <command> form (and pass ask mode via --ask-mode if needed). [api mismatch]

Severity Level: Major ⚠️
- ⚠️ FR-9 exec TOCTOU acceptance spec mis-specifies argv.
- ⚠️ TOCTOU denial behavior may not be exercised correctly.
Steps of Reproduction ✅
1. Inspect the FR-9 acceptance spec at
`docs/specs/acceptance/FR-9-exec-tamper.command.feature:6`, which defines the step `When I
run \`policyctl exec --harness forge --domain devops -- ask -- echo ok\`` to exercise
TOCTOU protection.

2. Open the exec subcommand parser in `cli/src/policy_federation/cli.py:670-688`, where
`exec_parser.add_argument("argv", nargs=argparse.REMAINDER)` captures all tokens after
`--` into `args.argv`, and open `exec_command` at
`cli/src/policy_federation/cli.py:160-173`, which passes `argv=list(args.argv)` directly
to `run_guarded_subprocess`.

3. Execute the FR-9 command literally (e.g., from a shell): `policyctl exec --harness
forge --domain devops -- ask -- echo ok`, so that after `argparse` sees the `--`
separator, `args.argv` becomes `["ask", "--", "echo", "ok"]`.

4. Observe that `exec_command` attempts to run a subprocess with executable `ask` rather
than `echo`, which will typically fail with an OS-level "No such file or directory: 'ask'"
or run an unintended binary, meaning the scenario either errors before TOCTOU denial logic
is evaluated or exercises the wrong command, and therefore does not faithfully test FR-9's
requirement in `docs/specs/SPEC.md:43-45` to deny execution when the policy hash changes.

Fix in Cursor Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docs/specs/acceptance/FR-9-exec-tamper.command.feature
**Line:** 6:6
**Comment:**
	*Api Mismatch: The exec command syntax is malformed: `-- ask -- echo ok` makes `ask` the subprocess executable after the `--` separator, so the scenario does not execute the intended command and can fail with command-not-found instead of validating TOCTOU denial logic. Use the proper `exec ... -- <command>` form (and pass ask mode via `--ask-mode` if needed).

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Then final_decision should be deny
And execution should not be attempted
Loading