-
Notifications
You must be signed in to change notification settings - Fork 0
spec: FR/NFR oracle + acceptance test skeletons #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| # PolicyStack Oracle Specification (`policyctl` + `policy_federation` surface) | ||
|
|
||
| Status: draft | ||
| Owner: spec/policystack-oracle | ||
|
|
||
| Scope: CLI command surface and package API under `cli/src/policy_federation/`. | ||
| Only source-backed behavior in `cli/src/policy_federation` and `cli/pyproject.toml` is covered. | ||
|
|
||
| ## Functional Requirements | ||
|
|
||
| - **FR-1**: `policyctl resolve` SHALL output a resolved payload containing `policy_hash`, `scope_chain`, `policy`, `source_files`, and `resolved_at` for a given `--harness`/`--domain`. | ||
| - **Mapped to**: `cli.py:resolve_command` -> `resolver.py:resolve`; `resolver.py:_policy_layers` | ||
| - **TESTABLE**: Run `policyctl resolve` on a fixture repo and assert the JSON includes the listed keys and `policy_hash` is non-empty. | ||
|
|
||
| - **FR-2**: `policyctl manifest` SHALL list active policy layer scope names and source paths in merge order. | ||
| - **Mapped to**: `cli.py:manifest_command` and `resolver.py:_policy_layers` | ||
| - **TESTABLE**: Given layered policy files, run `policyctl manifest` and assert layers are present with deterministic order and expected source paths. | ||
|
|
||
| - **FR-3**: `policyctl check` SHALL validate one or all policy files using schema + authorization block validation. | ||
| - **Mapped to**: `cli.py:check_command`, `validate.py:validate_policy_file` | ||
| - **TESTABLE**: Run `policyctl check` against valid and invalid policy YAML files; valid set exits 0 and invalid set raises `SystemExit` with failure. | ||
|
|
||
| - **FR-4**: `policyctl evaluate` SHALL return the policy decision and a decision summary that includes `reason`, matched rules, and metadata for the request. | ||
| - **Mapped to**: `cli.py:evaluate_command`, `authorization.py:evaluate_authorization` | ||
| - **TESTABLE**: Given a rule that matches `action`/`command`, verify output decision equals the winning rule and includes `matched_rules`. | ||
|
|
||
| - **FR-5**: Authorization rule evaluation SHALL apply deterministic precedence: highest priority first, then stricter effect order (deny > ask > allow). | ||
| - **Mapped to**: `authorization.py:normalize_authorization_rules`, `authorization.py:evaluate_authorization`, `authorization.AuthorizationRule` | ||
| - **TESTABLE**: Provide two rules with same priority and different effects; assert effective decision follows deny over ask over allow. | ||
|
|
||
| - **FR-6**: `policyctl compile --target <supported>` SHALL produce target-specific `native_config`, canonical `defaults`, and `shim_rules` for non-native conditions/targets. | ||
| - **Mapped to**: `cli.py:compile_command`, `compiler.py:compile_target`, `compiler.SUPPORTED_TARGETS` | ||
| - **TESTABLE**: Compile a policy with conditional/runtime-only rules for each supported target and assert shim entries are generated as required. | ||
|
|
||
| - **FR-7**: Delegation shall attempt local-fast evaluation first, then cache lookup, then harness fallback chain, and return `allow|deny|ask` with confidence. | ||
| - **Mapped to**: `delegate.py:delegate_ask`, `delegate.py:_local_fast_evaluate`, `delegate.py:_get_cached_decision`, `delegate.py:_invoke_harness`, `delegate.HARNESS_FALLBACK` | ||
| - **TESTABLE**: Mock a local-fast allow path and a fallback path; assert returned `DelegateResult` contains expected `decision`, `source`, and `confidence`. | ||
|
|
||
| - **FR-8**: `policyctl intercept|write-check|network-check` and review modes SHALL map policy decisions to normalized exit behavior (`allow`/`deny`/`ask`) using `ALLOW_EXIT_CODE`, `DENY_EXIT_CODE`, `ASK_EXIT_CODE`. | ||
| - **Mapped to**: `interceptor.py:intercept_command`, `interceptor.py:ALLOW_EXIT_CODE`, `interceptor.py:DENY_EXIT_CODE`, `interceptor.py:ASK_EXIT_CODE`, `interceptor.py:_sources_hash` | ||
| - **TESTABLE**: Evaluate deny/allow/ask outcomes across modes and verify returned `exit_code` and `final_decision`. | ||
|
|
||
| - **FR-9**: `policyctl exec` SHALL verify policy integrity immediately before executing subprocess and deny execution on source hash change. | ||
| - **Mapped to**: `interceptor.py:run_guarded_subprocess`, `interceptor.py:intercept_command`, `resolver.py:hash_policy_sources` | ||
| - **TESTABLE**: Simulate `sources_hash` drift between pre-check and execution and verify subprocess is blocked with `DENY_EXIT_CODE`. | ||
|
|
||
| - **FR-10**: Audit workflows (`audit`, `verify`) shall support filtering by action/decision/time/actor and optional chain validation. | ||
| - **Mapped to**: `cli.py:audit_command`, `_parse_iso_datetime`, `_compute_audit_summary`, `runtime_artifacts.py:filter_audit_events`, `runtime_artifacts.py:verify_audit_chain` | ||
| - **TESTABLE**: Use a fixture audit log with mixed actions and decisions and assert filters and `verify_audit_chain` report expected counts/validity. | ||
|
|
||
| - **FR-11**: `policyctl add-rule` and `policyctl remove-rule` SHALL safely mutate policy YAML while enforcing duplicate/absence constraints. | ||
| - **Mapped to**: `cli.py:add_rule_command`, `cli.py:remove_rule_command`, `policy_editor.py:add_rule`, `policy_editor.py:remove_rule` | ||
| - **TESTABLE**: Add a unique rule then attempt duplicate add and non-existent remove; assert success and validation failures are represented in errors. | ||
|
|
||
| - **FR-12**: `policyctl diff` SHALL report added, removed, modified, and effect changes when comparing two policies. | ||
| - **Mapped to**: `cli.py:diff_command`, `policy_federation.policy_diff.diff_policies` | ||
| - **TESTABLE**: Compare two fixture policies with all four diff types and assert each output bucket is populated accordingly. | ||
|
|
||
| - **FR-13**: `policyctl verify` SHALL create `.policy-federation/verify` baseline on first run and fail with `tampered` if subsequent hash differs. | ||
| - **Mapped to**: `cli.py:verify_command`, `resolver.py:hash_policy_sources` | ||
| - **TESTABLE**: Run verify twice with unchanged files (status ok) and once after file touch (status tampered). | ||
|
|
||
| - **FR-14**: Runtime integration install/uninstall commands SHALL install/remove launcher wrappers and return structured before/after payloads. | ||
| - **Mapped to**: `cli.py:install_runtime_command`, `cli.py:uninstall_runtime_command`, `integrations.py:install_runtime_integrations`, `integrations.py:uninstall_runtime_integrations` | ||
| - **TESTABLE**: Invoke install then uninstall on temp HOME and assert payload includes wrapper and backup/restore metadata. | ||
|
|
||
| ## Non-Functional Requirements | ||
|
|
||
| - **NFR-1**: JSON emission from CLI entrypoints SHALL be deterministic for the same inputs. | ||
| - **Mapped to**: `cli.py:_emit_json`, `resolve.py:resolve` ordering via `scope_chain`, `resolver.py:hash_policy_sources` | ||
| - **TESTABLE**: Run identical command twice against same fixtures and compare stable output JSON semantics including sorted keys. | ||
|
|
||
| - **NFR-2**: Delegation and API fallback paths SHALL bound runtime via explicit timeouts and fail-safe `ask` behavior on unresponsive or missing tools. | ||
| - **Mapped to**: `delegate.py:_invoke_harness`, `delegate.py:_invoke_api_harness`, `delegate.HARNESS_CONFIG` | ||
| - **TESTABLE**: Configure harness path/URL unavailable and assert returned result is `ask` and does not hang indefinitely. | ||
|
|
||
| - **NFR-3**: Audit event handling SHALL keep chain integrity checks non-crashing and report invalid entries with explicit invalid-event context. | ||
| - **Mapped to**: `runtime_artifacts.py:verify_audit_chain`, `runtime_artifacts.py:filter_audit_events` | ||
| - **TESTABLE**: Feed malformed events missing required fields and assert `valid=False` with details. | ||
|
|
||
| - **NFR-4**: Delegation cache shall support bounded TTL behavior and explicit clearability, and not crash policy enforcement when unavailable. | ||
| - **Mapped to**: `delegate.py:_get_cached_decision`, `delegate.py:_cache_decision`, `delegate.py:clear_cache`, `delegate.py:get_cache_stats` | ||
| - **TESTABLE**: Write and invalidate cache path entries and validate hit/miss behavior and clear_cache success boolean. | ||
|
|
||
| - **NFR-5**: Decision artifacts for execution failures (denial/tamper/path) SHALL include source action/rules context to support post-hoc investigation. | ||
| - **Mapped to**: `runtime_artifacts.py:build_permission_audit_event`, `runtime_artifacts.py:record_audit_event`, `interceptor.py:intercept_command`, `interceptor.py:run_guarded_subprocess` | ||
| - **TESTABLE**: Force a denied execution and assert audit payload contains `request.action`, `final_decision`, and `scope_chain`. | ||
|
|
||
| - **NFR-6**: Runtime context helpers SHALL avoid throwing on empty input and infer repository context from common worktree path patterns. | ||
| - **Mapped to**: `runtime_context.py:infer_repo_name_from_cwd` | ||
| - **TESTABLE**: Call with worktree and non-worktree paths and assert non-empty, deterministic repo inference. | ||
|
|
||
| ## Acceptance Artifacts | ||
|
|
||
| - Pending Gherkin files: one per FR | ||
| - `docs/specs/acceptance/FR-1-resolve.command.feature` | ||
| - `docs/specs/acceptance/FR-2-manifest.command.feature` | ||
| - `docs/specs/acceptance/FR-3-check.command.feature` | ||
| - `docs/specs/acceptance/FR-4-evaluate.command.feature` | ||
| - `docs/specs/acceptance/FR-5-rule-precedence.command.feature` | ||
| - `docs/specs/acceptance/FR-6-compile.command.feature` | ||
| - `docs/specs/acceptance/FR-7-delegate.command.feature` | ||
| - `docs/specs/acceptance/FR-8-intercept-exit.command.feature` | ||
| - `docs/specs/acceptance/FR-9-exec-tamper.command.feature` | ||
| - `docs/specs/acceptance/FR-10-audit.command.feature` | ||
| - `docs/specs/acceptance/FR-11-policy-edit.command.feature` | ||
| - `docs/specs/acceptance/FR-12-diff.command.feature` | ||
| - `docs/specs/acceptance/FR-13-verify.command.feature` | ||
| - `docs/specs/acceptance/FR-14-runtime-install.command.feature` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| @pending | ||
| Feature: FR-1 Resolve policy contract | ||
| Scenario: Pending -- resolve command payload | ||
| Given a repository fixture with layered policy files | ||
| When I run `policyctl resolve --harness forge --domain devops` | ||
| Then I should receive JSON with keys policy_hash, scope_chain, policy, source_files, resolved_at | ||
| And final decision should be emitted via JSON mode | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| @pending | ||
| Feature: FR-10 Audit filtering and verification | ||
| Scenario: Pending -- audit query and chain check | ||
| Given an audit log with mixed actions and decisions | ||
| When I run `policyctl audit --log-path <fixture> --summary --since 2026-01-01T00:00:00Z --verify-chain` | ||
| Then summary fields total/by_decision/by_action should be present | ||
| And verify-chain result should indicate chain validity |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| @pending | ||
| Feature: FR-11 Policy edit commands | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Replace the shorthand requirement reference with a full Severity Level: Minor Why it matters? 🤔The repository rule requires FR-XXX-NNN references in test names, markers, tags, or docstrings. This feature file only uses the shorthand (Use Cmd/Ctrl + Click for best experience) Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** docs/specs/acceptance/FR-11-policy-edit.command.feature
**Line:** 2:2
**Comment:**
*Custom Rule: Replace the shorthand requirement reference with a full `FR-XXX-NNN` trace (or add an explicit FR trace tag) so this acceptance test complies with the repository FR traceability format.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix |
||
| Scenario: Pending -- add and remove policy rules | ||
| Given a writable policy fixture | ||
| When I run `policyctl add-rule --file policies/test.yaml --id test-allow --effect allow --priority 5 --actions exec` | ||
| Then the rule should be persisted to the file | ||
| When I run `policyctl remove-rule --file policies/test.yaml --id test-allow` | ||
| Then the rule should be removed from the file | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| @pending | ||
| Feature: FR-12 Policy diff command | ||
| Scenario: Pending -- compare pre/post policy snapshots | ||
| Given two policy files with add/remove/modify/effect changes | ||
| When I run `policyctl diff before.yaml after.yaml` | ||
| Then output should include added_rules, removed_rules, modified_rules, and effect_changes |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| @pending | ||
| Feature: FR-13 Verify baseline hash | ||
| Scenario: Pending -- baseline tamper detection | ||
| Given a policies directory with existing policy files | ||
| When I run `policyctl verify` | ||
| Then status should become baseline-recorded | ||
| And a second run without changes should return ok | ||
| And after touching a file should return tampered and non-zero |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| @pending | ||
| Feature: FR-14 Runtime integration install/uninstall | ||
| Scenario: Pending -- install and uninstall runtime wrappers | ||
| Given a temporary HOME directory | ||
| When I run `policyctl install-runtime` | ||
| Then install output should report installed wrappers and patched files | ||
| When I run `policyctl uninstall-runtime` | ||
| Then uninstall output should report removed wrappers and launcher restore state |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| @pending | ||
| Feature: FR-2 Manifest layer discovery | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Replace the non-canonical FR reference with at least one traceable Severity Level: Minor Why it matters? 🤔This acceptance test file is intended to satisfy FR traceability rules, but the only requirement reference in the feature title is (Use Cmd/Ctrl + Click for best experience) Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** docs/specs/acceptance/FR-2-manifest.command.feature
**Line:** 2:2
**Comment:**
*Custom Rule: Replace the non-canonical FR reference with at least one traceable `FR-XXX-NNN` ID (for example in the feature title or a trace tag) so this acceptance test satisfies repository FR traceability rules.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix |
||
| Scenario: Pending -- manifest command lists layers | ||
| Given a repository fixture with harness/domain policy layers | ||
| When I run `policyctl manifest --harness forge --domain devops` | ||
| Then the output should include ordered layer scope names and source file paths | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| @pending | ||
| Feature: FR-3 Check policy files | ||
| Scenario: Pending -- policy check validates policy files | ||
| Given a valid policy fixture and an invalid policy fixture | ||
| When I run `policyctl check` | ||
| Then I should see successful result for the valid file and an error for invalid inputs |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| @pending | ||
| Feature: FR-4 Evaluate authorization | ||
| Scenario: Pending -- evaluate returns matched rule reasoning | ||
| Given a policy with a matching authorization rule for a command | ||
| When I run `policyctl evaluate --harness forge --domain devops --action exec --command "git status"` | ||
| Then JSON output should include decision and winning rule metadata | ||
| And reason should reference matched rule |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| @pending | ||
| Feature: FR-5 Rule precedence | ||
| Scenario: Pending -- deny precedence over lower effect | ||
| Given two authorization rules with the same priority but different effects | ||
| When I run `policyctl evaluate --harness forge --domain devops --action exec --command "rm -rf /tmp/test"` | ||
| Then deny should win when a tie for priority exists | ||
| And output should include ordered matching rules |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| @pending | ||
| Feature: FR-6 Compile policy to target | ||
| Scenario: Pending -- compile creates native config and shim rules | ||
| Given a policy with conditional and runtime-only rules | ||
| When I run `policyctl compile --target claude-code --harness forge --domain devops` | ||
| Then output should include target, defaults, native_config, and shim_rules fields | ||
| And shim_rules should include non-native-rule entries |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| @pending | ||
| Feature: FR-7 Delegation pipeline | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Update the feature title to include a repository-compliant FR trace ID in the Severity Level: Minor Why it matters? 🤔The repository rule requires FR references in the FR-XXX-NNN format for test traceability. (Use Cmd/Ctrl + Click for best experience) Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** docs/specs/acceptance/FR-7-delegate.command.feature
**Line:** 2:2
**Comment:**
*Custom Rule: Update the feature title to include a repository-compliant FR trace ID in the `FR-XXX-NNN` format so the acceptance test is traceable by the quality gate.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix |
||
| Scenario: Pending -- delegate asks with local fast path and cache | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Add an FR trace reference to the scenario name (or an equivalent scenario-level trace marker) using the Severity Level: Minor Why it matters? 🤔The rule requires new test cases to include an FR trace marker, requirement annotation, docstring trace, or FR-based test name. (Use Cmd/Ctrl + Click for best experience) Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** docs/specs/acceptance/FR-7-delegate.command.feature
**Line:** 3:3
**Comment:**
*Custom Rule: Add an FR trace reference to the scenario name (or an equivalent scenario-level trace marker) using the `FR-XXX-NNN` format so the new test case has explicit requirement linkage.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix |
||
| Given a command that can be decided by local-fast | ||
| When I run `policyctl intercept --harness forge --domain devops --action write --command "echo hi" --ask-mode delegate` | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: The command uses an unsupported Severity Level: Major
|
||
| Then delegate result should be deterministic and include decision/confidence | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| @pending | ||
| Feature: FR-8 Intercept exit mapping | ||
| Scenario: Pending -- allow/deny/ask mapping | ||
| Given policies producing allow and deny and ask outcomes | ||
| When I run `policyctl intercept` with each outcome scenario | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: This invocation is missing required Severity Level: Major
|
||
| Then exit_code should map to 0 for allow, 2 for deny, and 3 for ask | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| @pending | ||
| Feature: FR-9 Exec TOCTOU protection | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Update this test artifact to include a valid Severity Level: Minor Why it matters? 🤔The repository rule requires FR-XXX-NNN references in test names, markers, tags, or docstrings. This feature file only contains (Use Cmd/Ctrl + Click for best experience) Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** docs/specs/acceptance/FR-9-exec-tamper.command.feature
**Line:** 2:2
**Comment:**
*Custom Rule: Update this test artifact to include a valid `FR-XXX-NNN` trace reference (for example in the feature title or an explicit trace tag) so it satisfies the repository's FR traceability requirement.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix |
||
| Scenario: Pending -- deny when policy hash changes before execution | ||
| Given a command that is initially allowed | ||
| And policy files change after pre-exec policy resolution | ||
| When I run `policyctl exec --harness forge --domain devops -- ask -- echo ok` | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: The exec command syntax is malformed: Severity Level: Major
|
||
| Then final_decision should be deny | ||
| And execution should not be attempted | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The scenario asserts that
policyctl resolveemits a final decision, but the resolve command only returns resolution metadata (policy hash, scope chain, merged policy, source files, timestamp, etc.) and does not compute authorization decisions. This creates an API contract mismatch and will make the acceptance test fail once implemented. Update this step to assert only resolve output fields, or switch to an evaluate/intercept command if decision output is required. [api mismatch]Severity Level: Major⚠️
Steps of Reproduction ✅
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖