Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ APTS is not a testing methodology. It complements PTES, OWASP WSTG, and OSSTMM b
- **Tier 2 (Verified)**: 85 additional (157 cumulative). Full transparency, tamper-proof audit trails, and independently verifiable findings.
- **Tier 3 (Comprehensive)**: 16 additional (173 cumulative). Highest assurance for critical infrastructure and L4 autonomous operations.

Fourteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS-<DOMAIN>-A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance.
Fifteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS-<DOMAIN>-A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance.

APTS has no certification body, no mandatory third-party audit, and no fee. Platforms are assessed against the requirements and conformance is documented. The standard does not prescribe who performs the assessment; internal self-assessment, independent internal review, and external third-party assessment are all valid approaches, and the choice is left to the reader.

Expand Down
2 changes: 1 addition & 1 deletion index.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ APTS is not a testing methodology. It complements PTES, OWASP WSTG, and OSSTMM b
- **Tier 2 (Verified)**: 85 additional (157 cumulative). Full transparency, tamper-proof audit trails, and independently verifiable findings.
- **Tier 3 (Comprehensive)**: 16 additional (173 cumulative). Highest assurance for critical infrastructure and L4 autonomous operations.

Fourteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS-<DOMAIN>-A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance.
Fifteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS-<DOMAIN>-A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance.

APTS has no certification body, no mandatory third-party audit, and no fee. Platforms are assessed against the requirements and conformance is documented. The standard does not prescribe who performs the assessment; internal self-assessment, independent internal review, and external third-party assessment are all valid approaches, and the choice is left to the reader.

Expand Down
4 changes: 3 additions & 1 deletion standard/3_Human_Oversight/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ The 19 requirements in this domain fall into six thematic groups:

A platform claims conformance with this domain by satisfying all MUST requirements at the compliance tier it targets. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 HO requirement plus every Tier 2 HO requirement, and a Tier 3 platform satisfies all three tiers. Human Oversight has no Tier 3 requirements in this release; a Tier 3 claim therefore requires all Tier 1 and Tier 2 HO requirements. SHOULD-level requirements are interpreted per RFC 2119.

One appendix-only advisory requirement for this domain (APTS-HO-A01 Out-of-Band Kill Switch via Independent Network) is documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). It is not required for conformance at any tier.
Two appendix-only advisory requirements for this domain (APTS-HO-A01 Out-of-Band Kill Switch via Independent Network and APTS-HO-A02 Disclosure and Mitigation of AI Influence on Operator Decisions) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier.

Every requirement in this domain includes a Verification subsection listing the verification procedures a reviewer uses to confirm implementation.

Expand Down Expand Up @@ -264,6 +264,8 @@ Organizations MUST generate periodic reports from decision logs at a cadence app
8. **Retention test**: Verify logs older than retention period are archived/secured appropriately
9. **Signature verification test**: Validate cryptographic signatures on sample log entries

> **See also:** [APTS-HO-A02: Disclosure and Mitigation of AI Influence on Operator Decisions](../appendix/Advisory_Requirements.md#apts-ho-a02-disclosure-and-mitigation-of-ai-influence-on-operator-decisions-advisory) — an advisory practice covering audit-trail provenance for AI-shaped operator affordances (option sets, defaults, wording, ordering) and bias mitigation at high-impact gates, so the chain-of-custody distinguishes a typed approval from a default click-through. Candidate for tier-gated inclusion in v0.2.0.

---

## APTS-HO-006: Graceful Pause Mechanism with State Preservation
Expand Down
2 changes: 1 addition & 1 deletion standard/Frontispiece.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,4 @@ Licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).

| Version | Date | Notes |
|---------|------|-------|
| 0.1.0 | April 2026 | Initial release. Eight domains, 173 tier-required requirements across three compliance tiers, plus 14 advisory practices in the appendix. |
| 0.1.0 | April 2026 | Initial release. Eight domains, 173 tier-required requirements across three compliance tiers, plus 15 advisory practices in the appendix. |
2 changes: 1 addition & 1 deletion standard/Getting_Started.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ Depending on your role:
## Common Questions

**Q: Do I need to implement all 173 requirements?**
No. Start with Tier 1 (72 requirements). Tier 2 and Tier 3 add requirements progressively for cumulative totals of 157 and 173. An additional 14 advisory practices live in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md) under the `APTS-<DOMAIN>-A0x` identifier pattern; advisory practices are not required for conformance at any tier. See [Introduction: Compliance Tiers](Introduction.md#compliance-tiers) for details.
No. Start with Tier 1 (72 requirements). Tier 2 and Tier 3 add requirements progressively for cumulative totals of 157 and 173. An additional 15 advisory practices live in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md) under the `APTS-<DOMAIN>-A0x` identifier pattern; advisory practices are not required for conformance at any tier. See [Introduction: Compliance Tiers](Introduction.md#compliance-tiers) for details.

**Q: What if my platform meets most but not all Tier 1 requirements?**
APTS does not award partial credit. A platform must meet 100% of requirements for its claimed tier. Address gaps before claiming a tier.
Expand Down
2 changes: 1 addition & 1 deletion standard/Introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ APTS does not prescribe who performs the assessment. The choice of internal self
| 7 | Third-Party & Supply Chain Trust | TP | 22 | AI providers, cloud dependencies, data handling, foundation model disclosure |
| 8 | Reporting | RP | 15 | Finding validation, confidence scoring, coverage disclosure |

**Total: 173 tier-required requirements** (Tier 1 + Tier 2 + Tier 3) across the eight domains. An additional **14 advisory practices** live exclusively in the [Advisory Requirements](appendix/Advisory_Requirements.md) appendix using the `APTS-<DOMAIN>-A0x` identifier pattern; advisory practices are not counted toward any tier and do not affect conformance.
**Total: 173 tier-required requirements** (Tier 1 + Tier 2 + Tier 3) across the eight domains. An additional **15 advisory practices** live exclusively in the [Advisory Requirements](appendix/Advisory_Requirements.md) appendix using the `APTS-<DOMAIN>-A0x` identifier pattern; advisory practices are not counted toward any tier and do not affect conformance.

---

Expand Down
2 changes: 1 addition & 1 deletion standard/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# OWASP Autonomous Penetration Testing Standard

This is the full OWASP Autonomous Penetration Testing Standard. It defines 173 tier-required requirements across 8 domains (plus 14 advisory practices in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md)) that autonomous penetration testing platforms must meet to operate safely, transparently, and within defined boundaries, whether delivered by vendors, operated as a service, or built in-house by enterprise security teams.
This is the full OWASP Autonomous Penetration Testing Standard. It defines 173 tier-required requirements across 8 domains (plus 15 advisory practices in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md)) that autonomous penetration testing platforms must meet to operate safely, transparently, and within defined boundaries, whether delivered by vendors, operated as a service, or built in-house by enterprise security teams.

## Getting Started

Expand Down
21 changes: 21 additions & 0 deletions standard/appendix/Advisory_Requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,27 @@ When the platform summarizes, truncates, or otherwise compacts the agent's conte

---

### APTS-HO-A02: Disclosure and Mitigation of AI Influence on Operator Decisions (Advisory)

**Applicability:** This practice applies to platforms that use language models or other AI components to generate content shown to operators at decision points — narrative summaries, recommended approval responses, prefilled answer choices, preselected defaults, or any other UI affordance that shapes what the operator confirms. It applies uniformly across every channel that delivers an operator decision affordance (web dashboard, mobile, instant-messaging, voice, push notification).

**Rationale:** Autonomous pentest platforms increasingly use language models not only to act on the target but also to shape what the operator sees at decision time — the narrative that frames a finding, the option set in an approval prompt, the wording, the preselected default. This is a manipulation surface distinct from the prompt-injection threats covered by the Manipulation Resistance domain: there, an external entity manipulates the agent; here, the agent influences its own supervisor's choice through affordances the agent controls. Established findings from human-computer interaction (default bias, primacy bias, choice architecture effects) show these shaping decisions meaningfully influence which option the operator selects, even without adversarial intent. APTS-HO-001, HO-005, and HO-010 mandate approval gates and audit trails but do not address the form of the question put to the operator; APTS-AR-006 covers the agent's reasoning chain but not the model-shaped inputs handed to the human. The practical effect is that an audit trail can show "operator approved" while concealing that the operator was offered a single highlighted choice with the safer option visually de-emphasized. The normative requirement set for v0.1.0 is frozen; this practice is a candidate for tier-gated inclusion in v0.2.0.

**Value:** Platforms that implement this practice make operator approval a more deliberate act and the audit trail more honest. Reviewers can distinguish a typed approval from a default click-through, customers can audit whether the platform's UI has steered operators toward expedient choices, and operators retain meaningful agency at high-impact decision points.

**Practice Description:**

1. **Distinguish a default click-through from an actively-selected response.** Where the operator confirms a preselected default with a single action, the audit trail should identify the response as a default click-through rather than treating it as semantically equivalent to a typed answer or an actively-selected non-default option.
2. **Log the model and prompt that shaped the operator's view.** When a model produces summary text, recommended responses, option sets, or defaults shown to the operator, the audit record should identify the model version, the prompt template, and the relevant context window state at generation time.
3. **Record the full option set, including filtered alternatives.** When the operator is presented with a constrained set of choices generated or curated by a model, the audit trail should record the candidate options the model considered and any options that were dropped, reordered, or de-emphasized before presentation, so reviewers can detect cases where the model omitted a safer option (for example, "deny" or "abort") or buried it behind progressive disclosure.
4. **Reduce default and ordering bias for high-impact gates.** For approvals governing irreversible actions (see APTS-HO-010) and other high-impact decisions, the platform should avoid preselecting a default, present the abort or deny option with visual weight equal to other options, and consider randomizing option order to mitigate primacy bias. For the most severe action categories, consider requiring a typed confirmation phrase rather than a single click.

**Recommendation:** Treat the operator-facing decision interface as a controlled surface and apply the same provenance discipline already required for agent action logging. Items 1 and 2 form the lowest-cost starting wedge — a response-classification audit field plus model and prompt-template provenance — both addable without touching the operator UI. The presentation rules in item 4 are most consequential for APTS-HO-010 gates and for actions classified as Critical or High under APTS-SC-001.

**Related normative requirements:** APTS-HO-001, APTS-HO-005, APTS-HO-010, APTS-AR-006, APTS-AR-019.

---

### APTS-AL-A01: Continuous Improvement and Maturity Roadmap (Advisory)

**Rationale:** Multi-year maturity roadmaps, formal improvement frameworks, and annual strategic assessments are organizational process practices rather than technical governance for autonomous pentest platforms. APTS defines platform requirements, not organizational management practices.
Expand Down
2 changes: 1 addition & 1 deletion standard/appendix/Glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Notation for specifying IP address ranges using a base address and prefix length
Alternative security measures that mitigate vulnerability when the primary control is missing. Example: Two-factor authentication compensates for weak passwords.

**Compliance Tier**
One of three progressive levels of APTS conformance. Tier 1 (Foundation) requires 72 core requirements (MUST | Tier 1). Tier 2 (Verified) adds 85 requirements for a cumulative 157 (MUST | Tier 2 + SHOULD | Tier 2). Tier 3 (Comprehensive) adds 16 requirements for a cumulative 173 (MUST | Tier 3 + SHOULD | Tier 3). A platform must meet 100% of requirements assigned to its claimed tier (both MUST and SHOULD). An additional 14 advisory practices in the Advisory Requirements appendix are recommended for highest-assurance engagements but are not counted toward any tier.
One of three progressive levels of APTS conformance. Tier 1 (Foundation) requires 72 core requirements (MUST | Tier 1). Tier 2 (Verified) adds 85 requirements for a cumulative 157 (MUST | Tier 2 + SHOULD | Tier 2). Tier 3 (Comprehensive) adds 16 requirements for a cumulative 173 (MUST | Tier 3 + SHOULD | Tier 3). A platform must meet 100% of requirements assigned to its claimed tier (both MUST and SHOULD). An additional 15 advisory practices in the Advisory Requirements appendix are recommended for highest-assurance engagements but are not counted toward any tier.

**Confidence Score**
A numeric value on a 0-100% scale indicating the platform's certainty in a scope boundary determination, target legitimacy assessment, asset classification, or finding validity. Scores below 75% for scope-related decisions trigger mandatory human escalation. See APTS-HO-013, APTS-RP-003.
Expand Down
2 changes: 1 addition & 1 deletion standard/appendix/Vendor_Evaluation_Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Decide your minimum compliance tier based on your risk tolerance:

- **Tier 2 (Verified):** 157 cumulative requirements (72 + 85). The platform is fully transparent about what it did and why, protects your data with tamper-proof audit trails, handles incidents with formal response procedures, and provides independently verifiable findings. **Choose Tier 2 when:** you are testing production environments, operating in regulated industries, or need full accountability for audit or compliance purposes. This is the recommended minimum for most production deployments.

- **Tier 3 (Comprehensive):** 173 cumulative requirements (157 + 16). The platform meets the highest assurance bar for critical infrastructure, fully autonomous (L4) operations, and the strictest regulatory requirements. **Choose Tier 3 when:** you are deploying fully autonomous testing against critical infrastructure, financial systems, or healthcare environments with minimal human oversight. An additional 14 advisory practices in the [Advisory Requirements appendix](Advisory_Requirements.md) are recommended for highest-assurance engagements but are not counted toward any tier.
- **Tier 3 (Comprehensive):** 173 cumulative requirements (157 + 16). The platform meets the highest assurance bar for critical infrastructure, fully autonomous (L4) operations, and the strictest regulatory requirements. **Choose Tier 3 when:** you are deploying fully autonomous testing against critical infrastructure, financial systems, or healthcare environments with minimal human oversight. An additional 15 advisory practices in the [Advisory Requirements appendix](Advisory_Requirements.md) are recommended for highest-assurance engagements but are not counted toward any tier.

> **Minimum tier guidance:** Tier 1 is appropriate for supervised testing of non-critical systems in non-regulated environments. Organizations in financial services, healthcare, critical infrastructure, or any regulated industry SHOULD require Tier 2 as a minimum. Tier 3 is recommended for critical infrastructure, fully autonomous (L4) operations, and environments with the strictest regulatory requirements.

Expand Down