Visit the Awesome Agentic AI Security project site Β· site source
The security boundary has moved from the model to the agentic execution system.
A curated list of resources, standards, benchmarks, tools, threat models, architectures, and research for securing agentic, multi-agent, tool-using, memory-bearing, and cyber-capable AI systems.
- Landscape Map - System-level map of prompts, context, tools, credentials, memory, approvals, and downstream action.
- Threat Model - Failure modes, preconditions, impact paths, and control questions for agentic systems.
- Attack Surfaces - Where language, context, authority, state, tools, memory, and policies expose risk.
- Agentic Attack Chains - How local weaknesses compose into breach paths and where defenders can interrupt them.
- Defence Architecture - Runtime control model for observing, interpreting, constraining, auditing, discovering, protecting, and governing agentic systems.
- Resource Catalogue - Standards, frameworks, research, tools, benchmarks, cyber-capable AI agents, and evidence requirements.
- Patterns - Secure engineering patterns for runtime boundaries, tool calling, MCP, memory, credentials, and approval.
- Visuals - Mermaid diagrams for execution boundaries, action paths, control points, and reference architectures.
- Core Concepts
- Standards and Frameworks
- Threat Models and Attack Surfaces
- Prompt Injection and Instruction Attacks
- Tool Use, MCP, and Runtime Security
- Memory, State, and Context Security
- Credentials, Identity, and Delegated Authority
- Benchmarks and Evaluations
- Cyber-Capable AI Agents
- Observability, Audit, and Forensics
- Governance and Assurance
- Physical AI and Robotics Security
- Open-Weight and Frontier Capability Risks
- Engineering Patterns
- Docs and Maps
- Related Projects
- Licence
- Contributing
Agentic systems behave less like isolated chat applications and more like distributed execution environments. Instructions can shape tool calls, trigger workflows, update memory, write code, route data, and influence decisions across enterprise systems.
The central security question is:
What can this AI system do, under whose authority, with which tools, using which data, with what memory, and under what controls?
Useful security for these systems must understand the relationship between intent, authority, action, context, and outcome.
flowchart TB
UP["User prompt"]
RD["Retrieved context"]
SR["System rules"]
AR["Agentic reasoning<br/>Goals emerge at runtime"]
IK["Internal knowledge"]
EA["External APIs"]
OT["Operational tools"]
Risk["Risk accumulation<br/>Composed outcomes may exceed approved scope"]
UP --> AR
RD --> AR
SR --> AR
AR -->|permitted step| IK
AR -->|permitted step| EA
AR -->|permitted step| OT
IK --> Risk
EA --> Risk
OT --> Risk
Text description of the Risk Accumulation flow
The diagram illustrates how a user prompt, retrieved context, and system rules are processed by agentic reasoning. This reasoning leads to several permitted actions: querying internal knowledge, calling external APIs, or using operational tools. These actions collectively lead to "Risk accumulation," where the final composed outcomes of the agent's work may exceed the originally approved security scope.
The repository organises controls around the AI Defense Plane: discover where agents, tools, prompts, data flows, credentials, memory, and autonomous workflows exist; protect tool use, memory writes, credentials, and actions; and govern evidence, audit trails, delegated authority, and risk acceptance. The fuller model is in Defence Architecture.
- OWASP Top 10 for LLM Applications - Security risks and mitigations for LLM applications, including prompt injection, sensitive information disclosure, unsafe output handling, excessive agency, and supply-chain concerns.
- OWASP Top 10 for Agentic Applications 2026
- Agentic security taxonomy for autonomous systems that plan, act, use tools, and make workflow decisions.
- OWASP Securing Agentic Applications Guide 1.0
- Practical guidance for designing, developing, and deploying secure LLM-powered agentic applications.
- MITRE ATLAS - Knowledge base for adversary tactics and techniques against AI-enabled systems.
- NIST AI Risk Management Framework
- Governance and risk-management framework for trustworthy AI systems.
- NIST AI RMF Generative AI Profile - Generative-AI-specific risk profile that can support governance for agentic applications.
- Full standards and frameworks catalogue
- Metadata-labelled entries with relevance, coverage, maturity, and limitations.
- Agentic AI Threat Model - Repository threat model for failure modes across prompts, tools, memory, credentials, approvals, and multi-agent workflows.
- Attack Surfaces: Agentic Execution Systems - Boundary map for language, context, authority, state, policies, tools, and downstream systems.
- Agentic Attack Chains - Defensive chain model for recognising and interrupting multi-step compromise paths.
- Agentic Attack Chain Library - Structured stubs for prompt injection, poisoned context, memory poisoning, unsafe MCP extensions, credential overreach, fake approvals, and related chain patterns.
- Lakera Progressive Breach Model
- Vendor analysis of how agentic compromise can progress from manipulated intent to tool use, delegated authority, propagation, and containment failure.
- Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
- Foundational research on external content influencing LLM-integrated applications.
- AgentDojo - Benchmark and evaluation environment for indirect prompt injection and defences in tool-using agents.
- Lakera Agent Breaker - Public challenge environment for learning about agentic prompt-injection, tool, browsing, memory, and data-exfiltration scenarios.
- OWASP GenAI Red Teaming Guide
- Methodology for planning and running GenAI red teaming across model, implementation, infrastructure, and runtime layers.
- Prompt Injection to Tool Misuse
- Defensive attack-chain stub for modelling instruction compromise through tool execution.
- Secure Tool Calling - Pattern for tool brokers, schemas, scopes, allow-lists, side-effect controls, and approval gates.
- Secure MCP - Pattern for trust boundaries, transport hardening, capability scoping, and untrusted-context handling in Model Context Protocol integrations.
- Secure Agent Runtime - Pattern for sandboxing, isolation, policy enforcement, and observability inside the execution loop.
- OWASP Agentic Skills Top 10
- Emerging guidance for the security of reusable agent skills and extension ecosystems.
- NVIDIA NeMo Agent Toolkit Safety and Security Example
- Practical example of agent workflow red teaming and risk scoring.
- Tools catalogue - Defensive tools for red teaming, evaluation, observability, inventory, and runtime control.
- Memory Security - Pattern for memory write paths, provenance, poisoning detection, retention controls, and audit evidence.
- AgentPoison - Research on poisoning agent memory or knowledge bases to influence future behaviour.
- A Practical Memory Injection Attack Against LLM Agents
- Research framing long-term memory as persistent, untrusted input.
- Poisoned Retrieved Context
- Defensive chain stub for modelling malicious or misleading retrieved content.
- Memory Poisoning - Defensive chain stub for persistent state manipulation and delayed effects.
- Credential and Token Boundaries
- Pattern for delegated authority, scoped tokens, credential brokers, and least-privilege impersonation.
- Credential Overreach - Defensive attack-chain stub for excessive authority and weak token boundaries.
- OWASP Top 10 for Agentic Applications 2026
- Includes identity, privilege abuse, tool misuse, and excessive agency concerns for autonomous systems.
- Lakera: AI Gateways
- Architecture discussion of identity, routing, policy enforcement, telemetry, and tool governance at AI gateway layers.
- Lakera: From Access Control to Outcome Control
- Vendor analysis that separates valid access from acceptable outcomes in agentic systems.
- AgentDojo - Evaluation environment for indirect prompt injection and defences in tool-using agents.
- CyberSecEval
- Cybersecurity benchmark suite for LLMs used in coding, analysis, and automation contexts.
- CyberGym - Benchmark environment for real-world AI-agent vulnerability analysis, reproduction, and verification tasks.
- ExploitGym - Capability benchmark for whether AI agents can turn known vulnerabilities into working exploits; use as a defensive risk signal, not operational guidance.
- Inspect AI - Evaluation framework from the UK AI Security Institute for structured tasks, solvers, scorers, and logs.
- Benchmark catalogue - Benchmarks, testbeds, and evaluation methods with proof limits and maturity notes.
This section tracks the defensive governance problem created by AI systems that can assist with vulnerability discovery, exploit-capability evaluation, patch verification, disclosure workflows, and forensic traceability. It does not provide exploitation instructions.
- Anthropic Mythos Preview - Vendor technical capability report on autonomous vulnerability discovery, exploit-capability evaluation, benchmark saturation, and coordinated disclosure constraints.
- Project Glasswing - Controlled defensive deployment programme for applying cyber-capable model access to critical software security work.
- Anthropic coordinated vulnerability disclosure
- Operating principles for human-reviewed, AI-labelled, paced disclosure of AI-discovered vulnerabilities.
- Anthropic / Mozilla Firefox security collaboration
- Case study on maintainer needs for minimal test cases, candidate patches, task verifiers, and reproducible evidence.
- CyberGym - Defensive evaluation environment for vulnerability reproduction, incomplete patch discovery, open-ended discovery, and sanitiser-backed validation.
- ExploitGym - High-risk capability benchmark for exploit generation, useful for governance and defensive preparedness.
- Anthropic Frontier Safety Roadmap
- Public roadmap for safeguards, cyber misuse detection, red teaming, model-weight security, and AI-assisted defence.
- METR common elements of frontier AI safety policies
- Cross-policy reference for dangerous capability thresholds, including offensive cybersecurity.
- UK NCSC frontier AI guidance
- Public-sector guidance on defender readiness as frontier AI changes the cost, speed, and scale of cyber operations.
- CETaS / Alan Turing Institute Mythos analysis
- Independent analysis of Mythos, Project Glasswing, restricted access, open-weight risk, and defensive capacity.
- UK AI Security Institute Frontier AI Trends Report
- Public evidence on frontier model trends, including cyber tasks, autonomy, and capability evaluation.
- AddressSanitizer - Sanitizer-based verification layer for memory-safety findings and patch validation.
- Cyber-capable AI agents catalogue - Fuller defensive catalogue for Mythos, Glasswing, CyberGym, ExploitGym, disclosure, verification, frontier governance, and watch areas.
- Defence Architecture - Control model for capturing prompts, context, tool calls, memory reads and writes, approvals, outputs, and downstream actions.
- Observability and Audit Trail Visual
- Diagram source for evidence capture across agentic execution paths.
- Resource Quality Rubric - Criteria for treating catalogue entries as evidence for judgement rather than endorsements.
- Agent Security Readiness Rubric
- Scorecard for evaluating whether an agent system has credible controls and evidence before deployment.
- Anthropic coordinated vulnerability disclosure
- Useful reference for evidence handling around AI-discovered vulnerabilities and maintainer workflows.
- NIST AI Risk Management Framework
- Governance framework for mapping, measuring, managing, and governing AI risk.
- NIST Center for AI Standards and Innovation - U.S. public-sector work on AI evaluation, measurement science, standards, and frontier risk assessment.
- Anthropic Responsible Scaling Policy v3.0
- Vendor policy for frontier capability monitoring, deployment safeguards, security levels, and public roadmaps.
- METR common elements of frontier AI safety policies
- Taxonomy reference for comparing frontier AI safety policies and dangerous capability thresholds.
- Open Research Questions - Repository map of unresolved questions around agentic execution security, evaluation, governance, and assurance.
- Awesome Physical AI - Companion field guide for robotics, embodied agents, and sensor-driven systems.
- UK NCSC frontier AI guidance
- Strategic guidance relevant to software-controlled, networked, and autonomous systems.
- Cyber-capable AI agents catalogue - Includes a defensive note on robotics and physical AI implications where cyber capability affects software-controlled physical systems.
- CETaS / Alan Turing Institute Mythos analysis
- Independent analysis of restricted access, open-weight proliferation risk, defensive capacity, and governance trade-offs.
- UK NCSC frontier AI guidance
- Notes the defensive implications of frontier capability transfer, open-weight models, and removed safeguards.
- METR common elements of frontier AI safety policies
- Reference for monitoring dangerous capabilities and thresholds across frontier AI policies.
- Open-weight cyber-capability watch areas
- Evidence-led catalogue section for tracking open-weight and frontier cyber-capability risk without unsupported claims.
- Secure Agent Runtime - Runtime boundaries, sandboxing, policy enforcement, and audit evidence.
- Secure Tool Calling - Tool schemas, brokers, scopes, side-effect controls, and approval gates.
- Secure MCP - Model Context Protocol boundaries, trust assumptions, and capability scoping.
- Memory Security - Memory write controls, provenance, poisoning detection, and retention.
- Credential and Token Boundaries
- Delegated authority, credential brokers, scoped tokens, and impersonation controls.
- Secure Engineering Patterns - How the threat model, attack surfaces, and chain interruptions map to reusable implementation controls.
| Section | Use it for |
|---|---|
| Docs | Conceptual maps, threat models, breach chains, defence architecture, evaluation, governance, case studies, and open questions. |
| Resources | Curated standards, frameworks, vendor research, papers, tools, benchmarks, cyber-capable AI agents, and evidence requirements. |
| Patterns | Secure engineering patterns for agent runtimes, tool calling, MCP, memory, credentials, approval, sandboxing, observability, and policy enforcement. |
| Visuals | Mermaid diagrams for execution boundaries, action paths, control points, and reference architectures. |
Companion field guides by the same maintainer covering adjacent areas of AI. Read alongside this repository for broader context on how agentic AI is being built and applied beyond the security boundary.
| Repository | Focus |
|---|---|
| Awesome Agentic Engineering | Engineering practices, patterns, and tooling for building agentic AI systems. |
| Awesome AI Scientists | AI for scientific research, discovery, and AI-as-scientist tooling. |
| Awesome Physical AI | Physical AI: robotics, embodied agents, and sensor-driven systems. |
This project is released under the MIT License.
Thrilled to have you here. Whether it is a quick typo fix, a fresh resource, a doc polish, or a sweeping overhaul - every contribution helps this list grow. Jump in and join the community - PRs of every size are welcome.
