Awesome Agentic AI Security

Visit the Awesome Agentic AI Security project site · site source

The security boundary has moved from the model to the agentic execution system.

A curated list of resources, standards, benchmarks, tools, threat models, architectures, and research for securing agentic, multi-agent, tool-using, memory-bearing, and cyber-capable AI systems.

Start Here

Landscape Map - System-level map of prompts, context, tools, credentials, memory, approvals, and downstream action.
Threat Model - Failure modes, preconditions, impact paths, and control questions for agentic systems.
Attack Surfaces - Where language, context, authority, state, tools, memory, and policies expose risk.
Agentic Attack Chains - How local weaknesses compose into breach paths and where defenders can interrupt them.
Defence Architecture - Runtime control model for observing, interpreting, constraining, auditing, discovering, protecting, and governing agentic systems.
Resource Catalogue - Standards, frameworks, research, tools, benchmarks, cyber-capable AI agents, and evidence requirements.
Patterns - Secure engineering patterns for runtime boundaries, tool calling, MCP, memory, credentials, and approval.
Visuals - Mermaid diagrams for execution boundaries, action paths, control points, and reference architectures.

Core Concepts
Standards and Frameworks
Threat Models and Attack Surfaces
Prompt Injection and Instruction Attacks
Tool Use, MCP, and Runtime Security
Memory, State, and Context Security
Credentials, Identity, and Delegated Authority
Benchmarks and Evaluations
Cyber-Capable AI Agents
Observability, Audit, and Forensics
Governance and Assurance
Physical AI and Robotics Security
Open-Weight and Frontier Capability Risks
Engineering Patterns
Docs and Maps
Related Projects
Licence
Contributing

Core Concepts

Agentic systems behave less like isolated chat applications and more like distributed execution environments. Instructions can shape tool calls, trigger workflows, update memory, write code, route data, and influence decisions across enterprise systems.

The central security question is:

What can this AI system do, under whose authority, with which tools, using which data, with what memory, and under what controls?

Useful security for these systems must understand the relationship between intent, authority, action, context, and outcome.

flowchart TB
    UP["User prompt"]
    RD["Retrieved context"]
    SR["System rules"]
    AR["Agentic reasoning<br/>Goals emerge at runtime"]
    IK["Internal knowledge"]
    EA["External APIs"]
    OT["Operational tools"]
    Risk["Risk accumulation<br/>Composed outcomes may exceed approved scope"]

    UP --> AR
    RD --> AR
    SR --> AR
    AR -->|permitted step| IK
    AR -->|permitted step| EA
    AR -->|permitted step| OT
    IK --> Risk
    EA --> Risk
    OT --> Risk

Text description of the Risk Accumulation flow

The diagram illustrates how a user prompt, retrieved context, and system rules are processed by agentic reasoning. This reasoning leads to several permitted actions: querying internal knowledge, calling external APIs, or using operational tools. These actions collectively lead to "Risk accumulation," where the final composed outcomes of the agent's work may exceed the originally approved security scope.

The repository organises controls around the AI Defense Plane: discover where agents, tools, prompts, data flows, credentials, memory, and autonomous workflows exist; protect tool use, memory writes, credentials, and actions; and govern evidence, audit trails, delegated authority, and risk acceptance. The fuller model is in Defence Architecture.

Standards and Frameworks

OWASP Top 10 for LLM Applications - Security risks and mitigations for LLM applications, including prompt injection, sensitive information disclosure, unsafe output handling, excessive agency, and supply-chain concerns.
OWASP Top 10 for Agentic Applications 2026
- Agentic security taxonomy for autonomous systems that plan, act, use tools, and make workflow decisions.
OWASP Securing Agentic Applications Guide 1.0
- Practical guidance for designing, developing, and deploying secure LLM-powered agentic applications.
MITRE ATLAS - Knowledge base for adversary tactics and techniques against AI-enabled systems.
NIST AI Risk Management Framework
- Governance and risk-management framework for trustworthy AI systems.
NIST AI RMF Generative AI Profile - Generative-AI-specific risk profile that can support governance for agentic applications.
Full standards and frameworks catalogue
- Metadata-labelled entries with relevance, coverage, maturity, and limitations.

Threat Models and Attack Surfaces

Agentic AI Threat Model - Repository threat model for failure modes across prompts, tools, memory, credentials, approvals, and multi-agent workflows.
Attack Surfaces: Agentic Execution Systems - Boundary map for language, context, authority, state, policies, tools, and downstream systems.
Agentic Attack Chains - Defensive chain model for recognising and interrupting multi-step compromise paths.
Agentic Attack Chain Library - Structured stubs for prompt injection, poisoned context, memory poisoning, unsafe MCP extensions, credential overreach, fake approvals, and related chain patterns.
Lakera Progressive Breach Model
- Vendor analysis of how agentic compromise can progress from manipulated intent to tool use, delegated authority, propagation, and containment failure.

Prompt Injection and Instruction Attacks

Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
- Foundational research on external content influencing LLM-integrated applications.
AgentDojo - Benchmark and evaluation environment for indirect prompt injection and defences in tool-using agents.
Lakera Agent Breaker - Public challenge environment for learning about agentic prompt-injection, tool, browsing, memory, and data-exfiltration scenarios.
OWASP GenAI Red Teaming Guide
- Methodology for planning and running GenAI red teaming across model, implementation, infrastructure, and runtime layers.
Prompt Injection to Tool Misuse
- Defensive attack-chain stub for modelling instruction compromise through tool execution.

Tool Use, MCP, and Runtime Security

Secure Tool Calling - Pattern for tool brokers, schemas, scopes, allow-lists, side-effect controls, and approval gates.
Secure MCP - Pattern for trust boundaries, transport hardening, capability scoping, and untrusted-context handling in Model Context Protocol integrations.
Secure Agent Runtime - Pattern for sandboxing, isolation, policy enforcement, and observability inside the execution loop.
OWASP Agentic Skills Top 10
- Emerging guidance for the security of reusable agent skills and extension ecosystems.
NVIDIA NeMo Agent Toolkit Safety and Security Example
- Practical example of agent workflow red teaming and risk scoring.
Tools catalogue - Defensive tools for red teaming, evaluation, observability, inventory, and runtime control.

Memory, State, and Context Security

Memory Security - Pattern for memory write paths, provenance, poisoning detection, retention controls, and audit evidence.
AgentPoison - Research on poisoning agent memory or knowledge bases to influence future behaviour.
A Practical Memory Injection Attack Against LLM Agents
- Research framing long-term memory as persistent, untrusted input.
Poisoned Retrieved Context
- Defensive chain stub for modelling malicious or misleading retrieved content.
Memory Poisoning - Defensive chain stub for persistent state manipulation and delayed effects.

Credentials, Identity, and Delegated Authority

Credential and Token Boundaries
- Pattern for delegated authority, scoped tokens, credential brokers, and least-privilege impersonation.
Credential Overreach - Defensive attack-chain stub for excessive authority and weak token boundaries.
OWASP Top 10 for Agentic Applications 2026
- Includes identity, privilege abuse, tool misuse, and excessive agency concerns for autonomous systems.
Lakera: AI Gateways
- Architecture discussion of identity, routing, policy enforcement, telemetry, and tool governance at AI gateway layers.
Lakera: From Access Control to Outcome Control
- Vendor analysis that separates valid access from acceptable outcomes in agentic systems.

Benchmarks and Evaluations

AgentDojo - Evaluation environment for indirect prompt injection and defences in tool-using agents.
CyberSecEval
- Cybersecurity benchmark suite for LLMs used in coding, analysis, and automation contexts.
CyberGym - Benchmark environment for real-world AI-agent vulnerability analysis, reproduction, and verification tasks.
ExploitGym - Capability benchmark for whether AI agents can turn known vulnerabilities into working exploits; use as a defensive risk signal, not operational guidance.
Inspect AI - Evaluation framework from the UK AI Security Institute for structured tasks, solvers, scorers, and logs.
Benchmark catalogue - Benchmarks, testbeds, and evaluation methods with proof limits and maturity notes.

Cyber-Capable AI Agents

This section tracks the defensive governance problem created by AI systems that can assist with vulnerability discovery, exploit-capability evaluation, patch verification, disclosure workflows, and forensic traceability. It does not provide exploitation instructions.

Anthropic Mythos Preview - Vendor technical capability report on autonomous vulnerability discovery, exploit-capability evaluation, benchmark saturation, and coordinated disclosure constraints.
Project Glasswing - Controlled defensive deployment programme for applying cyber-capable model access to critical software security work.
Anthropic coordinated vulnerability disclosure
- Operating principles for human-reviewed, AI-labelled, paced disclosure of AI-discovered vulnerabilities.
Anthropic / Mozilla Firefox security collaboration
- Case study on maintainer needs for minimal test cases, candidate patches, task verifiers, and reproducible evidence.
CyberGym - Defensive evaluation environment for vulnerability reproduction, incomplete patch discovery, open-ended discovery, and sanitiser-backed validation.
ExploitGym - High-risk capability benchmark for exploit generation, useful for governance and defensive preparedness.
Anthropic Frontier Safety Roadmap
- Public roadmap for safeguards, cyber misuse detection, red teaming, model-weight security, and AI-assisted defence.
METR common elements of frontier AI safety policies
- Cross-policy reference for dangerous capability thresholds, including offensive cybersecurity.
UK NCSC frontier AI guidance
- Public-sector guidance on defender readiness as frontier AI changes the cost, speed, and scale of cyber operations.
CETaS / Alan Turing Institute Mythos analysis
- Independent analysis of Mythos, Project Glasswing, restricted access, open-weight risk, and defensive capacity.
UK AI Security Institute Frontier AI Trends Report
- Public evidence on frontier model trends, including cyber tasks, autonomy, and capability evaluation.
AddressSanitizer - Sanitizer-based verification layer for memory-safety findings and patch validation.
Cyber-capable AI agents catalogue - Fuller defensive catalogue for Mythos, Glasswing, CyberGym, ExploitGym, disclosure, verification, frontier governance, and watch areas.

Observability, Audit, and Forensics

Defence Architecture - Control model for capturing prompts, context, tool calls, memory reads and writes, approvals, outputs, and downstream actions.
Observability and Audit Trail Visual
- Diagram source for evidence capture across agentic execution paths.
Resource Quality Rubric - Criteria for treating catalogue entries as evidence for judgement rather than endorsements.
Agent Security Readiness Rubric
- Scorecard for evaluating whether an agent system has credible controls and evidence before deployment.
Anthropic coordinated vulnerability disclosure
- Useful reference for evidence handling around AI-discovered vulnerabilities and maintainer workflows.

Governance and Assurance

NIST AI Risk Management Framework
- Governance framework for mapping, measuring, managing, and governing AI risk.
NIST Center for AI Standards and Innovation - U.S. public-sector work on AI evaluation, measurement science, standards, and frontier risk assessment.
Anthropic Responsible Scaling Policy v3.0
- Vendor policy for frontier capability monitoring, deployment safeguards, security levels, and public roadmaps.
METR common elements of frontier AI safety policies
- Taxonomy reference for comparing frontier AI safety policies and dangerous capability thresholds.
Open Research Questions - Repository map of unresolved questions around agentic execution security, evaluation, governance, and assurance.

Physical AI and Robotics Security

Awesome Physical AI - Companion field guide for robotics, embodied agents, and sensor-driven systems.
UK NCSC frontier AI guidance
- Strategic guidance relevant to software-controlled, networked, and autonomous systems.
Cyber-capable AI agents catalogue - Includes a defensive note on robotics and physical AI implications where cyber capability affects software-controlled physical systems.

Open-Weight and Frontier Capability Risks

CETaS / Alan Turing Institute Mythos analysis
- Independent analysis of restricted access, open-weight proliferation risk, defensive capacity, and governance trade-offs.
UK NCSC frontier AI guidance
- Notes the defensive implications of frontier capability transfer, open-weight models, and removed safeguards.
METR common elements of frontier AI safety policies
- Reference for monitoring dangerous capabilities and thresholds across frontier AI policies.
Open-weight cyber-capability watch areas
- Evidence-led catalogue section for tracking open-weight and frontier cyber-capability risk without unsupported claims.

Engineering Patterns

Secure Agent Runtime - Runtime boundaries, sandboxing, policy enforcement, and audit evidence.
Secure Tool Calling - Tool schemas, brokers, scopes, side-effect controls, and approval gates.
Secure MCP - Model Context Protocol boundaries, trust assumptions, and capability scoping.
Memory Security - Memory write controls, provenance, poisoning detection, and retention.
Credential and Token Boundaries
- Delegated authority, credential brokers, scoped tokens, and impersonation controls.
Secure Engineering Patterns - How the threat model, attack surfaces, and chain interruptions map to reusable implementation controls.

Docs and Maps

Section	Use it for
Docs	Conceptual maps, threat models, breach chains, defence architecture, evaluation, governance, case studies, and open questions.
Resources	Curated standards, frameworks, vendor research, papers, tools, benchmarks, cyber-capable AI agents, and evidence requirements.
Patterns	Secure engineering patterns for agent runtimes, tool calling, MCP, memory, credentials, approval, sandboxing, observability, and policy enforcement.
Visuals	Mermaid diagrams for execution boundaries, action paths, control points, and reference architectures.

Related Projects

Companion field guides by the same maintainer covering adjacent areas of AI. Read alongside this repository for broader context on how agentic AI is being built and applied beyond the security boundary.

Repository	Focus
Awesome Agentic Engineering	Engineering practices, patterns, and tooling for building agentic AI systems.
Awesome AI Scientists	AI for scientific research, discovery, and AI-as-scientist tooling.
Awesome Physical AI	Physical AI: robotics, embodied agents, and sensor-driven systems.

Licence

This project is released under the MIT License.

Contributing

Thrilled to have you here. Whether it is a quick typo fix, a fresh resource, a doc polish, or a sweeping overhaul - every contribution helps this list grow. Jump in and join the community - PRs of every size are welcome.

Read the contributing guide · good first issues

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.claude/worktrees		.claude/worktrees
.devcontainer		.devcontainer
.github		.github
assets		assets
docs		docs
patterns		patterns
resources		resources
rubrics		rubrics
site		site
visuals		visuals
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Agentic AI Security

Start Here

Contents

Core Concepts

Standards and Frameworks

Threat Models and Attack Surfaces

Prompt Injection and Instruction Attacks

Tool Use, MCP, and Runtime Security

Memory, State, and Context Security

Credentials, Identity, and Delegated Authority

Benchmarks and Evaluations

Cyber-Capable AI Agents

Observability, Audit, and Forensics

Governance and Assurance

Physical AI and Robotics Security

Open-Weight and Frontier Capability Risks

Engineering Patterns

Docs and Maps

Related Projects

Licence

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Awesome Agentic AI Security

Start Here

Contents

Core Concepts

Standards and Frameworks

Threat Models and Attack Surfaces

Prompt Injection and Instruction Attacks

Tool Use, MCP, and Runtime Security

Memory, State, and Context Security

Credentials, Identity, and Delegated Authority

Benchmarks and Evaluations

Cyber-Capable AI Agents

Observability, Audit, and Forensics

Governance and Assurance

Physical AI and Robotics Security

Open-Weight and Frontier Capability Risks

Engineering Patterns

Docs and Maps

Related Projects

Licence

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages