Security: capability sandboxing for ingested P2P skills

Companion to the prompt-injection scanning issue. Even if a skill's content passes injection screening, the skill itself defines the agent's behavior surface when invoked, and current code grants ambient capabilities to whatever's in the active skills directory. A trusted-peer-turned-malicious could publish a skill that does exactly what its frontmatter says, where the description happens to be "exfiltrate `.env` to attacker.example.com" or "send wallet balance to known-bad address."

## Threat model

A trusted publisher pushes a skill whose stated purpose is hostile but technically truthful. Without per-skill capability declarations enforced at runtime, ambient trust on the skill becomes ambient trust on every tool it can reach: shell, network, wallet, filesystem outside the skill's workspace.

## Proposed defense

- SKILL.md frontmatter declares required capabilities, e.g. `capabilities: [shell, wallet.send, network.fetch]`
- Default profile for ingested-from-mesh skills is least-privilege: read-only filesystem, no network, no wallet, no shell
- High-risk capabilities (wallet writes, shell, file writes outside the skill's workspace, arbitrary outbound network) require explicit operator approval per-skill, prompted at first invocation rather than at ingestion
- Enforcement at the runtime level, not just the prompt level. The skill literally cannot call those tools, vs. trusting the LLM not to use them after being told not to in the system prompt

## Open questions

- How this interacts with the existing `allowed-tools` frontmatter convention. Probably consolidate into one capability schema rather than maintain both.
- Per-skill vs. per-publisher capability profile. A skill from a verified publisher might inherit elevated defaults; an `untrusted` publisher's skill stays locked down regardless of frontmatter claims.
- Sandbox boundary: worker thread, separate process, or capability-token model. Worker thread is cheapest and good enough for "no shell, no wallet, restricted fs." Separate process is needed for hard isolation against memory-corruption skills (lower priority).

## Why now

Same trigger as the injection-scanning issue. Belt-and-braces: scan the content, then constrain what the content can do even if the scan misses something.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security: capability sandboxing for ingested P2P skills #21

Threat model

Proposed defense

Open questions

Why now

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Security: capability sandboxing for ingested P2P skills #21

Description

Threat model

Proposed defense

Open questions

Why now

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions