Skip to content

Security: capability sandboxing for ingested P2P skills #21

@VGIL77

Description

@VGIL77

Companion to the prompt-injection scanning issue. Even if a skill's content passes injection screening, the skill itself defines the agent's behavior surface when invoked, and current code grants ambient capabilities to whatever's in the active skills directory. A trusted-peer-turned-malicious could publish a skill that does exactly what its frontmatter says, where the description happens to be "exfiltrate .env to attacker.example.com" or "send wallet balance to known-bad address."

Threat model

A trusted publisher pushes a skill whose stated purpose is hostile but technically truthful. Without per-skill capability declarations enforced at runtime, ambient trust on the skill becomes ambient trust on every tool it can reach: shell, network, wallet, filesystem outside the skill's workspace.

Proposed defense

  • SKILL.md frontmatter declares required capabilities, e.g. capabilities: [shell, wallet.send, network.fetch]
  • Default profile for ingested-from-mesh skills is least-privilege: read-only filesystem, no network, no wallet, no shell
  • High-risk capabilities (wallet writes, shell, file writes outside the skill's workspace, arbitrary outbound network) require explicit operator approval per-skill, prompted at first invocation rather than at ingestion
  • Enforcement at the runtime level, not just the prompt level. The skill literally cannot call those tools, vs. trusting the LLM not to use them after being told not to in the system prompt

Open questions

  • How this interacts with the existing allowed-tools frontmatter convention. Probably consolidate into one capability schema rather than maintain both.
  • Per-skill vs. per-publisher capability profile. A skill from a verified publisher might inherit elevated defaults; an untrusted publisher's skill stays locked down regardless of frontmatter claims.
  • Sandbox boundary: worker thread, separate process, or capability-token model. Worker thread is cheapest and good enough for "no shell, no wallet, restricted fs." Separate process is needed for hard isolation against memory-corruption skills (lower priority).

Why now

Same trigger as the injection-scanning issue. Belt-and-braces: scan the content, then constrain what the content can do even if the scan misses something.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededsecuritySecurity: threat models, hardening, defensive work

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions