Skip to content

Security: prompt-injection scanning on inbound P2P skill content before LLM exposure #20

@VGIL77

Description

@VGIL77

The current P2P skill ingestion pipeline at src/agents/skills/ingest.ts defends against unauthorized injection (Ed25519 signature on the bytes, SHA-256 content hash, SKILL.md schema validation, quarantine-by-default, per-peer rate limiting). What it doesn't defend against: a signed-but-malicious skill from a compromised trusted peer, where the SKILL.md bytes themselves contain adversarial instructions targeting the LLM that later reads the skill.

Threat model

An attacker with control of a previously-trusted pubkey publishes a skill whose content embeds prompt-injection text: "ignore prior instructions," planted role markers, encoded exfil instructions, etc. Under auto policy with that pubkey on the trust list, current code accepts it directly into the skills directory (ingest.ts:139-176). Next time the agent loads the skills snapshot for inference, the payload reaches the LLM with no further gating.

Signing and hashing don't help here. The transport layer can't see what the content layer says.

Proposed defense

Pre-LLM injection scan, run before bumpSkillsSnapshotVersion makes the skill visible to the agent:

  • Rule-based heuristics for known patterns (jailbreak strings, unusual role markers, base64/hex-encoded payloads inside descriptions, suspicious YAML frontmatter)
  • Optional cheap classifier pass for borderline cases
  • Suspicious content gets force-quarantined regardless of pubkey trust level; operator must explicitly approve out of skills-incoming/
  • Existing validateSkillContent at ingest.ts:303 is the natural place to layer this in as a tiered validator

Out of scope here

Capability sandboxing on what skills can do once active is a separate problem, filed as its own issue. This one is purely about catching adversarial content before the LLM sees it.

Why now

Surfaced from public scrutiny on the architecture (a r/LocalLLaMA reader called this exact gap). The transport-layer defenses are real but the content-layer defense is the right next thing to ship.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededsecuritySecurity: threat models, hardening, defensive work

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions