Skip to content

[Epic]: Support Agent Skills via On-Demand SKILL.md Discovery #91

@sami-marreed

Description

@sami-marreed

Feature Request

Add support for Agent Skills — a dynamic, on-demand skill discovery and injection system powered by SKILL.md files. Instead of stuffing every possible instruction into the agent's initial system prompt (which wastes tokens and can confuse the model), cuga would use a lightweight skill registry pattern where full instructions are loaded only when needed.

The first reference skill to validate the system against is the Anthropic PPTX skill, which exercises the full breadth of what skill execution requires: Python scripts, Node.js tooling (pptxgenjs), and shell commands (LibreOffice, pdftoppm) — making it an ideal integration test for the entire skill pipeline.


Motivation / Problem

As cuga handles increasingly complex and varied user tasks, the system prompt grows unwieldy. Including every possible workflow guide upfront:

  • Wastes context window tokens on irrelevant instructions
  • Increases latency and cost per request
  • Can confuse the agent with conflicting or noisy guidance

A skill system solves this by keeping the system prompt lean and loading task-specific instructions on-demand.


Use Case

As a developer using cuga for diverse tasks (git releases, deployments, presentations, code reviews, etc.), I want cuga to automatically discover relevant skill files in my project (.cuga/skills/) or globally (~/.config/cuga/skills/), surface them to the agent only when relevant, and inject full instructions only when the agent actually needs them — keeping every interaction fast, focused, and token-efficient.


Proposed Solution

A complete end-to-end skills flow:

Step 1: Skill Discovery

Before each session, cuga scans project-local (.cuga/skills/) and global (~/.config/cuga/skills/) directories for SKILL.md files. It reads the YAML frontmatter (name, description) of each file and injects a lightweight <available_skills> list into the agent's system prompt.

Step 2: Agent Decision & Tool Call

The agent analyzes the user's prompt and, if it matches a skill description, emits a tool call to load the full skill:

skill({ name: "pptx" })

Step 3: Permission Gate

Each skill can define a permission policy (e.g., ask, auto, deny). If set to ask, cuga pauses and requests human approval before loading or executing. This is the primary safety layer for sensitive skills.

Step 4: Context Injection

If permitted, cuga reads the full Markdown body of the matched SKILL.md and injects it into the agent's context. This file contains the heavy-lifting instructions: specific rules, workflow constraints, architecture guidelines, or exact shell commands.

Step 5: Execution

The agent executes the skill using its standard tools (bash, write, edit). These tools remain governed by their own permission settings (e.g., bash with ask mode still requires approval per command).

The PPTX skill is a great first test because it requires all three execution modalities:

  • Python: python -m markitdown presentation.pptx, python scripts/thumbnail.py
  • Node.js / npm: npm install -g pptxgenjs for creating slides from scratch
  • Shell tools: soffice --headless --convert-to pdf, pdftoppm for rendering

Step 6: Sandbox Integration

Skill execution should respect the configured sandbox environment:

  • Containerized sandboxes (e.g., rootless Podman/Docker via an entersh-style script): skill bash commands run inside the container, isolating them from the host OS
  • Plugin sandboxes (e.g., Daytona, DevContainers): skills execute within the managed remote environment

Alternatives Considered

  • Static system prompt expansion: Add all skill instructions upfront. Works but doesn't scale — wastes tokens and degrades quality as the number of skills grows.
  • Manual user invocation: Require users to explicitly load a skill file. Removes the "automatic discovery" value and adds friction.
  • LLM tool use without SKILL.md: Use generic tool calls with inline descriptions. Loses the ability for users to customize and extend skills via plain Markdown files in their own repo.

Priority

High - Important for my workflow

Implementation Complexity (if known)

Complex - Significant development effort


Additional Context

This pattern is directly inspired by the Cursor Agent Skills system, which uses a similar SKILL.md + on-demand injection model. The key differentiator for cuga is the permission gate (Step 3) and first-class sandbox integration (Step 6), making it safer for agentic use in production and CI environments.

The Anthropic PPTX skill is the reference implementation to validate against. It is representative of real-world skill complexity: multi-step QA loops, subagent delegation, cross-tool dependencies (Python + Node + system binaries), and a rich set of design guidelines injected as context.

Skill file example (.cuga/skills/pptx/SKILL.md frontmatter):

---
name: pptx
description: "Use this skill any time a .pptx file is involved — creating, reading, editing, or converting presentations."
permissions:
  bash: ask
---

Checklist

  • I have searched existing issues and feature requests to ensure this is not a duplicate
  • I have provided a clear use case and motivation for this feature
  • I am willing to help test this feature once implemented
  • I am interested in contributing to the implementation of this feature

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

Status
In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions