Feature Request
Add support for Agent Skills — a dynamic, on-demand skill discovery and injection system powered by SKILL.md files. Instead of stuffing every possible instruction into the agent's initial system prompt (which wastes tokens and can confuse the model), cuga would use a lightweight skill registry pattern where full instructions are loaded only when needed.
The first reference skill to validate the system against is the Anthropic PPTX skill, which exercises the full breadth of what skill execution requires: Python scripts, Node.js tooling (pptxgenjs), and shell commands (LibreOffice, pdftoppm) — making it an ideal integration test for the entire skill pipeline.
Motivation / Problem
As cuga handles increasingly complex and varied user tasks, the system prompt grows unwieldy. Including every possible workflow guide upfront:
- Wastes context window tokens on irrelevant instructions
- Increases latency and cost per request
- Can confuse the agent with conflicting or noisy guidance
A skill system solves this by keeping the system prompt lean and loading task-specific instructions on-demand.
Use Case
As a developer using cuga for diverse tasks (git releases, deployments, presentations, code reviews, etc.), I want cuga to automatically discover relevant skill files in my project (.cuga/skills/) or globally (~/.config/cuga/skills/), surface them to the agent only when relevant, and inject full instructions only when the agent actually needs them — keeping every interaction fast, focused, and token-efficient.
Proposed Solution
A complete end-to-end skills flow:
Step 1: Skill Discovery
Before each session, cuga scans project-local (.cuga/skills/) and global (~/.config/cuga/skills/) directories for SKILL.md files. It reads the YAML frontmatter (name, description) of each file and injects a lightweight <available_skills> list into the agent's system prompt.
Step 2: Agent Decision & Tool Call
The agent analyzes the user's prompt and, if it matches a skill description, emits a tool call to load the full skill:
Step 3: Permission Gate
Each skill can define a permission policy (e.g., ask, auto, deny). If set to ask, cuga pauses and requests human approval before loading or executing. This is the primary safety layer for sensitive skills.
Step 4: Context Injection
If permitted, cuga reads the full Markdown body of the matched SKILL.md and injects it into the agent's context. This file contains the heavy-lifting instructions: specific rules, workflow constraints, architecture guidelines, or exact shell commands.
Step 5: Execution
The agent executes the skill using its standard tools (bash, write, edit). These tools remain governed by their own permission settings (e.g., bash with ask mode still requires approval per command).
The PPTX skill is a great first test because it requires all three execution modalities:
- Python:
python -m markitdown presentation.pptx, python scripts/thumbnail.py
- Node.js / npm:
npm install -g pptxgenjs for creating slides from scratch
- Shell tools:
soffice --headless --convert-to pdf, pdftoppm for rendering
Step 6: Sandbox Integration
Skill execution should respect the configured sandbox environment:
- Containerized sandboxes (e.g., rootless Podman/Docker via an
entersh-style script): skill bash commands run inside the container, isolating them from the host OS
- Plugin sandboxes (e.g., Daytona, DevContainers): skills execute within the managed remote environment
Alternatives Considered
- Static system prompt expansion: Add all skill instructions upfront. Works but doesn't scale — wastes tokens and degrades quality as the number of skills grows.
- Manual user invocation: Require users to explicitly load a skill file. Removes the "automatic discovery" value and adds friction.
- LLM tool use without SKILL.md: Use generic tool calls with inline descriptions. Loses the ability for users to customize and extend skills via plain Markdown files in their own repo.
Priority
High - Important for my workflow
Implementation Complexity (if known)
Complex - Significant development effort
Additional Context
This pattern is directly inspired by the Cursor Agent Skills system, which uses a similar SKILL.md + on-demand injection model. The key differentiator for cuga is the permission gate (Step 3) and first-class sandbox integration (Step 6), making it safer for agentic use in production and CI environments.
The Anthropic PPTX skill is the reference implementation to validate against. It is representative of real-world skill complexity: multi-step QA loops, subagent delegation, cross-tool dependencies (Python + Node + system binaries), and a rich set of design guidelines injected as context.
Skill file example (.cuga/skills/pptx/SKILL.md frontmatter):
---
name: pptx
description: "Use this skill any time a .pptx file is involved — creating, reading, editing, or converting presentations."
permissions:
bash: ask
---
Checklist
Feature Request
Add support for Agent Skills — a dynamic, on-demand skill discovery and injection system powered by
SKILL.mdfiles. Instead of stuffing every possible instruction into the agent's initial system prompt (which wastes tokens and can confuse the model), cuga would use a lightweight skill registry pattern where full instructions are loaded only when needed.The first reference skill to validate the system against is the Anthropic PPTX skill, which exercises the full breadth of what skill execution requires: Python scripts, Node.js tooling (
pptxgenjs), and shell commands (LibreOffice,pdftoppm) — making it an ideal integration test for the entire skill pipeline.Motivation / Problem
As cuga handles increasingly complex and varied user tasks, the system prompt grows unwieldy. Including every possible workflow guide upfront:
A skill system solves this by keeping the system prompt lean and loading task-specific instructions on-demand.
Use Case
As a developer using cuga for diverse tasks (git releases, deployments, presentations, code reviews, etc.), I want cuga to automatically discover relevant skill files in my project (
.cuga/skills/) or globally (~/.config/cuga/skills/), surface them to the agent only when relevant, and inject full instructions only when the agent actually needs them — keeping every interaction fast, focused, and token-efficient.Proposed Solution
A complete end-to-end skills flow:
Step 1: Skill Discovery
Before each session, cuga scans project-local (
.cuga/skills/) and global (~/.config/cuga/skills/) directories forSKILL.mdfiles. It reads the YAML frontmatter (name,description) of each file and injects a lightweight<available_skills>list into the agent's system prompt.Step 2: Agent Decision & Tool Call
The agent analyzes the user's prompt and, if it matches a skill description, emits a tool call to load the full skill:
Step 3: Permission Gate
Each skill can define a permission policy (e.g.,
ask,auto,deny). If set toask, cuga pauses and requests human approval before loading or executing. This is the primary safety layer for sensitive skills.Step 4: Context Injection
If permitted, cuga reads the full Markdown body of the matched
SKILL.mdand injects it into the agent's context. This file contains the heavy-lifting instructions: specific rules, workflow constraints, architecture guidelines, or exact shell commands.Step 5: Execution
The agent executes the skill using its standard tools (
bash,write,edit). These tools remain governed by their own permission settings (e.g.,bashwithaskmode still requires approval per command).The PPTX skill is a great first test because it requires all three execution modalities:
python -m markitdown presentation.pptx,python scripts/thumbnail.pynpm install -g pptxgenjsfor creating slides from scratchsoffice --headless --convert-to pdf,pdftoppmfor renderingStep 6: Sandbox Integration
Skill execution should respect the configured sandbox environment:
entersh-style script): skill bash commands run inside the container, isolating them from the host OSAlternatives Considered
Priority
High - Important for my workflow
Implementation Complexity (if known)
Complex - Significant development effort
Additional Context
This pattern is directly inspired by the Cursor Agent Skills system, which uses a similar
SKILL.md+ on-demand injection model. The key differentiator for cuga is the permission gate (Step 3) and first-class sandbox integration (Step 6), making it safer for agentic use in production and CI environments.The Anthropic PPTX skill is the reference implementation to validate against. It is representative of real-world skill complexity: multi-step QA loops, subagent delegation, cross-tool dependencies (Python + Node + system binaries), and a rich set of design guidelines injected as context.
Skill file example (
.cuga/skills/pptx/SKILL.mdfrontmatter):Checklist