Skip to content

[Feedback]: llmstxt-to-skill generates skeletal skills requiring hours of manual work #5

@pgesiak

Description

@pgesiak

Feedback Category

Developer Experience

Summary

The llmstxt-to-skill plugin generates skeletal skills requiring 2-3 hours of manual work to make production-ready. It needs to auto-generate content, update infrastructure, and maintenance tooling.

Current State

What's Generated:

skills/openclaw/
├── SKILL.md              # Lists 241 references, no content
└── references/
    ├── llms.txt          # Manifest
    └── *.md              # 241 downloaded docs

SKILL.md example:

---
name: openclaw
description: OpenClaw documentation and reference. Use when asking about this documentation.
---

# OpenClaw

This skill provides access to OpenClaw documentation with 241 reference documents.

## Reference Documents
- [auth-monitoring](references/auth-monitoring.md): Auth Monitoring
- [cron-jobs](references/cron-jobs.md): Cron Jobs
[... 239 more lines ...]

Problems

Can't answer questions - No actual content, just file lists
Generic triggers - "OpenClaw documentation" won't trigger on real queries
No updates - No mechanism to sync with upstream
Wrong naming - Should be "openclaw-docs"
No tooling - No scripts, validation, or maintenance commands

User Impact

After running /llmstxt-to-skill https://docs.openclaw.ai/llms.txt, spent 2-3 hours:

  1. Extracting key content from 241 docs → SKILL.md
  2. Writing update scripts from scratch (9.2KB)
  3. Building index generators (1.5KB)
  4. Adding frontmatter to track staleness
  5. Creating protection logic for SKILL-* files
  6. Documenting update process

All of this should be automatic.


8 Required Improvements

1. Generate Actual Content (Not Lists)

Current:

## Reference Documents
- [file1.md](references/file1.md): Title 1
- [file2.md](references/file2.md): Title 2

Should Generate:

## Installation (macOS)

Three methods:

1. **Curl (Recommended):**
   ```bash
   curl -fsSL https://openclaw.ai/install.sh | bash
   openclaw onboard --install-daemon
  1. npm:

    npm install -g openclaw@latest
  2. From source:

    git clone && pnpm install && pnpm build

For detailed troubleshooting: SKILL-INSTALLATION.md

Core Concepts

  • Gateway: WebSocket server managing channels (WhatsApp, Telegram, Discord)
  • Agent: Isolated AI instance with workspace and sessions
  • Workspace: ~/.openclaw/workspace with AGENTS.md, SOUL.md, memory/
  • Skills: Extensible tools loaded from workspace/skills/

[... 800 more words of useful content ...]


**Target:** 1,200-2,000 words of actionable content extracted from key docs.

**Implementation approach:**
```typescript
async function generateSkillContent(llmstxt: ParsedLlmsTxt): string {
  // 1. Identify key docs
  const keyDocs = {
    installation: findDocs(llmstxt, ['setup', 'install', 'getting-started']),
    architecture: findDocs(llmstxt, ['architecture', 'overview', 'concepts']),
    configuration: findDocs(llmstxt, ['config', 'configuration']),
  };

  // 2. Fetch and summarize
  const sections = await Promise.all([
    summarizeInstallation(keyDocs.installation),
    summarizeArchitecture(keyDocs.architecture),
    summarizeConfiguration(keyDocs.configuration),
  ]);

  // 3. Build SKILL.md with content
  return buildSkillMd({
    quickStart: sections[0],
    coreConcepts: sections[1],
    configuration: sections[2],
    references: organizeReferences(llmstxt),
  });
}

2. Concrete Trigger Terms (Not Meta-Descriptions)

Current:

description: OpenClaw documentation and reference. Use when asking about this documentation.

Should Be:

description: OpenClaw personal assistant gateway WebSocket architecture agent runtime workspace configuration memory system skills loading AGENTS.md SOUL.md USER.md IDENTITY.md BOOT.md session management channel routing macOS setup permissions TCC iMessage WhatsApp Telegram Discord Signal Slack CLI commands hooks system browser automation vector search pairing authentication

Rule: Extract concrete terms from doc titles, skip meta-words like "documentation", "reference", "guide".

function generateTriggerDescription(llmstxt: ParsedLlmsTxt): string {
  const terms = llmstxt.entries
    .flatMap(entry => extractKeywords(entry.title))
    .filter(term => !isMetaWord(term)); // Skip "documentation", "guide", etc.

  return terms.slice(0, 80).join(' ');
}

function isMetaWord(word: string): boolean {
  const meta = ['documentation', 'reference', 'guide', 'docs', 'manual'];
  return meta.includes(word.toLowerCase());
}

3. Auto-Generate Update Scripts

Should Create: scripts/update-docs.sh

#!/bin/bash
# Auto-generated by llmstxt-to-skill
# Source: https://docs.openclaw.ai/llms.txt

REFS_DIR="references"
LLMS_URL="https://docs.openclaw.ai/llms.txt"

# Download fresh manifest
curl -fsSL "$LLMS_URL" -o /tmp/llms-new.txt

# Compare manifests
if diff "$REFS_DIR/llms.txt" /tmp/llms-new.txt > /dev/null; then
    echo "✓ No changes"
    exit 0
fi

# Process added entries
comm -13 <(sort "$REFS_DIR/llms.txt") <(sort /tmp/llms-new.txt) | \
while IFS= read -r line; do
    url=$(echo "$line" | sed -E 's/.*\((https:\/\/[^)]+)\).*/\1/')
    filename=$(basename "$url")

    # Skip SKILL-* files (never overwrite)
    [[ "$filename" =~ ^SKILL- ]] && continue

    # Download with metadata
    download_with_metadata "$url" "$REFS_DIR/$filename"
    echo "  ✓ Downloaded: $filename"
done

# Update manifest
mv /tmp/llms-new.txt "$REFS_DIR/llms.txt"
echo "✓ Updated manifest"

Critical: Protect SKILL-* files from overwrites.


4. Fix Naming (Add "-docs" Suffix)

Current: https://docs.openclaw.ai/llms.txt → skill name: openclaw

Should Be: https://docs.openclaw.ai/llms.txt → skill name: openclaw-docs

function generateSkillName(llmsTxtUrl: string): string {
  const url = new URL(llmsTxtUrl);
  const hostname = url.hostname;
  const pathname = url.pathname;

  const baseName = extractBaseName(hostname);

  // Detect documentation patterns
  const isDocsSite =
    hostname.startsWith('docs.') ||
    hostname.startsWith('developer.') ||
    hostname.includes('.readthedocs.io') ||
    hostname.includes('.github.io') ||
    pathname.includes('/docs/') ||
    pathname.includes('/documentation/');

  return isDocsSite ? `${baseName}-docs` : baseName;
}

5. Auto-Compact Large Skills

If SKILL.md >2,500 words, prompt:

⚠️  SKILL.md is 4,231 words (recommended: <2,500)
Auto-compact for better performance? (y/N)

If yes:

  • Extract sections >400 words into SKILL-*.md files
  • Replace with summaries + links
  • Result: SKILL.md = 1,200 words, extracted 4 files

6. Generate Usage Documentation

Create: references/README.md

# OpenClaw Docs Skill

Generated from: https://docs.openclaw.ai/llms.txt

## Directory Structure

- `SKILL.md` - Main file (auto-loads)
- `scripts/update-docs.sh` - Update from source
- `references/SKILL-*.md` - Extracted sections (manual edit)
- `references/*.md` - Downloaded docs (auto-update)

## Updating Documentation

```bash
./scripts/update-docs.sh

File Types

  1. SKILL-*.md - Never auto-updated (skill-generated)
  2. *.md - Auto-updatable from source

Maintenance

Check for updates: ./scripts/update-docs.sh --check


---

### 7. Organize References by Topic (Not Alphabetically)

**Current:**
```markdown
## Reference Documents (Alphabetical)
- [agent.md](references/agent.md)
- [architecture.md](references/architecture.md)
- [auth.md](references/auth.md)

Should Be:

## Reference Documentation

### Getting Started
- [SKILL-INSTALLATION.md](references/SKILL-INSTALLATION.md) - Complete walkthrough
- [setup.md](references/setup.md) - Initial setup
- [onboarding.md](references/onboarding.md) - Onboarding wizard

### Architecture
- [SKILL-ARCHITECTURE.md](references/SKILL-ARCHITECTURE.md) - System overview
- [gateway.md](references/gateway.md) - Gateway details
- [agent.md](references/agent.md) - Agent system

### Channels
- [whatsapp.md](references/whatsapp.md) - WhatsApp integration
- [telegram.md](references/telegram.md) - Telegram setup
- [discord.md](references/discord.md) - Discord integration
function organizeReferences(llmstxt: ParsedLlmsTxt): ReferencesByCategory {
  const categories = {
    'Getting Started': ['setup', 'install', 'getting-started', 'onboard'],
    'Architecture': ['architecture', 'gateway', 'agent', 'session'],
    'Channels': ['whatsapp', 'telegram', 'discord', 'slack'],
    'Configuration': ['config', 'environment', 'settings'],
  };

  return categorizeEntries(llmstxt.entries, categories);
}

8. Add Validation Tooling

Create: scripts/validate-skill.sh

#!/bin/bash

echo "→ Validating skill..."

# Check SKILL.md size
words=$(wc -w < SKILL.md)
if [ "$words" -gt 3500 ]; then
    echo "⚠️  SKILL.md is $words words (consider compaction)"
fi

# Check frontmatter
if ! grep -q "^name:" SKILL.md; then
    echo "❌ Missing frontmatter"
    exit 1
fi

# Check reference count
refs=$(find references -name "*.md" | wc -l)
manifest=$(grep -c "^- \[" references/llms.txt)
echo "✓ References: $refs files ($manifest in manifest)"

# Check stale docs (>30 days old)
stale=$(find references -name "*.md" ! -name "SKILL-*" -mtime +30 | wc -l)
[ "$stale" -gt 0 ] && echo "⚠️  $stale docs >30 days old"

echo "✓ Skill health: OK"

Implementation Phases

Phase 1: Critical (Do First)

  1. ✅ Content generation - Extract key docs into SKILL.md
  2. ✅ Concrete triggers - Build from topic keywords
  3. ✅ Fix naming - Add "-docs" suffix

Phase 2: Essential (Next)

  1. ✅ Update scripts - Generate automatically
  2. ✅ Usage docs - Create README
  3. ✅ Organize refs - Group by topic

Phase 3: Nice-to-Have

  1. ✅ Auto-compact - Prompt if >2500 words
  2. ✅ Validation - Generate health check script

Expected Impact

Before:

  • Skill generated: 30 seconds
  • Manual work: 2-3 hours
  • No update mechanism
  • Stale over time

After:

  • Skill generated: 45 seconds (complete)
  • Manual work: 10 minutes (90% reduction)
  • One-command updates
  • Stays fresh

Testing Checklist

Test with:

  • Small site (~20 pages)
  • Medium site (~50 pages)
  • Large site (>200 pages)

Verify:

  • SKILL.md has content (not lists)
  • Trigger uses concrete terms
  • Name has "-docs" suffix (if docs site)
  • Update scripts generated & executable
  • Large skills auto-compact or prompt
  • References organized by topic
  • README and validation present

Questions for Developers

  1. LLM summarization? Use LLM to extract content or parse/template?
  2. Compaction threshold? Auto at 2500 words or only prompt?
  3. Update frequency? Should scripts check periodically (cron)?
  4. Version tracking? Track llms.txt hash for smarter updates?
  5. Multi-source? Support combining multiple llms.txt files?

Reported: 2026-02-01
Plugin: llmstxt v1.0.0
Reference: OpenClaw docs skill (manual implementation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    feedbackGeneral feedback or suggestionsplugin:llmstxtllmstxt pluginpriority:highHigh priority, needs immediate attentiontriageNeeds initial review and categorization

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions