Skip to content

feat: add /ai-safety skill — AI safety auditor for LLM integrations#173

Open
HMAKT99 wants to merge 1 commit intogarrytan:mainfrom
HMAKT99:arun/ai-safety-skill
Open

feat: add /ai-safety skill — AI safety auditor for LLM integrations#173
HMAKT99 wants to merge 1 commit intogarrytan:mainfrom
HMAKT99:arun/ai-safety-skill

Conversation

@HMAKT99
Copy link

@HMAKT99 HMAKT99 commented Mar 18, 2026

/cso finds unlocked doors. /ai-safety finds the ones only AI can open.

Traditional security (/cso) catches SQL injection, missing auth, hardcoded secrets. But when your product uses LLMs, there are new attack surfaces: prompt injection that leaks your system prompt, user data sent to third-party APIs without stripping PII, model output plugged into SQL queries without validation, biased AI making hiring decisions.

What /ai-safety does

You:   /ai-safety

Claude: AI SAFETY SCORECARD
        ═════════════════════
        Category               Score   Grade   Finding
        Injection resistance   2/5     D       User input in system prompt via interpolation
        PII handling           2/5     D       user.email sent to Claude API unstripped
        Output validation      1/5     F ←     classify.py output used in SQL query raw
        Content safety         3/5     C+      Basic keyword filter, no adversarial testing
        Bias mitigation        2/5     D       Resume ranker — no fairness testing
        Compliance             1/5     F ←     No EU AI Act measures

        OVERALL: D (35%)

        CRITICAL:
        [1] classify.py:88 — Model output used in SQL query construction
            Attack: Model hallucinates SQL payload → SQL injection
            Fix: Validate output against enum allowlist

How it relates to /cso

/cso audits for OWASP Top 10 (traditional web security). /ai-safety audits for AI-specific attack surfaces that OWASP doesn't cover. They're complementary:

/cso         → traditional security (auth, injection, secrets)
/ai-safety   → AI security (prompt injection, PII to APIs, output validation)    ← NEW

Only .tmpl committed — bun run gen:skill-docs generates the rest.

Test plan

  • .tmpl follows template pipeline — uses {{PREAMBLE}}
  • Registered in gen-skill-docs.ts, skill-check.ts, both test files
  • bun run gen:skill-docs generates valid SKILL.md
  • All existing tests pass with skill added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant