Skip to content

Neural-alchemy/promptshield

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

68 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PromptShields

⚠️This project is evolving rapidly occasional instability is expected

Secure AI Applications in 3 Lines of Code

PyPI Python License Downloads

Stop prompt injection, jailbreaks, and data leaks in production LLM applications.


Installation

pip install promptshields

Quick Start

from promptshield import Shield

shield = Shield.balanced()
result = shield.protect_input(user_input, system_prompt)

if result['blocked']:
    print(f"Blocked: {result['reason']} (score: {result['threat_level']:.2f})")
    print(f"Breakdown: {result['threat_breakdown']}")

That's it. Production-ready security in 3 lines.


Why PromptShields?

Feature PromptShields DIY Regex Paid APIs
Setup Time 3 minutes Weeks Days
Cost Free Free $$$$
Privacy 100% Local Local Cloud
F1 Score 0.97 (RF) / 0.96 (DeBERTa) ~0.60 ~0.95
ML Models 4 + DeBERTa None Black box
Async βœ… Native DIY Varies

What We Block

  • πŸ›‘οΈ Prompt injection attacks (direct + indirect)
  • 🎭 Jailbreak attempts (DAN, persona replacement)
  • πŸ”‘ System prompt extraction
  • πŸ”’ PII leakage
  • πŸ“Š Session anomalies
  • πŸ”€ Encoded/obfuscated attacks (Base64, URL, Unicode)

Security Modes

Choose the right tier for your application:

Shield.fast()       # ~1ms  - High throughput (pattern matching only)
Shield.balanced()   # ~2ms  - Production default (patterns + session tracking)
Shield.strict()     # ~7ms  - Sensitive apps (+ 1 ML model + PII detection)
Shield.secure()     # ~12ms - Maximum security (4 ML models ensemble)

New in v2.7.0 (Streaming & Output Protection)

Streaming LLM Output Scanning

Securely wrap real-time LLM token streams to detect leaked PII or injected prompts before the full response finishes generating.

# Auto-scans generator stream chunks in real-time
for safe_chunk in shield.protect_stream(llm.stream("Summarize...")):
    print(safe_chunk, end="")

Upgraded Zero-Leakage Classical ML

The random_forest, logistic_regression, linear_svc, and gradient_boosting ensemble models were retrained from scratch on a curated 14K dataset to completely eliminate "data leakage" false positives (e.g., blocking benign math/conversion queries). Combine with DeBERTa via Shield.balanced() for optimal F1 performance.


New in v2.6.0 (Developer Experience)

YAML Configuration

Launch shields declaratively without changing application code.

shield = Shield.from_config("promptshield.yml")

Slack / Teams Webhooks

Instantly trigger webhooks whenever high-severity threats are blocked natively.

shield = Shield.balanced(webhook_url="https://hooks.slack.com/...")

New in v2.5.0

Per-Layer Threat Breakdown

Every response now shows exactly which layer triggered:

result = shield.protect_input(user_text, system_prompt)
print(result["threat_breakdown"])
# {"pattern_score": 0.0, "ml_score": 0.994, "session_score": 0.0}

DeBERTa Support

shield = Shield(models=["deberta"])  # Auto-downloads from HuggingFace

Async Support

from promptshield import AsyncShield

shield = AsyncShield.balanced()
result = await shield.aprotect_input(user_text, system_prompt)

FastAPI Middleware

from promptshield import Shield
from promptshield.integrations.fastapi import PromptShieldMiddleware

app.add_middleware(PromptShieldMiddleware, shield=Shield.balanced())

Allowlist & Custom Rules

shield = Shield(
    patterns=True,
    models=["random_forest"],
    allowlist=["summarize this document", "translate to french"],
    custom_patterns=[r"jailbreak|dan mode|evil\s*bot"],
)

Benchmark Results

Trained on neuralchemy/Prompt-injection-dataset:

Model F1 ROC-AUC FPR Latency
Random Forest 0.969 0.994 6.9% <1ms
Logistic Regression 0.964 0.995 6.4% <1ms
Gradient Boosting 0.961 0.994 7.9% <1ms
LinearSVC 0.959 0.995 10.3% <1ms
DeBERTa-v3-small 0.959 0.950 8.5% ~50ms

Pre-trained models: neuralchemy/prompt-injection-detector Β· neuralchemy/prompt-injection-deberta


Documentation

πŸ“– Full Documentation β€” Complete guide with framework integrations

πŸš€ Quickstart Guide β€” Get running in 5 minutes


License

MIT License β€” see LICENSE


Built by NeurAlchemy β€” AI Security & LLM Safety Research

Packages

 
 
 

Contributors

Languages