Skip to content

mrzasad/prompt-shield-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

SecureGate: AI Input/Output Guardrail Layer

SecureGate is an open-source, dual-layer security gateway built to protect Large Language Models (LLMs) from malicious prompts and sensitive data exfiltration. Powered by Streamlit and Anthropic's Claude, it intercepts user inputs and model outputs in real time to enforce strict safety boundaries.

🏗️ Architecture & Data Flow

User Input
    │
    ├─► [Layer 1] Regex Engine  ──►  30+ curated patterns
    │         Severity: CRITICAL / HIGH / MEDIUM
    │
    ├─► [Layer 2] LLM Classifier  ──►  Claude as Judge
    │         Returns: threat bool, category, confidence, reason
    │
    ▼
Combined Verdict: BLOCK | WARN | PASS
    │
    ├─► PASS  ─►  Downstream LLM (Safe system prompt)
    │                 │
    │                 ▼
    │            Output Scanned (Same 2 layers)
    │
    └─► BLOCK ─►  Request suppressed + Audit logged

🛡️ Threat Categories Covered

SecureGate evaluates traffic against specific security risks:

  • Prompt Injection: Intentional system overrides (e.g., "Ignore all previous instructions...").
  • Jailbreak: Roleplay exploits and safety filter bypasses (e.g., DAN attacks).
  • DB/Log Exfiltration: SQL injections and connection string leaks (e.g., SELECT * FROM).
  • Secret Probing: Accidental or malicious exposure of API keys, passwords, and tokens.
  • Encoded Payloads: Obfuscated attacks using Base64 blobs, eval(), or exec().
  • Output Leaks: System instruction disclosure or raw database responses in the final output.

🚀 Quick Start

1. Clone the Repository

git clone https://github.com
cd SecureGate

2. Install Dependencies

pip install streamlit anthropic

3. Run the Application

streamlit run security_guardrail_app.py

⚙️ How to Use

  1. Launch the app in your browser (typically http://localhost:8501).
  2. Input your Anthropic API Key into the secure sidebar field.
  3. Navigate the four operational tabs:
    • Dashboard / Architecture: View real-time pipeline visualization.
    • Threat Tester: Validate the engine instantly using 9 preset attack payloads (including benign baselines) to test layers in isolation.
    • Live Sandbox: Test your own custom prompt attacks and view the bidirectional scanning logs.
    • Audit Logs: Inspect suppressed blocks, classification confidence levels, and mitigation reasons.
Screenshot 2026-05-18 213944 Screenshot 2026-05-18 212731 Screenshot 2026-05-18 214303 Screenshot 2026-05-18 214239

About

This llm guardrail is an open-source, dual-layer AI input/output guardrail application designed to secure downstream Large Language Models (LLMs) against malicious attacks and data leaks. Built with Streamlit and Anthropic, the application actively intercepts both user inputs and model responses to ensure safe and compliant interactions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages