Skip to content

KanishkNavale/sqlhund

Repository files navigation

sqlhund

Rust-powered auditable SQL injection detection for Python 🐍. Built for AI agents ✨.

PyPI Python versions License CI CodeQL Downloads


AI agents now write and execute SQL directly. Text-to-SQL pipelines, MCP database connectors, LangChain SQL toolkits, and autonomous data analysts all generate queries on the fly. That creates a new attack surface: an LLM that can be prompted, jailbroken, or simply confused into generating '; DROP TABLE users; -- and then running it.

sqlhund is a runtime guardrail for that. It detects SQL injection patterns and classifies them by CWE and CAPEC. The detection engine is written in Rust with Python bindings via PyO3, so it can sit in an agent's hot path without adding latency. Every match is classified across two axes: technique (HOW the injection works) and impact (WHAT it can do). You get structured, auditable threat intelligence instead of a bare boolean.

>>> import sqlhund
>>> sqlhund.is_query_malicious("SELECT * FROM users WHERE id = 1")
False
>>> sqlhund.is_query_malicious("' OR 1=1 --")
True

Note

The primary goal is to prevent AI agents from manipulating data in or the structure of your database.

Contents

Installation

pip install sqlhund  # pip
poetry add sqlhund   # poetry
uv add sqlhund       # uv

Requires Python 3.10+. No runtime dependencies.

Quick Start

sqlhund exposes two functions. That's the entire API.

import sqlhund

# Boolean check
sqlhund.is_query_malicious("SELECT * FROM users; DROP TABLE users")
# True

# Detailed analysis with CWE/CAPEC classification
sqlhund.analyze_query("SELECT * FROM users; DROP TABLE users")
# {
#     'is_malicious': True,
#     'matches': {
#         'general': [{
#             'technique': ['CWE-89'],           # HOW: SQL Injection
#             'impact': ['CWE-285', 'CWE-471'],  # WHAT: Auth bypass + data tampering
#             'capec': [66]                      # CAPEC-66: SQL Injection
#         }]
#     }
# }

# Safe queries pass through cleanly
sqlhund.analyze_query("SELECT id FROM users WHERE id = 1")
# {'is_malicious': False, 'matches': {}}

See Detailed Threat Analysis below for the file-operation and multi-database detection cases.

Usage Examples

Guarding an LLM Agent's SQL Tool Calls

from langchain.tools import tool
import sqlhund

@tool
def execute_sql(query: str) -> str:
    """Execute a SQL query against the database. Rejects malicious queries."""
    if sqlhund.is_query_malicious(query):
        return "Query rejected: potential SQL injection detected."
    return database.execute(query)

sqlhund sits between the LLM's output and your database. The agent never reaches the database if the query is malicious, no matter how the prompt was crafted.

Validating AI-Generated SQL

import sqlhund

def execute_ai_query(query: str):
    """Execute AI-generated SQL with injection protection."""
    if sqlhund.is_query_malicious(query):
        raise ValueError("Potential SQL injection detected")

    # Safe to execute
    return database.execute(query)

Detailed Threat Analysis

result = sqlhund.analyze_query("SELECT * FROM users WHERE id = 1 OR 1=1")

if result['is_malicious']:
    for db_name, patterns in result['matches'].items():
        print(f"Database: {db_name}")
        for pattern in patterns:
            print(f"  Technique: {pattern['technique']}")  # CWE-89
            print(f"  Impact: {pattern['impact']}")        # CWE-285
            print(f"  CAPEC: {pattern['capec']}")          # 66

File-operation attacks are detected too, scoped to the database they target:

sqlhund.analyze_query("SELECT load_extension('evil')")
# {
#     'is_malicious': True,
#     'matches': {
#         'sqlite': [{
#             'technique': ['CWE-89', 'CWE-610', 'CWE-114'],
#             'impact': ['CWE-200', 'CWE-285'],
#             'capec': [470]
#         }]
#     }
# }

Pre-screening User Input

def sanitize_search_query(user_input: str) -> str:
    """Validate search input before building SQL."""
    test_query = f"SELECT * FROM products WHERE name LIKE '%{user_input}%'"

    if sqlhund.is_query_malicious(test_query):
        raise ValueError("Invalid search term")

    return user_input

Features

  • Fast: core detection engine written in Rust, compiled to a native Python extension
  • Accurate: 100% precision and recall on a 10M+ query benchmark (zero false positives, zero false negatives)
  • Multi-database: detects injection patterns targeting SQLite, PostgreSQL, and DuckDB
  • Zero dependencies: ships as a self-contained native wheel
  • AI-agent ready: built as a guardrail for LLM-generated SQL
  • Security classification: maps detected patterns to CWE and CAPEC taxonomies for threat intelligence

How sqlhund Compares

Tool What it does Where it fits
sqlhund Pattern detection plus CWE/CAPEC classification, Python-native Runtime guardrail for AI-generated or agent-relayed SQL
libinjection C library, pattern-based SQLi/XSS detection Closest classic analog. No native Python bindings, no CWE/CAPEC mapping
sqlmap Active penetration-testing scanner Offensive testing, not a runtime guard
Parameterized queries Prevents injection at query-construction time The right long-term fix, but it doesn't help when an LLM generates SQL whose structure isn't known ahead of time

Note

sqlhund doesn't replace parameterized queries. It covers the case parameterization can't: SQL whose structure is generated dynamically by a model.

Security Classification

sqlhund classifies detected patterns using industry-standard security frameworks.

Dual-Axis CWE Analysis

Every detected pattern is analyzed across two independent axes:

  • Technique (HOW): CWE identifiers describing the injection mechanism

    • CWE-89: SQL Injection
    • CWE-610: External Resource Reference (file operations)
    • CWE-94/95: Code/Eval Injection
    • CWE-77/78: Command/OS Command Injection
    • CWE-114: Process Control (loading untrusted libraries)
    • CWE-116/184: Encoding evasion and filter bypass
  • Impact (WHAT): CWE identifiers describing the attack consequences

    • CWE-200: Information Disclosure
    • CWE-285: Authorization Bypass
    • CWE-269: Privilege Escalation
    • CWE-471: Data Tampering
    • CWE-400: Resource Exhaustion (DoS)
    • CWE-208: Timing Side-Channel (blind injection)
    • CWE-497: System Information Exposure

CAPEC Attack Patterns

Matches are also mapped to CAPEC attack pattern IDs:

  • CAPEC-66: SQL Injection
  • CAPEC-7: Blind SQL Injection
  • CAPEC-54: Query System for Information
  • CAPEC-470: Expanding Control over the OS from the Database
  • CAPEC-664: Server-Side Request Forgery

OWASP Alignment

sqlhund detects patterns from OWASP Top 10 A03:2021 - Injection, covering:

  • SQL Injection (CWE-89)
  • Command Injection (CWE-77, CWE-78)
  • Code Injection (CWE-94, CWE-95)
  • File/Resource Injection (CWE-610)

Resources:

Supported Databases

sqlhund detects database-specific injection patterns for:

Database Detection Patterns
General UNION, comments, tautologies, subqueries, time delays
SQLite load_extension, ATTACH, PRAGMA, virtual tables
PostgreSQL pg_read_file, COPY, DO blocks, dblink, extensions
DuckDB read_csv, ATTACH, httpfs, CREATE SECRET, macros

Benchmarks

Evaluated against the RbSQLi dataset: 10,304,026 labeled SQL queries (2,813,146 malicious, 7,490,880 benign).

Predicted Malicious Predicted Benign
Actual Malicious 2,813,146 0
Actual Benign 0 7,490,880

Precision: 100% · Recall: 100% · Accuracy: 100%

Building from Source

Requires Rust, Maturin, and uv.

git clone https://github.com/KanishkNavale/sqlhund
cd sqlhund
make dev         # set up development environment
make build       # compile debug build
make release     # compile optimized release build

Testing

Run unit tests (Rust + Python):

make unittest

Run evaluation against the full RbSQLi dataset (download the dataset, place it at tests/data/wild.csv):

make wildtest

Contributing

Contributions are welcome. See the open issues or submit a pull request.

License

The MIT License licenses this project.

About

A Rust library with Python bindings for detecting SQL injection patterns in input strings. Built for speed and designed especially for AI agents that process or generate SQL queries.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors