Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 18 additions & 19 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

125 changes: 85 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@

---

**sqlhund** detects SQL injection patterns with CWE and CAPEC classification. Written in Rust with Python bindings via [PyO3](https://pyo3.rs/), it provides security analysis for AI agents that generate or relay SQL queries, preventing database manipulation through dual-axis threat intelligence (technique vs impact).
AI agents now write and execute SQL directly. Text-to-SQL pipelines, MCP database connectors, LangChain SQL toolkits, and autonomous data analysts all generate queries on the fly. That creates a new attack surface: an LLM that can be prompted, jailbroken, or simply confused into generating `'; DROP TABLE users; --` and then *running* it.

**sqlhund** is a runtime guardrail for that. It detects SQL injection patterns and classifies them by CWE and CAPEC. The detection engine is written in Rust with Python bindings via [PyO3](https://pyo3.rs/), so it can sit in an agent's hot path without adding latency. Every match is classified across two axes: technique (HOW the injection works) and impact (WHAT it can do). You get structured, auditable threat intelligence instead of a bare boolean.

```python
>>> import sqlhund
Expand All @@ -29,6 +31,20 @@ True
>
> The primary goal is to prevent AI agents from manipulating data in or the structure of your database.

## Contents

- [Installation](#installation)
- [Quick Start](#quick-start)
- [Usage Examples](#usage-examples)
- [Features](#features)
- [How sqlhund Compares](#how-sqlhund-compares)
- [Security Classification](#security-classification)
- [Supported Databases](#supported-databases)
- [Benchmarks](#benchmarks)
- [Building from Source](#building-from-source)
- [Testing](#testing)
- [Contributing](#contributing)

## Installation

```bash
Expand All @@ -41,42 +57,25 @@ Requires Python 3.10+. No runtime dependencies.

## Quick Start

sqlhund exposes two functions — that's the entire API:
sqlhund exposes two functions. That's the entire API.

```python
import sqlhund

# Simple boolean check
# Boolean check
sqlhund.is_query_malicious("SELECT * FROM users; DROP TABLE users")
# True

# Detailed analysis with security classification
result = sqlhund.analyze_query("SELECT * FROM users; DROP TABLE users")
# {
# 'is_malicious': True,
# 'matches': {
# 'general': [
# {
# 'technique': ['CWE-89'], # HOW: SQL Injection
# 'impact': ['CWE-285', 'CWE-471'], # WHAT: Auth bypass + data tampering
# 'capec': [66] # CAPEC-66: SQL Injection
# }
# ]
# }
# }

# File operation attacks detected
result = sqlhund.analyze_query("SELECT load_extension('evil')")
# Detailed analysis with CWE/CAPEC classification
sqlhund.analyze_query("SELECT * FROM users; DROP TABLE users")
# {
# 'is_malicious': True,
# 'matches': {
# 'sqlite': [
# {
# 'technique': ['CWE-89', 'CWE-610', 'CWE-114'],
# 'impact': ['CWE-200', 'CWE-285'],
# 'capec': [470]
# }
# ]
# 'general': [{
# 'technique': ['CWE-89'], # HOW: SQL Injection
# 'impact': ['CWE-285', 'CWE-471'], # WHAT: Auth bypass + data tampering
# 'capec': [66] # CAPEC-66: SQL Injection
# }]
# }
# }

Expand All @@ -85,8 +84,26 @@ sqlhund.analyze_query("SELECT id FROM users WHERE id = 1")
# {'is_malicious': False, 'matches': {}}
```

See [Detailed Threat Analysis](#detailed-threat-analysis) below for the file-operation and multi-database detection cases.

## Usage Examples

### Guarding an LLM Agent's SQL Tool Calls

```python
from langchain.tools import tool
import sqlhund

@tool
def execute_sql(query: str) -> str:
"""Execute a SQL query against the database. Rejects malicious queries."""
if sqlhund.is_query_malicious(query):
return "Query rejected: potential SQL injection detected."
return database.execute(query)
```

sqlhund sits between the LLM's output and your database. The agent never reaches the database if the query is malicious, no matter how the prompt was crafted.

### Validating AI-Generated SQL

```python
Expand Down Expand Up @@ -115,7 +132,23 @@ if result['is_malicious']:
print(f" CAPEC: {pattern['capec']}") # 66
```

### Pre-screening an User Input
File-operation attacks are detected too, scoped to the database they target:

```python
sqlhund.analyze_query("SELECT load_extension('evil')")
# {
# 'is_malicious': True,
# 'matches': {
# 'sqlite': [{
# 'technique': ['CWE-89', 'CWE-610', 'CWE-114'],
# 'impact': ['CWE-200', 'CWE-285'],
# 'capec': [470]
# }]
# }
# }
```

### Pre-screening User Input

```python
def sanitize_search_query(user_input: str) -> str:
Expand All @@ -130,20 +163,32 @@ def sanitize_search_query(user_input: str) -> str:

## Features

- **Fast**: Core detection engine written in Rust, compiled to a native Python extension
- **Fast**: core detection engine written in Rust, compiled to a native Python extension
- **Accurate**: 100% precision and recall on a 10M+ query benchmark (zero false positives, zero false negatives)
- **Multi-database**: Detects injection patterns targeting SQLite, PostgreSQL, and DuckDB
- **Zero dependencies**: Ships as a self-contained native wheel
- **AI-agent ready**: Designed as a guardrail for LLM-generated SQL
- **Security classification**: Maps detected patterns to CWE and CAPEC taxonomies for threat intelligence
- **Multi-database**: detects injection patterns targeting SQLite, PostgreSQL, and DuckDB
- **Zero dependencies**: ships as a self-contained native wheel
- **AI-agent ready**: built as a guardrail for LLM-generated SQL
- **Security classification**: maps detected patterns to CWE and CAPEC taxonomies for threat intelligence

## How sqlhund Compares

| Tool | What it does | Where it fits |
|---|---|---|
| **sqlhund** | Pattern detection plus CWE/CAPEC classification, Python-native | Runtime guardrail for AI-generated or agent-relayed SQL |
| `libinjection` | C library, pattern-based SQLi/XSS detection | Closest classic analog. No native Python bindings, no CWE/CAPEC mapping |
| `sqlmap` | Active penetration-testing scanner | Offensive testing, not a runtime guard |
| Parameterized queries | Prevents injection at query-construction time | The right long-term fix, but it doesn't help when an LLM generates SQL whose structure isn't known ahead of time |

> [!NOTE]
> sqlhund doesn't replace parameterized queries. It covers the case parameterization can't: SQL whose structure is generated dynamically by a model.

## Security Classification

sqlhund classifies detected patterns using industry-standard security frameworks:
sqlhund classifies detected patterns using industry-standard security frameworks.

### Dual-Axis CWE Analysis

Analyze each detected pattern across two independent axes:
Every detected pattern is analyzed across two independent axes:

- **Technique** (HOW): CWE identifiers describing the injection mechanism
- CWE-89: SQL Injection
Expand Down Expand Up @@ -195,14 +240,14 @@ sqlhund detects database-specific injection patterns for:

| Database | Detection Patterns |
|------------|--------------------|
| General | UNION, comments, tautologies, subqueries, time delays |
| SQLite | load_extension, ATTACH, PRAGMA, virtual tables |
| PostgreSQL | pg_read_file, COPY, DO blocks, dblink, extensions |
| DuckDB | read_csv, ATTACH, httpfs, CREATE SECRET, macros |
| General | UNION, comments, tautologies, subqueries, time delays |
| SQLite | load_extension, ATTACH, PRAGMA, virtual tables |
| PostgreSQL | pg_read_file, COPY, DO blocks, dblink, extensions |
| DuckDB | read_csv, ATTACH, httpfs, CREATE SECRET, macros |

## Benchmarks

Evaluated against the [RbSQLi dataset](https://data.mendeley.com/datasets/xz4d5zj5yw/3) 10,304,026 labeled SQL queries (2,813,146 malicious, 7,490,880 benign).
Evaluated against the [RbSQLi dataset](https://data.mendeley.com/datasets/xz4d5zj5yw/3): 10,304,026 labeled SQL queries (2,813,146 malicious, 7,490,880 benign).

| | Predicted Malicious | Predicted Benign |
|----------------------|--------------------:|-----------------:|
Expand Down
23 changes: 22 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,35 @@ authors = [{ name = "Kanishk Navale", email = "navalekanishk@gmail.com" }]
license = "MIT"
readme = "README.md"
requires-python = ">=3.10,<4.0"
keywords = ["sql", "injection", "security", "detection", "pyo3", "ai-agents"]
keywords = [
"python",
"sqlite",
"postgresql",
"owasp",
"cybersecurity",
"cwe",
"capec",
"devsecops",
"ai-agents",
"pyo3-rust-bindings",
"duckdb",
"sql-injection-prevention",
"no-llms",
"injection-detection",
"sql-guardrails",
"sql-guard",
]
classifiers = [
"Development Status :: 4 - Beta",
"Environment :: Console",
"Intended Audience :: Developers",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Operating System :: OS Independent",
"Topic :: Software Development :: Libraries :: Python Modules",
"Topic :: Security",
]

dependencies = []
Expand Down
Loading
Loading