diff --git a/techniques/SAFE-T1803/README.md b/techniques/SAFE-T1803/README.md index 255d35b0..2e9fdf61 100644 --- a/techniques/SAFE-T1803/README.md +++ b/techniques/SAFE-T1803/README.md @@ -1,72 +1,73 @@ -# SAFE-T1803 — Database Dump +# SAFE-T1803: Database Dump -**Tactics:** Collection, Exfiltration -**Technique ID:** SAFE-T1803 -**Status:** Stable (v1.0) -**First Observed:** Early 2000s (bulk database breach and SQL-based data exfiltration campaigns) [1][2][7] -**Last Updated:** 2025-11-29 -**Author:** Pratikshya Regmi - ---- - -## Summary +## Overview +**Tactic**: Collection (ATK-TA0009), Exfiltration (ATK-TA0010) +**Technique ID**: SAFE-T1803 +**Severity**: High +**First Observed**: Early 2000s (bulk database breach and SQL-based data exfiltration campaigns) [1][2][7] +**Last Updated**: 2026-04-14 +**Author**: Pratikshya Regmi +## Description A **Database Dump** attack uses Model Context Protocol (MCP) tools as the execution plane for **bulk database exfiltration**: running wide-table `SELECT` queries, invoking logical backup commands, or exporting large structured datasets from production databases, data warehouses, or analytic stores [1][2][3][4]. Classic database-theft campaigns rely on SQL injection, compromised credentials, or insider abuse to dump entire tables (for example, users, credentials, finance, telemetry) and stage them for exfiltration [1][2][6][7]; in SAFE-T1803, the same outcome is achieved when LLM agents drive over-privileged MCP tools to perform bulk exports. Rather than deploying custom exfiltration implants, adversaries (or misaligned agents) abuse existing MCP tools such as `sql_query`, `db.execute`, HTTP/cloud SDK wrappers, or file export tools. With one or a small number of tool calls, they can pull millions of rows from production tables, export them as CSV/Parquet, and stage them in files or object storage that are easy to exfiltrate. Because these tools are often designed for analytics, reporting, and maintenance, **benign and malicious usage can look very similar**, making schema-aware detection, tight scoping, and strong guardrails critical [1][3][4][8][10]. -**Why “First Observed: Early 2000s (bulk database breach and SQL-based data exfiltration campaigns)”** — Large-scale, purpose-built database theft has been publicly documented for decades, including early 2000s incidents where attackers used SQL injection and compromised DB accounts to dump entire customer and payment databases [1][2][6][7]. SAFE-T1803 generalizes these database-dump patterns to the MCP ecosystem, where LLM agents orchestrate bulk data access via tools instead of custom exfiltration malware alone. - ---- - -## ATTACK / ATLAS Mapping - -- **MITRE ATTACK** - - **T1213 – Data from Information Repositories** — adversaries access and collect data from databases and other repositories; SAFE-T1803 maps MCP-driven database dumps directly to this technique [1][2][3]. - - **T1213.006 – Databases** — explicit sub-technique for accessing database content; SAFE-T1803 focuses on **bulk, table- or schema-level** export abuse through tools [2]. - - **T1074 – Data Staged** — SAFE-T1803 typically stages large query results as files, temporary tables, or objects for subsequent exfiltration [3][4]. - - **T1041 / T1567 – Exfiltration Over C2 / Web Services (Related)** — SAFE-T1803 often feeds staged dumps into HTTP, cloud, or SaaS channels used for exfiltration [3][4][9]. - -- **MITRE D3FEND / Enterprise Mitigations** - - **M1041 – Data Loss Prevention** — schema-aware monitoring and policy enforcement at DB, storage, and egress layers to detect and block bulk exfiltration [3][8][11]. - - **M1018 – User Account Management** — least privilege for DB roles and service accounts used by MCP tools, including strong governance for production vs analytics access [3][11][12]. - +**Why "First Observed: Early 2000s"** — Large-scale, purpose-built database theft has been publicly documented for decades, including early 2000s incidents where attackers used SQL injection and compromised DB accounts to dump entire customer and payment databases [1][2][6][7]. SAFE-T1803 generalizes these database-dump patterns to the MCP ecosystem, where LLM agents orchestrate bulk data access via tools instead of custom exfiltration malware alone. + +### ATT&CK / Mitigation Mapping +- **MITRE ATT&CK** + - **T1213 — Data from Information Repositories**: adversaries access and collect data from databases and other repositories; SAFE-T1803 maps MCP-driven database dumps directly to this technique [1][2][3]. + - **T1213.006 — Databases**: explicit sub-technique for accessing database content; SAFE-T1803 focuses on **bulk, table- or schema-level** export abuse through tools [2]. + - **T1074 — Data Staged**: SAFE-T1803 typically stages large query results as files, temporary tables, or objects for subsequent exfiltration [3]. + - **T1041 / T1567 — Exfiltration Over C2 / Web Services (Related)**: SAFE-T1803 often feeds staged dumps into HTTP, cloud, or SaaS channels used for exfiltration [4][5]. +- **MITRE Enterprise Mitigations** + - **M1041 — Data Loss Prevention**: schema-aware monitoring and policy enforcement at DB, storage, and egress layers to detect and block bulk exfiltration [8]. + - **M1018 — User Account Management**: least privilege for DB roles and service accounts used by MCP tools, including strong governance for production vs analytics access [12]. - **OWASP Top‑10 for LLM Applications (2025)** - - **LLM01/LLM03/LLM05** — Prompt injection, data supply-chain issues, and excessive agency can all lead to agents triggering overly broad queries or export/backup tools that dump production databases [4][9][13]. - ---- - -## Technical Description - -A typical MCP deployment connects an LLM host to one or more MCP servers that expose tools for interacting with real systems (databases, data warehouses, cloud storage, observability, etc.). SAFE-T1803 describes how those tools can be turned into **high-bandwidth data-theft capabilities**: - -1) **Wide-Table Data Extraction.** - An attacker-controlled prompt, compromised MCP server, or misconfigured policy causes the agent to call generic query tools (`sql_query`, `db.execute`, `run_query`) with broad statements such as `SELECT * FROM users` or time-unbounded queries across large tables. Without strict row limits, column masking, or schema constraints, a single query can return sensitive data on millions of customers, credentials, or transactions [1][2][3][6]. The tool may stream rows back to the LLM host or write them to files. - -2) **Abuse of Logical Backup and Dump Utilities.** - Some MCP tools wrap logical backup utilities or managed-DB admin APIs (for example, `pg_dump`, `mysqldump`, `BACKUP DATABASE`, export endpoints). If these are exposed as general-purpose tools (“backup database”, “export table”), an LLM agent can be convinced to run full-schema or full-database dumps, often compressed and ready for exfiltration [2][3][7][10]. - -3) **Cloud Analytics and Data-Lake Export.** - In cloud environments, MCP servers may have tools that read from managed warehouses or lakes (e.g., BigQuery, Snowflake, Redshift) and then export to object storage for downstream jobs. Attackers can redirect this machinery to bulk-export high-value datasets (user profiles, telemetry, logs) into attacker-chosen buckets or paths [3][4][11][14]. - -4) **Multi-Source Aggregation for Enrichment.** - Modern analytics often join across multiple sources. An attacker can chain multiple MCP queries and exports to build a **joined, enriched dataset** (for example, users + auth logs + payment history), then stage it as a single, exfil-ready artifact. - -5) **Staging and Exfiltration Chaining.** - Database dumps produced via MCP usually do not leave the environment immediately. Instead, they are staged as: - - Large tool results visible in agent logs or output panes. - - Files written to local disks, shared volumes, or object storage. - - Temporary tables or internal datasets in analytics platforms. - - Follow-on exfiltration techniques then move these staged artifacts out of the environment, often through HTTP, cloud APIs, or developer tooling [3][4][9][11]. - -**Stealth and Abuse of “Analytics” Semantics.** -Dangerous actions can be triggered by innocent-sounding prompts (“pull everything so we don’t miss edge cases”, “create a full backup for safety”, “export all historic data for churn analysis”), or by poisoned/compromised MCP servers that reinterpret benign arguments as bulk exports. Because these operations resemble legitimate analytics and reporting, robust **policy enforcement, role scoping, and volume-aware monitoring** are required to distinguish normal use from SAFE-T1803 [3][4][8][9][11]. - ---- - -## Architecture Diagram + - **LLM01 / LLM03 / LLM05** — prompt injection, data supply-chain issues, and excessive agency can all lead to agents triggering overly broad queries or export/backup tools that dump production databases [9][13]. + +## Attack Vectors +- **Primary Vector**: Wide-table or unbounded `SELECT` statements executed through generic MCP query tools (`sql_query`, `db.execute`, `run_query`) returning bulk rows to the LLM host or to disk. +- **Secondary Vectors**: + - Logical backup / dump utility wrappers exposed as MCP tools (e.g., `pg_dump`, `mysqldump`, `BACKUP DATABASE`, managed-DB export APIs). + - Cloud analytics and data-lake export tools (BigQuery, Snowflake, Redshift) redirected to bulk-export high-value datasets to attacker-chosen object storage. + - Multi-source aggregation chains that join production tables with logs/billing into a single enriched, exfil-ready dataset. + - Staging via files, temporary tables, or object storage that is then moved by follow-on exfiltration techniques (HTTP, cloud SDK, SaaS). + +## Technical Details + +### Prerequisites +- MCP tools that can issue arbitrary SQL or invoke backup/export utilities against high-value datastores. +- DB or service-account credentials with broad read access (production OLTP, warehouses, lakes, log stores). +- Ability to influence agent behavior via prompts, tool metadata, configuration, or compromised MCP servers [4][9][13]. +- Optional: write access to file systems or object storage for staging dumps prior to exfiltration. + +### Attack Flow +1. **Recon**: Identify which MCP servers and tools are available to the LLM (generic SQL/DB tools, warehouse/lake query tools, export/backup utilities, HTTP/cloud SDK tools, file tools). Determine where high-value data lives and which roles/credentials the MCP tools use [1][2][3][11]. +2. **Gain Control of the Agent Path**: Use prompt injection, compromised MCP servers, stolen API keys, or misconfigured auto-approval policies to influence which tools the agent calls and with what arguments [4][9][13]. +3. **Discover Schema and Sensitivity**: Leverage database metadata queries (`information_schema`, sys catalogs, warehouse information functions) to enumerate tables, columns, and approximate sensitivity (`users`, `customers`, `payments`, `tokens`, `session_logs`). Use small sampled queries to validate data value. +4. **Weaponize "Analytics" and "Backup" Semantics**: Craft instructions like "for robust analysis, first export all historic data from these tables" or "take a full backup before making changes" that cause full-table SELECTs or dump utilities to run. +5. **Execute Dump and Stage Data**: Chain or loop MCP tool calls to run full-table or wide-filter queries, invoke dump/export utilities for entire databases or warehouses, and write results to files or object storage in convenient formats (CSV, Parquet, compressed archives). +6. **Exfiltrate and Cover Tracks**: Use HTTP/cloud tools, additional agents, or external automation to move staged dumps to attacker-controlled locations. Where possible, modify or delete logs, audit tables, and job history that would expose the excessive exports [3][4][8][10][11]. + +### Example Scenario +```json +{ + "tool": "sql_query", + "args": { + "sql": "SELECT * FROM users JOIN auth_logs USING (user_id) JOIN payments USING (user_id);", + "limit": null + }, + "context": "Agent prompt: 'For robust analysis, first export all historic customer activity so we don't miss edge cases.'", + "result": { + "row_count": 3142578, + "destination": "s3://analytics-temp/exports/full_users_join.parquet" + } +} +``` +### Architecture ```mermaid flowchart LR U[User / Attacker] -->|Prompt / Task| L[LLM Host] @@ -85,7 +86,7 @@ flowchart LR style S fill:#ffdddd,stroke:#ff0000,stroke-width:1px style X fill:#ffdddd,stroke:#ff0000,stroke-width:1px - subgraph Collection & Staging + subgraph Collection_and_Staging D W S @@ -96,235 +97,134 @@ flowchart LR end ``` ---- - -## Sub‑Techniques - -**SAFE‑T1803.001 — Full-Table SELECT Dump via MCP Query Tools.** -MCP query tools (`sql_query`, `db.execute`, etc.) execute broad SELECT statements (for example, `SELECT * FROM users`) or time-unbounded queries across high-value tables, returning large result sets to the LLM host or writing them to files. - -**SAFE‑T1803.002 — Logical Backup / Dump Utility Abuse.** -Wrapper tools around database backup utilities or managed-DB export APIs (for example, `pg_dump`, `mysqldump`, `BACKUP DATABASE`, “export table”) are invoked by agents to create full logical backups of schemas, databases, or warehouses, which are then staged for exfiltration. - -**SAFE‑T1803.003 — Analytics Export / Data-Lake Extraction.** -MCP analytics tools read from warehouses/lakes and export datasets (CSV, Parquet, Avro) to object storage. Attackers repurpose these capabilities to bulk-export PII, logs, or internal telemetry to attacker-controlled or weakly monitored buckets [3][4][11][14]. - -**SAFE‑T1803.004 — Multi-Source Enriched Dump.** -Agents orchestrate multiple tools and queries to join across sources (user profiles, auth logs, billing, support tickets), then write a **single enriched dump** containing a highly valuable composite dataset for downstream exfiltration. - ---- - -## Adversary Playbook (Procedures) - -**Recon.** -Identify which MCP servers and tools are available to the LLM: generic SQL/DB tools, warehouse/lake query tools, export/backup utilities, HTTP/cloud SDK tools, and file tools. Determine where high-value data lives (production OLTP DBs, analytics warehouses, data lakes, log storage, backups) and which roles/credentials the MCP tools use [1][2][3][11]. - -**Gain Control of the Agent Path.** -Use prompt injection, compromised MCP servers, stolen API keys, or misconfigured auto-approval policies to influence which tools the agent calls and with what arguments [4][9][13]. Compromise configuration or tool metadata so agents “helpfully” choose bulk export code paths. - -**Discover Schema and Sensitivity.** -Leverage database metadata queries (for example, `information_schema`, sys catalogs, warehouse information functions) to enumerate tables, columns, and approximate sensitivity (names like `users`, `customers`, `payments`, `tokens`, `session_logs`). Use small sampled queries to validate data value and distribution. - -**Weaponize “Analytics” and “Backup” Semantics.** -Craft instructions like “for robust analysis, first export all historic data from these tables”, or “take a full backup before making changes”, that cause full-table SELECTs or dump utilities to run. Poisoned templates, dashboards, or “playbooks” can embed these semantics for future sessions. - -**Execute Dump and Stage Data.** -Chain or loop MCP tool calls to: -- Run full-table or wide-filter queries against high-value schemas. -- Invoke dump/export utilities for entire databases or warehouses. -- Write results to files or object storage in convenient formats (CSV, Parquet, compressed archives). -- Place dumps in locations accessible to exfiltration tools (for example, specific buckets, shared volumes, or HTTP-reachable paths). - -**Exfiltrate and Cover Tracks.** -Use HTTP/cloud tools, additional agents, or external automation to move staged dumps to attacker-controlled locations (e.g., external buckets, SaaS endpoints, C2). Where possible, modify or delete logs, audit tables, and job history that would expose the excessive exports, sometimes overlapping with log deletion and configuration-tampering techniques [3][4][8][10][11]. - ---- +### Sub-Techniques +- **SAFE-T1803.001 — Full-Table SELECT Dump via MCP Query Tools**: MCP query tools (`sql_query`, `db.execute`) execute broad SELECT statements (e.g., `SELECT * FROM users`) or time-unbounded queries across high-value tables, returning large result sets to the LLM host or writing them to files. +- **SAFE-T1803.002 — Logical Backup / Dump Utility Abuse**: Wrapper tools around backup utilities or managed-DB export APIs (`pg_dump`, `mysqldump`, `BACKUP DATABASE`, "export table") are invoked by agents to create full logical backups of schemas, databases, or warehouses, then staged for exfiltration. +- **SAFE-T1803.003 — Analytics Export / Data-Lake Extraction**: MCP analytics tools read from warehouses/lakes and export datasets (CSV, Parquet, Avro) to object storage. Attackers repurpose these capabilities to bulk-export PII, logs, or internal telemetry to attacker-controlled or weakly monitored buckets [3][14]. +- **SAFE-T1803.004 — Multi-Source Enriched Dump**: Agents orchestrate multiple tools and queries to join across sources (user profiles, auth logs, billing, support tickets), then write a **single enriched dump** containing a highly valuable composite dataset for downstream exfiltration. -## Detection +### Advanced Attack Techniques +**Stealth and Abuse of "Analytics" Semantics.** Dangerous actions can be triggered by innocent-sounding prompts ("pull everything so we don't miss edge cases", "create a full backup for safety", "export all historic data for churn analysis"), or by poisoned/compromised MCP servers that reinterpret benign arguments as bulk exports. Because these operations resemble legitimate analytics and reporting, robust **policy enforcement, role scoping, and volume-aware monitoring** are required to distinguish normal use from SAFE-T1803 [3][4][8][9][11]. -### Signals & Heuristics +## Impact Assessment +- **Confidentiality**: Critical — bulk disclosure of PII, credentials, payment data, and proprietary records. +- **Integrity**: Low to Medium — primary impact is data theft, but log/audit tampering may follow to hide activity. +- **Availability**: Medium — large queries and exports can saturate DB/warehouse resources or trigger throttling. +- **Scope**: Network-wide — staged dumps frequently traverse storage, network, and SaaS boundaries en route to exfiltration. -**1. High-Risk Query and Export Patterns.** -Look for MCP-originated DB activity that includes: -- `SELECT * FROM` against large or sensitive tables (e.g., `users`, `customers`, `payments`, `sessions`, `auth_logs`). -- Full-schema or full-database export operations (e.g., `pg_dump`, `mysqldump`, `BACKUP DATABASE`, warehouse export jobs) triggered from service accounts associated with MCP or LLM agents [1][2][3][6]. -- Warehouse/lake exports of entire datasets to object storage in non-standard locations. +### Current Status (2025) +Database breach and bulk-export incidents continue to dominate breach reports [7], and analyst guidance increasingly highlights MCP tool abuse as a near-term escalation path for traditional data-theft TTPs. Cloud providers and security vendors recommend schema-aware DLP, query guardrails, and tight role separation as core defenses [3][8][11][14]. -**2. Unusual Volume and Cardinality.** -Sudden spikes in: -- Returned row counts for MCP query tools (for example, a jump from typical hundreds/thousands to millions). -- Result payload sizes (tens or hundreds of MB) for tool responses. -- Warehouse export jobs or object-storage writes originating from MCP-linked credentials [3][8][11]. +## Detection Methods -**3. Sensitive Schema Focus.** -Concentration of queries on tables with PII, auth, or payment data, especially when accessed: -- Outside normal analytics or maintenance windows. -- By service accounts or agents not normally associated with such workloads [1][2][11][12]. +### Indicators of Compromise (IoCs) +- MCP tool calls containing `SELECT * FROM` against large or sensitive tables (`users`, `customers`, `payments`, `sessions`, `auth_logs`). +- Tool invocations of `pg_dump`, `mysqldump`, `BACKUP DATABASE`, or warehouse export jobs from MCP-linked service accounts [1][2][3][6]. +- Sudden spikes in returned row counts (hundreds → millions) or response payload sizes (tens to hundreds of MB) for MCP query tools. +- Warehouse export jobs or object-storage writes originating from MCP-linked credentials to new or unusual destinations [3][8][11][14]. +- Concentration of queries on PII / auth / payment tables outside normal analytics windows or by accounts not normally associated with such workloads [12]. +- Creation of new buckets, paths, or folders that suddenly receive large dumps, especially with broader-than-usual ACLs. -**4. Staging in New or Unusual Destinations.** -Creation of new buckets, paths, or folders that suddenly receive large dumps, particularly where: -- ACLs are broader than usual (public or cross-account). -- The destination is rarely or never used in baseline workloads [3][8][11][14]. - -**5. Suspicious Timing and Context.** -Database dumps shortly after: -- New MCP tools, connectors, or roles are introduced. -- Credential or configuration changes involving DB or storage access. -- Alerts for compromise or abnormal behavior in the same environment. - -**6. Anomalous Agent Behavior.** -For LLM-based agents: -- Rapid escalation from narrow analytical questions to full-table reads and exports. -- “Backup everything” or “export all historic data” prompts appearing in proximity to unusual tool usage or policy violations. +### Detection Rules +**Important**: The following rule is provided in `detection-rule.yml` and contains example patterns only. Attackers continuously develop new exfiltration patterns. Organizations should: +- Use behavioral analytics to baseline normal MCP query and export volumes per service account. +- Correlate MCP tool logs with database audit logs, warehouse query history, and object-storage access logs. +- Update detection rules based on threat intelligence and red-team findings. ### Log Sources - -- **MCP Tool Invocation Logs.** Tool name, arguments, results, timestamps, calling user/agent, MCP server ID. -- **Database Audit Logs.** Query text, normalized SQL, execution context (user, app, IP), affected tables, row counts, and backup/export operations. -- **Warehouse / Data-Lake Logs.** Query history, export job logs, destination URIs, dataset sizes. -- **Cloud Provider Logs.** Storage API logs (object writes, exports, bucket and ACL changes), cross-account sharing events. -- **Network and Proxy Logs.** Egress patterns from MCP servers, especially large outbound transfers to previously unseen endpoints. -- **Security/Monitoring Logs.** DLP events, anomaly detection signals, SIEM alerts related to DB/warehouse/export behaviors. - -### Example Analytic - +- **MCP Tool Invocation Logs.** Tool name, arguments, results, timestamps, calling user/agent, MCP server ID. +- **Database Audit Logs.** Query text, normalized SQL, execution context (user, app, IP), affected tables, row counts, and backup/export operations. +- **Warehouse / Data-Lake Logs.** Query history, export job logs, destination URIs, dataset sizes. +- **Cloud Provider Logs.** Storage API logs (object writes, exports, bucket and ACL changes), cross-account sharing events. +- **Network and Proxy Logs.** Egress patterns from MCP servers, especially large outbound transfers to previously unseen endpoints. +- **Security / Monitoring Logs.** DLP events, anomaly detection signals, SIEM alerts related to DB / warehouse / export behaviors. + +### Worked Example Detect **MCP tool invocations** that: -1. Use DB or warehouse query/export tools **AND** -2. Contain bulk-access patterns in arguments (`SELECT * FROM`, export/backup job parameters referring to entire tables or datasets) **AND/OR** +1. Use DB or warehouse query/export tools **AND** +2. Contain bulk-access patterns in arguments (`SELECT * FROM`, export/backup job parameters referring to entire tables or datasets) **AND/OR** 3. Are followed within a short window by: - - Large result sizes or row counts in MCP tool logs and/or - - Warehouse export jobs writing large datasets to storage and/or - - Object-storage writes of large files to unusual paths or buckets + - Large result sizes or row counts in MCP tool logs, and/or + - Warehouse export jobs writing large datasets to storage, and/or + - Object-storage writes of large files to unusual paths or buckets. Combine this with filters for non-maintenance time windows, non-DBA service accounts, and new/unusual destinations to prioritize likely malicious or uncontrolled incidents [1][3][8][11][14]. ---- - -## Mitigations - -Each block heading carries a single mitigation tag, following SAFE‑T100x style. - -### Data Access Governance — Mitigation: SAFE‑M‑10: Principle of Least Privilege - -- Implement strict role separation between: - - **Production OLTP vs analytics** use cases. - - **Human DBAs** vs **MCP/LLM agents**. -- Limit MCP tools to: - - Read-only access where possible. - - Narrow schemas, views, and stored procedures, rather than raw tables [1][2][3][11]. -- Enforce strong approval and review processes for any MCP tool that can access payment, auth, PII, or log/audit tables. - -### Query Guardrails & Result Limits — Mitigation: SAFE‑M‑21: Policy Enforcement & Output Isolation - -- Implement query guardrails: - - Reject or require approval for `SELECT *` on large tables. - - Enforce hard row/size limits; require explicit overrides for bulk exports. - - Prefer parameterized, pre-vetted stored procedures over arbitrary SQL [3][8][11]. -- On the LLM host, integrate policy engines to: - - Inspect proposed tool calls. - - Block or downgrade risky queries (for example, full-table scans, no `WHERE` clause). - -### Output Control & Data Loss Prevention — Mitigation: SAFE‑M‑3: Data Loss Prevention - -- Apply DLP controls that understand: - - Schemas, sensitivity labels, and column-level classifications. - - Data movement patterns out of databases, warehouses, and object storage [3][8][11]. -- Mask or tokenize sensitive columns in MCP query results by default; provide full fidelity only in strictly controlled workflows. -- Use sampling, aggregation, and summarization instead of raw row dumps where possible. - -### Logging, Telemetry & Alerting — Mitigation: SAFE‑M‑12: Centralized Logging - -- Centralize DB, warehouse, object storage, MCP tool, and network logs into SIEM or similar systems [3][8][11][12]. -- Use tamper-resistant logging (e.g., write-once or separate logging accounts) for high-value audit streams. -- Treat: - - New/unusual export destinations, - - Large full-table scans, and - - Unplanned backup/export operations - as high-severity alerts. - -### Human-in-the-Loop & Approvals — Mitigation: SAFE‑M‑20: Human Oversight - -- Require explicit human approvals for: - - Any query or export projected to exceed row/size thresholds. - - Full database or schema export jobs initiated from MCP tools. -- Present **clear, human-readable impact summaries** before approval (“This export will contain ~3.1M user records from 5 tables”). -- Log approval details (who, when, justification) for attestation and post-incident analysis. - -### Secure MCP Server & Connector Implementation — Mitigation: SAFE‑M‑16: Environment Hardening - -- Harden MCP servers and DB/warehouse connectors: - - Validate and sanitize all user-provided SQL parameters. - - Prevent prompt-driven arbitrary SQL where possible; prefer constrained query templates. - - Use secure network paths (TLS, private links) and avoid exposing DBs directly to the internet [2][3][11][14]. -- Align with OWASP LLM and GenAI security guidance, including secure design of plugins/tools and defense against prompt injection and excessive agency [4][9][13]. - ---- - -## Validation - -**Staging Only (Non-Production).** -Conduct validation in isolated environments with synthetic or non-critical data: - -- **Full-Table SELECT Simulation.** - Use MCP query tools to run wide-range queries against synthetic large tables and confirm that: - - DB and MCP logs capture row counts, query text, and tool metadata. - - Detection rules fire at expected thresholds, while normal small queries remain low-noise. - -- **Logical Dump / Backup Simulation.** - In test databases and warehouses, simulate backup/export jobs and ensure: - - DB/warehouse audit logs capture the operations. - - MCP tool logs link back to agent sessions. - - Detection rules and approvals intercept unauthorized “full backup” attempts. - -- **Analytics Export / Data-Lake Simulation.** - Use test datasets and buckets; simulate large-scale exports and confirm that cloud logs, SIEM alerts, and MCP tool logs all correlate. - -- **End-to-End Exfiltration Exercises.** - Integrate SAFE-T1803 scenarios into red team and purple team exercises, followed by exfiltration techniques, focusing on how quickly defenders can detect, stop, and investigate bulk database dumps [3][8][11][14]. - -- **Chaos and Recovery Drills.** - Combine SAFE-T1803 scenarios with resilience tests (e.g., “What happens if a full analytics dataset is dumped and exfiltrated?”). Validate that data-loss and breach-response playbooks are mature. - ---- +### Rule Validation +Validate the Sigma rule and surrounding detection pipeline in a staging environment with synthetic or non-critical data before production rollout: + +- **Full-Table SELECT Simulation.** Run wide-range queries against synthetic large tables through MCP query tools. Confirm that DB and MCP logs capture row counts, query text, and tool metadata, and that detection rules fire at expected thresholds while normal small queries remain low-noise. +- **Logical Dump / Backup Simulation.** Simulate `pg_dump` / `mysqldump` / `BACKUP DATABASE` / warehouse export jobs against test databases and ensure DB/warehouse audit logs capture the operations, MCP tool logs link back to agent sessions, and detection rules or approvals intercept unauthorized "full backup" attempts. +- **Analytics Export / Data-Lake Simulation.** Use test datasets and buckets; simulate large-scale exports and confirm cloud logs, SIEM alerts, and MCP tool logs all correlate to the same agent session. +- **End-to-End Exfiltration Exercises.** Integrate SAFE-T1803 scenarios into red-team / purple-team exercises to measure how quickly defenders can detect, stop, and investigate bulk database dumps [3][8][11][14]. +- **Chaos & Recovery Drills.** Combine SAFE-T1803 scenarios with resilience tests (e.g., "What happens if a full analytics dataset is dumped and exfiltrated?") to validate that data-loss and breach-response playbooks are mature. + +### Behavioral Indicators +- Rapid escalation from narrow analytical questions to full-table reads and exports within a single agent session. +- "Backup everything" or "export all historic data" prompts appearing in proximity to unusual tool usage or policy violations. +- Database dumps shortly after new MCP tools, connectors, or roles are introduced, or after credential/configuration changes involving DB or storage access. + +## Mitigation Strategies + +### Preventive Controls +1. **[SAFE-M-29: Explicit Privilege Boundaries](../../mitigations/SAFE-M-29/README.md)**: Enforce strict role separation between production OLTP vs analytics, and between human DBAs vs MCP/LLM agents. Limit MCP tools to read-only access on narrow schemas, views, and stored procedures rather than raw tables [1][2][3][11]. +2. **[SAFE-M-16: Token Scope Limiting](../../mitigations/SAFE-M-16/README.md)**: Issue MCP tool credentials with the minimum scope required (specific schemas, row-limit quotas on the DB side); avoid sharing high-privilege DBA roles with agent service accounts [12]. +3. **[SAFE-M-71: Query Guardrails & Result Limits](../../mitigations/SAFE-M-71/README.md)**: Reject or require approval for `SELECT *` on large tables; block full-table scans without `WHERE` clauses; prefer parameterized, pre-vetted stored procedures over arbitrary SQL; enforce hard row/byte caps. The policy engine inspects proposed tool calls before dispatch; the model cannot bypass the enforcement layer [3][8][11]. +4. **[SAFE-M-23: Tool Output Truncation](../../mitigations/SAFE-M-23/README.md)**: Enforce hard row/byte caps on MCP tool responses; require explicit, audited overrides for bulk exports. +5. **[SAFE-M-72: Data Loss Prevention on Tool Outputs](../../mitigations/SAFE-M-72/README.md)**: Apply column-level classification, masking, tokenization, and sensitive-content scanning to MCP tool outputs before they reach the agent context. Protects against bulk disclosure of PII, payment data, secrets, and other classified content — including cases where query guardrails were satisfied but the result itself contains sensitive records [3][8]. +6. **[SAFE-M-14: Server Allowlisting](../../mitigations/SAFE-M-14/README.md)**: Restrict which MCP servers and connectors can attach to high-value databases and warehouses, reducing exposure to compromised or unvetted tools. +7. **[SAFE-M-9: Sandboxed Testing](../../mitigations/SAFE-M-9/README.md)**: Validate new MCP tools and query templates in isolated environments with synthetic data before production rollout. + +### Detective Controls +1. **[SAFE-M-12: Audit Logging](../../mitigations/SAFE-M-12/README.md)**: Centralize DB, warehouse, object-storage, MCP tool, and network logs into SIEM with tamper-resistant write-once or separate logging accounts for high-value audit streams [3][8][11][12]. +2. **[SAFE-M-11: Behavioral Monitoring](../../mitigations/SAFE-M-11/README.md)**: Alert on full-table scans, unplanned backup/export operations, and writes to new/unusual export destinations. +3. **[SAFE-M-70: Tool-Invocation Anomaly Detection & Baselining](../../mitigations/SAFE-M-70/README.md)**: Profile typical row counts, payload sizes, and destinations per MCP service account and tool; alert on volume and cardinality spikes (for example, a jump from thousands to millions of rows, or writes to a bucket never used in baseline). +4. **[SAFE-M-10: Automated Scanning](../../mitigations/SAFE-M-10/README.md)**: Continuously scan MCP tool invocation logs for dump-like SQL patterns and export commands. + +### Response Procedures +1. **Immediate Actions**: + - Suspend the offending MCP session and revoke or rotate the DB/warehouse/storage credentials used by the agent. + - Quarantine staged dump files and freeze access to destination buckets/paths. +2. **Investigation Steps**: + - Reconstruct the agent session: prompts, tool calls, arguments, returned row counts, and destination URIs. + - Correlate MCP tool logs with DB audit logs, warehouse query history, and object-storage events to confirm scope of data exposed. + - Identify whether the trigger was prompt injection, compromised tooling, credential abuse, or policy misconfiguration [4][9][13]. +3. **Remediation**: + - Tighten role scope, query guardrails, and approval flows on the affected tools. + - Notify data-protection and incident-response stakeholders; trigger breach-response playbooks if regulated data was exposed [10]. + - Add detection signatures and red-team test cases for the observed pattern. ## Related Techniques - -**ATT&CK:** - -- **T1213 – Data from Information Repositories.** -- **T1213.006 – Databases.** -- **T1074 – Data Staged.** -- **T1041 / T1567 – Exfiltration Over C2 / Web Services.** - - - ---- +- [SAFE-T1102](../SAFE-T1102/README.md): Prompt Injection — common trigger for unintended bulk queries. +- [SAFE-T1505](../SAFE-T1505/README.md): In-Memory Secret Extraction — complementary credential-focused exfiltration via MCP. +- [SAFE-T1303](../SAFE-T1303/README.md): Container Sandbox Escape via Runtime Exec — adjacent privilege-escalation pattern that may stage follow-on dumps. ## References - -[1] MITRE ATT&CK, T1213 – Data from Information Repositories. https://attack.mitre.org/techniques/T1213/ -[2] MITRE ATT&CK, T1213.006 – Data from Information Repositories: Databases. https://attack.mitre.org/techniques/T1213/006/ -[3] MITRE ATT&CK, T1074 – Data Staged. https://attack.mitre.org/techniques/T1074/ -[4] MITRE ATT&CK, T1041 – Exfiltration Over C2 Channel. https://attack.mitre.org/techniques/T1041/ -[5] MITRE ATT&CK, T1567 – Exfiltration Over Web Service. https://attack.mitre.org/techniques/T1567/ -[6] MITRE ATT&CK, Database Exfiltration Case Studies (Enterprise ATT&CK examples). https://attack.mitre.org/ -[7] Verizon, Data Breach Investigations Reports (DBIR) – database and credential breaches. https://www.verizon.com/business/resources/reports/dbir/ -[8] MITRE D3FEND, Data Loss Prevention & Staging Defenses. https://d3fend.mitre.org/ -[9] OWASP, Top‑10 for Large Language Model Applications. https://owasp.org/www-project-top-10-for-large-language-model-applications/ -[10] CISA, “Protecting Sensitive and Personal Information from Ransomware-Caused Data Breaches,” Fact Sheet. https://www.cisa.gov/sites/default/files/publications/CISA_Fact_Sheet-Protecting_Sensitive_and_Personal_Information_from_Ransomware-Caused_Data_Breaches-508C.pdf -[11] NSA & CISA, “Secure Data in the Cloud,” Cybersecurity Information Sheet (CSI), March 2024. -https://media.defense.gov/2024/Mar/07/2003407862/-1/-1/0/CSI-CLOUDTOP10-SECURE-DATA.PDF -[12] MITRE ATT&CK, M1018 – User Account Management. https://attack.mitre.org/mitigations/M1018/ -[13] OWASP GenAI Security Project. https://owasp.org/www-project-top-10-for-large-language-model-applications/ -[14] Google Cloud, “4 steps to stop data exfiltration with Google Cloud.” https://cloud.google.com/blog/products/identity-security/4-steps-to-stop-data-exfiltration-with-google-cloud - - ---- +- [1] [MITRE ATT&CK T1213 — Data from Information Repositories](https://attack.mitre.org/techniques/T1213/) +- [2] [MITRE ATT&CK T1213.006 — Databases](https://attack.mitre.org/techniques/T1213/006/) +- [3] [MITRE ATT&CK T1074 — Data Staged](https://attack.mitre.org/techniques/T1074/) +- [4] [MITRE ATT&CK T1041 — Exfiltration Over C2 Channel](https://attack.mitre.org/techniques/T1041/) +- [5] [MITRE ATT&CK T1567 — Exfiltration Over Web Service](https://attack.mitre.org/techniques/T1567/) +- [6] [MITRE ATT&CK Enterprise — Database Exfiltration Case Studies (index)](https://attack.mitre.org/) +- [7] [Verizon Data Breach Investigations Report (DBIR)](https://www.verizon.com/business/resources/reports/dbir/) +- [8] [MITRE D3FEND — Defensive Countermeasures Knowledge Graph](https://d3fend.mitre.org/) +- [9] [OWASP Top 10 for Large Language Model Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/) +- [10] [CISA Fact Sheet — Protecting Sensitive and Personal Information from Ransomware-Caused Data Breaches](https://www.cisa.gov/sites/default/files/publications/CISA_Fact_Sheet-Protecting_Sensitive_and_Personal_Information_from_Ransomware-Caused_Data_Breaches-508C.pdf) +- [11] [NSA & CISA — Top 10 Cloud Security Mitigation Strategies: Secure Data in the Cloud (CSI, March 2024)](https://media.defense.gov/2024/Mar/07/2003407862/-1/-1/0/CSI-CLOUDTOP10-SECURE-DATA.PDF) +- [12] [MITRE ATT&CK M1018 — User Account Management](https://attack.mitre.org/mitigations/M1018/) +- [13] [OWASP GenAI Security Project (LLM Top 10 hub)](https://owasp.org/www-project-top-10-for-large-language-model-applications/) +- [14] [Google Cloud — 4 Steps to Stop Data Exfiltration with Google Cloud](https://cloud.google.com/blog/products/identity-security/4-steps-to-stop-data-exfiltration-with-google-cloud) + +## MITRE ATT&CK Mapping +- [T1213 — Data from Information Repositories](https://attack.mitre.org/techniques/T1213/) +- [T1213.006 — Databases](https://attack.mitre.org/techniques/T1213/006/) +- [T1074 — Data Staged](https://attack.mitre.org/techniques/T1074/) +- [T1041 — Exfiltration Over C2 Channel](https://attack.mitre.org/techniques/T1041/) +- [T1567 — Exfiltration Over Web Service](https://attack.mitre.org/techniques/T1567/) ## Version History -| Version | Date | Changes | Author | -|---------|------|---------|--------| -| 1.0 | 2025-11-29 | SAFE-T1803 database dump via MCP tools, sub-techniques, detections, mitigations, and references | Pratikshya Regmi | - ---- +| Version | Date | Changes | Author | +|---------|------------|--------------------------------------------------------------------------------------------------|--------------------| +| 1.0 | 2025-11-29 | SAFE-T1803 database dump via MCP tools, sub-techniques, detections, mitigations, and references | Pratikshya Regmi | +| 1.1 | 2026-04-14 | Restructured to standard technique template; corrected SAFE-M IDs; normalized references | bishnu bista | diff --git a/techniques/SAFE-T1803/detection-rule.yml b/techniques/SAFE-T1803/detection-rule.yml index 7e9d9070..c2af28b8 100644 --- a/techniques/SAFE-T1803/detection-rule.yml +++ b/techniques/SAFE-T1803/detection-rule.yml @@ -1,75 +1,96 @@ +# SAFE-T1803: Database Dump Detection Rule +# This rule detects bulk database exfiltration patterns initiated through MCP-exposed +# SQL or backup/export tools (full-table SELECTs, dump utilities, or unusually large +# result sets against high-value tables). +# Note: This is an example rule. Tune thresholds and tool/account names to your environment. + +title: MCP Database Dump via SQL or Backup Tool Detection id: 9d4e1e5b-8825-4cb0-9a7c-6d8322c5d21f -name: SAFE-T1803 Database Dump via MCP SQL Tool -description: > - Detects suspicious database dump activity initiated through MCP-exposed SQL tools, - including full-table SELECT queries, backup/dump commands, and unusually large result - sets against high-value tables. status: experimental - -log_sources: - - product: mcp - service: tool_invocation - definition: MCP host logs for tool calls and responses (including tool args and result metadata). - - product: database - service: query_audit - definition: Database audit logs capturing query text, execution context, and row counts. - +description: Detects suspicious database dump activity initiated through MCP-exposed SQL or backup/export tools, including full-table SELECT queries, logical backup/dump commands, and unusually large result sets against high-value tables. +author: Pratikshya Regmi +date: 2025-11-29 +modified: 2026-04-14 +references: + - https://github.com/safe-mcp/techniques/SAFE-T1803 + - https://attack.mitre.org/techniques/T1213/006/ + - https://attack.mitre.org/techniques/T1074/ +logsource: + product: mcp + service: tool_invocation + category: application detection: - # Events from MCP host indicating a database-capable tool is being used. - mcp_sql_tool: + # MCP tool calls invoking a database-capable tool. + selection_db_tool: tool.name: - - "sql.query" - - "sql.execute" - - "db.query" - - "db.execute" - - "cloud-sql.query" - - "cloud-sql.execute" + - 'sql.query' + - 'sql.execute' + - 'db.query' + - 'db.execute' + - 'cloud-sql.query' + - 'cloud-sql.execute' + - 'warehouse.query' + - 'warehouse.export' - # SQL patterns commonly associated with database dumps or bulk exports. - mcp_sql_suspicious_pattern: + # SQL or argument patterns commonly associated with bulk database dumps or exports. + selection_dump_pattern: tool.args.sql|contains: - - "SELECT * FROM" - - "pg_dump" - - "mysqldump" - - "BACKUP DATABASE" - - "EXPORT DATA" - - "COPY " - - " TO STDOUT" - - " INTO OUTFILE" + - 'SELECT * FROM' + - 'pg_dump' + - 'mysqldump' + - 'BACKUP DATABASE' + - 'EXPORT DATA' + - 'COPY ' + - ' TO STDOUT' + - ' INTO OUTFILE' # Optional: high row-count or payload size returned to the LLM host. - mcp_large_result: - tool.result.row_count: ">=100000" # tune threshold for your environment - # or use a size field if available: - # tool.result.size_bytes: ">=104857600" # 100MB+ + selection_large_result: + tool.result.row_count|gte: 100000 - # Optional: database-side corroboration for the same service account. - db_full_table_scan: - db.log.user: "mcp_service_account" # tune to your MCP DB account + selection_large_payload: + tool.result.size_bytes|gte: 104857600 # 100 MB + + # Optional: database-side corroboration for an MCP-linked service account. + selection_db_corroboration: + db.log.user|contains: 'mcp' db.log.query|contains: - - "SELECT * FROM" - db.log.estimated_rows: ">=100000" + - 'SELECT * FROM' + - 'pg_dump' + - 'mysqldump' + - 'BACKUP DATABASE' + db.log.estimated_rows|gte: 100000 + + condition: selection_db_tool and (selection_dump_pattern or selection_large_result or selection_large_payload or selection_db_corroboration) + +fields: + - tool.name + - tool.args.sql + - tool.result.row_count + - tool.result.size_bytes + - tool.result.destination + - session.id + - user.id + - db.log.user + - db.log.query + - db.log.estimated_rows - timeframe: 5m +falsepositives: + - Approved analytics or reporting jobs that legitimately scan large tables + - Scheduled DBA-initiated backups or warehouse exports + - Data engineering pipelines that intentionally use COPY / EXPORT for ETL + - Load-testing or performance benchmarking against synthetic datasets - # Core condition: MCP SQL tool + (dump-like SQL OR unusually large results), - # optionally correlated with DB audit logs if available. - condition: > - mcp_sql_tool and - ( - mcp_sql_suspicious_pattern - or mcp_large_result - or db_full_table_scan - ) +level: high tags: - - attack.T1213 - - attack.T1213.006 - - attack.T1074 - - attack.T1020 - - safemcp.safe-t1803 - - tactic.collection - - category.database - - category.data-dump - - product.mcp + - attack.collection + - attack.exfiltration + - attack.t1213 + - attack.t1213.006 + - attack.t1074 + - attack.t1041 + - attack.t1567 + - safe.t1803 + - mcp.security - mcp.sql