Skip to content

robertsaghafi/DSPM-Discovery-Simulator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

📡 DSPM-Discovery-Simulator

Problem Statement

Organizations cannot validate if their Data Security Posture Management (DSPM) tools actually work without uploading real, sensitive production data, which creates a chicken-and-egg security risk. Security teams need a way to test classification accuracy, data discovery coverage, and alert tuning without exposing actual PII, PHI, or financial data.

Proposed Solution

A Python utility that generates "High-Fidelity Synthetic PII." It creates files that look like real customer data to a scanner but contain zero actual sensitive information. The tool generates realistic patterns for SSNs, credit cards, email addresses, phone numbers, and medical record numbers across multiple file formats.

Why It Matters

Validates tool effectiveness and classification accuracy in a controlled environment. Enables security teams to benchmark DSPM solutions (BigID, Microsoft Purview, Wiz) before purchasing or to tune existing deployments without compliance risk.

MVP Scope

  • Script to generate 1,000 rows of synthetic data in CSV, JSON, and Parquet formats.
  • Configurable "data profiles" (e.g., Healthcare, Financial Services, Retail).
  • A "Ground Truth" manifest file that documents what sensitive data patterns exist in each generated file.

Suggested Stack

  • Python
  • Faker Library
  • Pandas

Deliverables

  • generate_test_data.py
  • ground_truth_manifest.json
  • sample_datasets/ folder with pre-generated test files
  • README.md with usage examples

Expansion Path

  • Adding "obfuscated" data patterns to test advanced ML-based classification.
  • Support for unstructured data (PDFs, DOCX files with embedded PII).
  • API endpoint to generate data on-demand for CI/CD testing.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors