A lightweight, dependency-free audit trail system for Python data pipelines.
CleanCore automatically creates immutable, row-level audit logs for every data transformation. It helps with debugging, compliance, and understanding how your data changes across cleaning steps.
- Automatic row-level audit logs
- Zero external dependencies (pure Python)
- JSON-based audit output for compliance and record keeping
- Works with lists, dictionaries, and CSV-style data
- Simple decorator-based API
pip install cleancore-python
Quick Start
Audit a Single Function
python
Copy code
from cleancore import audit_trail, ProvenaLogger, generate_terminal_report
@audit_trail(rule_id="GDPR_EMAIL_MASKING")
def clean_emails(data):
result = []
for row in data:
new_row = row.copy()
if '@' in new_row.get('email', ''):
new_row['email'] = '***@***.***'
result.append(new_row)
return result
logger = ProvenaLogger("Single_Transformation")
data = [
{'id': 1, 'email': 'test@example.com'},
{'id': 2, 'email': 'user'}
]
cleaned = clean_emails(data, provena_logger=logger)
print(generate_terminal_report(logger))
Pipeline Usage
python
Copy code
from cleancore import audit_pipeline, audit_trail
import csv
def load_data(path):
with open(path) as f:
return list(csv.DictReader(f))
@audit_trail(rule_id="STANDARDIZE_NAMES")
def standardize_names(data):
return data
@audit_trail(rule_id="FILL_MISSING_VALUES")
def fill_missing(data):
return data
with audit_pipeline("Customer_Onboarding_Pipeline") as logger:
data = load_data("customers.csv")
data = standardize_names(data, provena_logger=logger)
data = fill_missing(data, provena_logger=logger)
logger.export_json("customer_pipeline_audit.json")
Output
CleanCore generates a human-readable terminal report and a machine-readable JSON audit log containing:
Transformation name
Rule ID
Rows before and after
Number of changed rows
Sample value changes
Execution timestamps
API Overview
audit_trail – Decorator for auditing functions
ProvenaLogger – Collects audit events
audit_pipeline – Context manager for multi-step pipelines
generate_terminal_report() – Prints terminal summary
export_json() – Saves audit log to file
Source Code
GitHub Repository
https://github.com/Sidra-009/cleancore-python-library
Issues, feature requests, and pull requests are welcome.
License
MIT License. See the LICENSE file for details.