Skip to content

A minimal Python library for tracking, auditing, and reporting data cleaning and transformation steps with full transparency

License

Notifications You must be signed in to change notification settings

Sidra-009/cleancore-python-library

Repository files navigation

CleanCore Python

A lightweight, dependency-free audit trail system for Python data pipelines.

CleanCore automatically creates immutable, row-level audit logs for every data transformation. It helps with debugging, compliance, and understanding how your data changes across cleaning steps.


Features

  • Automatic row-level audit logs
  • Zero external dependencies (pure Python)
  • JSON-based audit output for compliance and record keeping
  • Works with lists, dictionaries, and CSV-style data
  • Simple decorator-based API

Installation

pip install cleancore-python
Quick Start
Audit a Single Function
python
Copy code
from cleancore import audit_trail, ProvenaLogger, generate_terminal_report

@audit_trail(rule_id="GDPR_EMAIL_MASKING")
def clean_emails(data):
    result = []
    for row in data:
        new_row = row.copy()
        if '@' in new_row.get('email', ''):
            new_row['email'] = '***@***.***'
        result.append(new_row)
    return result

logger = ProvenaLogger("Single_Transformation")

data = [
    {'id': 1, 'email': 'test@example.com'},
    {'id': 2, 'email': 'user'}
]

cleaned = clean_emails(data, provena_logger=logger)

print(generate_terminal_report(logger))
Pipeline Usage
python
Copy code
from cleancore import audit_pipeline, audit_trail
import csv

def load_data(path):
    with open(path) as f:
        return list(csv.DictReader(f))

@audit_trail(rule_id="STANDARDIZE_NAMES")
def standardize_names(data):
    return data

@audit_trail(rule_id="FILL_MISSING_VALUES")
def fill_missing(data):
    return data

with audit_pipeline("Customer_Onboarding_Pipeline") as logger:
    data = load_data("customers.csv")
    data = standardize_names(data, provena_logger=logger)
    data = fill_missing(data, provena_logger=logger)

logger.export_json("customer_pipeline_audit.json")
Output
CleanCore generates a human-readable terminal report and a machine-readable JSON audit log containing:

Transformation name

Rule ID

Rows before and after

Number of changed rows

Sample value changes

Execution timestamps

API Overview
audit_trail – Decorator for auditing functions

ProvenaLogger – Collects audit events

audit_pipeline – Context manager for multi-step pipelines

generate_terminal_report() – Prints terminal summary

export_json() – Saves audit log to file

Source Code
GitHub Repository
https://github.com/Sidra-009/cleancore-python-library

Issues, feature requests, and pull requests are welcome.

License
MIT License. See the LICENSE file for details.

About

A minimal Python library for tracking, auditing, and reporting data cleaning and transformation steps with full transparency

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages