Skip to content

NoahCMD/customer-data-hub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Customer Data Hub

Real-time Customer Data Platform with Multi-Source Integration

A production-ready CDP that ingests data from multiple sources (CRMs, ERPs, APIs) and transforms it into actionable customer intelligence using the Medallion architecture (Bronze β†’ Silver β†’ Gold).

🎯 Key Features

  • Real-time Event Processing: Webhook receivers for Omie, HubSpot, and custom APIs
  • Medallion Architecture: Structured data pipeline with raw, cleansed, and analytics layers
  • Multi-source Integration: Omie ERP, HubSpot CRM, and generic API connectors
  • Data Quality: Validation scoring and error tracking
  • Customer Intelligence: Lifecycle stages, engagement scoring, and churn risk detection
  • Scalable Design: Async processing, batch pipelines, and optimized queries
  • REST API: Query customers, segments, and metrics in real-time

πŸ—οΈ Architecture

Data Sources (Webhooks/APIs)
    ↓
Bronze Layer (Raw ingestion)
    ↓
Silver Layer (Data cleansing)
    ↓
Gold Layer (Analytics & segments)
    ↓
REST API (Customer queries)

For detailed architecture documentation, see docs/architecture.md.

πŸš€ Quick Start

Prerequisites

  • Docker & Docker Compose
  • Git

Setup

  1. Clone and navigate:
cd customer-data-hub
  1. Configure environment:
cp .env.example .env
# Edit .env with your API credentials
  1. Start services:
docker-compose up -d
  1. Verify:
curl http://localhost:8000/health/

Services running:

πŸ“‘ Webhook Integration

Omie ERP

curl -X POST http://localhost:8000/webhooks/omie \
  -H "Content-Type: application/json" \
  -H "X-Omie-Signature: your-signature" \
  -d @examples/omie_customer_webhook.json

HubSpot CRM

curl -X POST http://localhost:8000/webhooks/hubspot \
  -H "Content-Type: application/json" \
  -H "X-HubSpot-Request-Signature: your-signature" \
  -d @examples/hubspot_contact_webhook.json

Generic API

curl -X POST http://localhost:8000/webhooks/generic \
  -H "Content-Type: application/json" \
  -d @examples/generic_customer_webhook.json

πŸ” API Endpoints

Health

  • GET /health/ - API health status
  • GET /health/stats - System statistics

Customers

  • GET /customers/ - List customers (paginated)
  • GET /customers/{id} - Customer details with segments
  • GET /customers/segment/high-value - High-value customers
  • GET /customers/segment/churn-risk - At-risk customers

For complete API documentation, see docs/api_endpoints.md.

πŸ“ Integration Guide

Complete setup instructions for each data source:

πŸ—„οΈ Database Schema

Bronze Layer: bronze_raw_data - Raw event payloads Silver Layer: silver_customers - Cleansed customer records Gold Layer: gold_customer_segments - Aggregated metrics Tracking: sync_logs, webhook_events - Operation logs

πŸ§ͺ Testing

Run tests:

pip install -r requirements.txt[dev]
pytest tests/

Test webhook endpoints manually:

# List webhook examples
ls examples/

# Test with sample payloads
curl -X POST http://localhost:8000/webhooks/generic \
  -H "Content-Type: application/json" \
  -d @examples/generic_customer_webhook.json

πŸ”’ Security

  • HMAC-SHA256 webhook signature validation
  • Environment variable-based credential management
  • Input validation on all endpoints
  • SQL injection prevention via SQLAlchemy ORM

See .env.example for configuration.

πŸ“Š Data Quality

Each customer record receives a quality score (0-100) based on:

  • Completeness (100 - 10 points per missing field)
  • Validation errors (- 5 points each)
  • Data freshness (engagement scoring)

πŸš€ Production Deployment

Docker Deployment

docker-compose -f docker-compose.yml up -d

Environment Variables

Copy .env.example to .env and configure:

  • Database credentials
  • API keys (Omie, HubSpot)
  • Webhook secrets
  • Environment (production/development)

Security Checklist

  • Enable webhook validation
  • Use HTTPS for all endpoints
  • Implement API authentication
  • Set up database backups
  • Configure rate limiting
  • Monitor logs and errors
  • Use strong secrets

🧠 Key Technologies

  • FastAPI: High-performance Python web framework
  • PostgreSQL: Relational data warehouse
  • Redis: Caching and session management
  • SQLAlchemy: ORM for database operations
  • Pydantic: Data validation
  • Docker: Container orchestration

πŸ“š Documentation

πŸ”„ Data Flow Example

  1. Webhook received β†’ POST /webhooks/omie
  2. Signature validated β†’ HMAC-SHA256 verification
  3. Stored in bronze β†’ Raw payload exact copy
  4. Transformed to silver β†’ Data cleansing & validation
  5. Aggregated to gold β†’ Metrics & segments calculated
  6. Available via API β†’ GET /customers endpoints

πŸŽ“ Why This Project Matters

This project demonstrates:

  • βœ… Multi-source data integration - Combining data from CRMs and ERPs
  • βœ… Real-time event processing - Webhooks, validation, async handling
  • βœ… Medallion architecture - Industry-standard data pipeline pattern
  • βœ… Customer intelligence - Segmentation, scoring, and analytics
  • βœ… Production-ready code - Error handling, logging, testing
  • βœ… REST API design - Clean endpoints, pagination, filtering
  • βœ… Scalable design - Batch processing, indexing, async operations

Perfect for demonstrating backend engineering skills for RevOps, data engineering, and CDP roles.

🀝 Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

πŸ“„ License

MIT License - see LICENSE for details.

πŸ“§ Support


Built with ❀️ for RevOps and data engineering

About

Real-time Customer Data Platform with Multi-Source Integration

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors