Real-time Customer Data Platform with Multi-Source Integration
A production-ready CDP that ingests data from multiple sources (CRMs, ERPs, APIs) and transforms it into actionable customer intelligence using the Medallion architecture (Bronze β Silver β Gold).
- Real-time Event Processing: Webhook receivers for Omie, HubSpot, and custom APIs
- Medallion Architecture: Structured data pipeline with raw, cleansed, and analytics layers
- Multi-source Integration: Omie ERP, HubSpot CRM, and generic API connectors
- Data Quality: Validation scoring and error tracking
- Customer Intelligence: Lifecycle stages, engagement scoring, and churn risk detection
- Scalable Design: Async processing, batch pipelines, and optimized queries
- REST API: Query customers, segments, and metrics in real-time
Data Sources (Webhooks/APIs)
β
Bronze Layer (Raw ingestion)
β
Silver Layer (Data cleansing)
β
Gold Layer (Analytics & segments)
β
REST API (Customer queries)
For detailed architecture documentation, see docs/architecture.md.
- Docker & Docker Compose
- Git
- Clone and navigate:
cd customer-data-hub- Configure environment:
cp .env.example .env
# Edit .env with your API credentials- Start services:
docker-compose up -d- Verify:
curl http://localhost:8000/health/Services running:
- API: http://localhost:8000
- PostgreSQL: localhost:5432
- Redis: localhost:6379
curl -X POST http://localhost:8000/webhooks/omie \
-H "Content-Type: application/json" \
-H "X-Omie-Signature: your-signature" \
-d @examples/omie_customer_webhook.jsoncurl -X POST http://localhost:8000/webhooks/hubspot \
-H "Content-Type: application/json" \
-H "X-HubSpot-Request-Signature: your-signature" \
-d @examples/hubspot_contact_webhook.jsoncurl -X POST http://localhost:8000/webhooks/generic \
-H "Content-Type: application/json" \
-d @examples/generic_customer_webhook.jsonGET /health/- API health statusGET /health/stats- System statistics
GET /customers/- List customers (paginated)GET /customers/{id}- Customer details with segmentsGET /customers/segment/high-value- High-value customersGET /customers/segment/churn-risk- At-risk customers
For complete API documentation, see docs/api_endpoints.md.
Complete setup instructions for each data source:
Bronze Layer: bronze_raw_data - Raw event payloads
Silver Layer: silver_customers - Cleansed customer records
Gold Layer: gold_customer_segments - Aggregated metrics
Tracking: sync_logs, webhook_events - Operation logs
Run tests:
pip install -r requirements.txt[dev]
pytest tests/Test webhook endpoints manually:
# List webhook examples
ls examples/
# Test with sample payloads
curl -X POST http://localhost:8000/webhooks/generic \
-H "Content-Type: application/json" \
-d @examples/generic_customer_webhook.json- HMAC-SHA256 webhook signature validation
- Environment variable-based credential management
- Input validation on all endpoints
- SQL injection prevention via SQLAlchemy ORM
See .env.example for configuration.
Each customer record receives a quality score (0-100) based on:
- Completeness (100 - 10 points per missing field)
- Validation errors (- 5 points each)
- Data freshness (engagement scoring)
docker-compose -f docker-compose.yml up -dCopy .env.example to .env and configure:
- Database credentials
- API keys (Omie, HubSpot)
- Webhook secrets
- Environment (production/development)
- Enable webhook validation
- Use HTTPS for all endpoints
- Implement API authentication
- Set up database backups
- Configure rate limiting
- Monitor logs and errors
- Use strong secrets
- FastAPI: High-performance Python web framework
- PostgreSQL: Relational data warehouse
- Redis: Caching and session management
- SQLAlchemy: ORM for database operations
- Pydantic: Data validation
- Docker: Container orchestration
- Webhook received β POST /webhooks/omie
- Signature validated β HMAC-SHA256 verification
- Stored in bronze β Raw payload exact copy
- Transformed to silver β Data cleansing & validation
- Aggregated to gold β Metrics & segments calculated
- Available via API β GET /customers endpoints
This project demonstrates:
- β Multi-source data integration - Combining data from CRMs and ERPs
- β Real-time event processing - Webhooks, validation, async handling
- β Medallion architecture - Industry-standard data pipeline pattern
- β Customer intelligence - Segmentation, scoring, and analytics
- β Production-ready code - Error handling, logging, testing
- β REST API design - Clean endpoints, pagination, filtering
- β Scalable design - Batch processing, indexing, async operations
Perfect for demonstrating backend engineering skills for RevOps, data engineering, and CDP roles.
Contributions are welcome! See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: your-email@example.com
Built with β€οΈ for RevOps and data engineering