An end-to-end data pipeline designed for cleansing, modeling, and analyzing e-commerce sales data from Excel to interactive dashboards.
⏱ Time Spent: 2,462 minutes (~41 hours)
🧱 Commits: +75 detailed commits tracking each development step
- 📦 Project Overview
- ⚙️ Stack & Technologies
- 🛠️ Pipeline Steps
- 🚀 Quick Start
- 📚 What I learned
- 🔗 Sources & References
- 🚧 Future Improvements
This project implements a data pipeline to extract data from an Excel file, transform and model it through structured DBT models, and visualize insights using Power BI.
Core Tools: Docker, DBT, PostgreSQL, Power BI
Programming Libraries: pandas, sqlalchemy, logging
Project Management: makefile
The primary focus was mastering DBT, resulting in:
- 7 structured data models
- 15 comprehensive tests
- Thorough metadata documentation
Detailed Steps:
-
Extraction: Reading data from Excel using
pandas. -
Loading: Storing raw data in PostgreSQL.
-
Transformation & Modeling: Using DBT to create a structured data flow:
- Raw → Staging → Intermediate → Dimensions → Facts
-
Visualization: Connecting Power BI to PostgreSQL to create interactive dashboard and analyses.
Step 1: Clone the Repository
git clone https://github.com/NotAbdelrahmanelsayed/retail-cleaning-modeling.git
cd retail-cleaning-modelingStep 2: Set Up Environment Variables
Create a .env file in the project root:
touch .envpaste this template inside .env file, feel free to customize it.
DB_USER=dbtuser
DB_PASSWORD=dummy_password
DB_DATABASE=retail
DB_HOST=postgres
DB_PORT=5432
PGADMIN_EMAIL=bedo@email.com
PGADMIN_PASSWORD=123456Step 3: Launch Services Build and launch containers:
docker compose up -d --buildTip: Use
make upfor convenience. Seemakefilefor additional dev-friendly commands.
Step 4: Run DBT Commands Access DBT container:
make sh
# or directly via:
docker exec -it dbt_core bashInside the container run:
dbt run # Run DBT models
dbt test # Test DBT models
dbt docs generate # Generate DBT documentation
dbt docs serve # Serve and view documentationStep 5: Explore Data with pgAdmin
- Visit http://localhost:8888
- Use the credentials from
.envto log in.
Step 6: Dashboard Visualization
- Open
dashboard/retail_analysis.pbixin Power BI Desktop. - Configure Power BI connection using PostgreSQL credentials from
.env.
- Spent alot of time in dbt documentation which deepened my knowled of DBT's limitations and capabilities
- Learned custom SQL tests.
- Improved my understanding of data modeling as both a creative and technical discipline.
- Improved my skills in Docker container managment.
- Learned to integrate PostgreSQL with Power BI for analytics
- Built practical skills for creating impactful dashboards with Power BI.
- I spent 10 hours solving challenging dbt and SQL assignments created by ChatGPT, receiving feedback from it iteratively to sharpen my skills.
- Implement Airflow for orchestrating the complete data pipeline (data ingestion, DBT modeling, and testing).
- Explore and utilize DBT Cloud for efficient deployment and monitoring.
- Deploy the PostgreSQL database to a cloud provider (AWS, Google Cloud).


