Skip to content

NotAbdelrahmanelsayed/retail-cleaning-modeling

Repository files navigation

📊 Retail Data Pipeline: From Excel to Dashboard via DBT, PostgreSQL, and Power BI.

An end-to-end data pipeline designed for cleansing, modeling, and analyzing e-commerce sales data from Excel to interactive dashboards.

Time Spent: 2,462 minutes (~41 hours)
🧱 Commits: +75 detailed commits tracking each development step


🗺️ Table of Contents


📦 Project Overview

This project implements a data pipeline to extract data from an Excel file, transform and model it through structured DBT models, and visualize insights using Power BI.

Dashboard

Data Model


⚙️ Stack & Technologies

Core Tools: Docker, DBT, PostgreSQL, Power BI
Programming Libraries: pandas, sqlalchemy, logging
Project Management: makefile

The primary focus was mastering DBT, resulting in:

  • 7 structured data models
  • 15 comprehensive tests
  • Thorough metadata documentation

🛠️ Pipeline Steps

Pipeline Diagram

Detailed Steps:

  1. Extraction: Reading data from Excel using pandas.

  2. Loading: Storing raw data in PostgreSQL.

  3. Transformation & Modeling: Using DBT to create a structured data flow:

    • Raw → Staging → Intermediate → Dimensions → Facts
  4. Visualization: Connecting Power BI to PostgreSQL to create interactive dashboard and analyses.


🚀 Quick Start

Step 1: Clone the Repository

git clone https://github.com/NotAbdelrahmanelsayed/retail-cleaning-modeling.git
cd retail-cleaning-modeling

Step 2: Set Up Environment Variables Create a .env file in the project root:

touch .env

paste this template inside .env file, feel free to customize it.

DB_USER=dbtuser
DB_PASSWORD=dummy_password
DB_DATABASE=retail
DB_HOST=postgres
DB_PORT=5432
PGADMIN_EMAIL=bedo@email.com
PGADMIN_PASSWORD=123456

Step 3: Launch Services Build and launch containers:

docker compose up -d --build

Tip: Use make up for convenience. See makefile for additional dev-friendly commands.

Step 4: Run DBT Commands Access DBT container:

make sh

# or directly via:
docker exec -it dbt_core bash

Inside the container run:

dbt run                # Run DBT models
dbt test               # Test DBT models
dbt docs generate      # Generate DBT documentation
dbt docs serve         # Serve and view documentation

Step 5: Explore Data with pgAdmin

Step 6: Dashboard Visualization

  • Open dashboard/retail_analysis.pbix in Power BI Desktop.
  • Configure Power BI connection using PostgreSQL credentials from .env.

📚 What I learned

  • Spent alot of time in dbt documentation which deepened my knowled of DBT's limitations and capabilities
  • Learned custom SQL tests.
  • Improved my understanding of data modeling as both a creative and technical discipline.
  • Improved my skills in Docker container managment.
  • Learned to integrate PostgreSQL with Power BI for analytics
  • Built practical skills for creating impactful dashboards with Power BI.

🔗 Sources & References

GitHub Repositories

Official Documentation

Courses

Articles

AI-Assisted Learning 🤖

  • I spent 10 hours solving challenging dbt and SQL assignments created by ChatGPT, receiving feedback from it iteratively to sharpen my skills.

🚧 Future Improvements

  • Implement Airflow for orchestrating the complete data pipeline (data ingestion, DBT modeling, and testing).
  • Explore and utilize DBT Cloud for efficient deployment and monitoring.
  • Deploy the PostgreSQL database to a cloud provider (AWS, Google Cloud).

About

end-to-end data pipeline project for cleansing, modeling, and analyzing e-commerce sales data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors