This repository contains the design and implementation of a retail Data Warehouse project for DWBI coursework. The solution integrates multiple data sources, builds a star schema, and implements ETL workflows to populate and maintain dimensional and fact tables.
- Assignment 1: Data Warehouse design, sources, architecture, and schema
- Assignment 2: ETL implementation, dimension/fact loading, and accumulating fact updates
The project uses retail transactional data from two primary source types:
- CSV source: customer master data (DataSource/customers.csv)
- Database source: products and transactions from operational retail tables (exported CSVs under DataSource/db-to-csv/)
Main staging/profile artifacts are stored in:
The warehouse follows a star schema centered on Fact_Sales.
Fact_SalesCustomerKeyProductKeyDateKeyGeographyKeyQuantityTotalAmount
Dim_Customer(SCD Type 2)CustomerKey,CustomerID,CustomerName,Country,CustomerSegment,RegistrationDate,StartDate,EndDate,IsCurrent
Dim_ProductProductKey,StockCode,Description,SubCategory,Category
Dim_DateDateKey,FullDate,Day,Month,Year,Quarter,MonthName
Dim_GeographyGeographyKey,Country
ETL is implemented to:
- Extract from staging and source outputs
- Transform and standardize dimensional/fact attributes
- Load dimensions and fact tables in dependency order
Typical load order:
Dim_DateDim_ProductDim_Customer(SCD Type 2 handling)Dim_GeographyFact_Sales
Fact enrichment includes calculated amount:
TotalAmount = Quantity * UnitPriceFact_Sales is extended to support accumulating updates with:
accm_txn_create_timeaccm_txn_complete_timetxn_process_time_hours
Update logic example:
UPDATE Fact_Sales
SET
accm_txn_complete_time = ?,
txn_process_time_hours = DATEDIFF(HOUR, accm_txn_create_time, ?)
WHERE SalesKey = ?;- DataSource/ -> Source files (customer CSV and DB exports)
- data-profiling/ -> XML profiling and source snapshots
- db/ -> SQL scripts for staging and DW setup
- RetailCube/ -> SSAS cube project files
- Star schema DW design
- SCD Type 2 implementation for customer dimension
- Multi-source ETL pipeline
- Fact table loading with lookups and derived measures
- Accumulating fact table update process