π¦ IndusLoan Analytics Platform
Personal Loan Acquisition Analytics | End-to-End Data Engineering Portfolio Project
This project simulates a production-grade Personal Loan Acquisition Data Pipeline
modeled on real retail banking operations at IndusInd Bank (2016).
It demonstrates end-to-end data engineering competency covering:
β
Medallion Architecture β Raw β Bronze β Silver β Gold β Reporting
β
PII Masking and Data Privacy β RBI Compliance
β
Watermark-based Incremental Loading β Full and Incremental modes
β
Slowly Changing Dimension Type 2 β Agent history tracking
β
Data Quality Framework β 11 rules across Bronze and Silver
β
Star Schema Design β 2 Facts Β· 7 Dimensions Β· 5 Aggregates
β
Data Governance β Full audit trail, lineage, reconciliation
β
Power BI Reporting β Row Level Security, executive dashboards
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INDUS LOAN ANALYTICS PLATFORM β
β Medallion Architecture β 14 Phases β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββ
β SOURCE β
β banksalesdata.csv β 22,155 rows Β· 24 columns
β IndusInd Bank 2016 β Real personal loan applications
ββββββββββββ¬βββββββββββ
β BULK INSERT
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAW LAYER β
β raw.applications β
β β’ Append-only β data is never modified here β
β β’ All 24 columns stored as VARCHAR β no casting β
β β’ Byte-perfect copy of source file β
β β’ NULL allowed everywhere β source data is messy β
ββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β usp_load_bronze (FULL / INCR)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BRONZE LAYER β
β bronze.applications β
β β’ Watermark applied β incremental load tracking β
β β’ PII Masked β PAN hashed Β· Name β initials Β· DOB β year β
β β’ Agent codes replaced with pseudonyms (AGT_00001) β
β β’ Record hash added β SHA2_256 for deduplication β
β β’ Audit columns added β batch_id Β· load_timestamp β
β β’ 5 DQ rules checked β CRITICAL stops pipeline β
ββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β usp_transform_silver
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SILVER LAYER β
β silver.applications β
β β’ 14 derived columns built using CTE chain β
β β’ Data types cast β dates, amounts, integers β
β β’ final_status β Approved Β· Declined Β· In Process β
β β’ tat_hours Β· tat_bucket β turnaround time analysis β
β β’ loan_amount_band Β· city_tier β segmentation β
β β’ is_approved Β· is_declined Β· is_in_process β BIT flags β
β β’ Decline codes exploded β silver.decline_codes_parsed β
β β’ 6 DQ rules checked β row reconciliation logged β
ββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β usp_load_gold
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GOLD LAYER β STAR SCHEMA β
β β
β DIMENSIONS FACTS β
β βββββββββββββββ ββββββββββββββββββββ β
β β dim_date ββββββββββββββββββββ fact_application β β
β βββββββββββββββ β β β
β βββββββββββββββ β 2 date keys β β
β βdim_customer βββββββββββββββββββ 7 dim keys β β
β βββββββββββββββ β 5 measures β β
β βββββββββββββββ ββββββββββ¬ββββββββββ β
β β dim_agent β β β
β β SCD Type 2 β ββββββββββΌββββββββββ β
β βββββββββββββββ βfact_decline_bridgeβ β
β βββββββββββββββ β many-to-many β β
β β dim_channel β β decline codes β β
β βββββββββββββββ ββββββββββββββββββββ β
β βββββββββββββββ β
β β dim_product β AGGREGATES β
β βββββββββββββββ βββββββββββββββββββββββββββ β
β βββββββββββββββ β agg_approval_funnel β β
β β dim_branch β β agg_agent_scorecard β β
β βββββββββββββββ β agg_tat_analysis β β
β ββββββββββββββββ β agg_channel_performance β β
β βdim_decline β β agg_city_performance β β
β β reason β βββββββββββββββββββββββββββ β
β βββββββββββββββ β
ββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β rpt.* views
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REPORTING LAYER β
β rpt.* Views β
β β’ Indexed views over gold aggregates β
β β’ Row Level Security β branch level access control β
β β’ Power BI connects here only β never to gold directly β
ββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β POWER BI β
β β’ Approval Funnel Dashboard β
β β’ Agent Scorecard β
β β’ TAT Analysis β
β β’ Channel Performance β
β β’ City and Region Heatmap β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AUDIT SCHEMA β DATA GOVERNANCE β
β Runs alongside every layer β tracks everything β
β β
β pipeline_run_log β dq_results β watermark_control β
β layer_reconciliation β column_lineage β agent_pseudonym β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Repository Structure
indus-loan-analytics/
β
βββ sql/
β βββ 00_setup/ β Phase 1 Foundation
β β βββ 01_create_schemas.sql β 6 schemas
β β βββ 02_create_audit_tables.sql β 5 audit tables
β β βββ 03_seed_watermark.sql β 2 seed rows
β β βββ 04_create_agent_pseudonym.sql β PII lookup table
β β
β βββ 01_raw/ β Phase 2 Raw Layer
β β βββ 01_create_raw_table.sql β 24 column raw table
β β
β βββ 02_bronze/ β Phase 3-4 coming soon
β βββ 03_silver/ β Phase 5-7 coming soon
β βββ 04_gold/ β Phase 8-11 coming soon
β βββ 05_reporting/ β Phase 12 coming soon
β
βββ data/
β βββ sample_100_rows.csv
β
βββ docs/
βββ .gitignore
βββ README.md
Audit Schema β Data Governance Layer
Table
Purpose
Governance Pillar
pipeline_run_log
Tracks every pipeline execution
Auditability
dq_results
Every DQ rule result β PASS or FAIL
Data Quality
watermark_control
Manages incremental load state
Completeness
layer_reconciliation
Proves zero row loss across layers
Reconciliation
column_lineage
Documents every column transformation
Traceability
agent_pseudonym
Real agent codes mapped to fake codes
PII Protection
Gold Layer β Star Schema
Table
Type
Key Columns
fact_application
Fact
loan_amount Β· tat_hours Β· is_approved
fact_decline_bridge
Bridge Fact
app_id Β· decline_code Β· is_primary
dim_date
Dimension
date Β· month Β· quarter Β· year
dim_customer
Dimension
segment Β· city Β· city_tier
dim_agent
SCD Type 2
agent_code Β· branch Β· effective_from Β· effective_to
dim_channel
Dimension
channel_code Β· channel_type
dim_product
Dimension
product_code Β· scheme Β· fee_code
dim_branch
Dimension
branch_code Β· city Β· region
dim_decline_reason
Dimension
decline_code Β· category
Tool
Version
Purpose
SQL Server
2019
Core database engine
T-SQL
β
DDL Β· Stored Procedures Β· DQ Rules
SSMS
19+
Database development
VS Code
Latest
Code editor Β· Git integration
Python
3.10+
Synthetic data generation
Power BI
Desktop
Reporting Β· Dashboards Β· RLS
Git
Latest
Version control
GitHub
β
Remote repository
# Step 1 β Clone the repository
git clone https://github.com/dirumisra/indus-loan-analytics.git
cd indus-loan-analytics
# Step 2 β Run in SSMS in this exact order
1. sql/00_setup/01_create_schemas.sql
2. sql/00_setup/02_create_audit_tables.sql
3. sql/00_setup/03_seed_watermark.sql
4. sql/00_setup/04_create_agent_pseudonym.sql
5. sql/01_raw/01_create_raw_table.sql
Attribute
Value
Source
IndusInd Bank Personal Loan Applications
Year
2016
Total Records
22,155 rows
Total Columns
24 columns
Decision Values
13 unique β FINISH Β· DECLINED Β· CBLR Β· REJ...
Sourcing Channels
16 unique β INH Β· DSA Β· BRANCH Β· PBA...
Products
7 unique β LAA701 Β· LAA702 Β· PLCIBIL...
Campaign Types
165 unique
Null Rate
Up to 64% in some columns
Phase
Description
Status
1
Foundation β DB Β· schemas Β· audit tables Β· seeds
β
Complete
2
Raw Layer β 24 column table Β· BULK INSERT
β
Complete
3
Bronze β PII masking Β· watermark Β· TRY/CATCH
β³ Pending
4
Bronze DQ β 5 rules Β· CRITICAL stop logic
β³ Pending
5
Silver β 14 derived columns Β· CTE chain
β³ Pending
6
Silver β decline code parser Β· STRING_SPLIT
β³ Pending
7
Silver DQ β 6 rules Β· row reconciliation
β³ Pending
8
Gold β 7 dimensions Β· MERGE statements
β³ Pending
9
Gold β SCD Type 2 Β· dim_agent
β³ Pending
10
Gold β fact_application Β· fact_decline_bridge
β³ Pending
11
Gold β 5 aggregation tables Β· RANK Β· LAG Β· NTILE
β³ Pending
12
Reporting β Power BI Β· RLS Β· rpt.* views
β³ Pending
13
Master pipeline orchestrator
β³ Pending
14
Testing Β· 5 test scripts Β· documentation
β³ Pending
Framework
How We Implement It
DAMA-DMBOK
Full audit trail Β· data quality Β· lineage
ISO 8000
Layer reconciliation β variance must always be zero
RBI Guidelines
PII masking Β· audit retention Β· reproducible reports
Dhiraj Kumar
Data Engineering Portfolio Project