Skip to content

dirumisra/indus-loan-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏦 IndusLoan Analytics Platform

Personal Loan Acquisition Analytics | End-to-End Data Engineering Portfolio Project

SQL Server Python Power BI GitHub Status


πŸ“Œ Project Overview

This project simulates a production-grade Personal Loan Acquisition Data Pipeline modeled on real retail banking operations at IndusInd Bank (2016).

It demonstrates end-to-end data engineering competency covering:

  • βœ… Medallion Architecture β€” Raw β†’ Bronze β†’ Silver β†’ Gold β†’ Reporting
  • βœ… PII Masking and Data Privacy β€” RBI Compliance
  • βœ… Watermark-based Incremental Loading β€” Full and Incremental modes
  • βœ… Slowly Changing Dimension Type 2 β€” Agent history tracking
  • βœ… Data Quality Framework β€” 11 rules across Bronze and Silver
  • βœ… Star Schema Design β€” 2 Facts Β· 7 Dimensions Β· 5 Aggregates
  • βœ… Data Governance β€” Full audit trail, lineage, reconciliation
  • βœ… Power BI Reporting β€” Row Level Security, executive dashboards

πŸ—οΈ Architecture

╔══════════════════════════════════════════════════════════════════════════════╗
β•‘                    INDUS LOAN ANALYTICS PLATFORM                            β•‘
β•‘                    Medallion Architecture β€” 14 Phases                       β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚      SOURCE          β”‚
  β”‚  banksalesdata.csv   β”‚  22,155 rows Β· 24 columns
  β”‚  IndusInd Bank 2016  β”‚  Real personal loan applications
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚  BULK INSERT
             β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                        RAW LAYER                                β”‚
  β”‚                      raw.applications                           β”‚
  β”‚  β€’ Append-only β€” data is never modified here                    β”‚
  β”‚  β€’ All 24 columns stored as VARCHAR β€” no casting                β”‚
  β”‚  β€’ Byte-perfect copy of source file                             β”‚
  β”‚  β€’ NULL allowed everywhere β€” source data is messy               β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚  usp_load_bronze (FULL / INCR)
             β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                      BRONZE LAYER                               β”‚
  β”‚                     bronze.applications                         β”‚
  β”‚  β€’ Watermark applied β€” incremental load tracking                β”‚
  β”‚  β€’ PII Masked β€” PAN hashed Β· Name β†’ initials Β· DOB β†’ year      β”‚
  β”‚  β€’ Agent codes replaced with pseudonyms (AGT_00001)             β”‚
  β”‚  β€’ Record hash added β€” SHA2_256 for deduplication               β”‚
  β”‚  β€’ Audit columns added β€” batch_id Β· load_timestamp              β”‚
  β”‚  β€’ 5 DQ rules checked β€” CRITICAL stops pipeline                 β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚  usp_transform_silver
             β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                      SILVER LAYER                               β”‚
  β”‚                     silver.applications                         β”‚
  β”‚  β€’ 14 derived columns built using CTE chain                     β”‚
  β”‚  β€’ Data types cast β€” dates, amounts, integers                   β”‚
  β”‚  β€’ final_status β€” Approved Β· Declined Β· In Process              β”‚
  β”‚  β€’ tat_hours Β· tat_bucket β€” turnaround time analysis            β”‚
  β”‚  β€’ loan_amount_band Β· city_tier β€” segmentation                  β”‚
  β”‚  β€’ is_approved Β· is_declined Β· is_in_process β€” BIT flags        β”‚
  β”‚  β€’ Decline codes exploded β†’ silver.decline_codes_parsed         β”‚
  β”‚  β€’ 6 DQ rules checked β€” row reconciliation logged               β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚  usp_load_gold
             β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                    GOLD LAYER β€” STAR SCHEMA                     β”‚
  β”‚                                                                 β”‚
  β”‚   DIMENSIONS                        FACTS                       β”‚
  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
  β”‚   β”‚  dim_date   │──────────────────│ fact_application β”‚        β”‚
  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚                  β”‚        β”‚
  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚  2 date keys     β”‚        β”‚
  β”‚   β”‚dim_customer │─────────────────│  7 dim keys      β”‚        β”‚
  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚  5 measures      β”‚        β”‚
  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
  β”‚   β”‚  dim_agent  β”‚                           β”‚                   β”‚
  β”‚   β”‚  SCD Type 2 β”‚                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚fact_decline_bridgeβ”‚        β”‚
  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚  many-to-many    β”‚        β”‚
  β”‚   β”‚ dim_channel β”‚                  β”‚  decline codes   β”‚        β”‚
  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                               β”‚
  β”‚   β”‚ dim_product β”‚          AGGREGATES                           β”‚
  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚  agg_approval_funnel     β”‚        β”‚
  β”‚   β”‚  dim_branch β”‚          β”‚  agg_agent_scorecard     β”‚        β”‚
  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚  agg_tat_analysis        β”‚        β”‚
  β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚  agg_channel_performance β”‚        β”‚
  β”‚   β”‚dim_decline   β”‚         β”‚  agg_city_performance    β”‚        β”‚
  β”‚   β”‚   reason     β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
  β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                               β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚  rpt.* views
             β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                    REPORTING LAYER                               β”‚
  β”‚                       rpt.* Views                               β”‚
  β”‚  β€’ Indexed views over gold aggregates                           β”‚
  β”‚  β€’ Row Level Security β€” branch level access control             β”‚
  β”‚  β€’ Power BI connects here only β€” never to gold directly         β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β–Ό
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚                        POWER BI                                  β”‚
  β”‚  β€’ Approval Funnel Dashboard                                    β”‚
  β”‚  β€’ Agent Scorecard                                              β”‚
  β”‚  β€’ TAT Analysis                                                 β”‚
  β”‚  β€’ Channel Performance                                          β”‚
  β”‚  β€’ City and Region Heatmap                                      β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  ╔══════════════════════════════════════════════════════════════════╗
  β•‘               AUDIT SCHEMA β€” DATA GOVERNANCE                    β•‘
  β•‘       Runs alongside every layer β€” tracks everything            β•‘
  β•‘                                                                 β•‘
  β•‘  pipeline_run_log β†’ dq_results β†’ watermark_control             β•‘
  β•‘  layer_reconciliation β†’ column_lineage β†’ agent_pseudonym        β•‘
  β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

πŸ“‚ Repository Structure

indus-loan-analytics/
β”‚
β”œβ”€β”€ sql/
β”‚   β”œβ”€β”€ 00_setup/                          ← Phase 1 Foundation
β”‚   β”‚   β”œβ”€β”€ 01_create_schemas.sql          ← 6 schemas
β”‚   β”‚   β”œβ”€β”€ 02_create_audit_tables.sql     ← 5 audit tables
β”‚   β”‚   β”œβ”€β”€ 03_seed_watermark.sql          ← 2 seed rows
β”‚   β”‚   └── 04_create_agent_pseudonym.sql  ← PII lookup table
β”‚   β”‚
β”‚   β”œβ”€β”€ 01_raw/                            ← Phase 2 Raw Layer
β”‚   β”‚   └── 01_create_raw_table.sql        ← 24 column raw table
β”‚   β”‚
β”‚   β”œβ”€β”€ 02_bronze/                         ← Phase 3-4 coming soon
β”‚   β”œβ”€β”€ 03_silver/                         ← Phase 5-7 coming soon
β”‚   β”œβ”€β”€ 04_gold/                           ← Phase 8-11 coming soon
β”‚   └── 05_reporting/                      ← Phase 12 coming soon
β”‚
β”œβ”€β”€ data/
β”‚   └── sample_100_rows.csv
β”‚
β”œβ”€β”€ docs/
β”œβ”€β”€ .gitignore
└── README.md

πŸ—„οΈ Database Design

Audit Schema β€” Data Governance Layer

Table Purpose Governance Pillar
pipeline_run_log Tracks every pipeline execution Auditability
dq_results Every DQ rule result β€” PASS or FAIL Data Quality
watermark_control Manages incremental load state Completeness
layer_reconciliation Proves zero row loss across layers Reconciliation
column_lineage Documents every column transformation Traceability
agent_pseudonym Real agent codes mapped to fake codes PII Protection

Gold Layer β€” Star Schema

Table Type Key Columns
fact_application Fact loan_amount Β· tat_hours Β· is_approved
fact_decline_bridge Bridge Fact app_id Β· decline_code Β· is_primary
dim_date Dimension date Β· month Β· quarter Β· year
dim_customer Dimension segment Β· city Β· city_tier
dim_agent SCD Type 2 agent_code Β· branch Β· effective_from Β· effective_to
dim_channel Dimension channel_code Β· channel_type
dim_product Dimension product_code Β· scheme Β· fee_code
dim_branch Dimension branch_code Β· city Β· region
dim_decline_reason Dimension decline_code Β· category

βš™οΈ Tech Stack

Tool Version Purpose
SQL Server 2019 Core database engine
T-SQL β€” DDL Β· Stored Procedures Β· DQ Rules
SSMS 19+ Database development
VS Code Latest Code editor Β· Git integration
Python 3.10+ Synthetic data generation
Power BI Desktop Reporting Β· Dashboards Β· RLS
Git Latest Version control
GitHub β€” Remote repository

πŸš€ Setup Guide

# Step 1 β€” Clone the repository
git clone https://github.com/dirumisra/indus-loan-analytics.git
cd indus-loan-analytics
# Step 2 β€” Run in SSMS in this exact order
1. sql/00_setup/01_create_schemas.sql
2. sql/00_setup/02_create_audit_tables.sql
3. sql/00_setup/03_seed_watermark.sql
4. sql/00_setup/04_create_agent_pseudonym.sql
5. sql/01_raw/01_create_raw_table.sql

πŸ“Š Source Data Profile

Attribute Value
Source IndusInd Bank Personal Loan Applications
Year 2016
Total Records 22,155 rows
Total Columns 24 columns
Decision Values 13 unique β€” FINISH Β· DECLINED Β· CBLR Β· REJ...
Sourcing Channels 16 unique β€” INH Β· DSA Β· BRANCH Β· PBA...
Products 7 unique β€” LAA701 Β· LAA702 Β· PLCIBIL...
Campaign Types 165 unique
Null Rate Up to 64% in some columns

πŸ“ˆ Phase Progress

Phase Description Status
1 Foundation β€” DB Β· schemas Β· audit tables Β· seeds βœ… Complete
2 Raw Layer β€” 24 column table Β· BULK INSERT βœ… Complete
3 Bronze β€” PII masking Β· watermark Β· TRY/CATCH ⏳ Pending
4 Bronze DQ β€” 5 rules Β· CRITICAL stop logic ⏳ Pending
5 Silver β€” 14 derived columns Β· CTE chain ⏳ Pending
6 Silver β€” decline code parser Β· STRING_SPLIT ⏳ Pending
7 Silver DQ β€” 6 rules Β· row reconciliation ⏳ Pending
8 Gold β€” 7 dimensions Β· MERGE statements ⏳ Pending
9 Gold β€” SCD Type 2 Β· dim_agent ⏳ Pending
10 Gold β€” fact_application Β· fact_decline_bridge ⏳ Pending
11 Gold β€” 5 aggregation tables Β· RANK Β· LAG Β· NTILE ⏳ Pending
12 Reporting β€” Power BI Β· RLS Β· rpt.* views ⏳ Pending
13 Master pipeline orchestrator ⏳ Pending
14 Testing · 5 test scripts · documentation ⏳ Pending

πŸ›οΈ Data Governance

Framework How We Implement It
DAMA-DMBOK Full audit trail Β· data quality Β· lineage
ISO 8000 Layer reconciliation β€” variance must always be zero
RBI Guidelines PII masking Β· audit retention Β· reproducible reports

πŸ‘€ Author

Dhiraj Kumar Data Engineering Portfolio Project

GitHub

About

Personal Loan Acquisition Analytics Platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages