RiskFrame v2.3

Production-simulated credit decision-ops platform — champion/challenger governance, Optuna HPO, Platt calibration, PSI drift alerting, fairness audit, FastAPI serving.

Architecture

Sample Output

The Problem

Credit default models fail silently. PR AUC looks fine in training, the model is miscalibrated at the cutoff that matters, and nobody checks disparate impact until a regulator does.

A model with PR AUC 0.26 can produce systematically wrong approval rates if the probability outputs are not calibrated against the decision thresholds. Drift compounds this: a 14-day population shift in EXT_SOURCE_2 of −0.18 and a credit income ratio increase of ×1.22 pushes PSI to 0.2358 — the alert threshold — while the model keeps scoring without warning. Standard notebook workflows catch none of this. RiskFrame is built to catch all of it.

Champion / Challenger Framework

RiskFrame runs XGBoost and LightGBM head-to-head with a 5-gate promotion framework. Neither model advances without clearing every gate on held-out test data.

Champion — XGBoost v1

python -m src.training.train \
  --data_dir data/home-credit-default-risk \
  --artifact_dir artifacts/xgb_v1 \
  --config configs/training_config.json

Pipeline: 7-table load → bureau/prev_app/installments aggregation to SK_ID_CURR grain → 183-feature ABTBuilder → stratified 60/20/20 split → ColumnTransformer fit on train-only → RandomizedSearchCV on XGBClassifier (20 iter, 3-fold CV) → CalibratedClassifierCV(method='sigmoid', cv='prefit') fit on val set → sklearn.Pipeline serialized to model.joblib.

Metric	Value
PR AUC	0.2611
ROC AUC	0.7663
ECE	0.0046
Gate decision	DEPLOYED

Challenger — LightGBM v1

python -m src.training.challenger.train_challenger

Same split, separate ColumnTransformer, RandomizedSearchCV on LightGBM, Platt calibration. Evaluated head-to-head across 8 metrics.

Metric	XGBoost Champion	LightGBM Challenger
PR AUC	0.2611	0.2609
ROC AUC	0.7663	0.7649
ECE	0.0046	higher
Gate 1 (PR AUC delta ≥ 0.001)	—	FAIL → HOLD

DeLong AUC z-stat: ~0.08, p ~0.07 — not statistically significant. Performance equivalent, champion retained.

Model Scorecard

Optuna HPO Design

python -m src.training.optuna_hpo --n_trials 50 --seed 42

50-trial TPE Bayesian search over 9 XGBoost hyperparameters. xgb_v2 achieves PR AUC 0.2654 — a better discriminator — but ECE regresses from 0.0046 to 0.0243. Platt sigmoid calibration becomes numerically unstable when the internal XGB probability distribution shifts, making xgb_v2 a worse policy instrument despite stronger AUC.

	xgb_v1 Champion	xgb_v2 Optuna
PR AUC	0.2611	0.2654
ROC AUC	0.7663	0.7692
ECE	0.0046	0.0243
Review rate collapse	—	−14.4pp
Gate decision	DEPLOYED	HOLD

Platt Calibration Rationale

The policy engine converts probability scores to decisions at thresholds. An uncalibrated model with excellent AUC can still produce the wrong APPROVE/REVIEW/REJECT split if probability outputs are systematically off.

CalibratedClassifierCV(method='sigmoid', cv='prefit') is fit on the val set after RandomizedSearchCV completes — never on training data. ECE 0.0046 confirms the champion is well-calibrated at the decision cutoffs. This is why xgb_v2 (ECE 0.0243) is held: better discrimination is not worth calibration regression when the policy engine depends on the probability.

PSI Drift Alerting

The drift monitor computes Population Stability Index (PSI) and per-feature KS statistics across 183 features at each batch run.

Day	PSI	Status
1–3	~0.03	Nominal
7	0.158	WARN
14	0.2358	ALERT

Day 14 drift is synthetic: EXT_SOURCE_2 shifted −0.18, credit_income_ratio scaled ×1.22. The drift monitor fires correctly. drift_fire_test.py asserts PSI > 0.20 on this population — it is part of the 22/22 test suite.

Fairness Audit

fairness_report.json applies two threshold-gated metrics by gender:

Metric	Value	Threshold	Status
Disparate Impact (F/M approval rate ratio)	1.059	0.80–1.25	No violation
Equal Opportunity gap (TPR parity)	~2.8pp	< 5pp	No violation

CODE_GENDER is excluded from adverse action reason codes (ECOA-compliant). SHAP rank of CODE_GENDER_F is #10 (mean |SHAP| = 0.0848 vs EXT_SOURCE_2 = 0.3470).

Decision Policy

score < 0.06            →  APPROVE
0.06 ≤ score < 0.28     →  REVIEW   (routed to human queue)
score ≥ 0.28            →  REJECT

Policy is separately versioned from the model. v1.0 → v1.1 on Day 12 tightened capacity from 30% to 15% — recorded in policy_change_log.json with rationale and authorized_by.

FastAPI Serving

# Score an applicant
curl -X POST http://localhost:8000/score \
  -H "Content-Type: application/json" \
  -d '{
    "SK_ID_CURR": 123456,
    "EXT_SOURCE_2": 0.6,
    "AMT_CREDIT": 200000,
    "AMT_INCOME_TOTAL": 100000,
    "DAYS_BIRTH": -12000,
    "DAYS_EMPLOYED": -2000,
    "CODE_GENDER": "M",
    "NAME_INCOME_TYPE": "Working"
  }'

Endpoints: /score /explain (SHAP) /batch /drift /policy /registry

Training-serving parity: the sklearn.Pipeline is the same object loaded by batch_scorer.py and serving/app.py. parity_check.py asserts batch scorer == API scorer within 1e-6.

docker-compose up -d
curl http://localhost:8000/health

30-Day Operational Lifecycle

python seed_demo.py        # Generate all artifacts
python show_demo_report.py # Terminal evidence summary

Day	Event
1–3	Clean batch runs, nominal PSI ~0.03
4	Malformed batch: 47 null SK_ID_CURR + 12 DAYS_EMPLOYED=+5200 → `rejected_rows.csv`
7	Population shift, PSI 0.158 WARN
10	LightGBM challenger registered, shadow scoring begins
12	Policy v1.0→v1.1: capacity 30%→15%, thresholds tightened
14	Synthetic drift injection → PSI 0.2358 ALERT
15	Fairness report generated
21	200 synthetic review outcomes logged with override reasons
25	Challenger comparison: 8-metric head-to-head + 5 promotion gates
30	Delayed label validation: bad-rate-by-bucket vs. predicted

Tests

python tests/pipeline_integrity_test.py   # 10 integrity tests
python tests/golden_scenario_suite.py     # 12 deterministic golden scenarios
python tests/drift_fire_test.py           # PSI > 0.20 on synthetic drift
python tests/parity_check.py             # Training-serving parity

All green: 22/22 tests passing

Quick Start

pip install -r requirements.txt
python -m src.training.train
python seed_demo.py
uvicorn serving.app:app --host 0.0.0.0 --port 8000
open dashboard.html

Key Evidence Artifacts

Artifact	Proves
`calibration_report.json`	Brier, ECE, MCE — calibration measured
`drift_report.json`	PSI > 0.20 alert fires on drifted population
`challenger_comparison_report.json`	8-metric head-to-head
`challenger_promotion_decision.json`	All 5 gates documented
`fairness_report.json`	Disparate impact + SHAP rank
`policy_change_log.json`	Day 12 threshold change with rationale
`optuna_hpo_results.json`	50-trial search, ECE regression documented
`batch_scoring_runs.csv`	30-run operational history

Honest Positioning

What this is: Solo-built, non-production, production-simulated credit decisioning platform on public Home Credit data.

What is real: Feature pipeline, model training, calibration evaluation, policy engine, batch/online scoring, drift computation, policy logs, review logs, fairness computation, delayed label check, Docker serving.

What is simulated: Applicant records (public dataset), operational lifecycle events (scripted), human review decisions (synthetic, labeled as such).

What is not claimed: Production deployment, regulatory approval, MRM validation, real customer data.

Documentation

Document	Location
PRD v2.3	`docs/prd/RiskFrame_PRD_v2.3.pdf`
Interview Defense	`docs/defense/RiskFrame_Interview_Defense_v2.pdf`
Model Card	`MODEL_CARD.md`
API curl proof	`docs/api_curl_proof.md`
Docker run proof	`docs/docker_run_proof.md`

Interview Defense

Full design rationale, architecture decisions, and expected interview questions with answers:

docs/defense/RiskFrame_Interview_Defense_v2.pdf

Covers: champion/challenger framework, Optuna HPO ECE regression, Platt calibration rationale, PSI drift alerting, fairness audit methodology, 5-gate promotion framework, and production failure modes.

Part of Applied LLM Systems Portfolio

This project is part of a portfolio targeting Applied LLM Systems Engineer roles.

NexusSupply — Supplier Risk Intelligence Platform (LangGraph + FinBERT + XGBoost + Instructor + NetworkX)
LendFlow — AI-powered loan underwriting pipeline (LangGraph + RAG + FOIR rules engine)
AgentReliabilityLab — Cyber threat triage agent (LangGraph + hybrid RAG + HITL + RAGAS eval)
RiskFrame Platform — ML model lifecycle (XGBoost + LightGBM champion/challenger, Optuna HPO, drift monitoring)
DevPulse Platform — Version-safe RAG migration intelligence (LLM-Last principle, conflict detection)
PulseRank Platform — Marketplace ranking with IPS debiasing (position bias correction, delayed attribution)
MetaSignal Platform — Experimentation intelligence (CUPED + guardrail-first + A/A calibration)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
artifacts		artifacts
configs		configs
docs		docs
serving		serving
src		src
tests		tests
.gitignore		.gitignore
CODEBASE_SUMMARY.md		CODEBASE_SUMMARY.md
DEPLOYMENT_CHECKLIST.md		DEPLOYMENT_CHECKLIST.md
Dockerfile		Dockerfile
FILES_INDEX.md		FILES_INDEX.md
MODEL_CARD.md		MODEL_CARD.md
PROJECT_COMPLETION_SUMMARY.txt		PROJECT_COMPLETION_SUMMARY.txt
README.md		README.md
RiskFrame_Interview_Defense_v2.pdf		RiskFrame_Interview_Defense_v2.pdf
dashboard.html		dashboard.html
docker-compose.yml		docker-compose.yml
github_push.sh		github_push.sh
requirements.txt		requirements.txt
seed_demo.py		seed_demo.py
show_demo_report.py		show_demo_report.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RiskFrame v2.3

Architecture

Sample Output

The Problem

Champion / Challenger Framework

Champion — XGBoost v1

Challenger — LightGBM v1

Model Scorecard

Optuna HPO Design

Platt Calibration Rationale

PSI Drift Alerting

Fairness Audit

Decision Policy

FastAPI Serving

30-Day Operational Lifecycle

Tests

Quick Start

Key Evidence Artifacts

Honest Positioning

Documentation

Interview Defense

Part of Applied LLM Systems Portfolio

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RiskFrame v2.3

Architecture

Sample Output

The Problem

Champion / Challenger Framework

Champion — XGBoost v1

Challenger — LightGBM v1

Model Scorecard

Optuna HPO Design

Platt Calibration Rationale

PSI Drift Alerting

Fairness Audit

Decision Policy

FastAPI Serving

30-Day Operational Lifecycle

Tests

Quick Start

Key Evidence Artifacts

Honest Positioning

Documentation

Interview Defense

Part of Applied LLM Systems Portfolio

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages