Senior Data Scientist · Decision Systems · AI Evaluation · Experimentation
Most DS teams measure things. I build the systems that decide what to measure, whether to trust it, and what to do about it.
4+ years shipping ML and AI evaluation systems across credit risk, experimentation platforms, RAG pipelines, and KPI governance. 5 open-source PyPI libraries · 3 production-simulated ML platforms · 1,000+ member data community · Bengaluru, IN
|
🔵 Decision Systems ML systems for credit risk, feature governance, and business-critical decisions
|
🟣 AI Evaluation & Observability RAG evaluation, golden-set testing, version-safe AI deployment, LLM guardrails
|
🟢 Experimentation & Metrics A/B testing infrastructure, CUPED, SRM detection, KPI governance, metric decomposition
|
- ML decision systems — credit risk scoring with SHAP explainability, threshold optimization, calibration, drift monitoring, and FastAPI serving
- Experimentation infrastructure — SRM detection, CUPED variance reduction, mSPRT sequential testing, guardrail-first decisioning, and audit-ready readouts
- AI evaluation pipelines — RAG golden-set auditing, version-safe retrieval, wrong-version-answer-rate controls, and LLM quality gates
- Metric decomposition — splitting any metric movement into mix effects, rate effects, and cross terms — available on PyPI as
metriclens
riskframe_platform — End-to-end credit risk decisioning platform. XGBoost/LightGBM challenger system, Optuna HPO, SHAP explainability, PSI drift detection, fairness checks, FastAPI serving, and model card documentation.
metasignal_platform — Production-simulated experimentation intelligence platform. SRM detection, CUPED, mSPRT, A/A calibration, guardrail-first decisioning, KPI dictionary, SQL metric contracts, streaming observability.
devpulse_platform — Version-safe RAG + agentic migration orchestration. Hybrid retrieval, deterministic conflict detection, SAFE/RISKY/BLOCKED verdicts, wrong-version-answer-rate controls, evidence-backed summaries.
metriclens — pip install metriclens · DataFrame-native metric decomposition. Splits any metric movement into mix shift, rate shift, and cross term. JSON/Markdown/HTML outputs, quality gates.
goldensetauditor — pip install goldensetauditor · LLM/RAG evaluation dataset auditor. Catches conflicting labels, duplicate prompts, weak reference answers, and ambiguous questions before they corrupt your evals.
trialcheck — pip install trialcheck · A/B experiment readout auditor. Checks for SRM, peeking risk, practical significance, guardrail movement, and pre-period imbalance. Zero dependencies. PASS/WARN/FAIL reports.
💼 Currently: Open to Senior DS · Applied AI · Experimentation · Decision Science roles · Bengaluru