Skip to content

feat(models): DoublyRobustModel for sparse benchmark correction (#40)#48

Open
pranilraichura wants to merge 6 commits into
aims-foundations:mainfrom
pranilraichura:feat/doubly-robust-model
Open

feat(models): DoublyRobustModel for sparse benchmark correction (#40)#48
pranilraichura wants to merge 6 commits into
aims-foundations:mainfrom
pranilraichura:feat/doubly-robust-model

Conversation

@pranilraichura

@pranilraichura pranilraichura commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements DoublyRobustModel for issue #40. Wraps a fitted base IRT model and learns an additive correction layer trained with IPW-weighted loss to correct for MNAR bias in sparse benchmark matrices.

Background: in the Fantastic Bugs setting, not every LLM is evaluated on every benchmark task. When missingness is informative (frontier models skip easy benchmarks, cheap models skip expensive ones), naive IRT fits are biased. The DR model corrects for this at training time.

Design

final_prediction(i, j) = clamp( base_model(i, j) + correction(i, j) )
  • DoublyRobustModel(base_model) freezes the base model's parameters
  • Adds correction_ability and correction_difficulty (residual Rasch layer, initialized to zero so predictions start identical to base)
  • fit() estimates propensity scores via logistic regression on the observation pattern, then trains the correction via mle_fit with IPW-weighted Bernoulli loss
  • predict() returns clamp(base_pred + sigmoid(alpha_i - beta_j) - 0.5)

Files

  • src/torch_measure/models/doubly_robust.py
  • src/torch_measure/models/__init__.py
  • tests/test_models/test_doubly_robust.py (11 tests)
  • tutorials/doubly_robust_sparse_benchmarks.ipynb

Questions for Sang

  • Currently only method='mle' is supported in fit() -- happy to wire up others
  • The IPW weighting is Horvitz-Thompson; full AIPW would need the outcome model term in the loss too -- worth discussing if that's in scope
  • Went with two-stage (fit base, freeze, fit correction) -- let me know if you had joint end-to-end training in mind instead

Test plan

  • pytest tests/test_models/test_doubly_robust.py -v
  • Verify DoublyRobustModel importable from torch_measure.models
  • Run tutorial notebook end-to-end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant