-
Notifications
You must be signed in to change notification settings - Fork 2
Add student loan balance imputation #239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
MaxGhenis
wants to merge
7
commits into
main
Choose a base branch
from
add-student-loan-balance-imputation
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds impute_student_loan_balance() function that: - Estimates balance based on plan type and years since graduation - Plan 1: £15k base with 3% annual decay - Plan 2: £45k base with 2% annual decay - Plan 5: £25k (new loans) - Scales totals to match SLC admin statistics (£294bn) Also adds load_was_student_loan_data() helper for extracting SLC debt from WAS Round 7 (Tot_LosR7_aggr - Tot_los_exc_SLCR7_aggr). Closes #238 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Changes: - Replace crude scaling approach with QRF model trained on WAS data - Add generate_was_student_loan_table() to prepare training data - Add save_student_loan_model() and create_student_loan_model() helpers - Impute household-level SLC debt, then allocate to individuals with loans - Calibration to admin totals will happen in main calibration step - Update tests to reflect new allocation-based approach The QRF approach is consistent with other imputations (wealth, consumption) and allows proper calibration rather than crude scaling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Adds calibration targets for student loans to the loss function: - Total outstanding balance (£294bn in 2025, from SLC) - Total annual repayments (£5.6bn in 2025, from DfE/OBR) - Number of borrowers with balance (~9.4m) - Number of people making repayments (~3.5m) These targets will be used during calibration to adjust weights to match admin statistics from SLC, DfE, and OBR. Sources: - SLC: gov.uk/government/statistics/student-loans-in-england-2024-to-2025 - DfE forecasts: gov.uk/government/statistics/student-loan-forecasts-for-england - OBR: obr.uk/forecasts-in-depth/tax-by-tax-spend-by-spend/student-loans/ Closes #237 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
WAS severely undercounts student loan debt (£33bn weighted vs £294bn admin), making the QRF approach unreliable. Instead: - Assign balances based on plan type using SLC admin averages - Plan 1: £10k base with 2% annual decay - Plan 2: £45k base with 1% annual decay - Plan 4: £13k base with 2% annual decay - Plan 5: £15k (new loans) Note: FRS only captures ~3.75m repayers vs 9.4m borrowers in admin data. Calibration targets in loss.py will adjust weights to match admin totals. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Allows sampling from different parts of the conditional distribution. Useful when source data undercounts and you want to sample from upper tail. Tested for student loan balance but WAS undercount is too severe (£15bn max vs £294bn target) - even q=0.99 can't compensate for missing observations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The QRF model was predicting very different student loan debt rates between WAS and FRS despite similar income/household composition. Adding HRP age band as a predictor dramatically improves results: Before (without age): FRS at q=0.99 predicted £14.5bn After (with age band): FRS at q=0.5 predicts £30.3bn (close to WAS £33.4bn) Changes: - Add hrp_age_band to STUDENT_LOAN_PREDICTORS - Add age_to_band() function to convert ages to WAS-style bands (2-8) - Add get_frs_predictors() to extract household-level predictors from FRS - Include age band in WAS data extraction 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Enhance student loan balance imputation with additional predictors: 1. Add tenure_type: Strong predictor - mortgaged owners (8.3%) vs outright owners (1.3%) have very different debt rates 2. Add hrp_employed: Employment status distinguishes employed (7.3%) from retired (0.5%) households 3. Use FRS reported repayments to identify loan holders: FRS captures ~4.35m repayers vs admin ~3.8m, providing good coverage Results improved significantly: - WAS at q=0.5: £29.2bn (actual: £33.4bn) - FRS at q=0.5: £28.3bn (much closer alignment with WAS) Also adds tests for age_to_band() and tenure mappings. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
impute_student_loan_balance()function that estimates outstanding student loan balances based on plan type and years since graduationload_was_student_loan_data()helper for extracting SLC debt from WAS Round 7 dataImplementation Details
Balance estimates by plan type:
Totals are scaled to match SLC admin statistics (£294bn as of March 2025).
Test plan
Closes #238
🤖 Generated with Claude Code