California county-level food insecurity analysis — OLS regression identifying poverty and physical food access as key predictors across 58 counties. CalFresh coverage gap. HC3 robust SEs. Python.
Which county-level factors predict food insecurity across California's 58 counties — and what survives when poverty is controlled?
This project uses cross-sectional county-level data from four public sources to identify the structural factors associated with food insecurity across California, with a focus on understanding what drives variation in CalFresh program need and uptake.
Key finding: Overall poverty rate is the dominant predictor of county food insecurity (r = 0.90, coefficient 1.45, p < 0.001). When poverty is explicitly controlled, most SNAP-related associations collapse — but physical food access barriers for SNAP recipients remain independently significant (p = 0.013), pointing to geographic distance to food retail as a driver beyond poverty itself.
📄 Read the full analysis: Medium article
| Finding | Result |
|---|---|
| Poverty vs food insecurity correlation | r = 0.90, p < 0.001 |
| Model fit (preferred M3P) | R² = 0.880, Adj-R² = 0.866 |
| Overall poverty rate (M3P) | +1.451 pp per SD (p < 0.001) |
| Low access, SNAP recipients (M3P) | +0.556 pp per SD (p = 0.013) |
| SNAP benefit per capita (M3P) | +0.611 pp per SD (p = 0.056) |
| SNAP eligibility rate without poverty (M3) | +1.117 pp per SD (p < 0.001) |
| SNAP eligibility rate with poverty (M3P) | +0.244 pp per SD (p = 0.490) |
| N counties | 58 |
ca-calfresh-coverage-gap/
│
├── README.md
├── requirements.txt
│
├── data/ # data folder (see instructions below)
│ └── README_data.md # exact download instructions
│
├── figures/ # generated by scripts (not committed)
│
├── calfresh_01_load_data.py # Step 1: load all sources, build panel
├── calfresh_02_eda.py # Step 2: exploratory data analysis
├── calfresh_03_feature_engineering.py # Step 3: transforms, VIF, standardize
├── calfresh_04_regression.py # Step 4: OLS regression M1-M3P + M5
└── calfresh_poverty_scatter.py # Supplemental: poverty vs FI scatter
URL: https://www.feedingamerica.org/research/map-the-meal-gap/by-county
Annual county-level food insecurity estimates for all US counties. Download the multi-year Excel file covering 2019–2023. Save to data/ folder.
Key variables used:
Overall Food Insecurity Rate— percentage of county population that is food insecureChild Food Insecurity Rate% FI <= SNAP Threshold— share of food-insecure people who meet the SNAP income eligibility criterionCost Per Meal
Note: MMG estimates are model-derived, not directly surveyed. Small county estimates carry more uncertainty.
URL: https://www.ers.usda.gov/data-products/food-environment-atlas/
Download the full Excel workbook. Two sheets are used:
ASSISTANCE— SNAP benefit per capita (2017 and 2022)ACCESS— food access variables by county
Important: Most SNAP participation columns in the Atlas are state-level constants — identical for all California counties. Only the four access variables and two benefit-per-capita variables have genuine county-level variation. The loading script checks this automatically.
Pulled automatically via the Census Bureau API when calfresh_01_load_data.py is run. Requires a free Census API key — sign up at https://api.census.gov/data/key_signup.html
Add your key to the script:
API_KEY = "your_key_here"Covers all California counties, 2019–2022 (5-year estimates), averaged to match the MMG panel period.
URL: https://data.ca.gov (search "DFA256")
Monthly county-level CalFresh enrollment, 2004–2017. Loaded and inspected but not used in the final analysis due to a date gap with the MMG panel (which starts 2019). Included in the loading script for completeness.
pip install -r requirements.txtrequirements.txt:
pandas>=2.0
numpy>=1.24
scipy>=1.10
matplotlib>=3.7
openpyxl>=3.1
requests>=2.28
No R. No statsmodels. No proprietary tools.
At the top of each script, update these two variables to match your local folder structure:
DATA = "/your/path/to/data/" # folder containing downloaded data files
FIG = "/your/path/to/figures/" # folder for output figuresScripts must be run in order — each depends on outputs from the previous.
Step 1 — calfresh_01_load_data.py
Reads all four data sources, pulls ACS B17001 overall poverty rate from the Census API, and saves clean CSVs.
Before running — add to data/ folder:
MMG2025_2019-2023_Data_To_Share.xlsx(Feeding America)2025-food-environment-atlas-data.xlsx(USDA)- Set
API_KEY = "your_key_here"in the script
Outputs to data/: county_avg_2019_2023.csv, mmg_ca_panel.csv, food_env_atlas_ca.csv, b17001_ca_avg.csv, b17001_ca_panel.csv, b22003_ca_panel.csv, dfa256_annual.csv
Step 2 — calfresh_02_eda.py
Exploratory data analysis — distributions, correlations, bivariate scatter plots.
Inputs: county_avg_2019_2023.csv, food_env_atlas_ca.csv
Outputs to figures/: 01_univariate.png, 02_bivariate.png, 03_correlation_matrix.png
Step 3 — calfresh_03_feature_engineering.py
Transforms skewed variables, checks VIF and drops multicollinear predictors, standardizes all features to z-scores, merges overall poverty rate from ACS B17001.
Inputs: county_avg_2019_2023.csv, food_env_atlas_ca.csv, b17001_ca_avg.csv
Outputs to data/: county_features.csv
Outputs to figures/: 04_log_transforms.png
Step 4 — calfresh_04_regression.py
Runs six OLS model specifications (M1 through M3P and M5), prints coefficient tables, generates coefficient stability and residual diagnostic figures.
Inputs: county_features.csv
Outputs to figures/: 05_coef_stability.png, 06_residuals_m3p.png
Supplemental — calfresh_poverty_scatter.py
Scatter plot of overall poverty rate vs food insecurity rate, labeled with notable counties.
Inputs: county_features.csv
Outputs to figures/: calfresh_poverty_vs_fi.png
Outcome: food_insecurity_rate (county average 2019–2023, %)
Unit: California county (N = 58)
Standard errors: HC3 heteroskedasticity-robust, implemented from scratch in NumPy. Standard OLS assumes constant residual variance across all observations — an assumption called homoskedasticity. MMG estimates for tiny rural counties carry far more uncertainty than estimates for large counties, because they are based on much thinner underlying data. HC3 corrects for this unequal variance without assuming any particular form for the heteroskedasticity.
Model specifications:
| Model | Specification | N | R² | Adj-R² |
|---|---|---|---|---|
| M1 | SNAP benefit per capita + cost per meal (baseline) | 58 | 0.705 | 0.695 |
| M2 | + Low access variables | 58 | 0.748 | 0.729 |
| M3 | + SNAP income eligibility rate | 58 | 0.817 | 0.800 |
| M3P | + Overall poverty rate (preferred) | 58 | 0.880 | 0.866 |
| M4 | + Benefit growth (full model) | 58 | 0.823 | 0.802 |
| M5 | Sensitivity: large counties only (N=39) | 39 | 0.835 | 0.810 |
Three transformations were tested for each right-skewed variable (log, sqrt, Box-Cox). Square root was chosen for pct_low_access_pop because log overcorrected (skew flipped from +2.20 to −1.31). Log was used for the other two access variables. Box-Cox was used as a diagnostic benchmark but not as the primary transform — it requires storing and reapplying a specific lambda parameter, unlike sqrt which is parameter-free.
All predictors checked for multicollinearity using Variance Inflation Factor before model building. Three variables dropped: access_x_eligible_z (VIF = 70.51), log_pct_low_access_lowincome_z (VIF = 43.45), pct_fi_snap_eligible_z (VIF = 12.34). VIF for all retained variables in M3P is below 5.5.
M3 shows strong positive associations for both SNAP benefit per capita (1.177, p < 0.001) and SNAP income eligibility rate (1.117, p < 0.001). These look counterintuitive — higher SNAP activity associated with higher food insecurity. The explanation is that both variables are driven by poverty: counties with more poverty have more food insecurity, receive more SNAP benefits, and have more residents meeting the income eligibility criterion — all for the same reason.
M3P tests this directly by adding overall poverty rate. Result: SNAP benefit coefficient drops to 0.611 (borderline significant) and eligibility rate collapses to 0.244 (non-significant). Both were largely poverty proxies. The one predictor that survives poverty control is low food access for SNAP recipients (0.556, p = 0.013) — geographic distance to food retail as an independent driver beyond poverty.
| Figure | File | What it shows |
|---|---|---|
| Univariate distributions | figures/01_univariate.png | All nine candidate variables, 58 counties |
| Bivariate scatter plots | figures/02_bivariate.png | Food insecurity vs key predictors |
| Correlation matrix | figures/03_correlation_matrix.png | Pairwise correlations, multicollinearity visible |
| Variable transforms | figures/04_log_transforms.png | Before/after skew for three access variables |
| Poverty vs FI | figures/calfresh_poverty_vs_fi.png | r = 0.90 scatter, 58 counties labeled |
| Coefficient stability | figures/05_coef_stability.png | M1 through M3P coefficient trajectories |
| Residual diagnostics | figures/06_residuals_m3p.png | Fitted vs residuals, Q-Q, actual vs fitted |
- Cross-sectional data cannot establish causation — this is a general limitation of observational data observed at a single point in time. Without temporal variation or a natural experiment, it is impossible to determine the direction of causality from an association alone.
- MMG food insecurity estimates are model-derived, not directly measured. Small county estimates carry more uncertainty.
- N = 58 is a small sample. Sequential model building and aggressive VIF pruning kept the model parsimonious.
- DFA256 administrative data ends in 2017, creating a gap with the 2019–2023 MMG panel.
- The 2022 SNAP benefit variable reflects COVID-19 Emergency Allotments still active in California — elevated above typical benefit levels.
- Spatial autocorrelation is not addressed. Neighboring Central Valley counties share unobserved traits. A spatial lag model or Moran's I test would be a natural next step.
Kachwala, N. (2026). California CalFresh Coverage Gap Analysis.
GitHub: https://github.com/nishreenk/ca-calfresh-coverage-gap
Data citations:
Feeding America (2025). Map the Meal Gap 2025. feedingamerica.org
USDA Economic Research Service (2025). Food Environment Atlas. ers.usda.gov
U.S. Census Bureau. American Community Survey 5-Year Estimates B17001, 2019-2022.
California CDSS. DFA256 CalFresh Administrative Data, 2004-2017.
Nishrin Kachwala GitHub: @nishreenk Medium: @nishrin-kachwala