EN.685.640 Mathematical Reasoning and Structure for Data Science — Final Project K'lila Nooning | Spring 2026
Release: v1.0.0 — ACS 2024 1-Year Estimates, baseline city typology (K=8 GMM)
Can U.S. cities be meaningfully grouped into governance-relevant typologies using ACS socioeconomic indicators, producing interpretable profiles for municipal decision-making?
ACS 1-Year Estimates (2024) for all U.S. census places with population ≥ 65,000 — 546 cities across all 50 states, D.C., and Puerto Rico. 12 features across four policy domains: Housing, Household Finance, Economic Health, and Education. Each feature is accompanied by its Margin of Error (MOE) for stability analysis.
- Preprocessing — z-score standardize features, assess multicollinearity, apply PCA if warranted
- Clustering — K-means (baseline) and GMM via EM (primary)
- Model selection — silhouette score, BIC (GMM), elbow method
- Stability analysis — MOE-bootstrap (1,000 resamples per city)
acs-city-clustering/
├── data/
│ ├── raw/ # ACS downloads (not committed — run fetch_acs.py first)
│ └── processed/ # cleaned, standardized feature matrix
├── src/
│ ├── data/
│ │ ├── fetch_acs.py # Census API pull + population filter
│ │ └── preprocess.py # cleaning, rate engineering, standardization, PCA
│ ├── models/
│ │ ├── kmeans.py
│ │ └── gmm.py
│ ├── evaluation/
│ │ ├── metrics.py # silhouette, BIC, elbow
│ │ └── stability.py # MOE bootstrap
│ └── utils/
│ └── plotting.py
├── results/
│ ├── figures/
│ └── tables/
├── reproduce_results.py # end-to-end entry point
├── requirements.txt
└── README.md
# 1. Install dependencies
pip install -r requirements.txt
# 2. Add your Census API key to .env (copy from .env.example)
# Get a free key at: https://api.census.gov/data/key_signup.html
cp .env.example .env
# then edit .env and set CENSUS_API_KEY=your_key_here
# 3. Fetch ACS data
python src/data/fetch_acs.py
# 4. Reproduce all results and figures
python reproduce_results.py- Random seed:
42(set globally inreproduce_results.py) - Python version: 3.11+
- All figures/tables in the report are generated by
reproduce_results.py
| Metric | Purpose |
|---|---|
| Silhouette score | Cohesion vs. separation |
| BIC | GMM model selection over K |
| Elbow (WCSS) | K-means K selection |
| MOE-bootstrap stability rate | Assignment stability under survey noise |