Skip to content

klilan/acs-city-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ACS City Typology Clustering

EN.685.640 Mathematical Reasoning and Structure for Data Science — Final Project K'lila Nooning | Spring 2026

Release: v1.0.0 — ACS 2024 1-Year Estimates, baseline city typology (K=8 GMM)

Problem Statement

Can U.S. cities be meaningfully grouped into governance-relevant typologies using ACS socioeconomic indicators, producing interpretable profiles for municipal decision-making?

Dataset

ACS 1-Year Estimates (2024) for all U.S. census places with population ≥ 65,000 — 546 cities across all 50 states, D.C., and Puerto Rico. 12 features across four policy domains: Housing, Household Finance, Economic Health, and Education. Each feature is accompanied by its Margin of Error (MOE) for stability analysis.

Approach (Track C: Clustering / Unsupervised Learning)

  1. Preprocessing — z-score standardize features, assess multicollinearity, apply PCA if warranted
  2. Clustering — K-means (baseline) and GMM via EM (primary)
  3. Model selection — silhouette score, BIC (GMM), elbow method
  4. Stability analysis — MOE-bootstrap (1,000 resamples per city)

Repository Structure

acs-city-clustering/
├── data/
│   ├── raw/          # ACS downloads (not committed — run fetch_acs.py first)
│   └── processed/    # cleaned, standardized feature matrix
├── src/
│   ├── data/
│   │   ├── fetch_acs.py      # Census API pull + population filter
│   │   └── preprocess.py     # cleaning, rate engineering, standardization, PCA
│   ├── models/
│   │   ├── kmeans.py
│   │   └── gmm.py
│   ├── evaluation/
│   │   ├── metrics.py        # silhouette, BIC, elbow
│   │   └── stability.py      # MOE bootstrap
│   └── utils/
│       └── plotting.py
├── results/
│   ├── figures/
│   └── tables/
├── reproduce_results.py      # end-to-end entry point
├── requirements.txt
└── README.md

Quickstart

# 1. Install dependencies
pip install -r requirements.txt

# 2. Add your Census API key to .env (copy from .env.example)
#    Get a free key at: https://api.census.gov/data/key_signup.html
cp .env.example .env
# then edit .env and set CENSUS_API_KEY=your_key_here

# 3. Fetch ACS data
python src/data/fetch_acs.py

# 4. Reproduce all results and figures
python reproduce_results.py

Reproducibility

  • Random seed: 42 (set globally in reproduce_results.py)
  • Python version: 3.11+
  • All figures/tables in the report are generated by reproduce_results.py

Evaluation Metrics

Metric Purpose
Silhouette score Cohesion vs. separation
BIC GMM model selection over K
Elbow (WCSS) K-means K selection
MOE-bootstrap stability rate Assignment stability under survey noise

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages