ACS City Typology Clustering

EN.685.640 Mathematical Reasoning and Structure for Data Science — Final Project K'lila Nooning | Spring 2026

Release: v1.0.0 — ACS 2024 1-Year Estimates, baseline city typology (K=8 GMM)

Problem Statement

Can U.S. cities be meaningfully grouped into governance-relevant typologies using ACS socioeconomic indicators, producing interpretable profiles for municipal decision-making?

Dataset

ACS 1-Year Estimates (2024) for all U.S. census places with population ≥ 65,000 — 546 cities across all 50 states, D.C., and Puerto Rico. 12 features across four policy domains: Housing, Household Finance, Economic Health, and Education. Each feature is accompanied by its Margin of Error (MOE) for stability analysis.

Approach (Track C: Clustering / Unsupervised Learning)

Preprocessing — z-score standardize features, assess multicollinearity, apply PCA if warranted
Clustering — K-means (baseline) and GMM via EM (primary)
Model selection — silhouette score, BIC (GMM), elbow method
Stability analysis — MOE-bootstrap (1,000 resamples per city)

Repository Structure

acs-city-clustering/
├── data/
│   ├── raw/          # ACS downloads (not committed — run fetch_acs.py first)
│   └── processed/    # cleaned, standardized feature matrix
├── src/
│   ├── data/
│   │   ├── fetch_acs.py      # Census API pull + population filter
│   │   └── preprocess.py     # cleaning, rate engineering, standardization, PCA
│   ├── models/
│   │   ├── kmeans.py
│   │   └── gmm.py
│   ├── evaluation/
│   │   ├── metrics.py        # silhouette, BIC, elbow
│   │   └── stability.py      # MOE bootstrap
│   └── utils/
│       └── plotting.py
├── results/
│   ├── figures/
│   └── tables/
├── reproduce_results.py      # end-to-end entry point
├── requirements.txt
└── README.md

Quickstart

# 1. Install dependencies
pip install -r requirements.txt

# 2. Add your Census API key to .env (copy from .env.example)
#    Get a free key at: https://api.census.gov/data/key_signup.html
cp .env.example .env
# then edit .env and set CENSUS_API_KEY=your_key_here

# 3. Fetch ACS data
python src/data/fetch_acs.py

# 4. Reproduce all results and figures
python reproduce_results.py

Reproducibility

Random seed: 42 (set globally in reproduce_results.py)
Python version: 3.11+
All figures/tables in the report are generated by reproduce_results.py

Evaluation Metrics

Metric	Purpose
Silhouette score	Cohesion vs. separation
BIC	GMM model selection over K
Elbow (WCSS)	K-means K selection
MOE-bootstrap stability rate	Assignment stability under survey noise

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACS City Typology Clustering

Problem Statement

Dataset

Approach (Track C: Clustering / Unsupervised Learning)

Repository Structure

Quickstart

Reproducibility

Evaluation Metrics

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data/processed		data/processed
results		results
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
reproduce_results.py		reproduce_results.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ACS City Typology Clustering

Problem Statement

Dataset

Approach (Track C: Clustering / Unsupervised Learning)

Repository Structure

Quickstart

Reproducibility

Evaluation Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages