NYC 311 Data Engineering and Analytics Pipeline

Overview

This project implements an end-to-end ELT pipeline for NYC 311 service request data. The pipeline ingests raw data from the NYC Open Data API, transforms and enriches it using dbt, and produces BI-ready aggregates for analysis of complaints trends, agency performance, and geospatial distribution.

Architecture & Workflow

Pipeline

NYC 311 API → Polars → PostgreSQL (raw schema) → dbt staging → dbt intermediate → dbt marts → BI layer

Warehouse Architecture

Raw Layer (raw) - stores raw API data without modifications
Staging Layer (staging) - cleans and normalizes columns, enforces types, derives basic fields
Intermediate Layer (intermediate) - enriches data with derived metrics and flags (e.g., resolution_hours, is_instant_close)
Marts Layer (marts) - Star schema fact and dim tables, plus pre-aggregated tables for performance

Key Features

Ingestion Layer

Incremental Data Ingestion - batched ingestion, query decomposition, metadata watermarking to maintain state between batches
Reliable Data Loading - bulk inserts via temporary tables + COPY, with ON CONFLICT DO NOTHING to prevent duplicates.

Transformation Layer

Data Quality Checks - automated dbt tests to ensure data quality, such as not_null, unique, and accepted_values.
Derived Metrics & Flags - includes resolution_hours, is_instant_close (auto-closure indicator), and other enriched fields.
Performance Optimization - pre-aggregated tables for geospatial heatmaps, performance by agency, and daily trends.
Schema Organization - medallion architecture; layered schemas (staging, intermediate, marts) maintain ground truth and provide clarity.

Portability

Dockerized Environment - dbt and PostgreSQL fully containerized for reproducibility and easy deployment.

Technologies

Python, Polars (ingestion and data manipulation)
PostgreSQL (data warehouse)
dbt (transformations, modeling, testing)
Docker (containerized environment)
NYC Open Data API (data source)

Next Steps

Connect a BI tool for dashboards (geo heatmaps, trends, agency/borough comparisons)
Connect with multiple data sources (weather, other NYC APIs) to enrich data further and increase analytics capability

Usage

Configure environment variables in .env for Postgres connection.
Build and run containers via Docker Compose:

docker compose up -d

Run dbt tests and transformations

docker compose run --rm dbt dbt build
docker compose run --rm dbt dbt test

Use psql to explore and analyze data

docker exec -it nyc311_postgres psql -U $POSTGRES_USER -d $POSTGRES_DB

Summary

Ingestion and raw layer fully implemented and verified
Staging, intermediate, and marts models implemented with dbt tests passing
Incremental ingestion, metadata watermarking, and batch performance optimized
Still need to work on BI layer (dashboards + visualizations), which will involve tweaking the mart models to fit analysis needs

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
dbt/nyc311_dbt		dbt/nyc311_dbt
docs		docs
ingestion		ingestion
sql		sql
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NYC 311 Data Engineering and Analytics Pipeline

Overview

Architecture & Workflow

Pipeline

Warehouse Architecture

Key Features

Ingestion Layer

Transformation Layer

Portability

Technologies

Next Steps

Usage

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NYC 311 Data Engineering and Analytics Pipeline

Overview

Architecture & Workflow

Pipeline

Warehouse Architecture

Key Features

Ingestion Layer

Transformation Layer

Portability

Technologies

Next Steps

Usage

Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages