Skip to content

Local dev DB with realistic sample data#143

Merged
bpstewar merged 2 commits into
mainfrom
local-dev-db
Oct 31, 2025
Merged

Local dev DB with realistic sample data#143
bpstewar merged 2 commits into
mainfrom
local-dev-db

Conversation

@Gabe-Levin
Copy link
Copy Markdown
Collaborator

@Gabe-Levin Gabe-Levin commented Oct 24, 2025

What changed

  • New dockerized postgres database for local development using realistic S2S for Kenya and Uganda.
  • Three steps to get the local database running:
    1. Create the db.env files within dev-db directory
    2. Download the parquet files from s3
    3. Run 'make up' in the terminal (from the dev-db directory)

@Gabe-Levin Gabe-Levin temporarily deployed to Space2Stats API Dev October 24, 2025 08:31 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Oct 24, 2025

PR Deployment Details:
🚀 PR deployed to https://aaf7unsja2.execute-api.us-east-1.amazonaws.com/

@Gabe-Levin Gabe-Levin changed the title WIP: local dev DB with realistic sample data Local dev DB with realistic sample data Oct 30, 2025
@Gabe-Levin Gabe-Levin temporarily deployed to Space2Stats API Dev October 30, 2025 11:25 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR sets up a local PostgreSQL development database environment for Space2Stats. It provides Docker-based infrastructure with sample data loading capabilities, enabling developers to work with representative datasets locally.

  • Adds Docker-based PostgreSQL setup with PostGIS and H3 extensions
  • Implements Python seeding script to load sample parquet data into the database
  • Includes SQL initialization scripts for schema creation
  • Provides Makefile commands for database management and a Jupyter notebook for generating sample data

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
dev-db/seed_sample_data.py Python script to load parquet sample data into PostgreSQL with chunked insertion and conflict handling
dev-db/init-scripts/01-init.sql SQL schema initialization for space2stats and climate tables with indexes
dev-db/docker-compose.yaml Docker Compose configuration for PostgreSQL with pgAdmin and optimized settings
dev-db/create_sample_parquets.ipynb Jupyter notebook for fetching and preparing sample datasets from Kenya and Uganda
dev-db/README.md Documentation for setup, usage, and troubleshooting of the development database
dev-db/Makefile Build automation for database lifecycle management (up, down, reset, seed, clean)
dev-db/Dockerfile Custom PostgreSQL image with PostGIS and H3 extensions
Comments suppressed due to low confidence (1)

dev-db/README.md:62

  • Extra closing code fence that doesn't match an opening fence. This should be removed as the file already has proper closing fences for the code block.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +67 to +69
def build_insert_query(
table_name: str, columns: Sequence[str], conflict_columns: Sequence[str]
) -> str:
Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table_name and column names are directly interpolated into the SQL query string without sanitization, creating a potential SQL injection vulnerability. Consider using psycopg's SQL composition tools (psycopg.sql.SQL, psycopg.sql.Identifier) to safely construct queries with dynamic identifiers.

Copilot uses AI. Check for mistakes.
- ./.pgdata:/var/lib/postgresql/data
- ./init-scripts:/docker-entrypoint-initdb.d
- ./space2stats_sample_cs.parquet:/docker-entrypoint-initdb.d/data/space2stats_sample_cs.parquet

Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is trailing whitespace on this line that should be removed for cleaner code.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines +110 to +112



Copy link

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple consecutive blank lines at the end of the file should be reduced to a single blank line for consistency.

Suggested change

Copilot uses AI. Check for mistakes.
@bpstewar bpstewar merged commit fdd174a into main Oct 31, 2025
12 checks passed
@bpstewar bpstewar had a problem deploying to Space2Stats API Dev October 31, 2025 12:23 — with GitHub Actions Failure
@bpstewar bpstewar temporarily deployed to Space2Stats API Dev October 31, 2025 12:27 — with GitHub Actions Inactive
@Gabe-Levin Gabe-Levin deleted the local-dev-db branch November 21, 2025 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants