Local dev DB with realistic sample data#143
Conversation
|
PR Deployment Details: |
There was a problem hiding this comment.
Pull Request Overview
This PR sets up a local PostgreSQL development database environment for Space2Stats. It provides Docker-based infrastructure with sample data loading capabilities, enabling developers to work with representative datasets locally.
- Adds Docker-based PostgreSQL setup with PostGIS and H3 extensions
- Implements Python seeding script to load sample parquet data into the database
- Includes SQL initialization scripts for schema creation
- Provides Makefile commands for database management and a Jupyter notebook for generating sample data
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| dev-db/seed_sample_data.py | Python script to load parquet sample data into PostgreSQL with chunked insertion and conflict handling |
| dev-db/init-scripts/01-init.sql | SQL schema initialization for space2stats and climate tables with indexes |
| dev-db/docker-compose.yaml | Docker Compose configuration for PostgreSQL with pgAdmin and optimized settings |
| dev-db/create_sample_parquets.ipynb | Jupyter notebook for fetching and preparing sample datasets from Kenya and Uganda |
| dev-db/README.md | Documentation for setup, usage, and troubleshooting of the development database |
| dev-db/Makefile | Build automation for database lifecycle management (up, down, reset, seed, clean) |
| dev-db/Dockerfile | Custom PostgreSQL image with PostGIS and H3 extensions |
Comments suppressed due to low confidence (1)
dev-db/README.md:62
- Extra closing code fence that doesn't match an opening fence. This should be removed as the file already has proper closing fences for the code block.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def build_insert_query( | ||
| table_name: str, columns: Sequence[str], conflict_columns: Sequence[str] | ||
| ) -> str: |
There was a problem hiding this comment.
The table_name and column names are directly interpolated into the SQL query string without sanitization, creating a potential SQL injection vulnerability. Consider using psycopg's SQL composition tools (psycopg.sql.SQL, psycopg.sql.Identifier) to safely construct queries with dynamic identifiers.
| - ./.pgdata:/var/lib/postgresql/data | ||
| - ./init-scripts:/docker-entrypoint-initdb.d | ||
| - ./space2stats_sample_cs.parquet:/docker-entrypoint-initdb.d/data/space2stats_sample_cs.parquet | ||
|
|
There was a problem hiding this comment.
There is trailing whitespace on this line that should be removed for cleaner code.
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Multiple consecutive blank lines at the end of the file should be reduced to a single blank line for consistency.
What changed