Skip to content

Overhaul gaiaCore to Postgres-Centric#23

Merged
jshoughtaling merged 20 commits into
OHDSI:mainfrom
TuftsCTSI:main
Apr 17, 2026
Merged

Overhaul gaiaCore to Postgres-Centric#23
jshoughtaling merged 20 commits into
OHDSI:mainfrom
TuftsCTSI:main

Conversation

@jshoughtaling
Copy link
Copy Markdown
Collaborator

@tibbben & @rtmill - this Pull Request does the following:

  • Removes the existing R-based codebase
  • Implements an end-to-end workflow for:
    • fetching GIS data source metadata in JSON-LD format
    • extracting metadata into data source and variable source tables
    • fetching and loading the underlying source data
    • loading OMOP-based LOCATION and LOCATION-HISTORY data
    • joining the associated data source(s) with the OMOP data to produce an external exposure table

Key next steps:

  • Test!
  • Write a set of connector tools to interface with gaiaCore via the PostgREST API in various languages, including:
    • Python
    • R
    • ...

@TheCedarPrince
Copy link
Copy Markdown
Collaborator

Just some notes/observations from here!

First, tried out using your branch here to get things working but got these errors initially:

thecedarprince@thecedarledge:~/FOSS/gaiaCore$ docker compose up -d gaiacore-db gaiacore-api 
[+] Running 3/3
 ! gaiacore-db Warning      pull access denied for gaiacore, repository does not exist or may require 'docker login': denied: req...                      0.5s 
 ✔ gaiacore-api Pulled                                                                                                                                    1.5s 
   ✔ 15e17bebd108 Pull complete                                                                                                                           0.7s 
[+] Building 0.3s (15/15) FINISHED                                                                                                                             
 => [internal] load local bake definitions                                                                                                                0.0s
 => => reading from stdin 538B                                                                                                                            0.0s
 => [internal] load build definition from Dockerfile                                                                                                      0.0s
 => => transferring dockerfile: 1.65kB                                                                                                                    0.0s
 => [internal] load metadata for docker.io/postgis/postgis:16-3.4-alpine                                                                                  0.2s
 => [internal] load .dockerignore                                                                                                                         0.0s
 => => transferring context: 2B                                                                                                                           0.0s
 => [1/8] FROM docker.io/postgis/postgis:16-3.4-alpine@sha256:681931a625df344215e9b8998bf34daf146b6a395ceacee4439eb9c85869239f                            0.0s
 => [internal] load build context                                                                                                                         0.0s
 => => transferring context: 935B                                                                                                                         0.0s
Untagged: gaiacore:latest
Deleted: sha256:c29b47868ab697a31fbdefce7318c6a52b3fa60e49c1f8d5b61df8177fd1afbd
Untagged: postgrest/postgrest:latest
Untagged: postgrest/postgrest@sha256:0a46780309a604cdc8b56c776c6e5e15788ce58174d709e40459ab5a2d44d228
Deleted: sha256:5c5eb0045fa5c2f83e977647f4aaf6222f75b8538642ed6ec682f56721b1708d
Deleted: sha256:96b98f958ab9446abbd3e836b148d6815dfbd2669ce741781a875881c143c31f
thecedarprince@thecedarledge:~/FOSS/gaiaCore$ git ^C
thecedarprince@thecedarledge:~/FOSS/gaiaCore$ docker compose up -d gaiacore-db gaiacore-api 
[+] Running 3/3
 ! gaiacore-db Warning      pull access denied for gaiacore, repository does not exist or may require 'docker login': denied: re
grest-v12.0.2-linux-stat  0.0s
 => CACHED [6/8] COPY sql/*.sql /docker-entrypoint-initdb.d/                                                                    
                          0.0s
 => CACHED [7/8] COPY postgrest.conf /etc/postgrest/postgrest.conf                                                              
                          0.0s
 => CACHED [8/8] COPY scripts/init_gaiacore.sh /docker-entrypoint-initdb.d/99_init_gaiacore.sh                                  
                          0.0s
 => exporting to image                                                                                                          
 => => exporting layers                                                                                                                                   0.0s
 => => writing image sha256:c29b47868ab697a31fbdefce7318c6a52b3fa60e49c1f8d5b61df8177fd1afbd                                                              0.0s
 => => naming to docker.io/library/gaiacore:latest                                                                                                        0.0s
 => resolving provenance for metadata file                                                                                                                0.0s
[+] Running 2/3
 ✔ gaiacore:latest               Built                                                                                                                    0.0s 
 ⠸ Container gaiacore-postgres   Starting                                                                                                                 0.3s 
 ✔ Container gaiacore-postgrest  Created                                                                                                                  0.0s 
Error response from daemon: failed to set up container networking: driver failed programming external connectivity on endpoint gaiacore-postgres (43eaa6953e6ff
9e543aabab8dc7d4ae900116c0036292bd110e5e39aa15a5425): failed to bind host port for 0.0.0.0:5432:172.X.X.X:5432/tcp: address already in use

I found you can fix this by disabling postgres if it is running locally on your machine outside of Docker.

Second, quick_ingest_datasource is not doing anything on my computer at the moment. It just hangs indefinitely...

Third, I cannot get the Julia client to work. Whether in example or in the pipeline, I get errors in the try-catch block.

@TheCedarPrince
Copy link
Copy Markdown
Collaborator

Hey @jshoughtaling , some more tinkering and results from experimentation on my side:

  1. I tried my best to load up a json-ld file into gaiaCore, but I realized I do not know exactly how to bring datasets into gaiaCore outside of the demo data directory provided within the Dockerfile. I poked around outside the repo and found gaiaCatalog and attempted to load some of the json-ld scripts I saw but since Docker could not see the external directory on my localhost, I couldn't load the files -- I stopped as that didn't seem like the right thing to do here
  2. Some other notes on other files/documentation:
File Line Number Comment
README.md 67 Annotation: This should be docker compose, not docker-compose
README.md 99 Annotation: Where do I download this from? is the URL wrong?
README.md 121 Annotation: Am I also supposed to download gaiaCatalog too to have this all working?
connectors/PIPELINE_GUIDE.md 18 Annotation: THis should be docker compose, not docker-compose

It seems like there is still a docker version mismatch with docker compose vs. docker-compose -- not sure how to ameliorate that.

  1. I am still unable to get the Julia connector working from the demo in connectors. Not sure what I am doing wrong there...

@jshoughtaling
Copy link
Copy Markdown
Collaborator Author

Hey @TheCedarPrince,

First, thank you so much for the time you've spent going through and testing this out. I will try and push some updates as soon as I can to resolve the issues you're having.

Meanwhile, I've had a few nice conversations with @tibbben about where certain components of this structure should live, and it could be that I partition this pull request out into multiple pieces that affect gaiaDB and gaiaCore. We will discuss at the meeting tomorrow in detail.

Thanks again!

@tibbben
Copy link
Copy Markdown
Collaborator

tibbben commented Nov 7, 2025

@jshoughtaling wow, great work! I owe you a review but will not have time to do anything soon (like this week), but may have a chance to look with a little more care sometime next week and then meet the week after (week of the 17th). If that sounds ok, I will dig in next week.

I am optimistic that we can bring together our approaches, modify the architectural approach a little to give things good homes, and then move forward ... I look forward to it.

@jshoughtaling
Copy link
Copy Markdown
Collaborator Author

jshoughtaling commented Nov 14, 2025

Thanks @TheCedarPrince and @tibbben for the feedback!
I've split this PR into its component parts. Here's a summary of what I moved:

What Was Migrated

To gaiaDB Repository

SQL Files (sql/ directory)

All SQL initialization and function files have been migrated to gaiaDB/sql/:

  • 01_init_schema.sql → Integrated into gaiaDB/init.sql
  • 02_jsonld_ingestion_functions.sqlgaiaDB/sql/02_jsonld_ingestion_functions.sql
  • 03_location_ingestion_functions.sqlgaiaDB/sql/03_location_ingestion_functions.sql
  • 04_spatial_join_functions.sqlgaiaDB/sql/04_spatial_join_functions.sql
  • 05_data_source_retrieval_functions.sqlgaiaDB/sql/05_data_source_retrieval_functions.sql

Enhanced init.sql

The gaiaDB/init.sql has been completely rewritten to:

  • Include the enhanced gaiaCore backbone schema (UUID-based, JSON-LD support)
  • Maintain the original vocabulary schema
  • Automatically load all function files
  • Create the working schema for OMOP tables

Enhanced Dockerfile

The gaiaDB/Dockerfile now includes:

  • plsh (PostgreSQL shell language)
  • GDAL tools (gdal, gdal-tools, gdal-driver-pg)
  • Build dependencies for extensions
  • Automatic copying of SQL function files

To gaiaDocker Repository

PostgREST Service

Added a new gaia-postgrest service to docker-compose.yaml:

  • Provides RESTful API access to the gaiaDB database
  • Exposes backbone and working schemas
  • Runs on port 3000 (configurable)
  • Includes health checks and service dependencies

Environment Configuration

Updated .env file:

  • Added GAIA_POSTGREST_API_PORT=3000

Database Health Check

Enhanced gaia-db service with health check to ensure proper startup ordering

What Remained in gaiaCore

Connector Examples (connectors/ directory)

The language-specific connector implementations remain in gaiaCore:

  • connectors/python/ - Python connector and examples
  • connectors/r/ - R connector and examples
  • connectors/julia/ - Julia connector and examples
  • connectors/java/ - Java connector and examples
  • connectors/bash/ - Bash connector and examples

Rationale: These are demonstration/reference implementations showing how to interact with the PostgREST API. They are not core infrastructure and belong with the gaiaCore repository as examples.

Documentation

  • README.md - gaiaCore overview and workflow documentation
  • POSTGREST_API_GUIDE.md - API usage guide
  • connectors/README.md - Connector overview
  • connectors/PIPELINE_GUIDE.md - Pipeline workflow guide

Test Data

  • test/omop/ - Sample OMOP data
  • test/data_source/ - Sample data sources for testing

Updated Init Script

  • scripts/init_gaiacore.sh - now loads data from client-side filesystem instead of server-side via \copy (client-side COPY) instead of server-side file paths
    • Supports environment variables for database connection (DB_HOST, DB_PORT, etc.)
    • Loads JSON-LD metadata by reading file content and passing to database
    • Can be run from connector containers or local machine

Key Schema Changes in gaiaDB

Breaking Changes

  1. data_source.data_source_uuid: Changed from int4 to UUID
  2. data_source table: Added JSON-LD metadata fields (creator, provider, keywords, etc.)
  3. variable_source table: Enhanced with additional metadata fields
  4. New working schema: Added for OMOP location and exposure tables

How to Use the New Code

1. Build and Run gaiaDB Standalone

cd gaiaDB
docker build -t gaia-db .
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=secret gaia-db

2. Run Full GAIA Stack with gaiaDocker

cd gaiaDocker
docker compose --profile gaia up -d

This will start:

  • gaia-db (PostgreSQL with PostGIS and gaiaCore functions)
  • gaia-postgrest (RESTful API)
  • gaia-catalog, gaia-solr, and utility services

3. Access PostgREST API

# View API documentation
curl http://localhost:3000/

# Query data sources
curl http://localhost:3000/data_source

# Call a function
curl -X POST http://localhost:3000/rpc/quick_ingest_datasource \
  -H "Content-Type: application/json" \
  -d '{"dataset_name": "My Dataset"}'

4. Load Test Data (Client-Side)

# From gaiaCore repository root
cd gaiaCore
./scripts/init_gaiacore.sh

# Or with custom database connection
DB_HOST=localhost DB_PORT=5432 ./scripts/init_gaiacore.sh

5. Use Connector Examples

cd gaiaCore/connectors/python
python pipeline_example.py http://localhost:3000

Documentation Updates Needed

gaiaDB Repository

  • Update main README.md to reflect new functionality
  • Add function reference documentation
  • Create migration guide for existing users
  • Add example Jupyter notebooks

gaiaDocker Repository

  • Update main README.md to include PostgREST service
  • Add PostgREST usage examples
  • Document API security setup
  • Add troubleshooting guide

gaiaCore Repository

  • Update docker-compose.yml with architecture notes
  • Update init_gaiacore.sh for client-side loading
  • Add scripts/README.md documentation
  • Update main README.md to reference gaiaDB and gaiaDocker
  • Clarify that gaiaCore focuses on connectors and examples
  • Add links to migrated code locations

@jshoughtaling jshoughtaling self-assigned this Nov 14, 2025
@jshoughtaling jshoughtaling merged commit 1581372 into OHDSI:main Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants