Skip to content

Add Ingestion Module for GCS/S3 Resources to Snowflake #54

@cyclux

Description

@cyclux

Load jaffle-shop Parquet files from cloud storage into Snowflake.

API:

from data import load_from_gcs, load_from_s3

results = load_from_gcs(session, schema_name="RAW")
results = load_from_s3(session, bucket="s3://bucket/path/", schema_name="RAW")

Files:

  • data/ingestion.py
  • data/sql/ingestion/*.sql

Behavior:

  1. GCS: Download → internal stage → COPY INTO
  2. S3: External stage → COPY INTO
  3. Schema inferred from Parquet (INFER_SCHEMA)
  4. Idempotent (safe to re-run)

SQL Templates:

File Purpose
create_parquet_file_format.sql Parquet format definition
create_internal_stage.sql Internal stage for GCS downloads
create_stage_s3_public.sql External S3 stage
create_table_from_parquet.sql Table creation with INFER_SCHEMA
copy_into_table.sql COPY INTO with MATCH_BY_COLUMN_NAME

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions