Priority: P1
The batch-processing.env recipe starts Spark, Iceberg, Trino, and Jupyter but has no actual Spark job.
What to do
- Create
stacks/processing/spark/jobs/etl_to_iceberg.py — PySpark reads from PostgreSQL, writes Iceberg tables
- Configure Spark to use Iceberg REST catalog
- Create Jupyter notebook
04-spark-iceberg.ipynb demonstrating the workflow
Acceptance criteria
Priority: P1
The
batch-processing.envrecipe starts Spark, Iceberg, Trino, and Jupyter but has no actual Spark job.What to do
stacks/processing/spark/jobs/etl_to_iceberg.py— PySpark reads from PostgreSQL, writes Iceberg tables04-spark-iceberg.ipynbdemonstrating the workflowAcceptance criteria
spark-submitruns the ETL job without errors