Summary
Document the Prefect → Databricks SQL Warehouse integration and close the one real infra gap: Prefect's default container does not bundle prefect-databricks, so the integration is not plug-and-play the way Kestra's is.
Parallel issue to #478 (same for Kestra). The Databricks-side story is identical (host, warehouse HTTP path, PAT from the existing nexus secret scope). The Nexus-side story differs in two points.
Differences from the Kestra issue
| Dimension |
Kestra (#478) |
Prefect (this issue) |
| Plugin install |
Bundled in kestra/kestra:latest |
Not bundled. prefect-databricks must be installed separately. |
| Flow definition |
YAML, declarative |
Python, imperative |
| Secret management |
Flow env vars (.env via deploy.sh) |
Prefect Blocks (persisted in Prefect DB) |
| Retry/cold-start |
Flow-level YAML |
Task-decorator @task(retries=..., retry_delay_seconds=...) |
What's missing
1. prefect-databricks in the Prefect worker environment
stacks/prefect/docker-compose.yml runs the official prefecthq/prefect:* image, which ships only Prefect core. The Databricks collection needs to land there for any Databricks flow to work.
Three realistic options:
- (a) Custom Dockerfile —
FROM prefecthq/prefect:3-latest-python3.12 with RUN pip install prefect-databricks. Same pattern used for Soda Core today. Recommended. Clean, versioned, reviewable in the repo. Image published under IMAGE_PREFECT or a new IMAGE_PREFECT_WORKER tag.
- (b) Runtime
pip install in a worker-pool init hook — flexible but slower on cold starts and brittle across restarts.
- (c) User installs it in their own deployment's work pool — works but defeats the "batteries-included" Nexus-Stack promise for classroom use.
2. A DatabricksCredentials Block auto-provisioned
Prefect stores credentials in its internal "Blocks" registry, not as env vars. Ideal: scripts/deploy.sh calls Prefect's API after spin-up to seed a DatabricksCredentials block with host + PAT from the same KV/Infisical source that Kestra already uses. Students then DatabricksCredentials.load("nexus-databricks") from their flows — no token handling in flow code.
Gated on the Databricks Integrations config being saved; if it's not, skip silently like other optional bootstrap steps.
3. Tutorial under docs/tutorials/prefect/databricks-warehouse.md
Analogous to the Kestra one (see #478 for structure). Prefect-specific content:
from prefect import flow, task
from prefect_databricks import DatabricksCredentials
from prefect_databricks.queries import DatabricksSqlQuery
@task(retries=5, retry_delay_seconds=30) # handles Free-Edition cold-start
def query_warehouse(creds, warehouse_id):
return DatabricksSqlQuery(
databricks_credentials=creds,
warehouse_id=warehouse_id,
query="SELECT current_timestamp(), current_catalog()",
)
@flow
def classroom_flow():
creds = DatabricksCredentials.load("nexus-databricks")
result = query_warehouse(creds, warehouse_id="abc123def456")
print(result)
if __name__ == "__main__":
classroom_flow()
Prefect-Free-Edition notes identical to Kestra (cold-start, 25 GB quota, no Jobs-API deep work).
4. Warehouse HTTP path handling
Prefect's DatabricksSqlQuery takes warehouse_id directly (not httpPath); the collection constructs the HTTP path internally. Easier than Kestra — warehouse_id is the last path segment of the HTTP path, so the same HTTP_PATH secret landed by #478 is sufficient (extract the ID server-side or just store both).
Relation to #478
Issues should be kept separate because:
- Different installation mechanics (bundled vs. pip-install).
- Different credential-model (env vs. Block).
- Different tutorial target audience (YAML-first vs. Python-first learners).
But both deserve consistent wording on:
- The Databricks side (host, warehouse, PAT, where they come from).
- The Free Edition caveats (cold-start retry, quota, no Jobs-API deep-dive).
Once both land, the Kestra and Prefect tutorials should cross-link: "prefer Kestra for YAML-first declarative DE; prefer Prefect for Python-first imperative DE. Both point at the same warehouse with the same credentials."
Related
Acceptance criteria
Summary
Document the Prefect → Databricks SQL Warehouse integration and close the one real infra gap: Prefect's default container does not bundle
prefect-databricks, so the integration is not plug-and-play the way Kestra's is.Parallel issue to #478 (same for Kestra). The Databricks-side story is identical (host, warehouse HTTP path, PAT from the existing
nexussecret scope). The Nexus-side story differs in two points.Differences from the Kestra issue
kestra/kestra:latestprefect-databricksmust be installed separately..envvia deploy.sh)Blocks (persisted in Prefect DB)@task(retries=..., retry_delay_seconds=...)What's missing
1.
prefect-databricksin the Prefect worker environmentstacks/prefect/docker-compose.yml runs the official
prefecthq/prefect:*image, which ships only Prefect core. The Databricks collection needs to land there for any Databricks flow to work.Three realistic options:
FROM prefecthq/prefect:3-latest-python3.12withRUN pip install prefect-databricks. Same pattern used for Soda Core today. Recommended. Clean, versioned, reviewable in the repo. Image published underIMAGE_PREFECTor a newIMAGE_PREFECT_WORKERtag.pip installin a worker-pool init hook — flexible but slower on cold starts and brittle across restarts.2. A
DatabricksCredentialsBlock auto-provisionedPrefect stores credentials in its internal "Blocks" registry, not as env vars. Ideal:
scripts/deploy.shcalls Prefect's API after spin-up to seed aDatabricksCredentialsblock with host + PAT from the same KV/Infisical source that Kestra already uses. Students thenDatabricksCredentials.load("nexus-databricks")from their flows — no token handling in flow code.Gated on the Databricks Integrations config being saved; if it's not, skip silently like other optional bootstrap steps.
3. Tutorial under
docs/tutorials/prefect/databricks-warehouse.mdAnalogous to the Kestra one (see #478 for structure). Prefect-specific content:
Prefect-Free-Edition notes identical to Kestra (cold-start, 25 GB quota, no Jobs-API deep work).
4. Warehouse HTTP path handling
Prefect's
DatabricksSqlQuerytakeswarehouse_iddirectly (nothttpPath); the collection constructs the HTTP path internally. Easier than Kestra —warehouse_idis the last path segment of the HTTP path, so the sameHTTP_PATHsecret landed by #478 is sufficient (extract the ID server-side or just store both).Relation to #478
Issues should be kept separate because:
But both deserve consistent wording on:
Once both land, the Kestra and Prefect tutorials should cross-link: "prefer Kestra for YAML-first declarative DE; prefer Prefect for Python-first imperative DE. Both point at the same warehouse with the same credentials."
Related
prefect-databricksdocs: https://prefecthq.github.io/prefect-databricks/Acceptance criteria
prefect-databrickspublished under a new tag / env var.scripts/deploy.shauto-provisions aDatabricksCredentialsblock when Databricks Integrations is configured.docs/tutorials/prefect/databricks-warehouse.md, linked fromdocs/tutorials/index.mdanddocs/tutorials/prefect/index.md(creating the Prefect category landing page if not yet present).pip installstep, no manual credential setup.@task(retries=5, retry_delay_seconds=30)in the sample.