Skip to content

feat(stacks): Add Marquez (OpenLineage reference backend) for cross-pipeline lineage #601

@stefanko-ch

Description

@stefanko-ch

Summary

Add Marquez as a new stack — the open-source reference implementation backend for the OpenLineage standard. Marquez collects, stores, and visualizes metadata + data lineage events emitted by data-pipeline tools.

OpenLineage itself is a standard + library suite, not a deployable service. What goes into Nexus-Stack is Marquez — the canonical OL backend (Java/Spring Boot API + Web UI, Postgres-backed).

Why it fits Nexus-Stack particularly well

Several stacks Nexus-Stack already ships speak OpenLineage natively, so adding Marquez immediately enables end-to-end cross-tool lineage without further per-tool integration work:

Existing stack OpenLineage integration
Spark openlineage-spark listener — automatic table + column-level lineage
dbt (in code-server's nexus-venv) openlineage-dbt — emits on dbt run
Flink openlineage-flink (table-level)
Trino emerging support
Kestra OpenLineage output task
Dagster official dagster-openlineage integration

Net effect: one new container brings unified lineage UI across most of the orchestration/processing stack we already have.

Components

Multi-container stack, similar pattern to Gitea / Dify:

Component Image Port Purpose
marquez marquezproject/marquez:0.50.0 (verify arch) 5000 API + lineage event sink
marquez-web marquezproject/marquez-web:0.50.0 (verify arch) 3000 React-based UI
marquez-postgres postgres:16-alpine internal Backend DB (per-stack, NOT shared)

User: nexus-marquez (per CLAUDE.md naming convention), password generated by Terraform → Infisical.

Acceptance criteria

  • stacks/marquez/docker-compose.yml with all 3 containers + internal Postgres network
  • Image multi-arch verified for both Marquez images (docker manifest inspect). If arm64-only, swap or write a custom Dockerfile
  • Image versions PINNED (Marquez is stateful — DB schema migrations on upgrade)
  • services.yaml entry: category data-quality (or new lineage), port 3000 (web UI), public:false
  • Terraform: random_password.marquez_db_password, output to Infisical
  • Auto-setup hook in services.py if any first-run admin is needed
  • docs/stacks/marquez.md reference doc with the "what does this give me" matrix above
  • README + docs/stacks/README.md + docs/stacks/overview.md updated (stack count, badge, version table)
  • Test plan: Spark stack → run any tutorial flow → Marquez UI shows the job + datasets

Open questions / risks

  • RAM footprint: Java/Spring Boot baseline ~500-700 MB. On the default cx43 server (16 GB RAM) that's fine, but on the fallback smaller variants (cpx32 / 8 GB) we're already running Kestra (1.8 GB), Infisical, Grafana stack, code-server. May need to flag Marquez as "enable selectively" rather than "always on".
  • Multi-arch: need to verify. Marquez Docker Hub page should confirm.
  • DB schema migrations: Marquez ships Flyway migrations. Stateful service — version bumps need test on a populated DB before merging.
  • Lineage event volume: Spark jobs emit per-task. For tutorial-scale stacks fine; need to confirm Postgres tuning for heavier classroom use.

Out of scope

  • Wiring up specific integrations (Spark listener, dbt config, etc.) — those go in follow-up PRs once Marquez is reachable. This issue is just the deployment.
  • Auth in front of Marquez — Cloudflare Access at the edge as with every other stack.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions