From b35f4b63dde70d1074b62b77ae18fa801f3a895f Mon Sep 17 00:00:00 2001 From: Rob Newman Date: Mon, 4 May 2026 12:23:37 -0700 Subject: [PATCH 01/24] feat: Add data-lineage overview --- platform-cloud/docs/data/data-lineage.md | 98 ++++++++++++++++++++++++ 1 file changed, 98 insertions(+) create mode 100644 platform-cloud/docs/data/data-lineage.md diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md new file mode 100644 index 000000000..2a9b3562b --- /dev/null +++ b/platform-cloud/docs/data/data-lineage.md @@ -0,0 +1,98 @@ +--- +title: "Data Lineage" +description: "Using data lineage in Seqera Platform." +date created: "2026-05-04" +last updated: "2026-05-04" +tags: [data lineage data-lineage provenance governance reproducibility lineage-id lid label] +--- + +:::info +Data lineage in platform is currently in public preview. + +Data lineage requires Nextflow v25.04 or later with either `lineage.enabled = true` in your Nextflow pipeline configuration (for per pipeline configuration) or defined in the **Lineage** workspace setting. The feature is experimental and subject to change. + +Please consult this guide for the latest information on recommended configuration and limitations. +::: + +Data lineage tracks the full provenance of every pipeline run — what executed, what data was consumed, and what outputs were produced (both task-level and workflow-level). This allows auditing of pipeline results, verification of reproducibility, and tracing file provenance. + +## Why data lineage matters + +Production pipelines generate results that teams need to trust, audit, and reproduce. Data lineage answers the question "how exactly was this result produced?" with a precise, immutable record. + +- **Reproducibility**: Every run, task, and output file receives a unique **Lineage ID (LID)** — a traversable URI pointing to a structured record of exactly what ran. You can verify that two runs produced identical results, or identify precisely where they diverged. +- **Auditing and compliance**: For teams in regulated industries (pharma, clinical genomics, CROs), lineage provides the audit trail needed for regulatory compliance. Each record captures inputs, outputs, parameters, compute environment, and the user who launched the run. +- **Debugging**: When a cached task unexpectedly re-executes or a pipeline produces an unexpected result, lineage lets you trace backward from any output to all contributing tasks and parameters. You can compare two task runs to isolate exactly what changed. +- **Broader team access**: Previously, exploring Nextflow lineage required CLI access and comfort reading raw JSON. Lineage data is surfaced in both pipeline run details pages and Data Explorer, so users can inspect provenance directly. +- **Cross-workflow discoverability**: [Workflow output labels][workflow-labels] make output files discoverable across runs. Rather than knowing which specific run produced a file, query lineage records by label to find all matching outputs workspace-wide. + +## How data lineage works + +Nextflow creates a structured JSON record for each entity in your pipeline when lineage is enabled: + +| Record type | Description | +|---|---| +| **WorkflowRun** | Full pipeline execution: repository, commit ID, parameters, compute environment, session ID, and Platform context (user, workspace, pipeline) | +| **TaskRun** | Individual task execution: script, code checksum, inputs, outputs, container, and dependencies | +| **FileOutput** | Output file: path, checksum, size, timestamp, and links back to the task and workflow that produced it | + +Each record gets a **Lineage ID (LID)** — a `lid://` URI that uniquely identifies the entity. LIDs are navigable: every LID and lineage label is a clickable link that queries all related entities across your organization. + +### Configure workspace settings + +Before collecting lineage data, configure the lineage storage location for your workspace. Go to **Workspace Settings** and open the [**Lineage** settings][workspace-lineage] to set the storage bucket and path where the lineage data is stored and indexed. This applies to **all** pipeline runs in the workspace. + +:::danger +Changing the lineage storage bucket path after runs have generated lineage data will result in historic data loss. The lineage index is tied to this location — changing it makes existing records inaccessible. If you need to move the lineage data storage location, first copy all existing lineage data to the new bucket and path (for example, `aws s3 cp --recursive s3://old-bucket/path s3://new-bucket/path`), then update the workspace setting. +::: + +### Enable per pipeline lineage in Nextflow + +To test lineage within a single pipeline, add the following to your Nextflow configuration file before running your pipeline: + +```groovy +lineage.enabled = true +lineage.store.location = '' +``` + +Only runs executed with this setting generate lineage data. Runs without it display a note on the Run Info tab: + +> *Lineage tracking was not enabled for this run. Add `lineage.enabled = true` to your Nextflow config to capture lineage data.* + +## Data lineage displayed in Seqera Platform + +### Workflow run details + +When a run was executed with lineage enabled, the [run details page][run-details] displays lineage data across the following tabs: + +**Run Info** — shows the Lineage ID, lineage labels, and the full Platform context captured at execution time: user, workspace, compute environment, pipeline name, revision, and commit ID. + +**Tasks** — displays the Lineage ID and lineage labels for each `TaskRun` alongside existing task data, so you can trace any task back to its lineage record. All task file inputs and outputs, and upstream and downstream tasks linked by lineage records are displayed. + +**Inputs** — lists all input datasets and parameters with file paths, types, and Lineage IDs and lineage labels where available. + +**Outputs** — lists all `FileOutput` records linked to the workflow run: output name, file path, type, Lineage ID, and lineage labels. Files link directly to [Data Explorer][data-explorer]. + +:::tip +All LIDs and lineage labels are clickable links. Clicking any LID opens the organization-level lineage search pre-filled with that identifier. +::: + +### Data Explorer + +Output objects generated by a lineage-enabled run display the **LID** and any **lineage labels** when you preview the object in Data Explorer. This lets you trace any file back to the pipeline run that produced it. + +## Lineage labels + +Assign lineage labels to output files using the `label` directive in your Nextflow process definitions. Labels appear in lineage records and are searchable across your workspace. + +Both Seqera Platform labels and Nextflow lineage labels propagate to lineage records. Seqera Platform excludes **resource labels** — they relate to underlying compute resources, not the data itself. + +:::info +Nextflow lineage labels are **immutable** — they are set at execution time and cannot be changed. Seqera Platform labels are **mutable**. If you update Platform labels after a run completes, a mismatch between Platform run labels and lineage labels is possible. This is expected behavior. +::: + +{/* links */} +[workflow-labels]: https://docs.seqera.io/nextflow/workflow#labels +[workspace-lineage]: ../orgs-and-teams/workspace-management#lineage +[run-details]: ../monitoring/run-details +[data-explorer]: data-explorer \ No newline at end of file From c4b4e6fd6f5ccea42630ccef56de4a7c1f4fd948 Mon Sep 17 00:00:00 2001 From: Rob Newman Date: Mon, 4 May 2026 12:33:03 -0700 Subject: [PATCH 02/24] chore: Fix newline CI/CD error --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index 2a9b3562b..e73998626 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -95,4 +95,4 @@ Nextflow lineage labels are **immutable** — they are set at execution time and [workflow-labels]: https://docs.seqera.io/nextflow/workflow#labels [workspace-lineage]: ../orgs-and-teams/workspace-management#lineage [run-details]: ../monitoring/run-details -[data-explorer]: data-explorer \ No newline at end of file +[data-explorer]: data-explorer From e5062b5a45a46ca984fddcf855fc9c23692221a9 Mon Sep 17 00:00:00 2001 From: Rob Newman Date: Mon, 4 May 2026 13:47:05 -0700 Subject: [PATCH 03/24] chore: Fix lineage ID case --- platform-cloud/docs/data/data-lineage.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index e73998626..7e16b4086 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -20,7 +20,7 @@ Data lineage tracks the full provenance of every pipeline run — what executed, Production pipelines generate results that teams need to trust, audit, and reproduce. Data lineage answers the question "how exactly was this result produced?" with a precise, immutable record. -- **Reproducibility**: Every run, task, and output file receives a unique **Lineage ID (LID)** — a traversable URI pointing to a structured record of exactly what ran. You can verify that two runs produced identical results, or identify precisely where they diverged. +- **Reproducibility**: Every run, task, and output file receives a unique **lineage ID (LID)** — a traversable URI pointing to a structured record of exactly what ran. You can verify that two runs produced identical results, or identify precisely where they diverged. - **Auditing and compliance**: For teams in regulated industries (pharma, clinical genomics, CROs), lineage provides the audit trail needed for regulatory compliance. Each record captures inputs, outputs, parameters, compute environment, and the user who launched the run. - **Debugging**: When a cached task unexpectedly re-executes or a pipeline produces an unexpected result, lineage lets you trace backward from any output to all contributing tasks and parameters. You can compare two task runs to isolate exactly what changed. - **Broader team access**: Previously, exploring Nextflow lineage required CLI access and comfort reading raw JSON. Lineage data is surfaced in both pipeline run details pages and Data Explorer, so users can inspect provenance directly. @@ -36,7 +36,7 @@ Nextflow creates a structured JSON record for each entity in your pipeline when | **TaskRun** | Individual task execution: script, code checksum, inputs, outputs, container, and dependencies | | **FileOutput** | Output file: path, checksum, size, timestamp, and links back to the task and workflow that produced it | -Each record gets a **Lineage ID (LID)** — a `lid://` URI that uniquely identifies the entity. LIDs are navigable: every LID and lineage label is a clickable link that queries all related entities across your organization. +Each record gets a **lineage ID (LID)** — a `lid://` URI that uniquely identifies the entity. LIDs are navigable: every LID and lineage label is a clickable link that queries all related entities across your organization. ### Configure workspace settings @@ -65,13 +65,13 @@ Only runs executed with this setting generate lineage data. Runs without it disp When a run was executed with lineage enabled, the [run details page][run-details] displays lineage data across the following tabs: -**Run Info** — shows the Lineage ID, lineage labels, and the full Platform context captured at execution time: user, workspace, compute environment, pipeline name, revision, and commit ID. +**Run Info** — shows the lineage ID, lineage labels, and the full Platform context captured at execution time: user, workspace, compute environment, pipeline name, revision, and commit ID. -**Tasks** — displays the Lineage ID and lineage labels for each `TaskRun` alongside existing task data, so you can trace any task back to its lineage record. All task file inputs and outputs, and upstream and downstream tasks linked by lineage records are displayed. +**Tasks** — displays the lineage ID and lineage labels for each `TaskRun` alongside existing task data, so you can trace any task back to its lineage record. All task file inputs and outputs, and upstream and downstream tasks linked by lineage records are displayed. -**Inputs** — lists all input datasets and parameters with file paths, types, and Lineage IDs and lineage labels where available. +**Inputs** — lists all input datasets and parameters with file paths, types, and lineage IDs and lineage labels where available. -**Outputs** — lists all `FileOutput` records linked to the workflow run: output name, file path, type, Lineage ID, and lineage labels. Files link directly to [Data Explorer][data-explorer]. +**Outputs** — lists all `FileOutput` records linked to the workflow run: output name, file path, type, lineage ID, and lineage labels. Files link directly to [Data Explorer][data-explorer]. :::tip All LIDs and lineage labels are clickable links. Clicking any LID opens the organization-level lineage search pre-filled with that identifier. From 1f857a8774aec553f3349a64ad102512c4ae4f7f Mon Sep 17 00:00:00 2001 From: Rob Newman <61608+robnewman@users.noreply.github.com> Date: Mon, 4 May 2026 15:11:27 -0700 Subject: [PATCH 04/24] Update platform-cloud/docs/data/data-lineage.md Co-authored-by: Chris Hakkaart Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com> --- platform-cloud/docs/data/data-lineage.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index 7e16b4086..385680934 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -7,11 +7,9 @@ tags: [data lineage data-lineage provenance governance reproducibility lineage-i --- :::info -Data lineage in platform is currently in public preview. +Data lineage in Platform is in public preview. It requires Nextflow v25.04 or later. Enable it per-pipeline (`lineage.enabled = true` in your Nextflow config) or workspace-wide via the **Lineage** workspace setting. -Data lineage requires Nextflow v25.04 or later with either `lineage.enabled = true` in your Nextflow pipeline configuration (for per pipeline configuration) or defined in the **Lineage** workspace setting. The feature is experimental and subject to change. - -Please consult this guide for the latest information on recommended configuration and limitations. +The feature is experimental and subject to change. See this guide for the latest configuration recommendations and limitations. ::: Data lineage tracks the full provenance of every pipeline run — what executed, what data was consumed, and what outputs were produced (both task-level and workflow-level). This allows auditing of pipeline results, verification of reproducibility, and tracing file provenance. From 1e17cd50fce4a40cfa89e8ad740933e76b9fd246 Mon Sep 17 00:00:00 2001 From: Rob Newman <61608+robnewman@users.noreply.github.com> Date: Mon, 4 May 2026 15:11:50 -0700 Subject: [PATCH 05/24] Update platform-cloud/docs/data/data-lineage.md Co-authored-by: Chris Hakkaart Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com> --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index 385680934..c85365716 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -12,7 +12,7 @@ Data lineage in Platform is in public preview. It requires Nextflow v25.04 or la The feature is experimental and subject to change. See this guide for the latest configuration recommendations and limitations. ::: -Data lineage tracks the full provenance of every pipeline run — what executed, what data was consumed, and what outputs were produced (both task-level and workflow-level). This allows auditing of pipeline results, verification of reproducibility, and tracing file provenance. +Data lineage tracks the full provenance of every pipeline run at both the task and workflow level, including what executed, what data it consumed, and what outputs it produced. Use it to audit results, verify reproducibility, and trace file provenance. ## Why data lineage matters From d5b3dd8080f3b71b98bdc605b62fb0124c0cfb8f Mon Sep 17 00:00:00 2001 From: Rob Newman <61608+robnewman@users.noreply.github.com> Date: Mon, 4 May 2026 15:12:04 -0700 Subject: [PATCH 06/24] Update platform-cloud/docs/data/data-lineage.md Co-authored-by: Chris Hakkaart Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com> --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index c85365716..8106804e4 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -16,7 +16,7 @@ Data lineage tracks the full provenance of every pipeline run at both the task a ## Why data lineage matters -Production pipelines generate results that teams need to trust, audit, and reproduce. Data lineage answers the question "how exactly was this result produced?" with a precise, immutable record. +Production pipelines generate results that teams need to trust, audit, and reproduce. Data lineage provides a precise, immutable record of how each result was produced. - **Reproducibility**: Every run, task, and output file receives a unique **lineage ID (LID)** — a traversable URI pointing to a structured record of exactly what ran. You can verify that two runs produced identical results, or identify precisely where they diverged. - **Auditing and compliance**: For teams in regulated industries (pharma, clinical genomics, CROs), lineage provides the audit trail needed for regulatory compliance. Each record captures inputs, outputs, parameters, compute environment, and the user who launched the run. From 2f0fe4d3c8e1d751381fcd621c50f5c90dab396c Mon Sep 17 00:00:00 2001 From: Rob Newman <61608+robnewman@users.noreply.github.com> Date: Mon, 4 May 2026 15:13:37 -0700 Subject: [PATCH 07/24] Update platform-cloud/docs/data/data-lineage.md Co-authored-by: Chris Hakkaart Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com> --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index 8106804e4..0e9cd799e 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -38,7 +38,7 @@ Each record gets a **lineage ID (LID)** — a `lid://` URI that uniquely identif ### Configure workspace settings -Before collecting lineage data, configure the lineage storage location for your workspace. Go to **Workspace Settings** and open the [**Lineage** settings][workspace-lineage] to set the storage bucket and path where the lineage data is stored and indexed. This applies to **all** pipeline runs in the workspace. +Before collecting lineage data, configure the lineage storage location for your workspace. Go to **Workspace Settings** and open the [**Lineage** settings][workspace-lineage] to set the storage bucket and path where lineage data is stored and indexed. This applies to all pipeline runs in the workspace. :::danger Changing the lineage storage bucket path after runs have generated lineage data will result in historic data loss. The lineage index is tied to this location — changing it makes existing records inaccessible. If you need to move the lineage data storage location, first copy all existing lineage data to the new bucket and path (for example, `aws s3 cp --recursive s3://old-bucket/path s3://new-bucket/path`), then update the workspace setting. From 22be2caec8c2d23761221e68cf992701af3e5b36 Mon Sep 17 00:00:00 2001 From: Rob Newman <61608+robnewman@users.noreply.github.com> Date: Mon, 4 May 2026 15:14:17 -0700 Subject: [PATCH 08/24] Update platform-cloud/docs/data/data-lineage.md Co-authored-by: Chris Hakkaart Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com> --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index 0e9cd799e..fd9d1b551 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -41,7 +41,7 @@ Each record gets a **lineage ID (LID)** — a `lid://` URI that uniquely identif Before collecting lineage data, configure the lineage storage location for your workspace. Go to **Workspace Settings** and open the [**Lineage** settings][workspace-lineage] to set the storage bucket and path where lineage data is stored and indexed. This applies to all pipeline runs in the workspace. :::danger -Changing the lineage storage bucket path after runs have generated lineage data will result in historic data loss. The lineage index is tied to this location — changing it makes existing records inaccessible. If you need to move the lineage data storage location, first copy all existing lineage data to the new bucket and path (for example, `aws s3 cp --recursive s3://old-bucket/path s3://new-bucket/path`), then update the workspace setting. +Changing the lineage storage bucket path after lineage data is generated results in historic data loss. The lineage index is tied to this location. Changing it makes existing records inaccessible. To move the storage location, first copy all existing lineage data to the new bucket and path (for example, `aws s3 cp --recursive s3://old-bucket/path s3://new-bucket/path`), then update the workspace setting. ::: ### Enable per pipeline lineage in Nextflow From aaeb9efc0484c7a9738ee086afab3a6c43f2ab07 Mon Sep 17 00:00:00 2001 From: Rob Newman <61608+robnewman@users.noreply.github.com> Date: Mon, 4 May 2026 15:14:32 -0700 Subject: [PATCH 09/24] Update platform-cloud/docs/data/data-lineage.md Co-authored-by: Chris Hakkaart Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com> --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index fd9d1b551..5ada0c935 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -53,7 +53,7 @@ lineage.enabled = true lineage.store.location = '' ``` -Only runs executed with this setting generate lineage data. Runs without it display a note on the Run Info tab: +Only runs executed with this setting generate lineage data. Runs without it display a note on the **Run Info** tab: > *Lineage tracking was not enabled for this run. Add `lineage.enabled = true` to your Nextflow config to capture lineage data.* From e66f2069cbe8999eb83b5858432a64731c18342b Mon Sep 17 00:00:00 2001 From: Rob Newman <61608+robnewman@users.noreply.github.com> Date: Mon, 4 May 2026 15:14:48 -0700 Subject: [PATCH 10/24] Update platform-cloud/docs/data/data-lineage.md Co-authored-by: Chris Hakkaart Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com> --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index 5ada0c935..bb158c418 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -46,7 +46,7 @@ Changing the lineage storage bucket path after lineage data is generated results ### Enable per pipeline lineage in Nextflow -To test lineage within a single pipeline, add the following to your Nextflow configuration file before running your pipeline: +To test lineage within a single pipeline, add the following to your Nextflow config file before running your pipeline: ```groovy lineage.enabled = true From bde5a3804edc98a75ceac0dff7f0054ff0b659a2 Mon Sep 17 00:00:00 2001 From: Rob Newman <61608+robnewman@users.noreply.github.com> Date: Mon, 4 May 2026 15:15:18 -0700 Subject: [PATCH 11/24] Update platform-cloud/docs/data/data-lineage.md Co-authored-by: Chris Hakkaart Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com> --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index bb158c418..e7445ae00 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -72,7 +72,7 @@ When a run was executed with lineage enabled, the [run details page][run-details **Outputs** — lists all `FileOutput` records linked to the workflow run: output name, file path, type, lineage ID, and lineage labels. Files link directly to [Data Explorer][data-explorer]. :::tip -All LIDs and lineage labels are clickable links. Clicking any LID opens the organization-level lineage search pre-filled with that identifier. +All LIDs and lineage labels are clickable links. Click any LID to open the organization-level lineage search pre-filled with that identifier. ::: ### Data Explorer From 009205792672c69d4fb9b5bc73275b60b206a434 Mon Sep 17 00:00:00 2001 From: Rob Newman <61608+robnewman@users.noreply.github.com> Date: Mon, 4 May 2026 15:15:52 -0700 Subject: [PATCH 12/24] Update platform-cloud/docs/data/data-lineage.md Co-authored-by: Chris Hakkaart Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com> --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index e7445ae00..f1c1095a8 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -77,7 +77,7 @@ All LIDs and lineage labels are clickable links. Click any LID to open the organ ### Data Explorer -Output objects generated by a lineage-enabled run display the **LID** and any **lineage labels** when you preview the object in Data Explorer. This lets you trace any file back to the pipeline run that produced it. +Output objects from a lineage-enabled run display their LID and any lineage labels when you preview the object in Data Explorer. You can trace any file back to the pipeline run that produced it. ## Lineage labels From 2e7599b07f9d591a09584c3f8e8d943abe291d3a Mon Sep 17 00:00:00 2001 From: Rob Newman <61608+robnewman@users.noreply.github.com> Date: Mon, 4 May 2026 15:16:23 -0700 Subject: [PATCH 13/24] Update platform-cloud/docs/data/data-lineage.md Co-authored-by: Chris Hakkaart Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com> --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index f1c1095a8..8f20ce946 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -83,7 +83,7 @@ Output objects from a lineage-enabled run display their LID and any lineage labe Assign lineage labels to output files using the `label` directive in your Nextflow process definitions. Labels appear in lineage records and are searchable across your workspace. -Both Seqera Platform labels and Nextflow lineage labels propagate to lineage records. Seqera Platform excludes **resource labels** — they relate to underlying compute resources, not the data itself. +Both Seqera Platform labels and Nextflow lineage labels propagate to lineage records. Seqera Platform excludes resource labels as they relate to underlying compute resources, not the data itself. :::info Nextflow lineage labels are **immutable** — they are set at execution time and cannot be changed. Seqera Platform labels are **mutable**. If you update Platform labels after a run completes, a mismatch between Platform run labels and lineage labels is possible. This is expected behavior. From 51c0afda22324abab2cc1c729eb0af9e483075d9 Mon Sep 17 00:00:00 2001 From: Rob Newman <61608+robnewman@users.noreply.github.com> Date: Mon, 4 May 2026 15:16:56 -0700 Subject: [PATCH 14/24] Update platform-cloud/docs/data/data-lineage.md Co-authored-by: Chris Hakkaart Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com> --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index 8f20ce946..10bd61b8e 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -86,7 +86,7 @@ Assign lineage labels to output files using the `label` directive in your Nextfl Both Seqera Platform labels and Nextflow lineage labels propagate to lineage records. Seqera Platform excludes resource labels as they relate to underlying compute resources, not the data itself. :::info -Nextflow lineage labels are **immutable** — they are set at execution time and cannot be changed. Seqera Platform labels are **mutable**. If you update Platform labels after a run completes, a mismatch between Platform run labels and lineage labels is possible. This is expected behavior. +Nextflow lineage labels are immutable. They are set at execution time and cannot be changed. Seqera Platform labels are mutable. Updating Platform labels after a run completes can produce a mismatch between Platform run labels and lineage labels. This is expected behavior. ::: {/* links */} From fcf1d2d4c798caab68067c4f88666b48924a3385 Mon Sep 17 00:00:00 2001 From: Phil Ewels Date: Tue, 5 May 2026 11:52:05 +0200 Subject: [PATCH 15/24] Add Data Lineage to the sidebar --- platform-cloud/cloud-sidebar.json | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/platform-cloud/cloud-sidebar.json b/platform-cloud/cloud-sidebar.json index ebcb78c61..f607cd437 100644 --- a/platform-cloud/cloud-sidebar.json +++ b/platform-cloud/cloud-sidebar.json @@ -98,7 +98,8 @@ "label": "Data", "items": [ "data/data-explorer", - "data/datasets" + "data/datasets", + "data/data-lineage" ] }, { From 7ed97dcc6873becb6db014869c5ce199a03f415a Mon Sep 17 00:00:00 2001 From: Chris Hakkaart Date: Tue, 5 May 2026 22:10:18 +1200 Subject: [PATCH 16/24] Apply suggestions from code review Merge changes that had conflicts earlier. Co-authored-by: Chris Hakkaart Signed-off-by: Chris Hakkaart --- platform-cloud/docs/data/data-lineage.md | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index 10bd61b8e..a2599d0a0 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -18,11 +18,11 @@ Data lineage tracks the full provenance of every pipeline run at both the task a Production pipelines generate results that teams need to trust, audit, and reproduce. Data lineage provides a precise, immutable record of how each result was produced. -- **Reproducibility**: Every run, task, and output file receives a unique **lineage ID (LID)** — a traversable URI pointing to a structured record of exactly what ran. You can verify that two runs produced identical results, or identify precisely where they diverged. -- **Auditing and compliance**: For teams in regulated industries (pharma, clinical genomics, CROs), lineage provides the audit trail needed for regulatory compliance. Each record captures inputs, outputs, parameters, compute environment, and the user who launched the run. -- **Debugging**: When a cached task unexpectedly re-executes or a pipeline produces an unexpected result, lineage lets you trace backward from any output to all contributing tasks and parameters. You can compare two task runs to isolate exactly what changed. -- **Broader team access**: Previously, exploring Nextflow lineage required CLI access and comfort reading raw JSON. Lineage data is surfaced in both pipeline run details pages and Data Explorer, so users can inspect provenance directly. -- **Cross-workflow discoverability**: [Workflow output labels][workflow-labels] make output files discoverable across runs. Rather than knowing which specific run produced a file, query lineage records by label to find all matching outputs workspace-wide. +- **Reproducibility**: Every run, task, and output file receives a unique lineage ID (LID), a traversable URI that points to a structured record of what ran. Verify that two runs produced identical results, or identify where they diverged. +- **Auditing and compliance**: For teams in regulated industries such as pharma, clinical genomics, and CROs, lineage provides the audit trail needed for regulatory compliance. Each record captures inputs, outputs, parameters, compute environment, and the user who launched the run. +- **Debugging**: When a cached task unexpectedly re-executes, or a pipeline produces an unexpected result, lineage traces backward from any output to all contributing tasks and parameters. Compare two task runs to isolate what changed. +- **Broader team access**: Exploring Nextflow lineage previously required CLI access and comfort reading raw JSON. Platform now surfaces lineage data in pipeline run detail pages and Data Explorer. Users can inspect provenance directly. +- **Cross-workflow discoverability**: [Workflow output labels][workflow-labels] make output files discoverable across runs. Query lineage records by label to find all matching outputs workspace-wide, without knowing which specific run produced a file. ## How data lineage works @@ -34,7 +34,7 @@ Nextflow creates a structured JSON record for each entity in your pipeline when | **TaskRun** | Individual task execution: script, code checksum, inputs, outputs, container, and dependencies | | **FileOutput** | Output file: path, checksum, size, timestamp, and links back to the task and workflow that produced it | -Each record gets a **lineage ID (LID)** — a `lid://` URI that uniquely identifies the entity. LIDs are navigable: every LID and lineage label is a clickable link that queries all related entities across your organization. +Each record gets a lineage ID (LID), a `lid://` URI that uniquely identifies the entity. Every LID and lineage label renders as a clickable link, letting you navigate to all related entities across your organization. ### Configure workspace settings @@ -63,13 +63,10 @@ Only runs executed with this setting generate lineage data. Runs without it disp When a run was executed with lineage enabled, the [run details page][run-details] displays lineage data across the following tabs: -**Run Info** — shows the lineage ID, lineage labels, and the full Platform context captured at execution time: user, workspace, compute environment, pipeline name, revision, and commit ID. - -**Tasks** — displays the lineage ID and lineage labels for each `TaskRun` alongside existing task data, so you can trace any task back to its lineage record. All task file inputs and outputs, and upstream and downstream tasks linked by lineage records are displayed. - -**Inputs** — lists all input datasets and parameters with file paths, types, and lineage IDs and lineage labels where available. - -**Outputs** — lists all `FileOutput` records linked to the workflow run: output name, file path, type, lineage ID, and lineage labels. Files link directly to [Data Explorer][data-explorer]. +- **Run Info**: Shows the lineage ID, lineage labels, and the full Platform context captured at execution time: user, workspace, compute environment, pipeline name, revision, and commit ID. +- **Tasks**: Displays the lineage ID and lineage labels for each `TaskRun` alongside existing task data, so you can trace any task back to its lineage record. All task file inputs and outputs, and upstream and downstream tasks linked by lineage records, are displayed. +- **Inputs**: Lists all input datasets and parameters with file paths, types, and lineage IDs and lineage labels where available. +- **Outputs**: Lists all `FileOutput` records linked to the workflow run: output name, file path, type, lineage ID, and lineage labels. Files link directly to [Data Explorer][data-explorer]. :::tip All LIDs and lineage labels are clickable links. Click any LID to open the organization-level lineage search pre-filled with that identifier. From 739f289cdcfd36ac3966733e359747f3bc5de7e3 Mon Sep 17 00:00:00 2001 From: Chris Hakkaart Date: Tue, 5 May 2026 22:11:33 +1200 Subject: [PATCH 17/24] Fix tag formatting in data-lineage.md Updated tags formatting for clarity and consistency. Signed-off-by: Chris Hakkaart --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index a2599d0a0..ae4832275 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -3,7 +3,7 @@ title: "Data Lineage" description: "Using data lineage in Seqera Platform." date created: "2026-05-04" last updated: "2026-05-04" -tags: [data lineage data-lineage provenance governance reproducibility lineage-id lid label] +tags: [data lineage, provenance, governance, reproducibility, lineage id, lid, label] --- :::info From 10844ef3547fc7bd9e25bc7083a470b851da8df1 Mon Sep 17 00:00:00 2001 From: Chris Hakkaart Date: Tue, 5 May 2026 22:12:49 +1200 Subject: [PATCH 18/24] Update data lineage documentation with warnings Added warnings about the experimental nature of data lineage. Signed-off-by: Chris Hakkaart --- platform-cloud/docs/data/data-lineage.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index ae4832275..021e94994 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -8,7 +8,9 @@ tags: [data lineage, provenance, governance, reproducibility, lineage id, lid, l :::info Data lineage in Platform is in public preview. It requires Nextflow v25.04 or later. Enable it per-pipeline (`lineage.enabled = true` in your Nextflow config) or workspace-wide via the **Lineage** workspace setting. +::: +:::warning The feature is experimental and subject to change. See this guide for the latest configuration recommendations and limitations. ::: From ece8f91c80c201e560e4a009511dc1b201de0f48 Mon Sep 17 00:00:00 2001 From: Chris Hakkaart Date: Tue, 5 May 2026 22:15:20 +1200 Subject: [PATCH 19/24] Fix link formatting in data-lineage documentation Signed-off-by: Chris Hakkaart --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index 021e94994..8ece3ebad 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -40,7 +40,7 @@ Each record gets a lineage ID (LID), a `lid://` URI that uniquely identifies the ### Configure workspace settings -Before collecting lineage data, configure the lineage storage location for your workspace. Go to **Workspace Settings** and open the [**Lineage** settings][workspace-lineage] to set the storage bucket and path where lineage data is stored and indexed. This applies to all pipeline runs in the workspace. +Before collecting lineage data, configure the lineage storage location for your workspace. Go to **Workspace Settings** and open the [**Lineage**][workspace-lineage] settings to set the storage bucket and path where lineage data is stored and indexed. This applies to all pipeline runs in the workspace. :::danger Changing the lineage storage bucket path after lineage data is generated results in historic data loss. The lineage index is tied to this location. Changing it makes existing records inaccessible. To move the storage location, first copy all existing lineage data to the new bucket and path (for example, `aws s3 cp --recursive s3://old-bucket/path s3://new-bucket/path`), then update the workspace setting. From cb5f557e6b8d85effff2bbd34f782b5195a355c3 Mon Sep 17 00:00:00 2001 From: Chris Hakkaart Date: Tue, 5 May 2026 22:26:31 +1200 Subject: [PATCH 20/24] Update data lineage documentation for clarity Clarified instructions for enabling data lineage in Nextflow. Signed-off-by: Chris Hakkaart --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index 8ece3ebad..a98508d86 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -7,7 +7,7 @@ tags: [data lineage, provenance, governance, reproducibility, lineage id, lid, l --- :::info -Data lineage in Platform is in public preview. It requires Nextflow v25.04 or later. Enable it per-pipeline (`lineage.enabled = true` in your Nextflow config) or workspace-wide via the **Lineage** workspace setting. +Data lineage in Platform is in public preview. It requires Nextflow v25.04 or later. ::: :::warning From f32a6c5df3600295d8c3ee52a5487a4ad6294dac Mon Sep 17 00:00:00 2001 From: Justine Geffen Date: Tue, 5 May 2026 17:39:36 +0200 Subject: [PATCH 21/24] Apply suggestion from @justinegeffen Signed-off-by: Justine Geffen --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index a98508d86..e9b63a6e2 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -16,7 +16,7 @@ The feature is experimental and subject to change. See this guide for the latest Data lineage tracks the full provenance of every pipeline run at both the task and workflow level, including what executed, what data it consumed, and what outputs it produced. Use it to audit results, verify reproducibility, and trace file provenance. -## Why data lineage matters +## Overview Production pipelines generate results that teams need to trust, audit, and reproduce. Data lineage provides a precise, immutable record of how each result was produced. From 0dc8853996d9689f481bd92f0f9f7c651e6e6759 Mon Sep 17 00:00:00 2001 From: Justine Geffen Date: Tue, 5 May 2026 17:44:31 +0200 Subject: [PATCH 22/24] Apply suggestion from @justinegeffen Signed-off-by: Justine Geffen --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index e9b63a6e2..ffb71ee6d 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -40,7 +40,7 @@ Each record gets a lineage ID (LID), a `lid://` URI that uniquely identifies the ### Configure workspace settings -Before collecting lineage data, configure the lineage storage location for your workspace. Go to **Workspace Settings** and open the [**Lineage**][workspace-lineage] settings to set the storage bucket and path where lineage data is stored and indexed. This applies to all pipeline runs in the workspace. +Before collecting lineage data, configure the lineage storage location for your workspace. Go to **Settings > Workspace Settings**. Select **Lineage** to set the storage bucket and path where lineage data is stored and indexed. This applies to all pipeline runs in the workspace. See [Lineage][workspace-lineage] for more information about the settings. :::danger Changing the lineage storage bucket path after lineage data is generated results in historic data loss. The lineage index is tied to this location. Changing it makes existing records inaccessible. To move the storage location, first copy all existing lineage data to the new bucket and path (for example, `aws s3 cp --recursive s3://old-bucket/path s3://new-bucket/path`), then update the workspace setting. From 5b32908c63a3b035171a3fc3d4b27ea5628acd26 Mon Sep 17 00:00:00 2001 From: Justine Geffen Date: Tue, 5 May 2026 17:45:07 +0200 Subject: [PATCH 23/24] Apply suggestion from @justinegeffen Signed-off-by: Justine Geffen --- platform-cloud/docs/data/data-lineage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index ffb71ee6d..63c25ecca 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -36,7 +36,7 @@ Nextflow creates a structured JSON record for each entity in your pipeline when | **TaskRun** | Individual task execution: script, code checksum, inputs, outputs, container, and dependencies | | **FileOutput** | Output file: path, checksum, size, timestamp, and links back to the task and workflow that produced it | -Each record gets a lineage ID (LID), a `lid://` URI that uniquely identifies the entity. Every LID and lineage label renders as a clickable link, letting you navigate to all related entities across your organization. +Each record gets a lineage ID (LID), a `lid://` URI that uniquely identifies the entity. Every LID and lineage label renders as a clickable link, and you can navigate to all related entities across your organization. ### Configure workspace settings From 0fef4a1464586ac632ca1f2df63adb383fd5c829 Mon Sep 17 00:00:00 2001 From: Rob Newman Date: Tue, 5 May 2026 11:34:16 -0700 Subject: [PATCH 24/24] chore: Clean up language, add toggle, add required IAM permissions --- platform-cloud/docs/data/data-lineage.md | 97 +++++++++++++++++++++--- 1 file changed, 87 insertions(+), 10 deletions(-) diff --git a/platform-cloud/docs/data/data-lineage.md b/platform-cloud/docs/data/data-lineage.md index 63c25ecca..ef43cffbc 100644 --- a/platform-cloud/docs/data/data-lineage.md +++ b/platform-cloud/docs/data/data-lineage.md @@ -7,7 +7,7 @@ tags: [data lineage, provenance, governance, reproducibility, lineage id, lid, l --- :::info -Data lineage in Platform is in public preview. It requires Nextflow v25.04 or later. +Data lineage in Platform is in public preview. It requires Nextflow v25.04 or later, and AWS S3 object storage. ::: :::warning @@ -24,7 +24,7 @@ Production pipelines generate results that teams need to trust, audit, and repro - **Auditing and compliance**: For teams in regulated industries such as pharma, clinical genomics, and CROs, lineage provides the audit trail needed for regulatory compliance. Each record captures inputs, outputs, parameters, compute environment, and the user who launched the run. - **Debugging**: When a cached task unexpectedly re-executes, or a pipeline produces an unexpected result, lineage traces backward from any output to all contributing tasks and parameters. Compare two task runs to isolate what changed. - **Broader team access**: Exploring Nextflow lineage previously required CLI access and comfort reading raw JSON. Platform now surfaces lineage data in pipeline run detail pages and Data Explorer. Users can inspect provenance directly. -- **Cross-workflow discoverability**: [Workflow output labels][workflow-labels] make output files discoverable across runs. Query lineage records by label to find all matching outputs workspace-wide, without knowing which specific run produced a file. +- **Cross-workflow discoverability**: [Workflow output labels][workflow-labels] make output files discoverable across runs. Navigate lineage records by label to find all matching outputs workspace-wide, without knowing which specific run produced a file. ## How data lineage works @@ -38,28 +38,105 @@ Nextflow creates a structured JSON record for each entity in your pipeline when Each record gets a lineage ID (LID), a `lid://` URI that uniquely identifies the entity. Every LID and lineage label renders as a clickable link, and you can navigate to all related entities across your organization. -### Configure workspace settings +## Enable data lineage -Before collecting lineage data, configure the lineage storage location for your workspace. Go to **Settings > Workspace Settings**. Select **Lineage** to set the storage bucket and path where lineage data is stored and indexed. This applies to all pipeline runs in the workspace. See [Lineage][workspace-lineage] for more information about the settings. +To start collecting data lineage for all pipeline runs in your workspace, go to **Settings > Workspace Settings**. Select **Lineage** and define the credentials, region, and (optionally) storage bucket and path where lineage data is stored and indexed. Toggle the **Enable lineage by default** on to collect data lineage for all pipeline runs in the workspace or toggle off to require per pipeline launch configuration. + +:::tip +If the storage bucket field is empty, a default bucket is generated for storing lineage data. +::: + +Once set, all pipeline runs in the workspace generate data lineage. See [Lineage][workspace-lineage] for more information about the settings. :::danger -Changing the lineage storage bucket path after lineage data is generated results in historic data loss. The lineage index is tied to this location. Changing it makes existing records inaccessible. To move the storage location, first copy all existing lineage data to the new bucket and path (for example, `aws s3 cp --recursive s3://old-bucket/path s3://new-bucket/path`), then update the workspace setting. +Changing the lineage storage bucket path after lineage data is generated will result in historic data loss. The lineage index is tied to the lineage storage bucket. Changing it makes existing records inaccessible. To move the storage location, first copy all existing lineage data to the new bucket and path (for example, `aws s3 cp --recursive s3://old-bucket/path s3://new-bucket/path`), then update the workspace setting. ::: -### Enable per pipeline lineage in Nextflow +When launching a pipeline in a data-lineage enabled workspace, the **Enable lineage** toggle in the pipeline **Run setup** reflects the **Enable lineage by default** workspace setting. This can be turned off to _explicitly exclude_ data lineage creation for the pipeline run. + +### Additional IAM permissions required + +If using existing AWS Batch or AWS Cloud compute environments with custom IAM roles, the following service role policies are required: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "ListObjectsInBucket", + "Effect": "Allow", + "Action": [ + "s3:ListBucket" + ], + "Resource": "arn:aws:s3:::seqera-lineage-" + }, + { + "Sid": "AllObjectActions", + "Effect": "Allow", + "Action": "s3:*Object", + "Resource": "arn:aws:s3:::seqera-lineage-/*" + }, + { + "Sid": "AllowObjectTagging", + "Effect": "Allow", + "Action": [ + "s3:PutObjectTagging", + "s3:GetObjectTagging" + ], + "Resource": "arn:aws:s3:::seqera-lineage-/*" + } + ] +} +``` + +Platform integration credentials require the following additional permissions: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "sqs:CreateQueue", + "sqs:GetQueueAttributes", + "sqs:SetQueueAttributes", + "sqs:GetQueueUrl", + "sqs:ReceiveMessage", + "sqs:DeleteMessage" + ], + "Resource": "arn:aws:sqs:*:*:seqera-lineage-*" + }, + { + "Effect": "Allow", + "Action": [ + "s3:CreateBucket", + "s3:GetBucketNotificationConfiguration", + "s3:PutBucketNotificationConfiguration", + "s3:GetBucketLocation" + ], + "Resource": "arn:aws:s3:::seqera-lineage-*" + } + ] +} +``` + +### Advanced: Experimenting with data lineage -To test lineage within a single pipeline, add the following to your Nextflow config file before running your pipeline: +To test or troubleshoot data lineage for a _specific pipeline_, add the following to your **Nextflow config file** under **Advanced options** when _adding_ a pipeline to the launchpad. ```groovy lineage.enabled = true lineage.store.location = '' ``` -Only runs executed with this setting generate lineage data. Runs without it display a note on the **Run Info** tab: +To test for a _single pipeline run_, add the same code to your **Nextflow config file** under **Advanced options** when _launching_ the pipeline run. -> *Lineage tracking was not enabled for this run. Add `lineage.enabled = true` to your Nextflow config to capture lineage data.* +:::warning +If data lineage is defined for a workspace, only that data is displayed in Platform. Any unique _specific pipeline_ or _single pipeline run_ lineage data is only accessible via the AWS S3 console and other related services (such as Amazon Athena). +::: -## Data lineage displayed in Seqera Platform +## Data lineage displayed in Platform ### Workflow run details