Gwright99/1338 isolate fusion permissions for gcp batch#1341
Gwright99/1338 isolate fusion permissions for gcp batch#1341
Conversation
- Fixes #1338 - Made permission requirements more granular for GCS Fuse & Fusion runs, and deprioritized suggest to use overly-powerful 'roles/storage.admin' role.
✅ Deploy Preview for seqera-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
@schaluva -- Can you please carve out some time to do a few tests runs with the permissions I've laid out? Based on the sources I've pulled from, I'm pretty confident in the changes within this PR but it would be good to have independent, up-to-date verification. |
|
Reviewers (other than @schaluva ) please hold off on review for now -- I may have mixed up some of the content between "Permissions needed by Platform" vs "Permissions needed by Fusion when running in the GCP CE". We're sorting this out now. |
|
The documentation only defines a single "custom service account" and tells users to assign it a set of permissions. However, it should be defined as two separate service account identities in GCP Batch. Permissions needed by Platform maps to the credentials SA (the JSON key in Platform Credentials) — that's the orchestration layer: Batch API, Compute API, Logging read, and reading the pipeline log file back from GCS to display in the UI. It has no object-level storage permissions. Permissions needed by Fusion/GCSFuse maps to the head job SA (the "Head Job Service Account" field in the CE form) — that's the runtime layer. The VM and Nextflow both run as this SA. All GCS data permissions live here, not on the Platform credential. The one Fusion-specific addition to GCSFuse is storage.buckets.get — Fusion probes bucket metadata at mount time on every bucket it touches (work-dir, inputs, publishDir). One gap worth flagging: roles/storage.objectUser is missing storage.buckets.get, so it's not sufficient for the Fusion work-dir case on its own. No predefined GCP role covers all five Fusion permissions cleanly — a custom role might be the right call here. Data Explorer is a third functional permission set and should be documented separately. For Data Explorer to work, the principal needs: storage.buckets.list, storage.buckets.get, storage.objects.list, storage.objects.get, and storage.objects.create if file upload is required. |
…ssions for publication bucket.
…ssions for publication bucket (GKE).
|
@schaluva -- Made a few updates but there are a few outstanding things remaining.
Todo:
|
|
(Semi-verified) storage permissions for Seqera Platform SA: CE Creation Screen (GCP Batch, GCP Cloud) -- Semi-Validated
Pipeline Run Screen -- Investigation Required
Data Explorer (Auto-discovery) -- Validated
Data Explorer (Manual) -- Validated
Data Explorer (Read / Navigate ) -- Validated
Assumes credential also supports working Data Explorer Auto-discovery / Manual mechanism. Data Explorer (Create / Download) -- Validated
Assumes credential also supports working Data Explorer Auto-discovery / Manual mechanism, and Data Explorer Read / Navigate mechanism. Studios -- Requires ValidationPiggybacks on Data-Link permissions. Ensure Match to Pre-Existing Roles -- Requires Validation
|


This PR provides updated, more granular, guidance re: storage permissions in GCP.
Effort originally focused on fixing GCP Batch documentation (in response to customer bug report) but the scope grew to also encompass Google Cloud (single VM) and GKE once their content was also reviewed.
NOTE:
Updated permissions were verified against minimal access configurations used by @ejseqera during rounds of testing in GCP.
Existing structure does not offer a mechanism to define common guidance once, so I sacrificed being DRY to stay in alignment with the existing overall structure of the docs site. This means the same language is repeated in multiple places, with some minor differences where relevant.