feat: Azure get-well [COMP-806]#1315
feat: Azure get-well [COMP-806]#1315justinegeffen wants to merge 17 commits intoenterprise-26.1-documentationfrom
Conversation
✅ Deploy Preview for seqera-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
Revert studios/add-studio.md to master version. The overview.md whitespace fixes are enforced by pre-commit hooks and remain as-is. Cloud changes ported to azure-getwell-cloud branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore custom-roles.md and roles.md to master versions. Cloud changes to be ported separately to azure-getwell-cloud branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s/docs into justine-azure-get-well
Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
|
@MichaelTansiniSeqera, this is good to review. |
| #### Accounts | ||
|
|
||
| Azure uses accounts for each service. For example, an [Azure Storage account][az-learn-storage] will house a collection of blob containers, file shares, queues, and tables. An Azure subscription can have multiple Azure Storage and Azure Batch accounts - however, a Platform compute environment can only use one of each. Multiple Platform compute environments can be created to use separate credentials, Azure Storage accounts, and Azure Batch accounts. | ||
| Azure uses accounts for each service. For example, an [Azure Storage account][azure-storage-account] will house a collection of blob containers, file shares, queues, and tables. An Azure subscription can have multiple Azure Storage and Azure Batch accounts - however, a Platform compute environment can only use one of each. Multiple Platform compute environments can be created to use separate credentials, Azure Storage accounts, and Azure Batch accounts. |
There was a problem hiding this comment.
"At a minimum, you will require an Azure Batch account and an Azure storage account to run pipelines with Azure Batch with Seqera. This is because Azure uses accounts fo reach service"
| 10. Select **Review and Create**. | ||
| 11. Select **Create**. | ||
| 12. Go to your new Batch account, then select **Access Keys**. | ||
| 13. Store the access keys for your Azure Batch account, to be used when you create a Seqera compute environment. |
There was a problem hiding this comment.
only if they are using keys - do we need a section on creating Entra and Service Principal and MIs?
There was a problem hiding this comment.
I think we should add here that Entra is now recommended as the credential mechanism
| When you use separate head and worker pools, you can assign a different managed identity to head and worker pools. Each pool receives only the managed identity relevant to its role. | ||
| ::: | ||
|
|
||
| 4. When you set up the Seqera compute environment, select the Azure Batch pool by name and enter the managed identity **client ID** and (optionally) the **resource ID** in the specified fields. The resource ID is the full ARM path of the managed identity (e.g., `/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}`). |
There was a problem hiding this comment.
if using Forge rather than optionally
| When you submit a pipeline to this compute environment, Nextflow will authenticate using the managed identity associated with the Azure Batch node it runs on, rather than relying on access keys. | ||
|
|
||
| :::caution | ||
| If a managed identity is misconfigured (e.g., invalid client ID or missing RBAC roles), the pipeline will fail with an explicit error. Seqera will not silently fall back to access key authentication. |
There was a problem hiding this comment.
@adamrtalbot or @jonmarti correct me if I'm wrong but we fall back to the Entra Service Principal if the MI is wrong, but we DON'T do that for access keys right? If so, we should make that clear here
There was a problem hiding this comment.
The fallback model is a bit subtler than "MI wrong → fall back to SP". Two layers behave differently:
Control plane (Platform → Azure Batch / Storage): whatever credential is configured (access keys or Entra service principal) is what Platform uses. The two are alternatives, not a runtime chain. Bad keys, invalid/expired client secrets, or missing RBAC all fail loudly with no silent fallback. Some features (VNet/subnet, managed identity assignment) require Entra because the underlying operation only accepts AAD tokens.
Data plane on the VM (Nextflow head + tasks → Storage / ACR):
| Credential | Head MI configured? | How the head VM authenticates |
|---|---|---|
| Access keys | n/a | Storage account key provisioned to the VM |
| Entra SP | No | SP credentials passed into the Nextflow config ⚠ secret on the VM |
| Entra SP | Yes | Short-lived token from the Azure metadata service (no creds on the VM) |
| Entra SP | Yes, but invalid / missing RBAC | Fails — no silent fallback to the SP at runtime |
So the existing caution is correct for runtime. What's worth adding is the config-time half , that's where the "fallback" really lives:
When a head managed identity is not configured, Platform passes the service principal credentials to the head job so it can authenticate to Azure services. The managed identity removes the long-lived secret from the compute node, so configuring it is recommended for production deployments. The same applies to the pool managed identity used by compute tasks.
| 16. Enable **Dispose resources** for Seqera to automatically delete the Batch pools if the compute environment is deleted on the platform. | ||
|
|
||
| :::info | ||
| Batch Forge creates separate Azure Batch pools for the Nextflow head job and compute tasks by default (named `tower-pool-{envId}-head` and `tower-pool-{envId}-worker`). This prevents the head node from competing for resources with compute tasks and allows independent sizing of each pool. |
There was a problem hiding this comment.
Add "please bear in mind when running multiple Azure Batch compute environments the Azure Batch initial limits are 100 by default, with a maximum of 500. For further context on limits see here
There was a problem hiding this comment.
The default limit for us is zero! I had to argue profusely to lift it and they genuinely tried to tell me "you only need three for dev, staging, prod" 😆
There was a problem hiding this comment.
😱 Maybe just warn about limits when running multiple CEs then as they'll be exhausted twice as fast?
Fixes: https://seqera.atlassian.net/browse/COMP-806.
Fixes: https://seqera.atlassian.net/browse/EDU-1058