diff --git a/platform-enterprise_docs/enterprise-sidebar.json b/platform-enterprise_docs/enterprise-sidebar.json index 3939a3408..106f0d2ad 100644 --- a/platform-enterprise_docs/enterprise-sidebar.json +++ b/platform-enterprise_docs/enterprise-sidebar.json @@ -21,7 +21,8 @@ "items": [ "enterprise/platform-helm", "enterprise/platform-kubernetes", - "enterprise/platform-docker-compose" + "enterprise/platform-docker-compose", + "enterprise/disaster-recovery" ] }, { diff --git a/platform-enterprise_docs/enterprise/disaster-recovery.md b/platform-enterprise_docs/enterprise/disaster-recovery.md new file mode 100644 index 000000000..e63781baf --- /dev/null +++ b/platform-enterprise_docs/enterprise/disaster-recovery.md @@ -0,0 +1,91 @@ +--- +title: "Platform disaster recovery" +description: Plan backup, restore, and recovery steps for Seqera Platform Enterprise deployments +date created: "2026-04-07" +tags: [installation, deployment, disaster recovery, backup, restore] +--- + +Use this guide to define a disaster recovery (DR) plan for Seqera Platform Enterprise before you need to restore service after an infrastructure loss or a region-level incident. + +Seqera Platform does not create a DR plan for you. Your recovery procedure depends on the infrastructure that hosts Platform, your database and Redis services, your container registry access, and the backup capabilities offered by your cloud provider or platform team. + +## What to protect + +Back up and document the parts of your deployment that you will need to rebuild Platform: + +- The Platform SQL database and its restore procedure. +- Your Platform configuration, including `tower.env`, `tower.yml`, Helm values, Kubernetes manifests, or `docker-compose.yml`. +- Your `TOWER_CRYPTO_SECRETKEY` value and any rotation-related keys. Existing encrypted secrets in the Platform database cannot be decrypted without the correct key material. +- TLS certificates, identity provider settings, registry credentials, and any other secrets required to start Platform. +- The storage locations and infrastructure dependencies referenced by your Platform deployment, such as load balancers, DNS records, persistent volumes, and mirrored container images. + +:::warning +Back up your Platform database before changing the crypto secret key or running key rotation. For more information, see [Configuration overview](./configuration/overview#secret-key-rotation). +::: + +## Define recovery targets + +Document the following targets with your operations team: + +- Recovery point objective (RPO): how much recent Platform state you can afford to lose. +- Recovery time objective (RTO): how long Platform can remain unavailable. +- Recovery owner: who can restore the database, recreate infrastructure, and validate the application. + +Your deployment model directly affects these targets: + +- Kubernetes and Helm deployments can be rebuilt on new infrastructure more easily, especially when Platform runs with external managed database and Redis services. +- Docker Compose deployments are single-instance by design. Restoring them normally requires application downtime while the host, configuration, and backing services are rebuilt. + +## Recommended backup strategy + +At minimum, maintain: + +1. Regular database backups or snapshots for the SQL database used by Platform. +2. Version-controlled copies of your deployment manifests and configuration overrides. +3. A secure copy of the active crypto secret key and any required supporting secrets. +4. A written restore runbook that includes DNS, ingress, load balancer, and certificate steps. + +For production environments, use the backup and replication features provided by your infrastructure: + +- Managed SQL backups, snapshots, and cross-region replicas where required by your RPO and RTO. +- Backups for any persistent volumes or host-attached storage used by your deployment. +- Registry mirroring for Platform images if your environment cannot rely on direct access to `cr.seqera.io` during recovery. + +## Recovery workflow + +### Kubernetes or Helm deployments + +1. Recreate or fail over the Kubernetes cluster and its supporting infrastructure. +2. Restore access to the SQL database, Redis service, secrets, ingress, and DNS records. +3. Reapply your Helm values or Kubernetes manifests. +4. Restore the SQL database from the selected backup or snapshot. +5. Confirm that Platform starts with the same crypto secret key used to encrypt the existing database contents. +6. Validate login, workspace access, and workflow launch behavior. + +### Docker Compose deployments + +1. Provision a replacement host or recover the existing host. +2. Restore `tower.env`, `tower.yml`, `docker-compose.yml`, certificates, and secret material. +3. Restore or recreate the external SQL database and Redis service used by Platform. +4. Start Platform with `docker compose up` and allow migrations and startup checks to finish. +5. Validate login, workspace access, and workflow launch behavior before switching traffic back. + +## Validation checklist + +Test your DR plan on a schedule that matches your organization's risk requirements. During each exercise, confirm that you can: + +- Restore the database from a recent backup. +- Start Platform with the correct crypto secret key and configuration. +- Reach the frontend through the expected DNS and TLS path. +- Log in and access organizations, workspaces, and compute environments. +- Launch a small workflow to verify end-to-end operation. + +The [Test deployment](./testing) guide provides a simple post-recovery smoke test you can adapt for DR exercises. + +## Related guides + +- [Platform installation overview](./install-platform) +- [Platform: Helm](./platform-helm) +- [Platform: Kubernetes](./platform-kubernetes) +- [Platform: Docker Compose](./platform-docker-compose) +- [Test deployment](./testing) diff --git a/platform-enterprise_docs/enterprise/install-platform.md b/platform-enterprise_docs/enterprise/install-platform.md index d91df41b3..d73d628f4 100644 --- a/platform-enterprise_docs/enterprise/install-platform.md +++ b/platform-enterprise_docs/enterprise/install-platform.md @@ -18,6 +18,8 @@ Seqera Platform Enterprise can be deployed using Docker Compose, Kubernetes, or See each deployment guide for detailed requirements. +For backup, restore, and recovery planning, see [Platform disaster recovery](./disaster-recovery). + ## Prerequisites :::info diff --git a/platform-enterprise_docs/enterprise/platform-docker-compose.md b/platform-enterprise_docs/enterprise/platform-docker-compose.md index 751847ca6..f7dc7be84 100644 --- a/platform-enterprise_docs/enterprise/platform-docker-compose.md +++ b/platform-enterprise_docs/enterprise/platform-docker-compose.md @@ -119,3 +119,7 @@ Seqera Platform offers a service that optimizes pipeline resource requests. Refe :::note Studios is available from Seqera Platform v24.1. If you experience any problems during the deployment process please contact your account executive. Studios in Enterprise is not installed by default. ::: + +## Disaster recovery planning + +Docker Compose deployments are single-instance by design, so recovery normally requires service downtime while you restore the host, configuration, and backing services. For backup, restore, and validation guidance, see [Platform disaster recovery](./disaster-recovery). diff --git a/platform-enterprise_docs/enterprise/platform-helm.md b/platform-enterprise_docs/enterprise/platform-helm.md index ca68df1d0..b36e5ecba 100644 --- a/platform-enterprise_docs/enterprise/platform-helm.md +++ b/platform-enterprise_docs/enterprise/platform-helm.md @@ -58,6 +58,10 @@ helm upgrade my-release oci://public.cr.seqera.io/charts/platform \ --values my-values.yaml ``` +## Disaster recovery planning + +Define your backup, restore, and validation procedure before promoting a Helm deployment to production. For DR guidance, including database backups, crypto key handling, and post-restore checks, see [Platform disaster recovery](./disaster-recovery). + ## Uninstalling the Helm chart To uninstall the Seqera Platform Enterprise Helm chart, run the following command, replacing `my-release` and `my-namespace` with your release name and namespace: diff --git a/platform-enterprise_docs/enterprise/platform-kubernetes.md b/platform-enterprise_docs/enterprise/platform-kubernetes.md index 166014164..5c424b79c 100644 --- a/platform-enterprise_docs/enterprise/platform-kubernetes.md +++ b/platform-enterprise_docs/enterprise/platform-kubernetes.md @@ -206,6 +206,8 @@ To configure Seqera Enterprise for high availability, note that: - The `cron` service may only have a single instance - The `groundswell` service may only have a single instance +For backup, restore, and validation planning, see [Platform disaster recovery](./disaster-recovery). + [aws-configure-ingress]: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/annotations/ [azure-configure-ingress]: https://docs.microsoft.com/en-us/azure/application-gateway/ingress-controller-annotations [google-configure-ingress]: https://cloud.google.com/kubernetes-engine/docs/concepts/ingress diff --git a/platform-enterprise_versioned_docs/version-25.3/enterprise-sidebar.json b/platform-enterprise_versioned_docs/version-25.3/enterprise-sidebar.json index bde267108..0f2869bd2 100644 --- a/platform-enterprise_versioned_docs/version-25.3/enterprise-sidebar.json +++ b/platform-enterprise_versioned_docs/version-25.3/enterprise-sidebar.json @@ -21,7 +21,8 @@ "items": [ "enterprise/platform-helm", "enterprise/platform-kubernetes", - "enterprise/platform-docker-compose" + "enterprise/platform-docker-compose", + "enterprise/disaster-recovery" ] }, { diff --git a/platform-enterprise_versioned_docs/version-25.3/enterprise/disaster-recovery.md b/platform-enterprise_versioned_docs/version-25.3/enterprise/disaster-recovery.md new file mode 100644 index 000000000..e63781baf --- /dev/null +++ b/platform-enterprise_versioned_docs/version-25.3/enterprise/disaster-recovery.md @@ -0,0 +1,91 @@ +--- +title: "Platform disaster recovery" +description: Plan backup, restore, and recovery steps for Seqera Platform Enterprise deployments +date created: "2026-04-07" +tags: [installation, deployment, disaster recovery, backup, restore] +--- + +Use this guide to define a disaster recovery (DR) plan for Seqera Platform Enterprise before you need to restore service after an infrastructure loss or a region-level incident. + +Seqera Platform does not create a DR plan for you. Your recovery procedure depends on the infrastructure that hosts Platform, your database and Redis services, your container registry access, and the backup capabilities offered by your cloud provider or platform team. + +## What to protect + +Back up and document the parts of your deployment that you will need to rebuild Platform: + +- The Platform SQL database and its restore procedure. +- Your Platform configuration, including `tower.env`, `tower.yml`, Helm values, Kubernetes manifests, or `docker-compose.yml`. +- Your `TOWER_CRYPTO_SECRETKEY` value and any rotation-related keys. Existing encrypted secrets in the Platform database cannot be decrypted without the correct key material. +- TLS certificates, identity provider settings, registry credentials, and any other secrets required to start Platform. +- The storage locations and infrastructure dependencies referenced by your Platform deployment, such as load balancers, DNS records, persistent volumes, and mirrored container images. + +:::warning +Back up your Platform database before changing the crypto secret key or running key rotation. For more information, see [Configuration overview](./configuration/overview#secret-key-rotation). +::: + +## Define recovery targets + +Document the following targets with your operations team: + +- Recovery point objective (RPO): how much recent Platform state you can afford to lose. +- Recovery time objective (RTO): how long Platform can remain unavailable. +- Recovery owner: who can restore the database, recreate infrastructure, and validate the application. + +Your deployment model directly affects these targets: + +- Kubernetes and Helm deployments can be rebuilt on new infrastructure more easily, especially when Platform runs with external managed database and Redis services. +- Docker Compose deployments are single-instance by design. Restoring them normally requires application downtime while the host, configuration, and backing services are rebuilt. + +## Recommended backup strategy + +At minimum, maintain: + +1. Regular database backups or snapshots for the SQL database used by Platform. +2. Version-controlled copies of your deployment manifests and configuration overrides. +3. A secure copy of the active crypto secret key and any required supporting secrets. +4. A written restore runbook that includes DNS, ingress, load balancer, and certificate steps. + +For production environments, use the backup and replication features provided by your infrastructure: + +- Managed SQL backups, snapshots, and cross-region replicas where required by your RPO and RTO. +- Backups for any persistent volumes or host-attached storage used by your deployment. +- Registry mirroring for Platform images if your environment cannot rely on direct access to `cr.seqera.io` during recovery. + +## Recovery workflow + +### Kubernetes or Helm deployments + +1. Recreate or fail over the Kubernetes cluster and its supporting infrastructure. +2. Restore access to the SQL database, Redis service, secrets, ingress, and DNS records. +3. Reapply your Helm values or Kubernetes manifests. +4. Restore the SQL database from the selected backup or snapshot. +5. Confirm that Platform starts with the same crypto secret key used to encrypt the existing database contents. +6. Validate login, workspace access, and workflow launch behavior. + +### Docker Compose deployments + +1. Provision a replacement host or recover the existing host. +2. Restore `tower.env`, `tower.yml`, `docker-compose.yml`, certificates, and secret material. +3. Restore or recreate the external SQL database and Redis service used by Platform. +4. Start Platform with `docker compose up` and allow migrations and startup checks to finish. +5. Validate login, workspace access, and workflow launch behavior before switching traffic back. + +## Validation checklist + +Test your DR plan on a schedule that matches your organization's risk requirements. During each exercise, confirm that you can: + +- Restore the database from a recent backup. +- Start Platform with the correct crypto secret key and configuration. +- Reach the frontend through the expected DNS and TLS path. +- Log in and access organizations, workspaces, and compute environments. +- Launch a small workflow to verify end-to-end operation. + +The [Test deployment](./testing) guide provides a simple post-recovery smoke test you can adapt for DR exercises. + +## Related guides + +- [Platform installation overview](./install-platform) +- [Platform: Helm](./platform-helm) +- [Platform: Kubernetes](./platform-kubernetes) +- [Platform: Docker Compose](./platform-docker-compose) +- [Test deployment](./testing) diff --git a/platform-enterprise_versioned_docs/version-25.3/enterprise/install-platform.md b/platform-enterprise_versioned_docs/version-25.3/enterprise/install-platform.md index 4f7b4d09d..a5311a588 100644 --- a/platform-enterprise_versioned_docs/version-25.3/enterprise/install-platform.md +++ b/platform-enterprise_versioned_docs/version-25.3/enterprise/install-platform.md @@ -18,6 +18,8 @@ Seqera Platform Enterprise can be deployed using Docker Compose, Kubernetes, or See each deployment guide for detailed requirements. +For backup, restore, and recovery planning, see [Platform disaster recovery](./disaster-recovery). + ## Prerequisites :::info diff --git a/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-docker-compose.md b/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-docker-compose.md index 751847ca6..f7dc7be84 100644 --- a/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-docker-compose.md +++ b/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-docker-compose.md @@ -119,3 +119,7 @@ Seqera Platform offers a service that optimizes pipeline resource requests. Refe :::note Studios is available from Seqera Platform v24.1. If you experience any problems during the deployment process please contact your account executive. Studios in Enterprise is not installed by default. ::: + +## Disaster recovery planning + +Docker Compose deployments are single-instance by design, so recovery normally requires service downtime while you restore the host, configuration, and backing services. For backup, restore, and validation guidance, see [Platform disaster recovery](./disaster-recovery). diff --git a/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-helm.md b/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-helm.md index ca68df1d0..b36e5ecba 100644 --- a/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-helm.md +++ b/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-helm.md @@ -58,6 +58,10 @@ helm upgrade my-release oci://public.cr.seqera.io/charts/platform \ --values my-values.yaml ``` +## Disaster recovery planning + +Define your backup, restore, and validation procedure before promoting a Helm deployment to production. For DR guidance, including database backups, crypto key handling, and post-restore checks, see [Platform disaster recovery](./disaster-recovery). + ## Uninstalling the Helm chart To uninstall the Seqera Platform Enterprise Helm chart, run the following command, replacing `my-release` and `my-namespace` with your release name and namespace: diff --git a/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-kubernetes.md b/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-kubernetes.md index 48ce997af..28f76edc5 100644 --- a/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-kubernetes.md +++ b/platform-enterprise_versioned_docs/version-25.3/enterprise/platform-kubernetes.md @@ -204,6 +204,8 @@ To configure Seqera Enterprise for high availability, note that: - The `cron` service may only have a single instance - The `groundswell` service may only have a single instance +For backup, restore, and validation planning, see [Platform disaster recovery](./disaster-recovery). + [aws-configure-ingress]: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/annotations/ [azure-configure-ingress]: https://docs.microsoft.com/en-us/azure/application-gateway/ingress-controller-annotations [google-configure-ingress]: https://cloud.google.com/kubernetes-engine/docs/concepts/ingress diff --git a/platform-enterprise_versioned_docs/version-25.3/getting-started/production-checklist.md b/platform-enterprise_versioned_docs/version-25.3/getting-started/production-checklist.md index 5c25f700c..cfe7b1fc3 100644 --- a/platform-enterprise_versioned_docs/version-25.3/getting-started/production-checklist.md +++ b/platform-enterprise_versioned_docs/version-25.3/getting-started/production-checklist.md @@ -2,7 +2,7 @@ title: "Production checklist" description: "A pre-production checklist for Seqera Platform." date created: "2025-07-03" -last updated: "2026-03-25" +last updated: "2026-04-07" tags: [production, checklist, deployment, limitations, retry] --- @@ -83,6 +83,17 @@ Do not rotate credentials during active pipeline runs. Schedule rotations during Use [Pipeline Secrets](../secrets/overview) to manage sensitive values such as API keys for third-party services. Secrets are injected at runtime and are not exposed in pipeline logs or configuration files. +## Disaster recovery planning + +Teams often discover gaps in disaster recovery planning only when they are asked to prepare for an audit or simulation exercise. Before go-live: + +- Define your recovery time objective (RTO) and recovery point objective (RPO). +- Decide whether your DR scenario assumes in-place recovery or full account recreation. +- Verify that you back up the Seqera database, deployment configuration, secrets, TLS assets, and external dependency configuration on a schedule that matches your RPO. +- Run at least one recovery drill in a non-production environment and record the real recovery time and manual steps required. + +See [Disaster recovery](../enterprise/disaster-recovery) for a deployment-focused recovery planning guide. + ## Compute environment permissions Permissions within shared compute environments are a frequent source of unexpected behavior, particularly when multiple teams use the same workspace.