-
Notifications
You must be signed in to change notification settings - Fork 6
[codex] Add Platform disaster recovery guidance #1283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
llewellyn-sl
wants to merge
19
commits into
master
Choose a base branch
from
EDU-789-docs-draft
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
b27ea85
docs: add enterprise disaster recovery guidance [EDU-789]
llewellyn-sl cb4a8ed
EDU-789: add platform disaster recovery docs
llewellyn-sl 9545e39
EDU-789: add platform disaster recovery docs
llewellyn-sl ed19fe9
EDU-789: add platform disaster recovery docs
llewellyn-sl b1df8b7
EDU-789: add platform disaster recovery docs
llewellyn-sl 7ce472b
EDU-789: add platform disaster recovery docs
llewellyn-sl 655f3ad
EDU-789: add platform disaster recovery docs
llewellyn-sl 9f6fd77
EDU-789: add platform disaster recovery docs
llewellyn-sl 90805ac
EDU-789: add platform disaster recovery docs
llewellyn-sl 3b2ccdd
EDU-789: add platform disaster recovery docs
llewellyn-sl 2d6065a
EDU-789: add platform disaster recovery docs
llewellyn-sl f6d9f92
EDU-789: add platform disaster recovery docs
llewellyn-sl 54c6581
EDU-789: add platform disaster recovery docs
llewellyn-sl 102eb43
EDU-789: add platform disaster recovery docs
llewellyn-sl 8859eb4
EDU-789: add platform disaster recovery docs
llewellyn-sl 0e80304
Merge branch 'master' into EDU-789-docs-draft
justinegeffen c4fd00f
Merge branch 'master' into EDU-789-docs-draft
justinegeffen f0e55f1
Merge branch 'master' into EDU-789-docs-draft
justinegeffen 492b814
Merge branch 'master' into EDU-789-docs-draft
justinegeffen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| --- | ||
| title: "Platform disaster recovery" | ||
| description: Plan backup, restore, and recovery steps for Seqera Platform Enterprise deployments | ||
| date created: "2026-04-07" | ||
| tags: [installation, deployment, disaster recovery, backup, restore] | ||
| --- | ||
|
|
||
| Use this guide to define a disaster recovery (DR) plan for Seqera Platform Enterprise before you need to restore service after an infrastructure loss or a region-level incident. | ||
|
|
||
| Seqera Platform does not create a DR plan for you. Your recovery procedure depends on the infrastructure that hosts Platform, your database and Redis services, your container registry access, and the backup capabilities offered by your cloud provider or platform team. | ||
|
|
||
| ## What to protect | ||
|
|
||
| Back up and document the parts of your deployment that you will need to rebuild Platform: | ||
|
|
||
| - The Platform SQL database and its restore procedure. | ||
| - Your Platform configuration, including `tower.env`, `tower.yml`, Helm values, Kubernetes manifests, or `docker-compose.yml`. | ||
| - Your `TOWER_CRYPTO_SECRETKEY` value and any rotation-related keys. Existing encrypted secrets in the Platform database cannot be decrypted without the correct key material. | ||
| - TLS certificates, identity provider settings, registry credentials, and any other secrets required to start Platform. | ||
| - The storage locations and infrastructure dependencies referenced by your Platform deployment, such as load balancers, DNS records, persistent volumes, and mirrored container images. | ||
|
|
||
| :::warning | ||
| Back up your Platform database before changing the crypto secret key or running key rotation. For more information, see [Configuration overview](./configuration/overview#secret-key-rotation). | ||
| ::: | ||
|
|
||
| ## Define recovery targets | ||
|
|
||
| Document the following targets with your operations team: | ||
|
|
||
| - Recovery point objective (RPO): how much recent Platform state you can afford to lose. | ||
| - Recovery time objective (RTO): how long Platform can remain unavailable. | ||
| - Recovery owner: who can restore the database, recreate infrastructure, and validate the application. | ||
|
|
||
| Your deployment model directly affects these targets: | ||
|
|
||
| - Kubernetes and Helm deployments can be rebuilt on new infrastructure more easily, especially when Platform runs with external managed database and Redis services. | ||
| - Docker Compose deployments are single-instance by design. Restoring them normally requires application downtime while the host, configuration, and backing services are rebuilt. | ||
|
|
||
| ## Recommended backup strategy | ||
|
|
||
| At minimum, maintain: | ||
|
|
||
| 1. Regular database backups or snapshots for the SQL database used by Platform. | ||
| 2. Version-controlled copies of your deployment manifests and configuration overrides. | ||
| 3. A secure copy of the active crypto secret key and any required supporting secrets. | ||
| 4. A written restore runbook that includes DNS, ingress, load balancer, and certificate steps. | ||
|
|
||
| For production environments, use the backup and replication features provided by your infrastructure: | ||
|
|
||
| - Managed SQL backups, snapshots, and cross-region replicas where required by your RPO and RTO. | ||
| - Backups for any persistent volumes or host-attached storage used by your deployment. | ||
| - Registry mirroring for Platform images if your environment cannot rely on direct access to `cr.seqera.io` during recovery. | ||
|
|
||
| ## Recovery workflow | ||
|
|
||
| ### Kubernetes or Helm deployments | ||
|
|
||
| 1. Recreate or fail over the Kubernetes cluster and its supporting infrastructure. | ||
| 2. Restore access to the SQL database, Redis service, secrets, ingress, and DNS records. | ||
| 3. Reapply your Helm values or Kubernetes manifests. | ||
| 4. Restore the SQL database from the selected backup or snapshot. | ||
| 5. Confirm that Platform starts with the same crypto secret key used to encrypt the existing database contents. | ||
| 6. Validate login, workspace access, and workflow launch behavior. | ||
|
|
||
| ### Docker Compose deployments | ||
|
|
||
| 1. Provision a replacement host or recover the existing host. | ||
| 2. Restore `tower.env`, `tower.yml`, `docker-compose.yml`, certificates, and secret material. | ||
| 3. Restore or recreate the external SQL database and Redis service used by Platform. | ||
| 4. Start Platform with `docker compose up` and allow migrations and startup checks to finish. | ||
| 5. Validate login, workspace access, and workflow launch behavior before switching traffic back. | ||
|
|
||
| ## Validation checklist | ||
|
|
||
| Test your DR plan on a schedule that matches your organization's risk requirements. During each exercise, confirm that you can: | ||
|
|
||
| - Restore the database from a recent backup. | ||
| - Start Platform with the correct crypto secret key and configuration. | ||
| - Reach the frontend through the expected DNS and TLS path. | ||
| - Log in and access organizations, workspaces, and compute environments. | ||
| - Launch a small workflow to verify end-to-end operation. | ||
|
|
||
| The [Test deployment](./testing) guide provides a simple post-recovery smoke test you can adapt for DR exercises. | ||
|
|
||
| ## Related guides | ||
|
|
||
| - [Platform installation overview](./install-platform) | ||
| - [Platform: Helm](./platform-helm) | ||
| - [Platform: Kubernetes](./platform-kubernetes) | ||
| - [Platform: Docker Compose](./platform-docker-compose) | ||
| - [Test deployment](./testing) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
91 changes: 91 additions & 0 deletions
91
platform-enterprise_versioned_docs/version-25.3/enterprise/disaster-recovery.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| --- | ||
| title: "Platform disaster recovery" | ||
| description: Plan backup, restore, and recovery steps for Seqera Platform Enterprise deployments | ||
| date created: "2026-04-07" | ||
| tags: [installation, deployment, disaster recovery, backup, restore] | ||
| --- | ||
|
|
||
| Use this guide to define a disaster recovery (DR) plan for Seqera Platform Enterprise before you need to restore service after an infrastructure loss or a region-level incident. | ||
|
|
||
| Seqera Platform does not create a DR plan for you. Your recovery procedure depends on the infrastructure that hosts Platform, your database and Redis services, your container registry access, and the backup capabilities offered by your cloud provider or platform team. | ||
|
|
||
| ## What to protect | ||
|
|
||
| Back up and document the parts of your deployment that you will need to rebuild Platform: | ||
|
|
||
| - The Platform SQL database and its restore procedure. | ||
| - Your Platform configuration, including `tower.env`, `tower.yml`, Helm values, Kubernetes manifests, or `docker-compose.yml`. | ||
| - Your `TOWER_CRYPTO_SECRETKEY` value and any rotation-related keys. Existing encrypted secrets in the Platform database cannot be decrypted without the correct key material. | ||
| - TLS certificates, identity provider settings, registry credentials, and any other secrets required to start Platform. | ||
| - The storage locations and infrastructure dependencies referenced by your Platform deployment, such as load balancers, DNS records, persistent volumes, and mirrored container images. | ||
|
|
||
| :::warning | ||
| Back up your Platform database before changing the crypto secret key or running key rotation. For more information, see [Configuration overview](./configuration/overview#secret-key-rotation). | ||
| ::: | ||
|
|
||
| ## Define recovery targets | ||
|
|
||
| Document the following targets with your operations team: | ||
|
|
||
| - Recovery point objective (RPO): how much recent Platform state you can afford to lose. | ||
| - Recovery time objective (RTO): how long Platform can remain unavailable. | ||
| - Recovery owner: who can restore the database, recreate infrastructure, and validate the application. | ||
|
|
||
| Your deployment model directly affects these targets: | ||
|
|
||
| - Kubernetes and Helm deployments can be rebuilt on new infrastructure more easily, especially when Platform runs with external managed database and Redis services. | ||
| - Docker Compose deployments are single-instance by design. Restoring them normally requires application downtime while the host, configuration, and backing services are rebuilt. | ||
|
|
||
| ## Recommended backup strategy | ||
|
|
||
| At minimum, maintain: | ||
|
|
||
| 1. Regular database backups or snapshots for the SQL database used by Platform. | ||
| 2. Version-controlled copies of your deployment manifests and configuration overrides. | ||
| 3. A secure copy of the active crypto secret key and any required supporting secrets. | ||
| 4. A written restore runbook that includes DNS, ingress, load balancer, and certificate steps. | ||
|
|
||
| For production environments, use the backup and replication features provided by your infrastructure: | ||
|
|
||
| - Managed SQL backups, snapshots, and cross-region replicas where required by your RPO and RTO. | ||
| - Backups for any persistent volumes or host-attached storage used by your deployment. | ||
| - Registry mirroring for Platform images if your environment cannot rely on direct access to `cr.seqera.io` during recovery. | ||
|
|
||
| ## Recovery workflow | ||
|
|
||
| ### Kubernetes or Helm deployments | ||
|
|
||
| 1. Recreate or fail over the Kubernetes cluster and its supporting infrastructure. | ||
| 2. Restore access to the SQL database, Redis service, secrets, ingress, and DNS records. | ||
| 3. Reapply your Helm values or Kubernetes manifests. | ||
| 4. Restore the SQL database from the selected backup or snapshot. | ||
| 5. Confirm that Platform starts with the same crypto secret key used to encrypt the existing database contents. | ||
| 6. Validate login, workspace access, and workflow launch behavior. | ||
|
|
||
| ### Docker Compose deployments | ||
|
|
||
| 1. Provision a replacement host or recover the existing host. | ||
| 2. Restore `tower.env`, `tower.yml`, `docker-compose.yml`, certificates, and secret material. | ||
| 3. Restore or recreate the external SQL database and Redis service used by Platform. | ||
| 4. Start Platform with `docker compose up` and allow migrations and startup checks to finish. | ||
| 5. Validate login, workspace access, and workflow launch behavior before switching traffic back. | ||
|
|
||
| ## Validation checklist | ||
|
|
||
| Test your DR plan on a schedule that matches your organization's risk requirements. During each exercise, confirm that you can: | ||
|
|
||
| - Restore the database from a recent backup. | ||
| - Start Platform with the correct crypto secret key and configuration. | ||
| - Reach the frontend through the expected DNS and TLS path. | ||
| - Log in and access organizations, workspaces, and compute environments. | ||
| - Launch a small workflow to verify end-to-end operation. | ||
|
|
||
| The [Test deployment](./testing) guide provides a simple post-recovery smoke test you can adapt for DR exercises. | ||
|
|
||
| ## Related guides | ||
|
|
||
| - [Platform installation overview](./install-platform) | ||
| - [Platform: Helm](./platform-helm) | ||
| - [Platform: Kubernetes](./platform-kubernetes) | ||
| - [Platform: Docker Compose](./platform-docker-compose) | ||
| - [Test deployment](./testing) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open – Do you also need back-ups for Redis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just use for cache but would be good to confirm