Skip to content

SingleNodeRestateDeployment: AMI update via SSM parameter causes unexpected instance replacement and data loss #100

@acomagu

Description

@acomagu

SingleNodeRestateDeployment uses the latest Amazon Linux 2023 AMI via an SSM parameter (/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-6.1-arm64). When AWS releases a new AMI, the next cdk deploy silently replaces the EC2 instance, causing the Restate data volume to become orphaned — even when deleteOnTermination: false is set.

Steps to reproduce

  1. Deploy a stack using SingleNodeRestateDeployment
  2. Wait for AWS to publish a new Amazon Linux 2023 AMI (or simulate by checking SSM parameter history)
  3. Run cdk deploy without any intentional changes
  4. Observe that the EC2 instance is replaced (Replacement: True in the CloudFormation change set)

Expected behavior

  • EBS volume is always reused
  • Or: an option is provided to support an externally-managed ec2.Volume so the volume lifecycle is decoupled from the instance

Actual behavior

  • The instance is silently replaced on the next deploy
  • A new blank EBS volume is created and mounted at /var/restate
  • The previous volume (with all Restate state) is left detached and orphaned
  • All registered services and workflow state are lost

Root cause

The CloudFormation stack uses an SSM dynamic reference for the AMI:

/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-6.1-arm64

When this resolves to a new AMI ID, CloudFormation treats ImageId as changed, which always requires instance replacement (RequiresRecreation: Always).

The change set shows:

{
  "ResourceType": "AWS::EC2::Instance",
  "Replacement": "True",
  "Details": [{
    "Target": { "Name": "ImageId", "RequiresRecreation": "Always" },
    "ChangeSource": "DirectModification"
  }]
}

Suggested solutions

Short-term: Expose a machineImage prop (already supported in v1.6.1) more prominently
in the documentation with a warning that not pinning the AMI risks data loss.

Annotation may be also useful:

  if (!props.machineImage) {                                                                                                                                             
    Annotations.of(this).addWarning(
      'AMI is not pinned via machineImage prop. ' +                                                                                                                      
      'An AWS AMI update will cause instance replacement and data loss on next cdk deploy.'
    );                                                                                                                                                                   
  }               

Long-term: Support an externally-managed ec2.Volume passed via props, so the volume
lifecycle is fully decoupled from the instance. This would allow CloudFormation to
reattach the same volume after any instance replacement.

Environment

  • @restatedev/restate-cdk: 1.6.1
  • aws-cdk-lib: 2.234.1
  • Region: ap-northeast-1

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions