Skip to content

AWS EC2 Based Deployment

Jeffrey Bian edited this page Dec 15, 2025 · 1 revision

AWS EC2 Based Deployment

This section provides a step-by-step guide to deploying the system on AWS EC2 instances. This setup is verified in production environments that require scalability, reliability, and high availability. We provide two example deployment using OpenTofu (https://opentofu.org/).

Prerequisites

  • AWS Account
  • AWS CLI configured with appropriate permissions
  • OpenTofu CLI installed and configured (>= 1.9.0)
  • Knowledge of Terraform / OpenTofu and AWS Services
  • Packer (https://developer.hashicorp.com/packer) CLI installed and configured for building AMIs

Before You Begin

Make sure the customer primary database is configured correctly, as outlined below.

Set the WAL_LEVEL to logical

In the RDS console, click "Parameter groups" on the left navigation. Then "Create parameter group" and create a new parameter group with the family the same as the primary database. For Aurora PostgresSQL, select the parameter type of "cluster parameter group".

This will create a clone of the default parameter group. Then, under "edit" mode, search for the rds.logical_replication parameter and set it to 1. After saving the changes, associate the new parameter group with the primary database. You may want to apply the changes immediately if possible

Set the max_replication_slots Value

Similar to Item 1 above, modify the max_replication_slots parameter to a value that is sufficient for the number of replication slots that will be used by the Springtail service. This number should be equal to or greater than the number of DBs in the primary database.

Set the max_slot_wal_keep_size Value

This should be set to a value that limits the amount of WAL that can be retained in the primary database. This is to ensure that the primary database does not run out of disk space due to the WAL retention.

Similar to the Item 1 above, modify the max_slot_wal_keep_size parameter to a value that won't fill up the disk space of the primary database.

Deployment Architecture

In production deployments, we launch an Ingestion EC2 instance, a Proxy EC2 instance, and one or more FDW EC2 instances. The ingestion instance mounts 4 EBS volumes and use ZFS to provide RAID10 functionality. It exposes the mount via NFSv4 as read-only to the FDW instances.

We need to first spin up a VPC with shared resources such as Redis Cluster, Security Groups, IAM Roles etc. This will serve as the base for launching the EC2 instances. We call such a shared VPC a shard.

Important note:

  • To make things easier to illustrate the deployment, we use local storage for the OpenTofu state files. In a real
    production deployment, it is recommended to use a remote backend such as S3 with DynamoDB for state locking.

There are two flavors of deployment depending on the networking topology.

Scenario 1: Private Primary Database

In this scenario, the customer's primary database is launched inside an AWS VPC and does not have public IPs. For such scenario, we also expose a private endpoint for the customer using AWS PrivateLink.

Architecture Diagram 1

Note: We use a Transit VPC to peer with the customer's VPC. This imposes minimal requirements on the customer's AWS account. The customer only needs accepting VPC Peering with proper route configuration. Then we use a PrivateLink endpoint between the Transit VPC and the shard VPC to allow communication between the Ingestion/Proxy. You may also directly launch the NLB and VPC Endpoint Service in the customer's VPC directly if preferred.

Scenario 2: Public Primary Database

In this scenario, the customer's primary database is launched with public IPs. This can apply to RDS or Supabase. For this case, we expose the proxy node via a public NLB.

Architecture Diagram 2

Deployment Steps

The example deployments are located in the deployments/examples under the main springtail repository.

1. Building a Shard

Goto the deployments/examples/aws/shard directory and edit the variables.tfvars file to set the appropriate variables such as AWS region, VPC CIDR, etc.

$ tofu init
$ tofu plan -var-file variables.tfvars -out /tmp/plan.out
$ tofu apply /tmp/plan.out

Take notes of the outputs. Once the shard is built, we can proceed to launch the DB instances.

2. Launching the DB Instances

After the shard is built, we can launch the EC2 instances for Ingestion, Proxy, and FDW nodes. Depending on the networking topology of the customer's AWS account, we provide two example deployment scenarios We will demonstrate how you can build a PrivateLink based connection to the customer's private database, as illustrated in Scenario 1. For more information regarding PrivateLink, please refer to the AWS documentation on PrivateLink.

2.1 Launch the network, PrivateLink components and instances

Go to the deployments/examples/aws/inb-pl-outb-pl directory and edit the variables.tfvars file to set the correct information as commented in the file.

$ tofu init
$ tofu plan -var-file variables.tfvars -out /tmp/plan.out
$ tofu apply /tmp/plan.out

After the deployment is complete, take note of the outputs.

The example here assumes that the customer's database is private and launched inside a different AWS Account. We use a Transit VPC to peer with the customer's VPC and use PrivateLink to connect our ingestion boxes to the NLB in the Transit VPC. From this point on, the VPC Endpoint represents the customer's database inside our shard VPC.

2.2 Customer Side Setup

The customer needs to perform the following steps to complete the setup:

  1. Accept the VPC Peering request from the Transit VPC.
  2. Add proper routes in the route table to route traffic to the Transit VPC CIDR via the VPC Peering connection.
  3. Add security group rules to allow traffic from the Transit VPC CIDR to the database port.

Please refer to Customer Actions for more details on how to perform these steps.

Once you finished the customer side of actions, wait for 30 seconds - 1 minute for the Watcher Lambda function to detect and register the Primary Database address to the NLB (PrivateLink).

3. Verification

After the deployment is complete, you can verify the setup by logging into the ingestion EC2 instances by using AWS SessionManager

$ aws ssm start-session --target {{ingestion_instance_id}}

Then use the VPC Endpoint domain name (as noted from the OpenTofu output) as the primary database hostname to connect to the database using psql.

$ psql -h {{vpc_endpoint_dns}} -U {{db_username}} -d postgres

If the connection is successful, you have completed the deployment of the topology successfully.

4. Prepare the Environment

After the verification is successful, you can proceed to set environment variables, populate the Redis cache and start the services.

Springtail service can either take a JSON settings file or load entries from Redis. For this deployment, you can try out the settings file, an example is the prod.system.settings.json under the springtail repository.

Point the following environment variable to the settings file location.

$ export SPRINGTAIL_PROPERTIES_FILE=/path/to/settings-prod.json

Now you may want to download the Springtail service package and start running the ingestion service to start.

Do the same to launch Proxy and FDW instances.

Clone this wiki locally