splitter.cpp is the code to split each individual table into multiple equal-sized files.
uploader.cpp is the code to download the partitioned data from S3 and upload the data to Redis.
- Create a security group for your redis cluster. In the security group, create an inbound rule to allow TCP connections from any host to port 6379. The detailed steps are as follows:
- In Amazon EC2, select
Security Groupson the left panel, and click onCreate security groupon the right. Let the name to beredis-sgand the description to besecurity group for redis. - The most important thing here's to create an inbound rule in the setting of that security group.
- The contents of the inbound rule: Custom TCP, port 6379; for
source, choosecustomand type0.0.0.0/0, which stands for any possible host. - This rule makes sure that all TCP connections to your redis cluster (which uses port 6379 by default) will not be rejected.
- In Amazon EC2, select
- In Amazon Elasticache, select
Redis clusterson the left panel, and click onCreate Redis clusteron the right. - In
Choose a cluster creation method, chooseConfigure and create a new cluster, the one in the middle. Disable cluster mode. Type any name and description you like. - In
Cluster settings, choose a machine with sufficient memory. For the 10GB dataset (i.e., the fast evaluation), we usecache.r7g.2xlargewith ~50 GB memory. For the 100GB dataset (i.e., the evaluation in our paper), we usecache.r5.4xlargewith ~100 GB memory. The port number should be 6379. - In
Connectivity, create a new subnet group. Choose the default VPC or create a new one. Ensure that the VPC shares the same subnet prefix as the EC2 instances you are running. Leave the rest options as default and click onNextin the bottom. - For
Security, choose the security group (e.g., namedredis-sg) you created in the step 0. Leave the rest as default and create the cluster. After a few minutes, the cluster will be ready to use.
The hostname of your redis server can be acquired by clicking on the cluster name in the Redis clusters page. It is the value of primary endpoint. Typically, the hostname is composed of endpoint url and port number, i.e., endpoint:port.