Skip to content

davidrochabio/terraform-aws-lambda-s3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data transformation with lambda function and S3 buckets using IaC (Terraform)

Archit

Prerequisites

  • Git
  • An AWS account.
  • AWS CLI configured with your AWS credentials.
  • Terraform installed on your local machine.
  • Docker to create lambda layer. If you want to create a layer without Docker, check AWS docs.

The requirements for the layer are inside the 'layer_requirements.txt' file. The layer should be placed in the same directory as 'main.tf' file and should be named 'pandasrequests_layer.zip'.

Data and Goal

The data contains information about bank accounts. The bank supports accounts in different currencies. -> The goal is to clean the data and transform ammounts in different currencies to a common currency - canadian dollar in this example.

The function makes use of a currency convertion rate API to dinamically get the rates for the unique currencies present in the dataset.

API reference

Lambda Function

The function listens to an S3 bucket for new files. When a new file is added to the bucket, the bucket triggers the lambda function. The Lambda function checks if the file name is 'banking_dirty.csv'. If the name matches, the function reads the file from the bucket and performs the following operations:

  • Normalizes column names to lower case
  • Normalizes the date columns to the desired format
  • Adds a column with the exchange rate for each account
  • Adds a column with the account amount converted to the desired common currency
  • Adds a column with the load date and time
  • Writes the transformed dataframe to another S3 bucket in csv format

Terraform and AWS resources

Terraform creates the following infrastructure in AWS:

  • Two s3 buckets
  • lambda function (Python 3.11)
  • lambda function layer version with pandas and requests
  • lambda function role and security policies
  • trigger from s3
  • notification

Usage

  • Clone repo and enter repo folder:
git clone https://github.com/davidrochabio/Terraform_AWS_Lambda_S3.git

cd Terraform_AWS_Lambda_S3
  • Create lambda layer with pandas and requests using docker and provided dockerfile:
docker build -t layer_image -f ./Dockerfile-layer .

docker run -dit --name layer_container layer_image /bin/bash

docker cp layer_container:/app/pandasrequests_layer.zip .

docker rm -f layer_container

docker rmi layer_image
  • Initialize Terraform:
terraform init
  • Validate main.tf and check plan:
terraform validate

terraform plan
  • Create resources in AWS:
terraform apply

PS: Terraform might throw an error if bucket names are already used in AWS. If that's the case, change bucket names in main.tf and in the lambda function.

  • Send file to s3 bucket
aws s3 cp ./banking_dirty.csv s3://input-banking-dirty/banking_dirty.csv
  • Check CloudWatch logs to see execution.

  • Destroy resources:

terraform destroy

About

Data transformation with lambda function and S3 buckets using IaC (Terraform)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors