diff --git a/s3files-lambda-sam/README.md b/s3files-lambda-sam/README.md new file mode 100644 index 000000000..a56edb7df --- /dev/null +++ b/s3files-lambda-sam/README.md @@ -0,0 +1,150 @@ +# Mount an S3 Bucket as a File System on AWS Lambda using Amazon S3 Files + +This pattern mounts an Amazon S3 bucket as a file system on an AWS Lambda function using **Amazon S3 Files**, then reads CSV files with **pandas** using standard Python file I/O. + +Learn more about this pattern at Serverless Land Patterns: https://serverlessland.com/patterns/s3files-lambda-sam + +Important: this application uses various AWS services and there are costs associated with these services after the Free Tier usage - please see the [AWS Pricing page](https://aws.amazon.com/pricing/) for details. You are responsible for any AWS costs incurred. No warranty is implied in this example. + +## Requirements + +* [Create an AWS account](https://portal.aws.amazon.com/gp/aws/developer/registration/index.html) if you do not already have one and log in. The IAM user that you use must have sufficient permissions to make necessary AWS service calls and manage AWS resources. +* [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) installed and configured +* [Git Installed](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) +* [AWS Serverless Application Model](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) (AWS SAM) installed +* [Python 3.13](https://www.python.org/downloads/) installed + +## How it works + +1. An S3 bucket is linked to an **S3 file system** (Amazon S3 Files), providing full POSIX file system semantics over S3 data. +2. A **mount target** is created in a private subnet, giving the Lambda function NFS access to the file system. +3. The Lambda function is configured with `FileSystemConfigs` pointing to the access point, mounting the S3 bucket at `/mnt/s3data`. +4. When invoked, Lambda reads a CSV file from `/mnt/s3data/input/` using `pandas.read_csv()` — a standard file path, no boto3 required. +5. It returns the row count, column names, and a preview of the first 5 rows as JSON. + +## Build Instructions + +Build the pandas Lambda layer targeting Linux x86_64 (Lambda's runtime), then remove pyarrow to stay within Lambda's 250MB unzipped layer limit: + +```bash +pip install pandas \ + --platform manylinux2014_x86_64 \ + --target layer/python/ \ + --implementation cp \ + --python-version 3.13 \ + --only-binary=:all: + +# Remove pyarrow if present (not needed for CSV reads, exceeds layer size limit) +rm -rf layer/python/pyarrow + +sam build +``` + +> The `--platform` flag ensures Linux-compatible wheels are downloaded regardless of your local OS (macOS, Windows, or Linux). + +## Deployment Instructions + +1. Clone the repository: + ```bash + git clone https://github.com/aws-samples/serverless-patterns + ``` + +2. Change to the pattern directory: + ```bash + cd s3files-lambda-sam + ``` + +3. Build (see Build Instructions above). + +4. Deploy: + ```bash + sam deploy --guided + ``` + +5. During the prompts: + * Enter a stack name + * Enter the desired AWS Region + * Accept the default parameter values or adjust VPC CIDRs if needed + * Allow SAM CLI to create IAM roles with the required permissions + + Once you have run `sam deploy --guided` once and saved arguments to `samconfig.toml`, you can use `sam deploy` for future deployments. + +6. Note the outputs — you will need `DataBucketName` and `S3FilesReaderFunctionName` for testing. + +## Testing + +### Unit tests (no AWS required) + +```bash +python3 -m venv .venv +source .venv/bin/activate # Windows: .venv\Scripts\activate +pip install pytest pandas +pytest src/tests/ -v +``` + +### Integration test against a deployed stack + +#### 1. Upload the sample CSV + +```bash +BUCKET=$(aws cloudformation describe-stacks \ + --stack-name \ + --query "Stacks[0].Outputs[?OutputKey=='DataBucketName'].OutputValue" \ + --output text) + +aws s3 cp src/tests/sample_sales.csv s3://$BUCKET/lambda/input/sample_sales.csv +``` + +#### 2. Invoke the Lambda function + +```bash +FUNCTION=$(aws cloudformation describe-stacks \ + --stack-name \ + --query "Stacks[0].Outputs[?OutputKey=='S3FilesReaderFunctionName'].OutputValue" \ + --output text) + +aws lambda invoke \ + --function-name $FUNCTION \ + --payload '{"file": "input/sample_sales.csv"}' \ + --cli-binary-format raw-in-base64-out \ + response.json + +cat response.json +``` + +Expected response: + +```json +{ + "statusCode": 200, + "body": { + "file": "/mnt/s3data/input/sample_sales.csv", + "rows": 10, + "columns": ["region", "revenue", "units"], + "preview": [...] + } +} +``` + +#### 3. Check Lambda logs + +```bash +sam logs --stack-name --tail +``` + +## Cleanup + +1. Delete the stack: + ```bash + aws cloudformation delete-stack --stack-name STACK_NAME + ``` + +2. Confirm the stack has been deleted: + ```bash + aws cloudformation list-stacks --query "StackSummaries[?contains(StackName,'STACK_NAME')].StackStatus" + ``` + +---- +Copyright 2026 Amazon.com, Inc. or its affiliates. All Rights Reserved. + +SPDX-License-Identifier: MIT-0 diff --git a/s3files-lambda-sam/example-pattern.json b/s3files-lambda-sam/example-pattern.json new file mode 100644 index 000000000..ee3e3bbe9 --- /dev/null +++ b/s3files-lambda-sam/example-pattern.json @@ -0,0 +1,69 @@ +{ + "title": "Mount an S3 Bucket as a File System on AWS Lambda using Amazon S3 Files", + "description": "This pattern mounts an Amazon S3 bucket as a file system on an AWS Lambda function using Amazon S3 Files, enabling standard Python file I/O to read S3 data — no S3 SDK calls required.", + "language": "Python", + "level": "200", + "framework": "AWS SAM", + "introBox": { + "headline": "How it works", + "text": [ + "Amazon S3 Files (launched April 2026) is a shared file system built on Amazon EFS that provides full POSIX file system semantics over your S3 data. It lets file-based applications, agents, and tools work directly with S3 data without duplicating it or learning new APIs.", + "This pattern links an S3 bucket to an S3 file system and mounts it on a Lambda function at /mnt/s3data. When invoked, the Lambda function reads a CSV file from the mount path using pandas.read_csv() and returns the row count, column names, and a preview of the first 5 rows — all through standard Python file I/O, with no boto3 S3 calls.", + "This pattern deploys one S3 bucket, one S3 file system, one mount target, one S3 Files access point, one Lambda function with a pandas layer, and a VPC with a private subnet and VPC endpoints for S3." + ] + }, + "gitHub": { + "template": { + "repoURL": "https://github.com/aws-samples/serverless-patterns/tree/main/s3files-lambda-sam", + "templateURL": "serverless-patterns/s3files-lambda-sam", + "projectFolder": "s3files-lambda-sam", + "templateFile": "template.yaml" + } + }, + "resources": { + "bullets": [ + { + "text": "Amazon S3 Files — product page", + "link": "https://aws.amazon.com/s3/features/files/" + }, + { + "text": "Working with Amazon S3 Files — documentation", + "link": "https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files.html" + }, + { + "text": "Mounting S3 file systems on AWS Lambda functions", + "link": "https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-files-mounting-lambda.html" + }, + { + "text": "Configuring file system access for Lambda functions", + "link": "https://docs.aws.amazon.com/lambda/latest/dg/configuration-filesystem.html" + } + ] + }, + "deploy": { + "text": [ + "pip install pandas --platform manylinux2014_x86_64 --target layer/python/ --implementation cp --python-version 3.13 --only-binary=:all:", + "rm -rf layer/python/pyarrow", + "sam build", + "sam deploy --guided" + ] + }, + "testing": { + "text": [ + "See the GitHub repo for detailed testing instructions." + ] + }, + "cleanup": { + "text": [ + "Delete the stack: aws cloudformation delete-stack --stack-name STACK_NAME." + ] + }, + "authors": [ + { + "name": "Serda Kasaci Yildirim", + "image": "https://drive.google.com/file/d/1rzVS1hrIMdqy6P9i7-o7OBLNc0xY0FVB/view?usp=sharing", + "bio": "Serda is a Solutions Architect at Amazon Web Services based in Vienna, focused on serverless technologies, event-driven architecture, and application modernization.", + "linkedin": "serdakasaci" + } + ] +} diff --git a/s3files-lambda-sam/requirements-dev.txt b/s3files-lambda-sam/requirements-dev.txt new file mode 100644 index 000000000..1f221627a --- /dev/null +++ b/s3files-lambda-sam/requirements-dev.txt @@ -0,0 +1,3 @@ +pytest>=8.0 +pandas>=2.0 +pyarrow>=14.0 diff --git a/s3files-lambda-sam/src/handler.py b/s3files-lambda-sam/src/handler.py new file mode 100644 index 000000000..75619e265 --- /dev/null +++ b/s3files-lambda-sam/src/handler.py @@ -0,0 +1,51 @@ +""" +Amazon S3 Files + AWS Lambda + +Demonstrates accessing an S3 bucket as a file system from Lambda using +Amazon S3 Files. Files are read and written via standard Python file I/O +at the local mount path — no boto3 S3 calls needed. + +The S3 bucket is mounted at /mnt/s3data via the S3 Files file system. +""" + +import json +import logging +import os + +import pandas as pd + +logger = logging.getLogger() +logger.setLevel(os.environ.get("LOG_LEVEL", "INFO")) + +MOUNT_PATH = os.environ.get("MOUNT_PATH", "/mnt/s3data") + + +def lambda_handler(event, context): + """ + Expected event: + { + "file": "input/sample_sales.csv" # path relative to the S3 Files mount root + } + """ + file_key = event.get("file") + if not file_key: + return {"statusCode": 400, "body": {"error": "Missing required field: 'file'"}} + + input_path = os.path.join(MOUNT_PATH, file_key) + + if not os.path.exists(input_path): + return {"statusCode": 404, "body": {"error": f"File not found: {input_path}"}} + + # Read the CSV directly from the S3 Files mount using standard file I/O + df = pd.read_csv(input_path) + logger.info("Read %s: %d rows, %d columns", input_path, len(df), len(df.columns)) + + return { + "statusCode": 200, + "body": { + "file": input_path, + "rows": len(df), + "columns": list(df.columns), + "preview": df.head(5).to_dict(orient="records"), + }, + } diff --git a/s3files-lambda-sam/src/tests/__init__.py b/s3files-lambda-sam/src/tests/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/s3files-lambda-sam/src/tests/sample_sales.csv b/s3files-lambda-sam/src/tests/sample_sales.csv new file mode 100644 index 000000000..a9a6133a4 --- /dev/null +++ b/s3files-lambda-sam/src/tests/sample_sales.csv @@ -0,0 +1,11 @@ +region,revenue,units +North,1000,10 +South,2000,20 +North,500,5 +East,750,8 +South,300,3 +West,1200,15 +East,900,12 +West,450,6 +North,800,9 +South,1100,11 diff --git a/s3files-lambda-sam/src/tests/test_handler.py b/s3files-lambda-sam/src/tests/test_handler.py new file mode 100644 index 000000000..6d39e8998 --- /dev/null +++ b/s3files-lambda-sam/src/tests/test_handler.py @@ -0,0 +1,136 @@ +""" +Unit tests for handler.py + +The S3 Files mount is simulated using pytest's tmp_path fixture, +so no real AWS infrastructure is needed to run these tests. + +Run: + pip install pytest pandas + pytest src/tests/ -v +""" + +import os +import sys + +import pytest + +# Make src/ importable +sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..")) + +SALES_CSV = """\ +region,revenue,units +North,1000,10 +South,2000,20 +North,500,5 +East,750,8 +South,300,3 +""" + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + +@pytest.fixture(autouse=True) +def patch_mount(tmp_path, monkeypatch): + """Point MOUNT_PATH at a temp directory and reset the module constant.""" + import handler + mount_dir = tmp_path / "s3data" + mount_dir.mkdir() + monkeypatch.setenv("MOUNT_PATH", str(mount_dir)) + handler.MOUNT_PATH = str(mount_dir) + return mount_dir + + +@pytest.fixture() +def mount(tmp_path): + return tmp_path / "s3data" + + +@pytest.fixture() +def sales_file(mount): + """Write a standard sales CSV into the mock mount's input/ directory.""" + csv_path = mount / "input" / "sales.csv" + csv_path.parent.mkdir(parents=True, exist_ok=True) + csv_path.write_text(SALES_CSV) + return mount + + +# --------------------------------------------------------------------------- +# Happy path +# --------------------------------------------------------------------------- + +class TestHappyPath: + + def test_returns_200(self, sales_file): + import handler + handler.MOUNT_PATH = str(sales_file) + result = handler.lambda_handler({"file": "input/sales.csv"}, {}) + assert result["statusCode"] == 200 + + def test_row_count_is_correct(self, sales_file): + import handler + handler.MOUNT_PATH = str(sales_file) + result = handler.lambda_handler({"file": "input/sales.csv"}, {}) + assert result["body"]["rows"] == 5 + + def test_columns_are_returned(self, sales_file): + import handler + handler.MOUNT_PATH = str(sales_file) + result = handler.lambda_handler({"file": "input/sales.csv"}, {}) + assert result["body"]["columns"] == ["region", "revenue", "units"] + + def test_preview_contains_up_to_5_rows(self, sales_file): + import handler + handler.MOUNT_PATH = str(sales_file) + result = handler.lambda_handler({"file": "input/sales.csv"}, {}) + assert len(result["body"]["preview"]) == 5 + + def test_preview_capped_at_5_for_large_file(self, mount): + """Files with more than 5 rows should still return only 5 in preview.""" + import handler + handler.MOUNT_PATH = str(mount) + csv_path = mount / "input" / "big.csv" + csv_path.parent.mkdir(parents=True, exist_ok=True) + rows = "id,value\n" + "\n".join(f"{i},{i*10}" for i in range(20)) + csv_path.write_text(rows) + result = handler.lambda_handler({"file": "input/big.csv"}, {}) + assert len(result["body"]["preview"]) == 5 + + def test_file_path_in_response(self, sales_file): + import handler + handler.MOUNT_PATH = str(sales_file) + result = handler.lambda_handler({"file": "input/sales.csv"}, {}) + assert "input/sales.csv" in result["body"]["file"] + + +# --------------------------------------------------------------------------- +# Input validation +# --------------------------------------------------------------------------- + +class TestInputValidation: + + def test_missing_file_field_returns_400(self, mount): + import handler + handler.MOUNT_PATH = str(mount) + result = handler.lambda_handler({}, {}) + assert result["statusCode"] == 400 + assert "file" in result["body"]["error"].lower() + + def test_empty_event_returns_400(self, mount): + import handler + handler.MOUNT_PATH = str(mount) + result = handler.lambda_handler({}, {}) + assert result["statusCode"] == 400 + + def test_file_not_found_returns_404(self, mount): + import handler + handler.MOUNT_PATH = str(mount) + result = handler.lambda_handler({"file": "input/missing.csv"}, {}) + assert result["statusCode"] == 404 + + def test_404_error_message_contains_path(self, mount): + import handler + handler.MOUNT_PATH = str(mount) + result = handler.lambda_handler({"file": "input/missing.csv"}, {}) + assert "missing.csv" in result["body"]["error"] diff --git a/s3files-lambda-sam/template.yaml b/s3files-lambda-sam/template.yaml new file mode 100644 index 000000000..813588ff0 --- /dev/null +++ b/s3files-lambda-sam/template.yaml @@ -0,0 +1,293 @@ +AWSTemplateFormatVersion: '2010-09-09' +Transform: AWS::Serverless-2016-10-31 +Description: > + Mount an S3 Bucket as a File System on AWS Lambda using Amazon S3 Files. Mount an S3 bucket as a file system and read + CSV files with pandas using standard file I/O at /mnt/s3data. + +Globals: + Function: + Runtime: python3.13 + Architectures: + - x86_64 + Timeout: 60 + MemorySize: 512 + Environment: + Variables: + MOUNT_PATH: /mnt/s3data + LOG_LEVEL: INFO + +Parameters: + VpcCidr: + Type: String + Default: 10.0.0.0/16 + Description: CIDR block for the VPC + + PrivateSubnetCidr: + Type: String + Default: 10.0.1.0/24 + Description: CIDR for the private subnet (Lambda and mount target) + +Resources: + + # ----------------------------------------------------------------------- + # VPC + # ----------------------------------------------------------------------- + VPC: + Type: AWS::EC2::VPC + Properties: + CidrBlock: !Ref VpcCidr + EnableDnsSupport: true + EnableDnsHostnames: true + Tags: + - Key: Name + Value: !Sub "${AWS::StackName}-vpc" + + PrivateSubnet: + Type: AWS::EC2::Subnet + Properties: + VpcId: !Ref VPC + CidrBlock: !Ref PrivateSubnetCidr + AvailabilityZone: !Select ["0", !GetAZs ""] + Tags: + - Key: Name + Value: !Sub "${AWS::StackName}-private-subnet" + + PrivateRouteTable: + Type: AWS::EC2::RouteTable + Properties: + VpcId: !Ref VPC + + PrivateSubnetRouteTableAssociation: + Type: AWS::EC2::SubnetRouteTableAssociation + Properties: + SubnetId: !Ref PrivateSubnet + RouteTableId: !Ref PrivateRouteTable + + # ----------------------------------------------------------------------- + # Security Groups + # ----------------------------------------------------------------------- + LambdaSecurityGroup: + Type: AWS::EC2::SecurityGroup + Properties: + GroupDescription: Security group for Lambda function + VpcId: !Ref VPC + SecurityGroupEgress: + - IpProtocol: tcp + FromPort: 443 + ToPort: 443 + CidrIp: !Ref VpcCidr + Description: HTTPS outbound to VPC endpoints + - IpProtocol: tcp + FromPort: 2049 + ToPort: 2049 + CidrIp: !Ref VpcCidr + Description: NFS outbound to S3 Files mount target + Tags: + - Key: Name + Value: !Sub "${AWS::StackName}-lambda-sg" + + MountTargetSecurityGroup: + Type: AWS::EC2::SecurityGroup + Properties: + GroupDescription: Security group for S3 Files mount target + VpcId: !Ref VPC + SecurityGroupIngress: + - IpProtocol: tcp + FromPort: 2049 + ToPort: 2049 + SourceSecurityGroupId: !Ref LambdaSecurityGroup + Description: NFS inbound from Lambda + Tags: + - Key: Name + Value: !Sub "${AWS::StackName}-mount-target-sg" + + # ----------------------------------------------------------------------- + # VPC Endpoints — keep all traffic on the AWS private network + # ----------------------------------------------------------------------- + + # S3 Gateway Endpoint — free, routes S3 API traffic via the route table + S3GatewayEndpoint: + Type: AWS::EC2::VPCEndpoint + Properties: + VpcId: !Ref VPC + ServiceName: !Sub "com.amazonaws.${AWS::Region}.s3" + VpcEndpointType: Gateway + RouteTableIds: + - !Ref PrivateRouteTable + + # S3 Files Interface Endpoint — for NFS mount operations + S3FilesInterfaceEndpoint: + Type: AWS::EC2::VPCEndpoint + Properties: + VpcId: !Ref VPC + ServiceName: !Sub "com.amazonaws.${AWS::Region}.s3-outposts" + VpcEndpointType: Interface + SubnetIds: + - !Ref PrivateSubnet + SecurityGroupIds: + - !Ref LambdaSecurityGroup + PrivateDnsEnabled: true + + # ----------------------------------------------------------------------- + # S3 Bucket + # ----------------------------------------------------------------------- + DataBucket: + Type: AWS::S3::Bucket + Properties: + VersioningConfiguration: + Status: Enabled + Tags: + - Key: Pattern + Value: s3files-lambda + + # ----------------------------------------------------------------------- + # IAM Role — grants S3 Files service access to the S3 bucket + # ----------------------------------------------------------------------- + S3FilesRole: + Type: AWS::IAM::Role + Properties: + AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Sid: AllowS3FilesAssumeRole + Effect: Allow + Principal: + Service: elasticfilesystem.amazonaws.com + Action: sts:AssumeRole + Condition: + StringEquals: + aws:SourceAccount: !Ref AWS::AccountId + ArnLike: + aws:SourceArn: !Sub "arn:aws:s3files:${AWS::Region}:${AWS::AccountId}:file-system/*" + Policies: + - PolicyName: S3FilesAccess + PolicyDocument: + Version: "2012-10-17" + Statement: + - Effect: Allow + Action: + - s3:GetObject + - s3:PutObject + - s3:DeleteObject + - s3:HeadObject + - s3:ListBucket + - s3:GetBucketLocation + - s3:GetBucketVersioning + - s3:ListBucketVersions + - s3:GetObjectVersion + - s3:DeleteObjectVersion + Resource: + - !GetAtt DataBucket.Arn + - !Sub "${DataBucket.Arn}/*" + + # ----------------------------------------------------------------------- + # S3 Files resources + # ----------------------------------------------------------------------- + S3FileSystem: + Type: AWS::S3Files::FileSystem + Properties: + Bucket: !GetAtt DataBucket.Arn + RoleArn: !GetAtt S3FilesRole.Arn + Tags: + - Key: Pattern + Value: s3files-lambda + + MountTarget: + Type: AWS::S3Files::MountTarget + Properties: + FileSystemId: !GetAtt S3FileSystem.FileSystemId + SubnetId: !Ref PrivateSubnet + SecurityGroups: + - !Ref MountTargetSecurityGroup + + AccessPoint: + Type: AWS::S3Files::AccessPoint + Properties: + FileSystemId: !GetAtt S3FileSystem.FileSystemId + PosixUser: + Uid: "1000" + Gid: "1000" + RootDirectory: + Path: /lambda + CreationPermissions: + OwnerUid: "1000" + OwnerGid: "1000" + Permissions: "755" + Tags: + - Key: Pattern + Value: s3files-lambda + + # ----------------------------------------------------------------------- + # Lambda Layer — pandas + # ----------------------------------------------------------------------- + PandasLayer: + Type: AWS::Serverless::LayerVersion + Properties: + LayerName: !Sub "${AWS::StackName}-pandas" + Description: pandas for CSV processing + ContentUri: layer/ + CompatibleRuntimes: + - python3.13 + RetentionPolicy: Delete + + # ----------------------------------------------------------------------- + # Lambda Function + # ----------------------------------------------------------------------- + S3FilesReaderFunction: + Type: AWS::Serverless::Function + DependsOn: MountTarget + Properties: + FunctionName: !Sub "${AWS::StackName}-s3files-reader" + CodeUri: src/ + Handler: handler.lambda_handler + Description: > + Reads a CSV from /mnt/s3data/input/ using pandas and returns the row + count, column names, and a preview of the first 5 rows — all via + standard Python file I/O through the S3 Files mount. No boto3 S3 calls needed. + Layers: + - !Ref PandasLayer + FileSystemConfigs: + - Arn: !GetAtt AccessPoint.AccessPointArn + LocalMountPath: /mnt/s3data + VpcConfig: + SubnetIds: + - !Ref PrivateSubnet + SecurityGroupIds: + - !Ref LambdaSecurityGroup + Policies: + - Statement: + - Sid: S3FilesMount + Effect: Allow + Action: + - s3files:ClientMount + - s3files:ClientWrite + Resource: "*" + - Sid: S3DirectRead + Effect: Allow + Action: + - s3:GetObject + - s3:GetObjectVersion + Resource: !Sub "${DataBucket.Arn}/*" + Tags: + Pattern: s3files-lambda + +Outputs: + DataBucketName: + Description: S3 bucket linked to the file system — upload CSV files here + Value: !Ref DataBucket + + S3FileSystemId: + Description: S3 file system ID + Value: !GetAtt S3FileSystem.FileSystemId + + S3FilesReaderFunctionName: + Description: Lambda function name — use this to invoke for testing + Value: !Ref S3FilesReaderFunction + + S3FilesReaderFunctionArn: + Description: Lambda function ARN + Value: !GetAtt S3FilesReaderFunction.Arn + + MountPath: + Description: Local mount path inside Lambda + Value: /mnt/s3data