feat(sfn-s3vectors-rag-refresh-cdk): Add S3 Vectors RAG refresh pattern with CDK#3005
feat(sfn-s3vectors-rag-refresh-cdk): Add S3 Vectors RAG refresh pattern with CDK#3005bfreiberg wants to merge 2 commits intoaws-samples:mainfrom
Conversation
…rn with CDK - Add Step Functions state machine for automated document ingestion pipeline - Create S3 Vectors knowledge base with distributed map for parallel processing - Implement Lambda functions for embedding generation, validation, and rollback - Add CDK infrastructure as code for complete stack deployment - Include comprehensive README with deployment and testing instructions - Add example pattern configuration and state machine visualization - Configure TypeScript build setup with tsconfig and package dependencies - Enable vector embedding via Amazon Bedrock Titan Text Embeddings V2 - Implement validation and automatic rollback on ingestion failure
marcojahn
left a comment
There was a problem hiding this comment.
Hi @bfreiberg, thank you for your contribution. I've created a few comments, please review & apply. TY
| embedFunction.addToRolePolicy(new iam.PolicyStatement({ | ||
| actions: ['s3vectors:PutVectors'], | ||
| resources: ['*'], | ||
| })); |
There was a problem hiding this comment.
The embedFunction IAM policy grants s3vectors:PutVectors with resources: ['*'], allowing writes to any S3 Vectors bucket/index in the account rather than just the intended one.
Example
| embedFunction.addToRolePolicy(new iam.PolicyStatement({ | |
| actions: ['s3vectors:PutVectors'], | |
| resources: ['*'], | |
| })); | |
| embedFunction.addToRolePolicy(new iam.PolicyStatement({ | |
| actions: ['s3vectors:PutVectors'], | |
| resources: [vectorIndex.attrIndexArn], | |
| })); |
| validateFunction.addToRolePolicy(new iam.PolicyStatement({ | ||
| actions: ['s3vectors:QueryVectors', 's3vectors:GetVectors'], | ||
| resources: ['*'], | ||
| })); |
There was a problem hiding this comment.
The validateFunction IAM policy grants s3vectors:QueryVectors and s3vectors:GetVectors with resources: ['*'], allowing reads from any S3 Vectors bucket/index in the account.
| validateFunction.addToRolePolicy(new iam.PolicyStatement({ | |
| actions: ['s3vectors:QueryVectors', 's3vectors:GetVectors'], | |
| resources: ['*'], | |
| })); | |
| validateFunction.addToRolePolicy(new iam.PolicyStatement({ | |
| actions: ['s3vectors:QueryVectors', 's3vectors:GetVectors'], | |
| resources: [vectorIndex.attrIndexArn], | |
| })); |
| rollbackFunction.addToRolePolicy(new iam.PolicyStatement({ | ||
| actions: ['s3vectors:DeleteVectors'], | ||
| resources: ['*'], | ||
| })); |
There was a problem hiding this comment.
The rollbackFunction IAM policy grants s3vectors:DeleteVectors with resources: ['*']. This is a destructive and potentially unrecoverable action — the function could delete vectors from any S3 Vectors bucket/index in the account, not just the intended one. A wildcard on a destructive operation like DeleteVectors is materially worse than a wildcard on read/write operations.
| rollbackFunction.addToRolePolicy(new iam.PolicyStatement({ | |
| actions: ['s3vectors:DeleteVectors'], | |
| resources: ['*'], | |
| })); | |
| rollbackFunction.addToRolePolicy(new iam.PolicyStatement({ | |
| actions: ['s3vectors:DeleteVectors'], | |
| resources: [vectorIndex.attrIndexArn], | |
| })); | |
| new RagRefreshStack(app, 'RagRefreshStack', { | ||
| env: { | ||
| account: process.env.CDK_DEFAULT_ACCOUNT, | ||
| region: process.env.AWS_REGION, | ||
| }, | ||
| }); |
There was a problem hiding this comment.
The env block mixes CDK and non-CDK environment variable conventions: account uses CDK_DEFAULT_ACCOUNT (set automatically by the CDK CLI from the resolved profile) while region uses AWS_REGION (a standard AWS SDK variable that is not set by the CDK CLI).
| new RagRefreshStack(app, 'RagRefreshStack', { | |
| env: { | |
| account: process.env.CDK_DEFAULT_ACCOUNT, | |
| region: process.env.AWS_REGION, | |
| }, | |
| }); | |
| new RagRefreshStack(app, 'RagRefreshStack', { | |
| env: { | |
| account: process.env.CDK_DEFAULT_ACCOUNT, | |
| region: process.env.CDK_DEFAULT_REGION, | |
| }, | |
| }); |
…ctor index ARN and fix CDK region env var Address PR review: replace wildcard resources with vectorIndex.attrIndexArn for s3vectors actions (PutVectors, QueryVectors, GetVectors, DeleteVectors) and use CDK_DEFAULT_REGION instead of AWS_REGION for consistent CDK CLI behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Issue #, if available: #3006
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.