Skip to content

feat(sfn-s3vectors-rag-refresh-cdk): Add S3 Vectors RAG refresh pattern with CDK#3005

Open
bfreiberg wants to merge 2 commits intoaws-samples:mainfrom
bfreiberg:main
Open

feat(sfn-s3vectors-rag-refresh-cdk): Add S3 Vectors RAG refresh pattern with CDK#3005
bfreiberg wants to merge 2 commits intoaws-samples:mainfrom
bfreiberg:main

Conversation

@bfreiberg
Copy link
Copy Markdown
Contributor

@bfreiberg bfreiberg commented Mar 27, 2026

Issue #, if available: #3006

Description of changes:

  • Add Step Functions state machine for automated document ingestion pipeline
  • Create S3 Vectors knowledge base with distributed map for parallel processing
  • Implement Lambda functions for embedding generation, validation, and rollback
  • Add CDK infrastructure as code for complete stack deployment
  • Include comprehensive README with deployment and testing instructions
  • Add example pattern configuration and state machine visualization
  • Configure TypeScript build setup with tsconfig and package dependencies
  • Enable vector embedding via Amazon Bedrock Titan Text Embeddings V2
  • Implement validation and automatic rollback on ingestion failure

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…rn with CDK

- Add Step Functions state machine for automated document ingestion pipeline
- Create S3 Vectors knowledge base with distributed map for parallel processing
- Implement Lambda functions for embedding generation, validation, and rollback
- Add CDK infrastructure as code for complete stack deployment
- Include comprehensive README with deployment and testing instructions
- Add example pattern configuration and state machine visualization
- Configure TypeScript build setup with tsconfig and package dependencies
- Enable vector embedding via Amazon Bedrock Titan Text Embeddings V2
- Implement validation and automatic rollback on ingestion failure
Copy link
Copy Markdown
Contributor

@marcojahn marcojahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bfreiberg, thank you for your contribution. I've created a few comments, please review & apply. TY

Comment on lines +61 to +64
embedFunction.addToRolePolicy(new iam.PolicyStatement({
actions: ['s3vectors:PutVectors'],
resources: ['*'],
}));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The embedFunction IAM policy grants s3vectors:PutVectors with resources: ['*'], allowing writes to any S3 Vectors bucket/index in the account rather than just the intended one.

Example

Suggested change
embedFunction.addToRolePolicy(new iam.PolicyStatement({
actions: ['s3vectors:PutVectors'],
resources: ['*'],
}));
embedFunction.addToRolePolicy(new iam.PolicyStatement({
actions: ['s3vectors:PutVectors'],
resources: [vectorIndex.attrIndexArn],
}));

Comment on lines +87 to +90
validateFunction.addToRolePolicy(new iam.PolicyStatement({
actions: ['s3vectors:QueryVectors', 's3vectors:GetVectors'],
resources: ['*'],
}));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validateFunction IAM policy grants s3vectors:QueryVectors and s3vectors:GetVectors with resources: ['*'], allowing reads from any S3 Vectors bucket/index in the account.

Suggested change
validateFunction.addToRolePolicy(new iam.PolicyStatement({
actions: ['s3vectors:QueryVectors', 's3vectors:GetVectors'],
resources: ['*'],
}));
validateFunction.addToRolePolicy(new iam.PolicyStatement({
actions: ['s3vectors:QueryVectors', 's3vectors:GetVectors'],
resources: [vectorIndex.attrIndexArn],
}));

Comment on lines +106 to +109
rollbackFunction.addToRolePolicy(new iam.PolicyStatement({
actions: ['s3vectors:DeleteVectors'],
resources: ['*'],
}));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rollbackFunction IAM policy grants s3vectors:DeleteVectors with resources: ['*']. This is a destructive and potentially unrecoverable action — the function could delete vectors from any S3 Vectors bucket/index in the account, not just the intended one. A wildcard on a destructive operation like DeleteVectors is materially worse than a wildcard on read/write operations.

Suggested change
rollbackFunction.addToRolePolicy(new iam.PolicyStatement({
actions: ['s3vectors:DeleteVectors'],
resources: ['*'],
}));
rollbackFunction.addToRolePolicy(new iam.PolicyStatement({
actions: ['s3vectors:DeleteVectors'],
resources: [vectorIndex.attrIndexArn],
}));

Comment on lines +8 to +13
new RagRefreshStack(app, 'RagRefreshStack', {
env: {
account: process.env.CDK_DEFAULT_ACCOUNT,
region: process.env.AWS_REGION,
},
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The env block mixes CDK and non-CDK environment variable conventions: account uses CDK_DEFAULT_ACCOUNT (set automatically by the CDK CLI from the resolved profile) while region uses AWS_REGION (a standard AWS SDK variable that is not set by the CDK CLI).

Suggested change
new RagRefreshStack(app, 'RagRefreshStack', {
env: {
account: process.env.CDK_DEFAULT_ACCOUNT,
region: process.env.AWS_REGION,
},
});
new RagRefreshStack(app, 'RagRefreshStack', {
env: {
account: process.env.CDK_DEFAULT_ACCOUNT,
region: process.env.CDK_DEFAULT_REGION,
},
});

…ctor index ARN and fix CDK region env var

Address PR review: replace wildcard resources with vectorIndex.attrIndexArn for
s3vectors actions (PutVectors, QueryVectors, GetVectors, DeleteVectors) and use
CDK_DEFAULT_REGION instead of AWS_REGION for consistent CDK CLI behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants