Multimodal AI Learning Assistant on AWS

An AI-powered learning platform that transforms multimedia content — videos, audio, PDFs, and images — into a personalized, interactive learning experience. Upload your content, and the assistant builds a searchable knowledge base, creates structured courses, delivers adaptive lessons, quizzes you with intelligent feedback, and tracks your progress over time.

short.mp4

What You Can Do

Turn any content into a knowledge base. Upload videos, podcasts, PDFs, or images and the platform automatically extracts, indexes, and organizes the content. Ask questions in natural language and get answers with citations that link directly to the source — click a video timestamp to jump to the exact moment, or a page reference to open the right spot in a PDF.

Create courses from any topic. Describe what you want to learn and the assistant researches it, structures it into a multi-part course with subtopics, and populates each section with real content. Courses are saved to your library and can be revisited anytime.

Learn through structured lessons. Each subtopic is delivered as a structured lesson: a motivating hook, a clear explanation of the core concept, a concrete example drawn from your knowledge base, and a concise takeaway. The assistant adapts based on your history — if you've struggled with a topic before, it adjusts its approach.

Test yourself with interactive quizzes. After each lesson, take quizzes generated from the actual content. Get immediate, cited feedback that explains why the correct answer is right. Answer incorrectly and the assistant re-explains using a different angle before trying again.

Review with flashcards. Reinforce key concepts with auto-generated flashcards. Mark cards as mastered or flag them for review — your progress persists across sessions.

Track your learning. Every quiz score, flashcard review, and checklist completion is recorded. The assistant identifies weak areas and suggests what to study next, creating a personalized learning path.

Talk to your assistant. Switch to voice mode for real-time speech-to-speech conversations powered by Amazon Nova Sonic. Ask questions, get explanations, and learn hands-free.

Choose your learning mode. Training Mode keeps you focused on your uploaded knowledge base with structured study. Self-Study Mode unlocks web research and the ability to save new findings directly to your knowledge base.

Architecture Overview

The application consists of three independent codebases deployed as a modular AWS CDK application:

Application Layer

agent/ — Python backend powered by Strands Agents and FastAPI. An orchestrator agent with 20 tools (9 core + 11 self-study) handles knowledge base search, course creation, lesson delivery, quiz generation, progress tracking, and web research. Serves the AG-UI protocol for real-time frontend communication. Voice mode uses BidiAgent with Amazon Nova Sonic for speech-to-speech.
frontend/ — Next.js 16 + React 19 + TypeScript with CopilotKit for the chat interface. Interactive UI components (QuizCard, FlashcardDeck, MediaViewer, DataPanel) for quizzes, flashcards, checklists, media viewing, file upload/processing, and progress dashboards. Deploys as a static site to S3/CloudFront.
cdk/ — AWS CDK (TypeScript) infrastructure-as-code. One-command deployment of the full stack.

Infrastructure Stacks (CDK)

Storage & Distribution Stack (storage-dist-stack.ts)
- Media Bucket: Secure bucket for uploaded source files
- Organized Bucket: Processed and extracted content
- Application Host Bucket: Frontend static site
- CloudFront distribution with Origin Access Control
Auth Stack (auth-stack.ts)
- Cognito User Pool for authentication
- Cognito Identity Pool for authorized access to AWS resources
- User Pool Client for application integration
OpenSearch Stack (opensearch-stack.ts)
- OpenSearch Serverless collection for vector search
- Custom resource creates kNN indices
Processing Stack (processing-stack.ts)
- Initial Processing Lambda: Handles S3 uploads and triggers Bedrock Data Automation
- BDA Processing Lambda: Processes extraction results into searchable text with metadata (timestamps, speakers, page numbers)
- Bedrock Knowledge Base with vector store integration
- DynamoDB table for courses, user progress, and preferences
- File Status DynamoDB table for tracking upload and processing state per file
AgentCore Stack (agentcore-stack.ts, optional — deployed when deployAgentCore=true)
- AgentCore Memory for conversation persistence (STM + LTM)
- Configurable memory expiry and LTM strategies
- Integrated with Cognito for user-specific memory namespaces

AWS Services Used

Service	Purpose
Amazon Bedrock	Foundation models (Claude) for the AI assistant
Amazon Nova Sonic	Real-time speech-to-speech voice conversations
Bedrock Knowledge Bases	Vector search over uploaded and researched content (retrieves 15 results, reranks to top 5 via Cohere Rerank)
Bedrock Data Automation	Extracts text from videos, audio, PDFs, and images
Bedrock AgentCore Runtime	Managed agent hosting with memory integration
Amazon OpenSearch Serverless	Vector store for knowledge base embeddings
Amazon DynamoDB	Courses, user progress, preferences, and learning state
Amazon S3 + CloudFront	Frontend hosting and media storage
Amazon Cognito	User authentication and authorization

Data Flow

Upload (video/audio/PDF/image)
  -> S3 Media Bucket
  -> EventBridge triggers Processing Lambda
  -> Bedrock Data Automation extracts text + metadata
  -> Organized Bucket stores processed content
  -> Bedrock Knowledge Base indexes for vector search
  -> Agent queries KB and delivers cited responses

Key Parameters

Parameter	Description	Default/Constraints
ModelId	The Amazon Bedrock supported LLM inference profile ID used for inference.	Default: "us.anthropic.claude-sonnet-4-6"
EmbeddingModelId	The Amazon Bedrock supported embedding LLM ID used in Bedrock Knowledge Base.	Default: "amazon.titan-embed-text-v2:0"
DataParser	The data processing strategy to use for multimedia content.	Default: "Bedrock Data Automation"
ResourceSuffix	Suffix to append to resource names (e.g., dev, test, prod)	Alphanumeric + hyphens, 1-20 chars

Features

AI-powered course creation with automatic web research per subtopic
Structured lesson delivery: Hook, Concept, Example, Takeaway
Interactive quizzes with cited feedback and adaptive remediation
Flashcard review with mastery tracking
Per-user progress tracking with concept-level insights
Learning checklists per course subtopic
Two learning modes: Training (KB-focused) and Self-Study (web research enabled)
Multimedia content processing (video, audio, PDF, image)
Cited responses with clickable video timestamps and PDF page links
Real-time speech-to-speech conversations via Amazon Nova Sonic
Conversation persistence with AgentCore Memory (STM + LTM)
Web research with Tavily search and save-to-KB
User authentication with Amazon Cognito

Security Features

IAM roles with least privilege access
Cognito User Pool for authentication
JWT-based authorization — user identity derived server-side from token
Tool filtering enforced server-side based on user mode stored in DynamoDB
CloudFront with Origin Access Control for secure content delivery

Prerequisites

AWS CLI with credentials configured
Node.js 18+ and npm
Python 3.12+ and uv
AWS CDK CLI (npm install -g aws-cdk)
Git
Docker (for AgentCore Runtime container builds)
Bedrock model access enabled for: Claude (inference), Titan Embed Text v2 (embeddings), Cohere Rerank v3.5 (KB reranking), and optionally Nova Sonic (voice)

Deployment

Option 1: Automated Deployment (Recommended)

The solution includes a comprehensive deployment script that handles all aspects of deployment:

Clone the repository and navigate to the project folder:

git clone <repository-url>
cd sample-multimodal-training-assistant

Run the deployment script:
```
./deploy.sh -e dev
```

This will deploy:

All infrastructure stacks with default options
Frontend application
Local development configuration

AgentCore Runtime Deployment

Quick Start

# First time: Deploy everything
./deploy.sh -e dev -r us-west-2

# Update agent code only
./deploy.sh --agentcore-runtime -e dev -r us-west-2

How It Works

Full Deployment (./deploy.sh)
- Deploys CDK infrastructure (Memory, execution role, etc.)
- Deploys AgentCore Runtime with the agent
- Stores Runtime ARN in SSM
Agent Updates (./deploy.sh --agentcore-runtime)
- Copies latest agent code + dependencies
- Rebuilds Docker image
- Updates both text and voice AgentCore Runtimes
- Same Runtime ARNs (no frontend changes needed)

What Gets Deployed to AgentCore

Agent code:

agents/ — Orchestrator agent with system prompt and tool routing
tools/ — KB search, course management, lesson delivery, quiz generation, progress tracking, web research
lib/ — KB client (with Cohere reranking), DynamoDB client, user context utilities Infrastructure references (from SSM/CloudFormation):
Region, KB ID, model ID, DynamoDB table, S3 buckets
Rerank model ID (default: cohere.rerank-v3-5:0)
Tavily API key (if configured via --tavily-key)
Memory ID, execution role from CDK stack
Deployment mode: Container (Docker)

Two runtimes are deployed:

Text Runtime: AG-UI protocol for chat (Claude model)
Voice Runtime: BidiAgent with Amazon Nova Sonic for speech-to-speech

Memory Integration

The deployed agent automatically:

Reads memory ID from SSM
Extracts actor_id from Cognito JWT
Configures user-specific memory namespaces
Persists conversations across sessions

Deployment Options

Option	Description	Default
-e ENV	Environment name for resource naming	dev
-r REGION	AWS region	from AWS CLI config
-p PROFILE	AWS profile name to use	default
-m DAYS	Memory expiry duration in days	30
-s	Skip infrastructure deployment (generate config only)	false
-i	Generate local configuration only (no deployment)	false
--agentcore-runtime	Update AgentCore Runtime agent code only	false
--frontend	Build and deploy static Next.js frontend to S3/CloudFront	false
--tavily-key KEY	Store Tavily API key in SSM (enables web research in Self-Study mode)	-
--memory disable	Disable AgentCore Memory	enabled by default
--ltm disable	Disable Long-Term Memory (use STM only)	LTM enabled by default
-h	Show help message	-

Example Deployment Commands

Full Deployment

./deploy.sh -e dev -r us-west-2

Deploys all infrastructure stacks plus AgentCore Memory, generates local config files.

Update Agent Code Only

./deploy.sh --agentcore-runtime -e dev -r us-west-2

Rebuilds and deploys just the agent to AgentCore Runtime — no infrastructure changes.

Deploy Frontend Only

./deploy.sh --frontend -e dev

Builds the Next.js static export and syncs to S3, then invalidates CloudFront cache.

Memory Configuration

# Short-Term Memory only (no long-term learning)
./deploy.sh -e dev --ltm disable

# Custom memory expiry (60 days)
./deploy.sh -e prod -m 60

# Stateless conversations (no memory)
./deploy.sh -e test --memory disable

Web Research (Self-Study Mode)

Self-Study mode enables web research and course creation from web sources. It requires a Tavily API key (free tier: 1,000 credits/month).

# Store the key in SSM (one-time setup)
./deploy.sh -e dev --tavily-key tvly-your-key-here

# Then deploy the agent so it picks up the key
./deploy.sh --agentcore-runtime -e dev -r us-west-2

The key is stored as a SecureString in SSM (/multimedia-rag/{env}/tavily-api-key). All subsequent deploy.sh -i and --agentcore-runtime commands read it automatically. Without a Tavily key, the platform works normally in Training mode — only Self-Study web research features are unavailable.

Local Config Generation

./deploy.sh -e dev -i

Generates frontend/.env.local and agent/.env for local development without deploying anything. If a Tavily key is stored in SSM, it's included in agent/.env automatically.

Local Development

# From the repo root — starts agent (port 8000) + frontend (port 3000)
./start-local.sh

If USE_RUNTIME=true is set in frontend/.env.local, the script skips the local agent and the frontend connects directly to AgentCore Runtime instead.

Or run them separately:

# Frontend
cd frontend
npm run dev

# Agent (AG-UI protocol on port 8000)
cd agent
uv run main.py

Note: Local AG-UI mode persists conversation history via S3SessionManager (when S3_SESSION_BUCKET is configured) but does not include AgentCore long-term memory. Use AgentCore Runtime for full memory integration (STM + LTM).

Speech-to-Speech Capabilities

Real-time voice conversations powered by Amazon Nova Sonic via AgentCore BidiAgent:

Bidirectional audio streaming: Talk naturally with your knowledge base using voice
Voice-based learning: Ask questions, get explanations — all by voice
WebSocket communication: Frontend connects via SigV4 pre-signed URL using Cognito Identity Pool credentials
Same knowledge base: Voice agent uses the same KB search and tools as text chat
Dedicated voice agent: Concise system prompt optimized for spoken responses

The voice agent is deployed as a separate AgentCore BidiAgent runtime. The frontend captures mic audio at 16kHz PCM and plays back LPCM 24kHz via an AudioWorklet.

IMPORTANT: Amazon Nova Sonic is currently only available in us-east-1 (N. Virginia).

AgentCore Memory

Conversation persistence and user preference learning:

Features

Short-Term Memory (STM): Maintains conversation history within a session
Long-Term Memory (LTM): Learns user preferences, facts, and session summaries across sessions
User-Specific: Integrated with Cognito for per-user memory namespaces
Configurable Expiry: Default 30 days, customizable from 3-365 days

Memory Strategies (LTM)

When LTM is enabled (default), three strategies enhance conversations:

Session Summaries: Automatically summarizes conversation sessions
User Preferences: Learns and stores user preferences across sessions
Semantic Facts: Extracts and stores factual information from conversations

Usage

Getting Started

Access the application at https://<CloudFront-Domain-Name>.cloudfront.net/
Sign in with your Cognito credentials
Upload content using the DataPanel sidebar — videos, PDFs, audio, or images
Wait for processing and KB sync to complete (status tracked per file in the DataPanel)
Start chatting — ask questions about your content, request a course, or dive into a lesson

Learning Workflow

Ask the assistant to create a course on any topic — it will research and structure it automatically
Select a subtopic from the course panel to start a lesson
Read the lesson, then accept the quiz offer to test yourself
Review feedback — correct answers get confirmation with sources, wrong answers get re-explanation
Track progress via checklists and the progress dashboard
Use flashcards to reinforce key concepts

Learning Modes

Training Mode: Focused on your uploaded knowledge base. Courses, quizzes, flashcards, and progress tracking.
Self-Study Mode: Everything in Training plus web research and save-to-KB. Toggle in the sidebar.

Supported File Formats

The application supports various file formats through Amazon Bedrock Data Automation:

Documents: PDF (.pdf), Microsoft Word (.doc, .docx), Text (.txt), CSV (.csv)

Images: JPEG/JPG (.jpg, .jpeg), PNG (.png), TIFF (.tiff), GIF (.gif), BMP (.bmp)

Video: MP4 (.mp4), WebM (.webm)

Audio: MP3 (.mp3), WAV (.wav), M4A (.m4a)

Troubleshooting

Common Deployment Issues

Bedrock Data Automation API Error:
- Error: "ValidationException when calling the CreateDataAutomationProject operation"
- Solution: The BDA API parameters may have changed. Update the audio configuration in processing-stack.ts.
Region Compatibility:
- Error: Services not available in the selected region
- Solution: Bedrock Data Automation is available in limited regions. Deploy to a supported region like us-west-2.
CloudFront Issues:
- Problem: Frontend not showing after deployment
- Solution: Check CloudFront distribution status and verify S3 bucket policy allows CloudFront access.
Speech-to-Speech Issues:
- Error: Voice agent deployment failed
- Solution: Verify you are deploying in us-east-1, as Nova Sonic is only available there.
AgentCore Runtime Issues:
- Error: Agent code update failed
- Solution: Ensure Docker is running and your AWS credentials have permissions for AgentCore Runtime operations.

Missing Environment Variables

If local development is missing configuration:

Run ./deploy.sh -e dev -i to regenerate config files without deployment
Verify .env.local exists in the frontend/ directory and .env exists in the agent/ directory

Cost

You are responsible for the cost of the AWS services used while running this Guidance. As of May 2025, the estimated infrastructure cost for running this Guidance with the default settings is approximately $176 per month (excluding Bedrock model inference and AgentCore Runtime charges).

This estimate is based on a deployment supporting 100 active users executing approximately 1,000 multimedia RAG conversations per month. Each conversation processes an average of 5 multimedia files (PDFs, images, videos, etc.) through the RAG pipeline.

AWS Service	Dimensions	Cost - USD
Amazon S3	100GB storage, 5K PUT, 50K GET requests	$4.75
Amazon OpenSearch Serverless	On-demand capacity (8hrs/day), 50GB storage	$152.40
AWS Lambda	5K retrieval + 5K processing invocations	$1.22
Amazon DynamoDB	On-demand, ~10K reads/writes per month	$1.25
Amazon CloudFront	100GB data transfer, 1M requests	$9.50
Amazon EventBridge	5,000 custom events	$0.01
Amazon CloudWatch	5GB logs, 10 metrics, 1 dashboard	$6.50
Amazon Cognito	100 monthly active users	$0.28
TOTAL		~$175.91

Note: This estimate covers AWS infrastructure costs only. Amazon Bedrock model inference charges, AgentCore Runtime charges, and Amazon Nova Sonic charges apply based on usage and are not included. Refer to Amazon Bedrock pricing for current costs.

We recommend creating a Budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance.

Limitations

File Size Limits:
- Videos: Up to 5GB
- Audio: Up to 2GB
- Documents: Up to 20 pages or 100MB
- Images: Up to 50MB each
API Limitations:
- Bedrock Data Automation has quota limitations — check AWS documentation for current limits
- Knowledge Base sync operations can take several minutes to complete
Regional Availability:
- Bedrock Data Automation is available in limited regions
- Speech-to-Speech (Amazon Nova Sonic) is only available in us-east-1
- AgentCore Runtime availability varies by region
- Ensure all required services are available in your target region
Resource Management:
- Files must be manually deleted from media and organized buckets
- Knowledge Base must be manually synced to reflect deleted files

Cleanup

When you're done with the solution, follow these steps to remove all deployed resources and avoid ongoing charges.

Automated Cleanup

Empty S3 buckets first (required before stack deletion):

# Get bucket names from CloudFormation outputs (replace 'dev' with your environment name)
MEDIA_BUCKET=$(aws cloudformation describe-stacks \
  --stack-name "MultimediaRagStack-dev" \
  --query "Stacks[0].Outputs[?OutputKey=='MediaBucketName'].OutputValue" \
  --output text \
  --region <your-region> \
  --profile <your-profile>)

ORGANIZED_BUCKET=$(aws cloudformation describe-stacks \
  --stack-name "MultimediaRagStack-dev" \
  --query "Stacks[0].Outputs[?OutputKey=='OrganizedBucketName'].OutputValue" \
  --output text \
  --region <your-region> \
  --profile <your-profile>)

APP_BUCKET=$(aws cloudformation describe-stacks \
  --stack-name "MultimediaRagStack-dev" \
  --query "Stacks[0].Outputs[?OutputKey=='ApplicationHostBucketName'].OutputValue" \
  --output text \
  --region <your-region> \
  --profile <your-profile>)

# Empty all buckets
aws s3 rm s3://$MEDIA_BUCKET --recursive
aws s3 rm s3://$ORGANIZED_BUCKET --recursive
aws s3 rm s3://$APP_BUCKET --recursive

Delete the CloudFormation stacks:

cd cdk

# Delete AgentCore stack first (if deployed)
npx cdk destroy AgentCoreStack-dev --profile <your-profile> --region <your-region> --force

# Delete main stack
npx cdk destroy MultimediaRagStack-dev --profile <your-profile> --region <your-region> --force

Resources Requiring Manual Deletion

Some resources may require manual deletion through the AWS Console:

Amazon Bedrock Knowledge Base — Navigate to Bedrock console, select the knowledge base (docs-kb-{environment}), delete
AgentCore Runtimes — Delete any AgentCore Runtime deployments via the Bedrock AgentCore console
CloudWatch Log Groups — Delete log groups containing your environment suffix
OpenSearch Serverless Collection — If stuck in "DELETING" state, contact AWS Support

Verification

After cleanup, verify these resources are deleted:

S3 buckets (Media, Organized, Application Host)
CloudFront distribution
Lambda functions
OpenSearch Serverless collection
Bedrock Knowledge Base
Cognito User Pool and Identity Pool
DynamoDB table
AgentCore Runtimes and Memory resources
CloudWatch Log groups

This sample solution is intended to be used with public, non-sensitive data only

This is a demonstration/sample solution and is not intended for production use. Please note:

Do not use sensitive, confidential, or critical data
Do not process personally identifiable information (PII)
Use only public data for testing and demonstration purposes
This solution is provided for learning and evaluation purposes only

Third-Party Services

This solution integrates with the following third-party services in Self-Study Mode. You are responsible for reviewing and complying with their respective terms.

Tavily Search API — Used for web research capabilities. Requires a Tavily account and API key. Tavily offers a free tier (1,000 API credits/month) with paid plans for higher usage. Review Tavily Terms of Service and Tavily Pricing before use.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Notices

Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided "as is" without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
agent		agent
assets		assets
cdk		cdk
frontend		frontend
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
THIRD-PARTY-LICENSES		THIRD-PARTY-LICENSES
deploy.sh		deploy.sh

Folders and files

Latest commit

History

Repository files navigation

Multimodal AI Learning Assistant on AWS

What You Can Do

Architecture Overview

Application Layer

Infrastructure Stacks (CDK)

AWS Services Used

Data Flow

Key Parameters

Features

Security Features

Prerequisites

Deployment

Option 1: Automated Deployment (Recommended)

AgentCore Runtime Deployment

Quick Start

How It Works

What Gets Deployed to AgentCore

Memory Integration

Deployment Options

Example Deployment Commands

Full Deployment

Update Agent Code Only

Deploy Frontend Only

Memory Configuration

Web Research (Self-Study Mode)

Local Config Generation

Local Development

Speech-to-Speech Capabilities

AgentCore Memory

Features

Memory Strategies (LTM)

Usage

Getting Started

Learning Workflow

Learning Modes

Supported File Formats

Troubleshooting

Common Deployment Issues

Missing Environment Variables

Cost

Limitations

Cleanup

Automated Cleanup

Resources Requiring Manual Deletion

Verification

This sample solution is intended to be used with public, non-sensitive data only

Third-Party Services

Security

License

Notices

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages