Make your deployments reliable with AI-driven ring-based deployment strategies powered by AWS Bedrock and Amazon Nova models.
- Overview
- Architecture
- Features
- Prerequisites
- Installation
- Configuration
- Running the Application
- AI Agents
- Simulation
- API Documentation
- Project Structure
- Troubleshooting
- Security
💡 Quick Reference: See COMMANDS.md for common commands and quick fixes.
FlexDeploy is an intelligent deployment orchestration platform that uses AI to manage device categorization, analyze deployment failures, and optimize gating factors for ring-based deployment strategies.
Key Capabilities:
- 🤖 AI-powered device ring categorization
- 📊 Automated deployment failure analysis
- 🎛️ Natural language gating factor configuration
- 📈 Real-time deployment monitoring
- 🔄 Ring-based progressive rollout
┌─────────────────────────────────────────────────────────────────┐
│ FlexDeploy System │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────┴───────────────┐
│ │
┌───────▼────────┐ ┌────────▼────────┐
│ Frontend UI │ │ Backend Server │
│ (React App) │ │ (FastAPI) │
│ │ │ │
│ - Material-UI │◄──REST────►│ - SQLite DB │
│ - Recharts │ API │ - Bedrock │
│ - Vite │ │ Agents │
└────────────────┘ └─────────┬───────┘
Port: 5173 │
│
┌───────────────────────────┴────────────┐
│ │
┌───────▼────────┐ ┌────────▼────────┐
│ AWS Bedrock │ │ SQLite DB │
│ │ │ │
│ Nova Pro v1 │ │ - Devices │
│ Nova Lite v1 │ │ - Deployments │
│ │ │ - Rings │
│ Region: │ │ - Gating │
│ us-east-2 │ │ Factors │
└────────────────┘ └─────────────────┘
▲
│
┌───────┴────────┐
│ Credentials │
│ ~/.aws/ │
│ credentials │
└────────────────┘
- Dashboard: Overview metrics and device distribution
- Devices: Device inventory with risk scores
- Deployments: Deployment management and execution
- Rings: Ring configuration and AI categorization
- Deployment Details: Ring-wise deployment status and failure analysis
- REST API: Device, deployment, and ring management
- AI Agents: Three specialized agents for intelligent operations
- Database: SQLite for persistent storage
- Configuration: Centralized config management
-
Ring Categorization Agent
- Pipeline: prompt → SQL agent → reasoning agent → result
- Automatically assigns devices to deployment rings
-
Deployment Failure Agent
- Pipeline: gating factor → prompt → result
- Analyzes failures and provides recommendations
-
Gating Factor Agent
- Pipeline: user text → prompt → gating factor → result
- Converts natural language to numeric thresholds
- Device inventory with comprehensive metrics
- Automatic risk score calculation
- Ring assignment (manual or AI-powered)
- Filter and search capabilities
- Create deployments with configurable gating factors
- Ring-based progressive rollout
- Real-time status tracking
- Start/stop deployment control
- 4 default rings (Canary, Low Risk, High Risk, VIP)
- Custom categorization prompts
- Global gating factor configuration
- AI-powered device categorization
- Intelligent device ring assignment
- Automated failure root cause analysis
- Natural language gating factor parsing
- Validation and recommendations
- Total devices, deployments, and active rings
- Device distribution visualization
- Real-time metrics
-
Python 3.11 or higher
python3 --version
-
Node.js 18 or higher
node --version
-
npm or yarn
npm --version
-
uv (Python package manager)
# Install uv curl -LsSf https://astral.sh/uv/install.sh | sh
-
AWS Account with Bedrock Access
- AWS account with IAM permissions
- Amazon Nova models enabled
- Valid credentials
- AWS Bedrock Access: Must be enabled in your account
- Model Access: Amazon Nova Pro and Nova Lite
- IAM Permissions:
bedrock:InvokeModel - Region: us-east-2 (recommended for Bedrock)
git clone https://github.com/mmohanram13/flexDeploy.git
cd flexDeploy# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install Python packages
uv pip install -e .
# Alternatively, use pip
pip install -e .Backend packages installed:
- fastapi >= 0.115.0
- uvicorn >= 0.32.0
- boto3 >= 1.35.0
- botocore >= 1.35.0
- strands-agents >= 1.12.0
- aiohttp >= 3.13.0
cd ui
npm install
cd ..Frontend packages installed:
- react 19.1.1
- react-router-dom 7.9.4
- @mui/material 7.3.4
- @mui/icons-material 7.3.4
- recharts 3.2.1
- vite 5.0.4
# Run interactive setup
./setup_config.shThis creates:
config.ini- Application settings (SSO, regions, models)~/.aws/credentials- AWS credentials template
- Edit
config.ini:
[aws]
sso_start_url = https://your-org.awsapps.com/start/#
sso_region = us-east-2
bedrock_region = us-east-2- Get SSO credentials:
- Visit your SSO start URL
- Login and select account
- Click "Command line or programmatic access"
- Copy credentials to
~/.aws/credentials
- Create IAM user with Bedrock permissions
- Generate access keys
- Add to
~/.aws/credentials:
[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
aws_session_token = IQoJb3JpZ2luX2VjEDY... # Optional, required for SSO- Login to AWS Console
- Navigate to AWS Bedrock → Model access
- Request access to:
- Amazon Nova Pro (us.amazon.nova-pro-v1:0)
- Amazon Nova Lite (us.amazon.nova-lite-v1:0)
- Wait for approval (usually instant)
# Test AWS Bedrock connection
python test_bedrock_agents.pyExpected output:
✓ Connection successful!
✓ Ring Categorization: PASSED
✓ Failure Analysis: PASSED
✓ Gating Factor Parsing: PASSED
✓ Gating Factor Validation: PASSED
Located in project root. Contains:
[aws]
# AWS SSO Configuration
sso_start_url = https://superopsglobalhackathon.awsapps.com/start/#
sso_region = us-east-2
# AWS Bedrock Configuration
bedrock_region = us-east-2
bedrock_model_pro = us.amazon.nova-pro-v1:0
bedrock_model_lite = us.amazon.nova-lite-v1:0
# Model Settings
default_max_tokens = 2000
default_temperature = 0.7
[server]
host = 0.0.0.0
port = 8000
[database]
db_name = flexdeploy.dbLocated in home directory. Contains:
[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
aws_session_token = YOUR_SESSION_TOKEN # Required for SSOImportant:
- Never commit credentials to git
- Session tokens expire (refresh regularly with SSO)
- File permissions should be 600:
chmod 600 ~/.aws/credentials
# Start both UI and server simultaneously
./run_app.shThis starts:
- Backend server on http://localhost:8000
- Frontend UI on http://localhost:5173
python -m server.mainOutput:
✓ Database connected
✓ Configuration loaded
- SSO Region: us-east-2
- Bedrock Region: us-east-2
✓ AWS Bedrock agents initialized
- Credentials from: ~/.aws/credentials
- Configuration from: config.ini
INFO: Uvicorn running on http://0.0.0.0:8000
cd ui
npm run devOutput:
VITE v5.0.4 ready in 1234 ms
➜ Local: http://localhost:5173/
➜ Network: use --host to expose
Open your browser to: http://localhost:5173
If the server is not running, you'll see a non-closable popup warning.
Purpose: Automatically categorize devices into deployment rings
Pipeline: prompt → SQL agent → reasoning agent → result
Usage:
curl -X POST http://localhost:8000/api/ai/categorize-devices \
-H "Content-Type: application/json" \
-d '{"deviceIds": ["DEV-001", "DEV-002"]}'Response:
{
"message": "Successfully categorized 2 devices",
"categorizations": [
{
"deviceId": "DEV-001",
"assignedRing": 1,
"reasoning": "Device has stable metrics with CPU at 45% and memory at 60%, suitable for Ring 1 (Low Risk)"
}
]
}Cost: ~$0.0004 per device
Purpose: Analyze why deployments failed
Pipeline: gating factor → prompt → result
Usage:
curl -X POST http://localhost:8000/api/ai/analyze-failure \
-H "Content-Type: application/json" \
-d '{
"deploymentId": "DEP-001",
"ringName": "Ring 1 - Low Risk Devices"
}'Response:
{
"deploymentId": "DEP-001",
"ringName": "Ring 1 - Low Risk Devices",
"analysis": "Primary failure reason: 15% of devices exceeded the CPU threshold of 80%..."
}Cost: ~$0.004 per analysis
Purpose: Convert natural language to numeric gating factors
Pipeline: user text → prompt → gating factor → result
Usage:
curl -X POST http://localhost:8000/api/ai/gating-factors \
-H "Content-Type: application/json" \
-d '{
"naturalLanguageInput": "Deploy only to very stable devices with low CPU usage"
}'Response:
{
"gatingFactors": {
"avgCpuUsageMax": 60.0,
"avgMemoryUsageMax": 70.0,
"avgDiskFreeSpaceMin": 30.0,
"riskScoreMin": 70,
"riskScoreMax": 100
},
"explanation": "Conservative settings: CPU<60%, Memory<70%, Disk>30% for stable devices"
}Cost: ~$0.0012 per request
Amazon Nova Pro (default):
- Input: $0.0008 per 1K tokens
- Output: $0.0032 per 1K tokens
Monthly estimate (moderate usage):
- 1000 devices categorized: $0.40
- 50 failure analyses: $0.20
- 100 gating factor requests: $0.12
- Total: ~$0.72/month
FlexDeploy includes a powerful modular simulation system for testing deployment scenarios without affecting real devices.
Web UI: Navigate to http://localhost:5173/simulator
The simulator provides:
- Stress Profiles: Pre-configured load levels (Low, Normal, High, Critical)
- Custom Metrics: Set specific CPU, memory, disk usage for rings
- Status Control: Manually control deployment ring progression
- Real-time View: See device metrics and risk scores update instantly
- Start the server:
./run_app.sh- Open the simulator:
http://localhost:5173/simulator
-
Select a deployment and ring
-
Apply a stress profile or custom metrics
-
View updated devices in the table below
| Level | CPU | Memory | Disk | Use Case |
|---|---|---|---|---|
| Low | 25% | 30% | 20% | Healthy baseline |
| Normal | 50% | 55% | 45% | Typical operation |
| High | 75% | 80% | 70% | Heavy load testing |
| Critical | 95% | 92% | 88% | Failure scenarios |
All simulator endpoints are under /api/simulator:
| Method | Endpoint | Description |
|---|---|---|
| POST | /devices |
Create/update device |
| POST | /device-metrics |
Update device metrics |
| POST | /ring-metrics |
Update ring metrics |
| POST | /deployment-status |
Update ring status |
| POST | /stress-profile |
Apply stress profile |
| GET | /deployment/{id}/ring/{ringId}/devices |
Get ring devices |
# Apply stress profile
curl -X POST http://localhost:8000/api/simulator/stress-profile \
-H "Content-Type: application/json" \
-d '{
"deploymentId": "DEP-001",
"ringId": 0,
"stressLevel": "high"
}'
# Update ring metrics
curl -X POST http://localhost:8000/api/simulator/ring-metrics \
-H "Content-Type: application/json" \
-d '{
"deploymentId": "DEP-001",
"ringId": 0,
"avgCpuUsage": 85.0,
"avgMemoryUsage": 90.0,
"avgDiskSpace": 75.0
}'The simulator is modularly structured:
- UI:
ui/src/pages/Simulator.jsx- React interface - Service:
server/simulator_service.py- Business logic - Endpoints:
server/main.py- API routes
For complete simulator documentation, see SIMULATOR.md
http://localhost:8000/api
| Method | Endpoint | Description |
|---|---|---|
| GET | /devices |
Get all devices |
| Method | Endpoint | Description |
|---|---|---|
| GET | /deployments |
Get all deployments |
| POST | /deployments |
Create new deployment |
| GET | /deployments/{id} |
Get deployment details |
| POST | /deployments/{id}/run |
Start deployment |
| POST | /deployments/{id}/stop |
Stop deployment |
| Method | Endpoint | Description |
|---|---|---|
| GET | /rings |
Get all rings |
| PUT | /rings/{id} |
Update ring configuration |
| Method | Endpoint | Description |
|---|---|---|
| GET | /gating-factors |
Get default gating factors |
| PUT | /gating-factors |
Update gating factors |
| Method | Endpoint | Description |
|---|---|---|
| POST | /ai/categorize-devices |
AI device categorization |
| POST | /ai/analyze-failure |
AI failure analysis |
| POST | /ai/gating-factors |
Parse natural language |
| POST | /ai/validate-gating-factors |
Validate thresholds |
| Method | Endpoint | Description |
|---|---|---|
| GET | /dashboard/metrics |
Get dashboard metrics |
| GET | /dashboard/device-distribution |
Get device distribution |
Create Deployment:
curl -X POST http://localhost:8000/api/deployments \
-H "Content-Type: application/json" \
-d '{
"deploymentName": "Q4 Security Patch",
"status": "Not Started",
"gatingFactorMode": "default"
}'Update Ring:
curl -X PUT http://localhost:8000/api/rings/1 \
-H "Content-Type: application/json" \
-d '{
"ringId": 1,
"ringName": "Ring 1 - Low Risk",
"categorizationPrompt": "Stable devices with good metrics"
}'flexDeploy/
├── server/ # Backend (Python/FastAPI)
│ ├── main.py # FastAPI server & API endpoints
│ ├── database.py # SQLite database management
│ ├── bedrock_agents.py # AWS Bedrock AI agents
│ └── config.py # Configuration management
│
├── flexdeploy.db # SQLite database file (root)
│
├── ui/ # Frontend (React/Vite)
│ ├── src/
│ │ ├── App.jsx # Main app component
│ │ ├── main.jsx # Entry point
│ │ ├── api/
│ │ │ └── client.js # API client
│ │ ├── components/
│ │ │ └── Layout.jsx # Layout component
│ │ ├── pages/
│ │ │ ├── Dashboard.jsx # Dashboard page
│ │ │ ├── Devices.jsx # Devices page
│ │ │ ├── Deployments.jsx # Deployments page
│ │ │ ├── DeploymentDetail.jsx
│ │ │ └── Rings.jsx # Rings page
│ │ └── theme/
│ │ └── theme.js # Material-UI theme
│ ├── package.json # npm dependencies
│ └── vite.config.js # Vite configuration
│
├── simulator/ # Deployment simulator
│ ├── run_orchestrator.py # Orchestrator
│ ├── src/
│ │ ├── master/ # Master node
│ │ └── slave/ # Agent nodes
│ └── examples/ # Example scripts
│
├── config.ini # Application configuration
├── config.ini.example # Configuration template
├── pyproject.toml # Python dependencies
├── run_app.sh # Start both UI & server
├── setup_config.sh # Setup configuration
├── deploy_bedrock.sh # Deploy with verification
├── test_bedrock_agents.py # Test AI agents
└── README.md # This file
Problem: Error loading configuration
# Verify config.ini exists
ls -la config.ini
# Create from template if missing
cp config.ini.example config.iniProblem: AWS credentials not found
# Check credentials file
cat ~/.aws/credentials
# Verify permissions
chmod 600 ~/.aws/credentials
# Test AWS connection
aws sts get-caller-identityProblem: Database error
# Check database exists
ls -la flexdeploy.db
# If corrupted, restore from backup
cp flexdeploy.db.backup flexdeploy.dbProblem: "Server Not Running" popup
- Start the backend server:
python -m server.main - Verify it's running:
curl http://localhost:8000/
Problem: npm packages missing
cd ui
rm -rf node_modules
npm installProblem: Port 5173 already in use
# Kill existing process
lsof -ti:5173 | xargs kill -9
# Or use different port
npm run dev -- --port 3000Problem: "AI service not available"
# Test Bedrock connection
python test_bedrock_agents.py
# Check credentials are valid
aws sts get-caller-identity
# Verify Bedrock access
aws bedrock list-foundation-models --region us-east-2Problem: "AccessDeniedException"
- Enable model access in AWS Console → Bedrock → Model access
- Request access to Amazon Nova Pro and Lite
- Verify IAM permissions include
bedrock:InvokeModel
Problem: "Token expired"
- SSO tokens expire after 1-12 hours
- Refresh credentials:
aws sso login # Or get new credentials from SSO portal
Problem: Missing tables
# Check database integrity
sqlite3 flexdeploy.db ".tables"
# If tables missing, run migrations
python server/migrate_data.pyProblem: Slow AI responses
- Switch to Nova Lite for faster responses
- Edit
config.ini:bedrock_model_pro = us.amazon.nova-lite-v1:0
Problem: High AWS costs
- Use batch operations instead of individual calls
- Implement caching for repeated queries
- Switch to Nova Lite model
-
Never Commit Credentials
config.iniis in.gitignore~/.aws/credentialsis in.gitignore- Review files before committing
-
Secure File Permissions
chmod 600 ~/.aws/credentials chmod 644 config.ini -
Use AWS SSO
- Temporary credentials
- Automatic expiration
- Centralized access control
-
Rotate Credentials
- SSO tokens expire automatically
- Manual keys should rotate every 90 days
-
Least Privilege IAM
{ "Effect": "Allow", "Action": ["bedrock:InvokeModel"], "Resource": [ "arn:aws:bedrock:us-east-2::foundation-model/us.amazon.nova-*" ] } -
Environment Variables (Production)
export AWS_ACCESS_KEY_ID="..." export AWS_SECRET_ACCESS_KEY="..."
-
Enable CloudTrail
- Track all Bedrock API calls
- Monitor for unusual activity
For production, use IAM roles instead of credentials:
# server/bedrock_agents.py
# No credentials needed - uses IAM role automatically
boto3.client('bedrock-runtime', region_name='us-east-2')Deploy on:
- AWS EC2 with IAM role
- AWS ECS with task role
- AWS Lambda with execution role
# 1. Install dependencies
uv pip install -e .
cd ui && npm install && cd ..
# 2. Setup configuration
./setup_config.sh
# 3. Edit AWS credentials
nano ~/.aws/credentials
# 4. Test Bedrock connection
python test_bedrock_agents.py
# 5. Run application
./run_app.sh
# 6. Open browser
# http://localhost:5173- Check logs: Server logs show detailed error messages
- Test components: Run
test_bedrock_agents.pyto verify AI agents - Review configuration: Ensure
config.iniand~/.aws/credentialsare correct - Verify AWS: Check Bedrock model access and IAM permissions
# View configuration
python -c "from server.config import get_config; get_config().print_config()"
# Test database
sqlite3 flexdeploy.db "SELECT COUNT(*) FROM devices;"
# Check AWS credentials
aws configure list
# View server logs
python -m server.main 2>&1 | tee server.log
# Reinstall packages
uv pip install -e . --force-reinstall
cd ui && npm install --force && cd ..- AWS Bedrock for AI infrastructure
- Amazon Nova models for intelligent analysis
- Material-UI for beautiful components
- FastAPI for high-performance backend
- React for modern frontend