FlexDeploy - AI-Powered Deployment Orchestrator

Make your deployments reliable with AI-driven ring-based deployment strategies powered by AWS Bedrock and Amazon Nova models.

📋 Table of Contents

Overview
Architecture
Features
Prerequisites
Installation
Configuration
Running the Application
AI Agents
Simulation
API Documentation
Project Structure
Troubleshooting
Security

💡 Quick Reference: See COMMANDS.md for common commands and quick fixes.

🎯 Overview

FlexDeploy is an intelligent deployment orchestration platform that uses AI to manage device categorization, analyze deployment failures, and optimize gating factors for ring-based deployment strategies.

Key Capabilities:

🤖 AI-powered device ring categorization
📊 Automated deployment failure analysis
🎛️ Natural language gating factor configuration
📈 Real-time deployment monitoring
🔄 Ring-based progressive rollout

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         FlexDeploy System                        │
└─────────────────────────────────────────────────────────────────┘
                                │
                ┌───────────────┴───────────────┐
                │                               │
        ┌───────▼────────┐            ┌────────▼────────┐
        │   Frontend UI  │            │  Backend Server │
        │   (React App)  │            │   (FastAPI)     │
        │                │            │                 │
        │  - Material-UI │◄──REST────►│  - SQLite DB    │
        │  - Recharts    │   API      │  - Bedrock      │
        │  - Vite        │            │    Agents       │
        └────────────────┘            └─────────┬───────┘
         Port: 5173                             │
                                                │
                    ┌───────────────────────────┴────────────┐
                    │                                        │
            ┌───────▼────────┐                      ┌────────▼────────┐
            │  AWS Bedrock   │                      │   SQLite DB     │
            │                │                      │                 │
            │  Nova Pro v1   │                      │  - Devices      │
            │  Nova Lite v1  │                      │  - Deployments  │
            │                │                      │  - Rings        │
            │  Region:       │                      │  - Gating       │
            │  us-east-2     │                      │    Factors      │
            └────────────────┘                      └─────────────────┘
                    ▲
                    │
            ┌───────┴────────┐
            │  Credentials   │
            │ ~/.aws/        │
            │  credentials   │
            └────────────────┘

Component Details

Frontend (React + Material-UI)

Dashboard: Overview metrics and device distribution
Devices: Device inventory with risk scores
Deployments: Deployment management and execution
Rings: Ring configuration and AI categorization
Deployment Details: Ring-wise deployment status and failure analysis

Backend (FastAPI + Python)

REST API: Device, deployment, and ring management
AI Agents: Three specialized agents for intelligent operations
Database: SQLite for persistent storage
Configuration: Centralized config management

AI Agents (AWS Bedrock + Amazon Nova)

Ring Categorization Agent
- Pipeline: prompt → SQL agent → reasoning agent → result
- Automatically assigns devices to deployment rings
Deployment Failure Agent
- Pipeline: gating factor → prompt → result
- Analyzes failures and provides recommendations
Gating Factor Agent
- Pipeline: user text → prompt → gating factor → result
- Converts natural language to numeric thresholds

✨ Features

Device Management

Device inventory with comprehensive metrics
Automatic risk score calculation
Ring assignment (manual or AI-powered)
Filter and search capabilities

Deployment Management

Create deployments with configurable gating factors
Ring-based progressive rollout
Real-time status tracking
Start/stop deployment control

Ring Configuration

4 default rings (Canary, Low Risk, High Risk, VIP)
Custom categorization prompts
Global gating factor configuration
AI-powered device categorization

AI-Powered Features

Intelligent device ring assignment
Automated failure root cause analysis
Natural language gating factor parsing
Validation and recommendations

Dashboard & Monitoring

Total devices, deployments, and active rings
Device distribution visualization
Real-time metrics

📦 Prerequisites

Required Software

Python 3.11 or higher
```
python3 --version
```
Node.js 18 or higher
```
node --version
```
npm or yarn
```
npm --version
```

uv (Python package manager)

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

AWS Account with Bedrock Access
- AWS account with IAM permissions
- Amazon Nova models enabled
- Valid credentials

AWS Requirements

AWS Bedrock Access: Must be enabled in your account
Model Access: Amazon Nova Pro and Nova Lite
IAM Permissions: bedrock:InvokeModel
Region: us-east-2 (recommended for Bedrock)

🚀 Installation

Step 1: Clone Repository

git clone https://github.com/mmohanram13/flexDeploy.git
cd flexDeploy

Step 2: Install Backend Dependencies

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install Python packages
uv pip install -e .

# Alternatively, use pip
pip install -e .

Backend packages installed:

fastapi >= 0.115.0
uvicorn >= 0.32.0
boto3 >= 1.35.0
botocore >= 1.35.0
strands-agents >= 1.12.0
aiohttp >= 3.13.0

Step 3: Install Frontend Dependencies

cd ui
npm install
cd ..

Frontend packages installed:

react 19.1.1
react-router-dom 7.9.4
@mui/material 7.3.4
@mui/icons-material 7.3.4
recharts 3.2.1
vite 5.0.4

Step 4: Setup Configuration

# Run interactive setup
./setup_config.sh

This creates:

config.ini - Application settings (SSO, regions, models)
~/.aws/credentials - AWS credentials template

Step 5: Configure AWS

Option A: Using AWS SSO (Recommended)

Edit config.ini:

[aws]
sso_start_url = https://your-org.awsapps.com/start/#
sso_region = us-east-2
bedrock_region = us-east-2

Get SSO credentials:
- Visit your SSO start URL
- Login and select account
- Click "Command line or programmatic access"
- Copy credentials to ~/.aws/credentials

Option B: Using IAM User Keys

Create IAM user with Bedrock permissions
Generate access keys
Add to ~/.aws/credentials:

[default]
aws_access_key_id = AKIAIOSFODNN7EXAMPLE
aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
aws_session_token = IQoJb3JpZ2luX2VjEDY...  # Optional, required for SSO

Step 6: Enable Bedrock Model Access

Login to AWS Console
Navigate to AWS Bedrock → Model access
Request access to:
- Amazon Nova Pro (us.amazon.nova-pro-v1:0)
- Amazon Nova Lite (us.amazon.nova-lite-v1:0)
Wait for approval (usually instant)

Step 7: Verify Installation

# Test AWS Bedrock connection
python test_bedrock_agents.py

Expected output:

✓ Connection successful!
✓ Ring Categorization: PASSED
✓ Failure Analysis: PASSED
✓ Gating Factor Parsing: PASSED
✓ Gating Factor Validation: PASSED

⚙️ Configuration

Configuration Files

config.ini (Application Settings)

Located in project root. Contains:

[aws]
# AWS SSO Configuration
sso_start_url = https://superopsglobalhackathon.awsapps.com/start/#
sso_region = us-east-2

# AWS Bedrock Configuration
bedrock_region = us-east-2
bedrock_model_pro = us.amazon.nova-pro-v1:0
bedrock_model_lite = us.amazon.nova-lite-v1:0

# Model Settings
default_max_tokens = 2000
default_temperature = 0.7

[server]
host = 0.0.0.0
port = 8000

[database]
db_name = flexdeploy.db

~/.aws/credentials (AWS Credentials)

Located in home directory. Contains:

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY
aws_session_token = YOUR_SESSION_TOKEN  # Required for SSO

Important:

Never commit credentials to git
Session tokens expire (refresh regularly with SSO)
File permissions should be 600: chmod 600 ~/.aws/credentials

🎮 Running the Application

Method 1: Run Both Services (Recommended)

# Start both UI and server simultaneously
./run_app.sh

This starts:

Backend server on http://localhost:8000
Frontend UI on http://localhost:5173

Method 2: Run Separately

Start Backend Server

python -m server.main

Output:

✓ Database connected
✓ Configuration loaded
  - SSO Region: us-east-2
  - Bedrock Region: us-east-2
✓ AWS Bedrock agents initialized
  - Credentials from: ~/.aws/credentials
  - Configuration from: config.ini
INFO: Uvicorn running on http://0.0.0.0:8000

Start Frontend UI

cd ui
npm run dev

Output:

VITE v5.0.4  ready in 1234 ms

➜  Local:   http://localhost:5173/
➜  Network: use --host to expose

Accessing the Application

Open your browser to: http://localhost:5173

If the server is not running, you'll see a non-closable popup warning.

🤖 AI Agents

1. Ring Categorization Agent

Purpose: Automatically categorize devices into deployment rings

Pipeline: prompt → SQL agent → reasoning agent → result

Usage:

curl -X POST http://localhost:8000/api/ai/categorize-devices \
  -H "Content-Type: application/json" \
  -d '{"deviceIds": ["DEV-001", "DEV-002"]}'

Response:

{
  "message": "Successfully categorized 2 devices",
  "categorizations": [
    {
      "deviceId": "DEV-001",
      "assignedRing": 1,
      "reasoning": "Device has stable metrics with CPU at 45% and memory at 60%, suitable for Ring 1 (Low Risk)"
    }
  ]
}

Cost: ~$0.0004 per device

2. Deployment Failure Agent

Purpose: Analyze why deployments failed

Pipeline: gating factor → prompt → result

Usage:

curl -X POST http://localhost:8000/api/ai/analyze-failure \
  -H "Content-Type: application/json" \
  -d '{
    "deploymentId": "DEP-001",
    "ringName": "Ring 1 - Low Risk Devices"
  }'

Response:

{
  "deploymentId": "DEP-001",
  "ringName": "Ring 1 - Low Risk Devices",
  "analysis": "Primary failure reason: 15% of devices exceeded the CPU threshold of 80%..."
}

Cost: ~$0.004 per analysis

3. Gating Factor Agent

Purpose: Convert natural language to numeric gating factors

Pipeline: user text → prompt → gating factor → result

Usage:

curl -X POST http://localhost:8000/api/ai/gating-factors \
  -H "Content-Type: application/json" \
  -d '{
    "naturalLanguageInput": "Deploy only to very stable devices with low CPU usage"
  }'

Response:

{
  "gatingFactors": {
    "avgCpuUsageMax": 60.0,
    "avgMemoryUsageMax": 70.0,
    "avgDiskFreeSpaceMin": 30.0,
    "riskScoreMin": 70,
    "riskScoreMax": 100
  },
  "explanation": "Conservative settings: CPU<60%, Memory<70%, Disk>30% for stable devices"
}

Cost: ~$0.0012 per request

Cost Estimates

Amazon Nova Pro (default):

Input: $0.0008 per 1K tokens
Output: $0.0032 per 1K tokens

Monthly estimate (moderate usage):

1000 devices categorized: $0.40
50 failure analyses: $0.20
100 gating factor requests: $0.12
Total: ~$0.72/month

🧪 Simulation

FlexDeploy includes a powerful modular simulation system for testing deployment scenarios without affecting real devices.

Access the Simulator

Web UI: Navigate to http://localhost:5173/simulator

The simulator provides:

Stress Profiles: Pre-configured load levels (Low, Normal, High, Critical)
Custom Metrics: Set specific CPU, memory, disk usage for rings
Status Control: Manually control deployment ring progression
Real-time View: See device metrics and risk scores update instantly

Quick Start

Start the server:

./run_app.sh

Open the simulator:

http://localhost:5173/simulator

Select a deployment and ring
Apply a stress profile or custom metrics
View updated devices in the table below

Stress Profiles

Level	CPU	Memory	Disk	Use Case
Low	25%	30%	20%	Healthy baseline
Normal	50%	55%	45%	Typical operation
High	75%	80%	70%	Heavy load testing
Critical	95%	92%	88%	Failure scenarios

API Endpoints

All simulator endpoints are under /api/simulator:

Method	Endpoint	Description
POST	`/devices`	Create/update device
POST	`/device-metrics`	Update device metrics
POST	`/ring-metrics`	Update ring metrics
POST	`/deployment-status`	Update ring status
POST	`/stress-profile`	Apply stress profile
GET	`/deployment/{id}/ring/{ringId}/devices`	Get ring devices

Example API Usage

# Apply stress profile
curl -X POST http://localhost:8000/api/simulator/stress-profile \
  -H "Content-Type: application/json" \
  -d '{
    "deploymentId": "DEP-001",
    "ringId": 0,
    "stressLevel": "high"
  }'

# Update ring metrics
curl -X POST http://localhost:8000/api/simulator/ring-metrics \
  -H "Content-Type: application/json" \
  -d '{
    "deploymentId": "DEP-001",
    "ringId": 0,
    "avgCpuUsage": 85.0,
    "avgMemoryUsage": 90.0,
    "avgDiskSpace": 75.0
  }'

Architecture

The simulator is modularly structured:

UI: ui/src/pages/Simulator.jsx - React interface
Service: server/simulator_service.py - Business logic
Endpoints: server/main.py - API routes

Detailed Documentation

For complete simulator documentation, see SIMULATOR.md

📚 API Documentation

Base URL

http://localhost:8000/api

Endpoints

Devices

Method	Endpoint	Description
GET	`/devices`	Get all devices

Deployments

Method	Endpoint	Description
GET	`/deployments`	Get all deployments
POST	`/deployments`	Create new deployment
GET	`/deployments/{id}`	Get deployment details
POST	`/deployments/{id}/run`	Start deployment
POST	`/deployments/{id}/stop`	Stop deployment

Rings

Method	Endpoint	Description
GET	`/rings`	Get all rings
PUT	`/rings/{id}`	Update ring configuration

Gating Factors

Method	Endpoint	Description
GET	`/gating-factors`	Get default gating factors
PUT	`/gating-factors`	Update gating factors

AI Agents

Method	Endpoint	Description
POST	`/ai/categorize-devices`	AI device categorization
POST	`/ai/analyze-failure`	AI failure analysis
POST	`/ai/gating-factors`	Parse natural language
POST	`/ai/validate-gating-factors`	Validate thresholds

Dashboard

Method	Endpoint	Description
GET	`/dashboard/metrics`	Get dashboard metrics
GET	`/dashboard/device-distribution`	Get device distribution

Example API Calls

Create Deployment:

curl -X POST http://localhost:8000/api/deployments \
  -H "Content-Type: application/json" \
  -d '{
    "deploymentName": "Q4 Security Patch",
    "status": "Not Started",
    "gatingFactorMode": "default"
  }'

Update Ring:

curl -X PUT http://localhost:8000/api/rings/1 \
  -H "Content-Type: application/json" \
  -d '{
    "ringId": 1,
    "ringName": "Ring 1 - Low Risk",
    "categorizationPrompt": "Stable devices with good metrics"
  }'

📁 Project Structure

flexDeploy/
├── server/                      # Backend (Python/FastAPI)
│   ├── main.py                 # FastAPI server & API endpoints
│   ├── database.py             # SQLite database management
│   ├── bedrock_agents.py       # AWS Bedrock AI agents
│   └── config.py               # Configuration management
│
├── flexdeploy.db                # SQLite database file (root)
│
├── ui/                          # Frontend (React/Vite)
│   ├── src/
│   │   ├── App.jsx             # Main app component
│   │   ├── main.jsx            # Entry point
│   │   ├── api/
│   │   │   └── client.js       # API client
│   │   ├── components/
│   │   │   └── Layout.jsx      # Layout component
│   │   ├── pages/
│   │   │   ├── Dashboard.jsx   # Dashboard page
│   │   │   ├── Devices.jsx     # Devices page
│   │   │   ├── Deployments.jsx # Deployments page
│   │   │   ├── DeploymentDetail.jsx
│   │   │   └── Rings.jsx       # Rings page
│   │   └── theme/
│   │       └── theme.js        # Material-UI theme
│   ├── package.json            # npm dependencies
│   └── vite.config.js          # Vite configuration
│
├── simulator/                   # Deployment simulator
│   ├── run_orchestrator.py    # Orchestrator
│   ├── src/
│   │   ├── master/             # Master node
│   │   └── slave/              # Agent nodes
│   └── examples/               # Example scripts
│
├── config.ini                   # Application configuration
├── config.ini.example          # Configuration template
├── pyproject.toml              # Python dependencies
├── run_app.sh                  # Start both UI & server
├── setup_config.sh             # Setup configuration
├── deploy_bedrock.sh           # Deploy with verification
├── test_bedrock_agents.py      # Test AI agents
└── README.md                   # This file

🔧 Troubleshooting

Server Won't Start

Problem: Error loading configuration

# Verify config.ini exists
ls -la config.ini

# Create from template if missing
cp config.ini.example config.ini

Problem: AWS credentials not found

# Check credentials file
cat ~/.aws/credentials

# Verify permissions
chmod 600 ~/.aws/credentials

# Test AWS connection
aws sts get-caller-identity

Problem: Database error

# Check database exists
ls -la flexdeploy.db

# If corrupted, restore from backup
cp flexdeploy.db.backup flexdeploy.db

UI Won't Load

Problem: "Server Not Running" popup

Start the backend server: python -m server.main
Verify it's running: curl http://localhost:8000/

Problem: npm packages missing

cd ui
rm -rf node_modules
npm install

Problem: Port 5173 already in use

# Kill existing process
lsof -ti:5173 | xargs kill -9

# Or use different port
npm run dev -- --port 3000

AI Agents Not Working

Problem: "AI service not available"

# Test Bedrock connection
python test_bedrock_agents.py

# Check credentials are valid
aws sts get-caller-identity

# Verify Bedrock access
aws bedrock list-foundation-models --region us-east-2

Problem: "AccessDeniedException"

Enable model access in AWS Console → Bedrock → Model access
Request access to Amazon Nova Pro and Lite
Verify IAM permissions include bedrock:InvokeModel

Problem: "Token expired"

SSO tokens expire after 1-12 hours

Refresh credentials:

aws sso login
# Or get new credentials from SSO portal

Database Issues

Problem: Missing tables

# Check database integrity
sqlite3 flexdeploy.db ".tables"

# If tables missing, run migrations
python server/migrate_data.py

Performance Issues

Problem: Slow AI responses

Switch to Nova Lite for faster responses

Edit config.ini:

bedrock_model_pro = us.amazon.nova-lite-v1:0

Problem: High AWS costs

Use batch operations instead of individual calls
Implement caching for repeated queries
Switch to Nova Lite model

🔒 Security

Best Practices

Never Commit Credentials
- config.ini is in .gitignore
- ~/.aws/credentials is in .gitignore
- Review files before committing

Secure File Permissions

chmod 600 ~/.aws/credentials
chmod 644 config.ini

Use AWS SSO
- Temporary credentials
- Automatic expiration
- Centralized access control
Rotate Credentials
- SSO tokens expire automatically
- Manual keys should rotate every 90 days

Least Privilege IAM

{
  "Effect": "Allow",
  "Action": ["bedrock:InvokeModel"],
  "Resource": [
    "arn:aws:bedrock:us-east-2::foundation-model/us.amazon.nova-*"
  ]
}

Environment Variables (Production)

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

Enable CloudTrail
- Track all Bedrock API calls
- Monitor for unusual activity

Production Deployment

For production, use IAM roles instead of credentials:

# server/bedrock_agents.py
# No credentials needed - uses IAM role automatically
boto3.client('bedrock-runtime', region_name='us-east-2')

Deploy on:

AWS EC2 with IAM role
AWS ECS with task role
AWS Lambda with execution role

🚀 Quick Start (TL;DR)

# 1. Install dependencies
uv pip install -e .
cd ui && npm install && cd ..

# 2. Setup configuration
./setup_config.sh

# 3. Edit AWS credentials
nano ~/.aws/credentials

# 4. Test Bedrock connection
python test_bedrock_agents.py

# 5. Run application
./run_app.sh

# 6. Open browser
# http://localhost:5173

📞 Support

Getting Help

Check logs: Server logs show detailed error messages
Test components: Run test_bedrock_agents.py to verify AI agents
Review configuration: Ensure config.ini and ~/.aws/credentials are correct
Verify AWS: Check Bedrock model access and IAM permissions

Common Commands

# View configuration
python -c "from server.config import get_config; get_config().print_config()"

# Test database
sqlite3 flexdeploy.db "SELECT COUNT(*) FROM devices;"

# Check AWS credentials
aws configure list

# View server logs
python -m server.main 2>&1 | tee server.log

# Reinstall packages
uv pip install -e . --force-reinstall
cd ui && npm install --force && cd ..

🙏 Acknowledgments

AWS Bedrock for AI infrastructure
Amazon Nova models for intelligent analysis
Material-UI for beautiful components
FastAPI for high-performance backend
React for modern frontend

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
server		server
ui		ui
.gitignore		.gitignore
.python-version		.python-version
COMMANDS.md		COMMANDS.md
README.md		README.md
SIMULATOR.md		SIMULATOR.md
config.ini.example		config.ini.example
deploy_bedrock.sh		deploy_bedrock.sh
install.bat		install.bat
install.sh		install.sh
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
run_app.bat		run_app.bat
run_app.sh		run_app.sh
setup_config.bat		setup_config.bat
setup_config.sh		setup_config.sh
test_api.sh		test_api.sh
test_bedrock_agents.py		test_bedrock_agents.py
uv.lock		uv.lock

mmohanram13/flexDeploy

Folders and files

Latest commit

History

Repository files navigation

FlexDeploy - AI-Powered Deployment Orchestrator

📋 Table of Contents

🎯 Overview

🏗️ Architecture

Component Details

Frontend (React + Material-UI)

Backend (FastAPI + Python)

AI Agents (AWS Bedrock + Amazon Nova)

✨ Features

Device Management

Deployment Management

Ring Configuration

AI-Powered Features

Dashboard & Monitoring

📦 Prerequisites

Required Software

AWS Requirements

🚀 Installation

Step 1: Clone Repository

Step 2: Install Backend Dependencies

Step 3: Install Frontend Dependencies

Step 4: Setup Configuration

Step 5: Configure AWS

Option A: Using AWS SSO (Recommended)

Option B: Using IAM User Keys

Step 6: Enable Bedrock Model Access

Step 7: Verify Installation

⚙️ Configuration

Configuration Files

config.ini (Application Settings)

~/.aws/credentials (AWS Credentials)

🎮 Running the Application

Method 1: Run Both Services (Recommended)

Method 2: Run Separately

Start Backend Server

Start Frontend UI

Accessing the Application

🤖 AI Agents

1. Ring Categorization Agent

2. Deployment Failure Agent

3. Gating Factor Agent

Cost Estimates

🧪 Simulation

Access the Simulator

Quick Start

Stress Profiles

API Endpoints

Example API Usage

Architecture

Detailed Documentation

📚 API Documentation

Base URL

Endpoints

Devices

Deployments

Rings

Gating Factors

AI Agents

Dashboard

Example API Calls

📁 Project Structure

🔧 Troubleshooting

Server Won't Start

UI Won't Load

AI Agents Not Working

Database Issues

Performance Issues

🔒 Security

Best Practices

Production Deployment

🚀 Quick Start (TL;DR)

📞 Support

Getting Help

Common Commands

🙏 Acknowledgments