🤖 Production-grade agentic AI system for autonomous issue detection and resolution during e-commerce platform migrations.
This system demonstrates proper agent behavior that goes far beyond a single LLM call:
✅ State Management - Persistent state across the observe-reason-decide-act loop
✅ Multi-Step Reasoning - Pattern detection → Root cause → Risk assessment → Action planning
✅ Tool Orchestration - 8+ specialized tools working together autonomously
✅ Feedback Loops - Learning from outcomes and adapting behavior
✅ Safety Controls - Multiple layers including safe mode, circuit breakers, and human oversight
Get the complete system running in under 10 minutes:
cd migrationguard-ai
setup.cmd # Start infrastructure
uv run python demo_agent_system.py # Run demoSee it in action: The demo showcases authentication error detection → pattern analysis → root cause reasoning → automated ticket creation with full state tracking and feedback loops.
📖 Detailed Guide: QUICKSTART.md
┌─────────────────────────────────────────────────────────────────┐
│ AGENT ORCHESTRATOR │
│ (Observe-Reason-Decide-Act Loop) │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ OBSERVE │ │ REASON │ │ DECIDE │
│ │ │ │ │ │
│ • Signal │ │ • Pattern │ │ • Risk │
│ Ingestion │──────▶ Detection │─────▶ Assessment │
│ • Normalize │ │ • Root Cause │ │ • Action │
│ • Track │ │ Analysis │ │ Selection │
└──────────────┘ └──────────────┘ └──────────────┘
│
▼
┌──────────────┐
│ ACT │
│ │
│ • Execute │
│ • Track │
│ • Learn │◀─┐
└──────────────┘ │
│ │
▼ │
┌──────────────┐ │
│ FEEDBACK │ │
│ LOOP │──┘
└──────────────┘
- Multi-source signal ingestion (API errors, support tickets, webhooks)
- Real-time normalization and enrichment
- Time-series storage with TimescaleDB
- Pattern detection across signals using Elasticsearch
- Root cause analysis with Google Gemini 2.5 Flash (+ rule-based fallback)
- Evidence gathering and confidence scoring (75-92% confidence)
- Automated risk assessment (low/medium/high)
- Approval requirements for high-risk actions
- Safety controls (safe mode, circuit breakers)
- Rate limiting and retry logic
- Graceful degradation on failures
- Comprehensive audit trail
- Outcome tracking and analysis
- Confidence calibration from results
- Adaptive behavior based on feedback
- Safe Mode: Automatic activation on critical errors
- Circuit Breakers: Fault tolerance for external services
- Graceful Degradation: Fallback mechanisms (Claude → rules, Elasticsearch → PostgreSQL, Kafka → Redis)
- Human Oversight: Approval workflows and manual controls
200+ Tests with 85%+ Coverage
- ✅ 150+ Unit Tests (core components, services, integrations)
- ✅ 50+ Property-Based Tests (RBAC, redaction, API, decisions, patterns)
- ✅ Integration Tests (error handling, end-to-end flows)
- ✅ All tests passing with comprehensive coverage
uv run pytest tests/unit/ -v- Backend: Python 3.11+, FastAPI, Pydantic
- AI: Google Gemini 2.5 Flash (FREE tier, 15 req/min) with rule-based fallback
- Agent Framework: Custom orchestration with state management and feedback loops
- Database: PostgreSQL + TimescaleDB (time-series)
- Cache: Redis (caching, rate limiting, buffering)
- Search: Elasticsearch (pattern matching, full-text search)
- Streaming: Apache Kafka (event streaming, async processing)
- Metrics: Prometheus + Grafana
- Logs: Structured logging with ELK stack support
- Visualization: Kibana for log exploration
- Containers: Docker + Docker Compose
- Orchestration: Kubernetes-ready
- CI/CD: GitHub Actions ready
Input: 3 signals (2 API 401 errors + 1 support ticket)
Agent Behavior:
- 🔭 Observe: Ingest and normalize signals
- 🔍 Detect: Identify auth failure pattern (confidence: 0.85)
- 🧠 Reason: Analyze root cause → "authentication_error"
- ⚖️ Decide: Select "create_support_ticket" (risk: low)
- ⚡ Act: Create ticket with troubleshooting steps
- 🔄 Learn: Track outcome, calibrate confidence
Output: Support ticket created with authentication guidance
Trigger: Confidence drift detected (expected: 0.90, actual: 0.75)
Agent Behavior:
- 🛡️ Safe mode automatically activated
- ⏸️ All actions require human approval
- 📋 Actions queued for review
- 🔔 Operator notified
- ✅ Manual deactivation by authorized operator
- QUICKSTART.md - Get started in 10 minutes
- INFRASTRUCTURE_SETUP.md - Detailed infrastructure guide
- README_DEMO.md - Demo explanation and agent behavior
- HACKATHON_SUBMISSION.md - Complete submission details
- DEVELOPMENT.md - Development guide
- API Docs: http://localhost:8000/docs (when running)
- Docker Desktop - Download
- Python 3.11+ with
uv- Install uv - Git (for cloning)
cd migrationguard-ai
setup.cmdThis will:
- ✅ Start all infrastructure services (PostgreSQL, Redis, Kafka, Elasticsearch)
- ✅ Run database migrations
- ✅ Create Kafka topics and Elasticsearch indices
- ✅ Verify connectivity
REM 1. Start infrastructure
docker-compose up -d
REM 2. Wait for services (30 seconds)
timeout /t 30
REM 3. Check connectivity
uv run python scripts/check_infrastructure.py
REM 4. Run migrations
uv run alembic upgrade head
REM 5. Setup Kafka and Elasticsearch
uv run python scripts/setup_infrastructure.pySee the complete agent in action:
uv run python demo_agent_system.pyuv run pytest tests/unit/ -vuv run uvicorn src.migrationguard_ai.api.app:app --reloadAPI available at: http://localhost:8000
API docs: http://localhost:8000/docs
cd frontend
npm install
npm run devFrontend available at: http://localhost:3000
| Service | URL | Credentials |
|---|---|---|
| API | http://localhost:8000 | - |
| API Docs | http://localhost:8000/docs | - |
| Grafana | http://localhost:3001 | admin/admin |
| Kibana | http://localhost:5601 | - |
| Prometheus | http://localhost:9090 | - |
| Elasticsearch | http://localhost:9200 | - |
migrationguard-ai/
├── src/migrationguard_ai/
│ ├── agent/ # Agent orchestration (state, graph)
│ ├── api/ # FastAPI REST API
│ ├── core/ # Core components (auth, config, safety)
│ ├── db/ # Database models (SQLAlchemy)
│ ├── services/ # Business logic (decision, action, pattern)
│ ├── integrations/ # External integrations (support systems)
│ └── workers/ # Background workers (pattern detection)
├── tests/
│ ├── unit/ # 150+ unit tests
│ ├── integration/ # Integration tests
│ └── e2e/ # End-to-end tests
├── alembic/ # Database migrations
├── scripts/ # Setup and utility scripts
├── frontend/ # React dashboard (TypeScript)
├── docker-compose.yml # Infrastructure setup
├── demo_agent_system.py # Complete agent demo
└── setup.cmd # Automated setup script
REM All tests
uv run pytest tests/unit/ -v
REM With coverage
uv run pytest tests/unit/ --cov=src --cov-report=html
REM Specific test file
uv run pytest tests/unit/test_decision_engine.py -v
REM Property-based tests
uv run pytest tests/unit/test_*_properties.py -vREM Format code
uv run black src tests
REM Lint code
uv run ruff check src tests
REM Type checking
uv run mypy srcREM Create migration
uv run alembic revision --autogenerate -m "Description"
REM Apply migrations
uv run alembic upgrade head
REM Rollback
uv run alembic downgrade -1All configuration via environment variables in .env file:
# Google Gemini API (FREE tier - get key at https://aistudio.google.com/apikey)
GOOGLE_API_KEY=your-api-key-here
# Database
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=migrationguard
POSTGRES_PASSWORD=changeme
# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
# Kafka
KAFKA_BOOTSTRAP_SERVERS='["localhost:9092"]'
# Elasticsearch
ELASTICSEARCH_HOSTS='["http://localhost:9200"]'
# Agent Configuration
AGENT_CONFIDENCE_THRESHOLD=0.7
AGENT_HIGH_RISK_APPROVAL_REQUIRED=trueExposed at /metrics:
- Signal ingestion rate
- Pattern detection latency
- Decision accuracy
- Action success rate
- System resource usage
Structured JSON logs for:
- Signal processing
- Pattern detection
- Root cause analysis
- Decision making
- Action execution
- Audit trail
Pre-configured dashboards:
- System health and performance
- Agent decision metrics
- Business impact (ticket deflection, resolution time)
- Infrastructure health
REM Stop all services
docker-compose down
REM Stop and remove all data
docker-compose down -vREM Start Docker Desktop, then verify:
docker psREM Check logs:
docker-compose logs [service-name]
REM Restart services:
docker-compose restartREM Reset database:
docker-compose down -v
docker-compose up -d postgres
timeout /t 10
uv run alembic upgrade headREM Verify infrastructure:
uv run python scripts/check_infrastructure.py
REM Run with verbose output:
uv run pytest tests/unit/ -v -s- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Run code quality checks (
black,ruff,mypy) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with FastAPI
- AI powered by Google Gemini (FREE tier)
- Infrastructure by Docker
- Testing with pytest and Hypothesis
- Documentation: See the documentation files in the repository
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Built for the Hackathon | Production-Ready | Fully Tested | Open Source