Autonomous Self-Healing AI Infrastructure Platform
Detects LLM/API outages and automatically reroutes inference traffic to backup providers in real time.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Sentinel-Ops AI β
β β
β ββββββββββββββββ βββββββββββββββββββ βββββββββββββββββ β
β β FastAPI β β Failover β β Health β β
β β Gateway βββββΆβ Engine βββββΆβ Monitor β β
β β β β (Circuit Break) β β (Background) β β
β ββββββββ¬ββββββββ ββββββββββ¬βββββββββ βββββββββ¬ββββββββ β
β β β β β
β β ββββββββββΌβββββββββ β β
β β β Provider β β β
β β β Registry β β β
β β β β β β
β β β βββββββββββββ β β β
β β β β OpenAI β βββββββββββββββ β
β β β β (Primary) β β β
β β β βββββββββββββ β β
β β β βββββββββββββ β β
β β β β Ollama β β β
β β β β (Fallback)β β β
β β β βββββββββββββ β β
β β βββββββββββββββββββ β
β β β
β ββββββββΌββββββββ βββββββββββββββββββ β
β β WebSocket β β Redis β β
β β Event Bus β β (Incidents + β β
β β β β Event Cache) β β
β ββββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Pattern | Implementation |
|---|---|
| Circuit Breaker | 3-state (CLOSED/OPEN/HALF_OPEN) per provider |
| Failover Chain | OpenAI β Ollama β (extensible) |
| Retry Strategy | Exponential back-off with jitter |
| Event Streaming | WebSocket fan-out via async broadcast |
| Incident Storage | Redis list (ring-buffer, 500 events max) |
| Observability | Structured JSON logs + per-provider metrics |
backend/
βββ app/
β βββ api/
β β βββ chat.py # POST /api/chat
β β βββ providers.py # GET /api/providers/status
β β βββ incidents.py # GET /api/incidents
β β βββ metrics.py # GET /api/metrics
β β βββ health.py # GET /health
β β βββ middleware.py # Tracing, rate limiting, security headers
β βββ core/
β β βββ config.py # Pydantic settings (env vars)
β β βββ logging.py # Structlog JSON logger
β β βββ redis.py # Async Redis pool + helpers
β β βββ circuit_breaker.py# 3-state circuit breaker
β βββ providers/
β β βββ base.py # Abstract BaseProvider interface
β β βββ openai_provider.py# OpenAI implementation
β β βββ ollama_provider.py# Ollama local implementation
β β βββ registry.py # Provider registry + chain
β βββ services/
β β βββ failover_engine.py# Core routing + failover logic
β β βββ incident_service.py# Incident persistence + broadcast
β β βββ metrics_service.py# Rolling metrics aggregation
β βββ monitoring/
β β βββ health_monitor.py # Async background health checker
β βββ websocket/
β β βββ manager.py # Connection pool + broadcast
β β βββ router.py # WS /ws/system-events endpoint
β βββ models/
β β βββ schemas.py # All Pydantic v2 domain models
β βββ app_factory.py # FastAPI app factory + lifespan
βββ tests/
β βββ test_sentinel.py # Unit + integration tests
βββ main.py # Uvicorn entrypoint
βββ requirements.txt
βββ Dockerfile
βββ docker-compose.yml
βββ .env.example
# 1. Clone and enter the project
cd sentinel-ops/backend
# 2. Configure environment
cp .env.example .env
# Edit .env β set OPENAI_API_KEY at minimum
# 3. Start Ollama locally (for fallback)
ollama pull llama3.2
# 4. Launch
docker compose up --buildAPI is live at http://localhost:8000
Docs at http://localhost:8000/docs
# Prerequisites: Python 3.12+, Redis, Ollama
# 1. Install dependencies
pip install -r requirements.txt
# 2. Configure
cp .env.example .env
# Set OPENAI_API_KEY, APP_ENV=development
# 3. Start Redis
redis-server
# 4. Start Ollama
ollama serve
ollama pull llama3.2
# 5. Run
python main.py
# or
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadRoute a prompt through the AI failover engine.
curl -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Explain circuit breakers in distributed systems."}
],
"max_tokens": 512,
"temperature": 0.7
}'Response:
{
"trace_id": "a1b2c3d4-...",
"provider_used": "openai",
"model": "gpt-4o-mini",
"response_text": "A circuit breaker is ...",
"latency_ms": 843.2,
"status": "success",
"failover_occurred": false,
"failover_chain": [],
"tokens_used": 187
}Force failover (when OpenAI is down):
{
"provider_used": "ollama",
"failover_occurred": true,
"failover_chain": ["openai"]
}curl http://localhost:8000/api/providers/status{
"openai": {
"status": "healthy",
"latency_ms": 412.3,
"success_rate_pct": 99.1,
"circuit_breaker": { "state": "closed" }
},
"ollama": {
"status": "healthy",
"latency_ms": 1204.7,
"circuit_breaker": { "state": "closed" }
}
}curl "http://localhost:8000/api/incidents?limit=20&type=failover_triggered"curl http://localhost:8000/api/metrics{
"total_requests": 1042,
"total_successes": 1038,
"total_failures": 4,
"total_failovers": 2,
"avg_latency_ms": 523.1,
"active_provider": "openai",
"uptime_seconds": 3601.0
}Trigger an on-demand health check:
curl -X POST http://localhost:8000/api/providers/openai/probeConnect from any WebSocket client:
const ws = new WebSocket("ws://localhost:8000/ws/system-events");
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log(data.event_type, data.payload);
};Event types:
provider_statusβ health update for one providerincidentβ new incident recorded (outage, failover, recovery)metricsβ aggregated system metrics (every health-check cycle)heartbeatβ keep-alive ping every 30ssystemβ connection lifecycle messages
# Run all tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=app --cov-report=term-missing- Create
app/providers/gemini_provider.pyextendingBaseProvider - Implement
complete()andhealth_check() - Register it in
app/providers/registry.py:
from app.providers.gemini_provider import GeminiProvider
registry.register(GeminiProvider(), position=2)The failover engine will automatically include it in the chain.
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
(required) | OpenAI API key |
OPENAI_MODEL |
gpt-4o-mini |
Model to use |
OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama server URL |
OLLAMA_MODEL |
llama3.2 |
Local model name |
REDIS_URL |
redis://localhost:6379/0 |
Redis connection string |
CIRCUIT_BREAKER_FAILURE_THRESHOLD |
3 |
Failures before circuit opens |
CIRCUIT_BREAKER_RECOVERY_TIMEOUT |
30 |
Seconds before half-open probe |
HEALTH_CHECK_INTERVAL_SECONDS |
15 |
Background monitoring frequency |
RATE_LIMIT_REQUESTS |
100 |
Max requests per window |
RATE_LIMIT_WINDOW_SECONDS |
60 |
Rate limit sliding window |
APP_ENV |
production |
development / staging / production |
- FastAPI β async web framework
- Pydantic v2 β data validation and settings
- httpx β async HTTP client for provider calls
- redis-py (async) β event storage and pub/sub
- structlog β structured JSON logging
- Docker + Compose β containerised deployment