AetherGuard is a state-of-the-art Autonomous Infrastructure Self-Healing Sentinel. It leverages Large Language Models (LLMs) and a multi-level Actor-Critic Governance pattern to detect, synthesize, and remediate infrastructure incidents in real-time.
Behold the 'War Room': A professional, glassmorphism-inspired interface where the Agent's consciousness meets infrastructure operations.
Native OpenTelemetry integration provides deep correlation between traces, logs, and AI interventions.
AetherGuard operates on a Safety-First loop. Every action proposed by the Actor (Planner) is strictly audited by a Critic (Safety Sentinel) before touching the production cluster.
graph TD
A[Incident Detected] -->|OTel Alert| B(Coordination Hub)
B --> C{Triple-Prompt Pipeline}
C -->|Step 1| D[Synthesis Prompt]
C -->|Step 2| E[Planner Prompt]
C -->|Step 3| F[Safety Critic Audit]
F -->|REJECTED| G[Log Safety Violation]
F -->|APPROVED| H[K8s Fabric8 Client]
H --> I[Infrastructure Remediation]
I -->|Rollout Restart| J[Success Verify]
subgraph "AI Brain (Actor-Critic)"
D
E
F
end
subgraph "Audit & Dashboard"
B
G
end
- Agentic SRE Core: Uses the ReAct (Reason + Act) patterns via LangChain4j.
- Multi-LLM Provider Support: Native compatibility with OpenAI, Groq (via OpenAI-compatible standard), and Ollama (for local/private LLM usage).
- Safety-First Governance: Actor-Critic pattern ensures no destructive commands (DELETE/DROP) are ever executed.
- Native Observability: Correlates OpenTelemetry traces directly with AI remediation steps.
- War Room Dashboard: Real-time monitoring using Quarkus Renarde and HTMX.
- Java 21 & Maven 3.9+
- Docker (For Quarkus Dev Services / PostgreSQL)
- Kubectl (Configured to a local/remote cluster)
- Clone the repository:
git clone https://github.com/rjaco/aetherguard.git cd aetherguard
2.### 3. Start the Infrastructure (Important!) You have two options to run the required services (PostgreSQL + OpenTelemetry):
Option A: Automatic (Recommended)
- Ensure Docker Desktop (or Docker Engine) is running.
- Simply run
mvn quarkus:dev. Quarkus will automatically spin up the necessary containers (Dev Services).
Option B: Manual Control
- If you prefer to manage the containers yourself or if Dev Services fails:
docker-compose up -d
- This will start PostgreSQL (port 5432) and Jaeger (port 16686/4317).
- Then run
mvn quarkus:dev.
AetherGuard supports OpenAI, Groq, and Ollama.
Create a .env file in the root or export variables:
LLM_PROVIDER=openai
LLM_API_KEY=sk-proj-...
LLM_MODEL_NAME=gpt-4o-miniLLM_PROVIDER=openai
LLM_BASE_URL=https://api.groq.com/openai/v1
LLM_API_KEY=gsk_...
LLM_MODEL_NAME=llama-3.3-70b-versatileLLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL_NAME=llama3-
Launch the Sentinel:
./mvnw quarkus:dev
-
Access the Interface:
- War Room Dashboard: http://localhost:8080/
- Swagger UI (API Docs): http://localhost:8080/q/swagger-ui
- Dev UI: http://localhost:8080/q/dev
Run the specialized chaos script to see AetherGuard in action:
./infra/chaos.shThis will simulate a pod crash and an invalid image tag. Watch the dashboard as AetherGuard synthesizes the error and performs a safe rollout restart.
/src/main/java/com/aetherguard/ai: LangChain4j Interfaces & Tooling./src/main/java/com/aetherguard/model: Reactive Persistence Entities./src/main/java/com/aetherguard/service: Core Orchestration Logic./infra: Kubernetes Manifests, Terraform, and Chaos Scripts.
Built with ❤️ for the SRE and Java Development community.