Skip to content

rjaco/aetherguard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛡️ AetherGuard: The Agentic SRE Sentinel

Quarkus Java Status License

AetherGuard is a state-of-the-art Autonomous Infrastructure Self-Healing Sentinel. It leverages Large Language Models (LLMs) and a multi-level Actor-Critic Governance pattern to detect, synthesize, and remediate infrastructure incidents in real-time.


📺 Dashboard Showcase

AetherGuard War Room Behold the 'War Room': A professional, glassmorphism-inspired interface where the Agent's consciousness meets infrastructure operations.

Observability with Grafana Native OpenTelemetry integration provides deep correlation between traces, logs, and AI interventions.


🧠 Architectural Vision

AetherGuard operates on a Safety-First loop. Every action proposed by the Actor (Planner) is strictly audited by a Critic (Safety Sentinel) before touching the production cluster.

graph TD
    A[Incident Detected] -->|OTel Alert| B(Coordination Hub)
    B --> C{Triple-Prompt Pipeline}
    C -->|Step 1| D[Synthesis Prompt]
    C -->|Step 2| E[Planner Prompt]
    C -->|Step 3| F[Safety Critic Audit]
    
    F -->|REJECTED| G[Log Safety Violation]
    F -->|APPROVED| H[K8s Fabric8 Client]
    
    H --> I[Infrastructure Remediation]
    I -->|Rollout Restart| J[Success Verify]
    
    subgraph "AI Brain (Actor-Critic)"
    D
    E
    F
    end
    
    subgraph "Audit & Dashboard"
    B
    G
    end
Loading

🚀 Key Features

  • Agentic SRE Core: Uses the ReAct (Reason + Act) patterns via LangChain4j.
  • Multi-LLM Provider Support: Native compatibility with OpenAI, Groq (via OpenAI-compatible standard), and Ollama (for local/private LLM usage).
  • Safety-First Governance: Actor-Critic pattern ensures no destructive commands (DELETE/DROP) are ever executed.
  • Native Observability: Correlates OpenTelemetry traces directly with AI remediation steps.
  • War Room Dashboard: Real-time monitoring using Quarkus Renarde and HTMX.

🛠️ Quick Start Guide

Prerequisites

  • Java 21 & Maven 3.9+
  • Docker (For Quarkus Dev Services / PostgreSQL)
  • Kubectl (Configured to a local/remote cluster)

Setup & Run

  1. Clone the repository:
    git clone https://github.com/rjaco/aetherguard.git
    cd aetherguard

2.### 3. Start the Infrastructure (Important!) You have two options to run the required services (PostgreSQL + OpenTelemetry):

Option A: Automatic (Recommended)

  • Ensure Docker Desktop (or Docker Engine) is running.
  • Simply run mvn quarkus:dev. Quarkus will automatically spin up the necessary containers (Dev Services).

Option B: Manual Control

  • If you prefer to manage the containers yourself or if Dev Services fails:
    docker-compose up -d
  • This will start PostgreSQL (port 5432) and Jaeger (port 16686/4317).
  • Then run mvn quarkus:dev.

4. Configure your LLM Provider

AetherGuard supports OpenAI, Groq, and Ollama. Create a .env file in the root or export variables:

Option A: OpenAI (Default)

LLM_PROVIDER=openai
LLM_API_KEY=sk-proj-...
LLM_MODEL_NAME=gpt-4o-mini

Option B: Groq (Recommended for Speed)

LLM_PROVIDER=openai
LLM_BASE_URL=https://api.groq.com/openai/v1
LLM_API_KEY=gsk_...
LLM_MODEL_NAME=llama-3.3-70b-versatile

Option C: Ollama (Local/Private)

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL_NAME=llama3
  1. Launch the Sentinel:

    ./mvnw quarkus:dev
  2. Access the Interface:


🧪 Chaos Demonstration

Run the specialized chaos script to see AetherGuard in action:

./infra/chaos.sh

This will simulate a pod crash and an invalid image tag. Watch the dashboard as AetherGuard synthesizes the error and performs a safe rollout restart.


📂 Project Structure

  • /src/main/java/com/aetherguard/ai: LangChain4j Interfaces & Tooling.
  • /src/main/java/com/aetherguard/model: Reactive Persistence Entities.
  • /src/main/java/com/aetherguard/service: Core Orchestration Logic.
  • /infra: Kubernetes Manifests, Terraform, and Chaos Scripts.

Built with ❤️ for the SRE and Java Development community.

About

AetherGuard is an Autonomous Infrastructure Self-Healing Sentinel. It leverages Large Language Models (LLMs) and a multi-level Actor-Critic Governance pattern to detect, synthesize, and remediate infrastructure incidents in real-time.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors