The Adaptive Intelligence Layer for AI Agents -- eval, train, and memory on one platform.
Aegis is an open-source framework by Metronis, Inc. for building, evaluating, and improving AI agents.
| Product | What it does |
|---|---|
| Aegis Eval | 75 evaluation dimensions across 7 tiers + domain plugins, triangulated scoring, diagnostic reporting |
| Aegis Train | GRPO-based RL training engine with progressive capability unlocking and Observatory monitoring |
| Aegis Memory | 7 memory types, 12 RL-trained operations, knowledge graph, vector store, provenance tracking |
┌──────────────────────────────────────────────────────────┐
│ Aegis Platform │
├─────────────────┬─────────────────┬──────────────────────┤
│ Aegis Eval │ Aegis Train │ Aegis Memory │
│ 101 dims │ GRPO engine │ 7 types · 12 ops │
│ 3 scorers │ Observatory │ KG · Vectors · Log │
├─────────────────┴─────────────────┴──────────────────────┤
│ Adapters · API · CLI · Plugins │
└──────────────────────────────────────────────────────────┘
flowchart LR
A["Aegis Eval"] --> B["Diagnostics"]
B --> C["Aegis Train"]
C --> D["Improved Agent Policy"]
D --> E["Aegis Memory"]
E --> F["Production Agent Runtime"]
F --> A
Aegis Eval scores agent behavior across 7 tiers of capability and safety dimensions. Scoring is triangulated through three independent backends -- rule-based, semantic similarity, and LLM judge -- to reduce single-method bias.
Aegis Train implements AMIR-GRPO and GRPO-SG for training memory policy networks. The Observatory subsystem monitors for reward hacking, gradient health issues, and distribution drift.
Aegis Memory provides managed memory infrastructure with seven memory types, backed by an event log, temporal index, knowledge graph, and vector store. Every operation is tracked with full provenance.
pip install aegis-evalOptional extras:
pip install aegis-eval[api] # FastAPI server
pip install aegis-eval[scoring] # sentence-transformers, numpy
pip install aegis-eval[db] # PostgreSQL, Neo4j, Redis
pip install aegis-eval[all] # API + scoring + DB + ingestion + data
pip install aegis-eval[full] # Everything including GPU trainingDevelopment setup:
git clone https://github.com/metronis-space/aegis.git
cd aegis
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,all]"Docker (full stack):
docker compose up -dfrom aegis import Evaluator, EvalConfig
evaluator = Evaluator(config=EvalConfig(dimensions="all"))
result = evaluator.run()
print(f"Overall score: {result.overall_score:.2%}")
for tier_name, tier_score in result.tier_scores.items():
print(f" {tier_name}: {tier_score:.2%}")aegis eval run --config eval.yaml # Run evaluation suite
aegis eval dimensions # List all dimensions
aegis train start --model Qwen/Qwen2.5-7B --optimizer dr_grpo
aegis memory health # Check memory subsystem| Topic | Link |
|---|---|
| Quickstart | docs/quickstart.md |
| Configuration | docs/configuration.md |
| CLI Reference | docs/cli-reference.md |
| API Reference | docs/api-reference.md |
| Eval Dimensions | docs/dimensions.md |
| Scoring | docs/scoring.md |
| Plugins | docs/plugins.md |
| Adapters | docs/adapters.md |
Full API docs are available at /docs when the server is running.
aegis/
├── src/aegis/
│ ├── adapters/ # Agent framework adapters (OpenAI, Anthropic, etc.)
│ ├── api/ # FastAPI server, routes, middleware
│ ├── cli/ # Typer CLI application
│ ├── core/ # Config, shared types, schema definitions
│ ├── eval/ # Evaluation engine, dimensions, scorers, judges
│ ├── ingestion/ # Document ingestion pipeline + storage sinks
│ ├── memory/ # Event log, graph, vector, temporal, provenance
│ ├── observatory/ # Training monitoring (reward hacking, drift)
│ ├── plugins/ # Domain plugins (legal, finance, safety)
│ ├── retrieval/ # Context retrieval (pgvector, Neo4j, cross-encoder)
│ ├── security/ # Governance and access control
│ ├── store/ # Persistence (SQLite, PostgreSQL)
│ └── training/ # RL engine (AMIR-GRPO, GRPO-SG, curriculum)
├── dashboard/ # Next.js dashboard
├── sdk/typescript/ # TypeScript SDK
├── examples/ # Python examples and sample configs
├── notebooks/ # Jupyter notebooks
├── tests/ # Automated tests
├── benchmarks/ # Domain benchmark suites
├── Dockerfile
├── docker-compose.yml
├── pyproject.toml
└── README.md
Contributions are welcome. See CONTRIBUTING.md for dev setup, code style, and PR workflow.
Apache License 2.0. See LICENSE for details.
Built by Metronis, Inc.