Skip to content

metronis-space/aegis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Aegis

CI Python 3.11+ License: Apache 2.0 PyPI version

The Adaptive Intelligence Layer for AI Agents -- eval, train, and memory on one platform.

Aegis is an open-source framework by Metronis, Inc. for building, evaluating, and improving AI agents.

Product What it does
Aegis Eval 75 evaluation dimensions across 7 tiers + domain plugins, triangulated scoring, diagnostic reporting
Aegis Train GRPO-based RL training engine with progressive capability unlocking and Observatory monitoring
Aegis Memory 7 memory types, 12 RL-trained operations, knowledge graph, vector store, provenance tracking

Architecture

┌──────────────────────────────────────────────────────────┐
│                      Aegis Platform                      │
├─────────────────┬─────────────────┬──────────────────────┤
│   Aegis Eval    │   Aegis Train   │    Aegis Memory      │
│   101 dims      │   GRPO engine   │  7 types · 12 ops    │
│   3 scorers     │   Observatory   │  KG · Vectors · Log  │
├─────────────────┴─────────────────┴──────────────────────┤
│           Adapters · API · CLI · Plugins                 │
└──────────────────────────────────────────────────────────┘
flowchart LR
    A["Aegis Eval"] --> B["Diagnostics"]
    B --> C["Aegis Train"]
    C --> D["Improved Agent Policy"]
    D --> E["Aegis Memory"]
    E --> F["Production Agent Runtime"]
    F --> A
Loading

Aegis Eval scores agent behavior across 7 tiers of capability and safety dimensions. Scoring is triangulated through three independent backends -- rule-based, semantic similarity, and LLM judge -- to reduce single-method bias.

Aegis Train implements AMIR-GRPO and GRPO-SG for training memory policy networks. The Observatory subsystem monitors for reward hacking, gradient health issues, and distribution drift.

Aegis Memory provides managed memory infrastructure with seven memory types, backed by an event log, temporal index, knowledge graph, and vector store. Every operation is tracked with full provenance.


Installation

pip install aegis-eval

Optional extras:

pip install aegis-eval[api]       # FastAPI server
pip install aegis-eval[scoring]   # sentence-transformers, numpy
pip install aegis-eval[db]        # PostgreSQL, Neo4j, Redis
pip install aegis-eval[all]       # API + scoring + DB + ingestion + data
pip install aegis-eval[full]      # Everything including GPU training

Development setup:

git clone https://github.com/metronis-space/aegis.git
cd aegis
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,all]"

Docker (full stack):

docker compose up -d

Quick Start

from aegis import Evaluator, EvalConfig

evaluator = Evaluator(config=EvalConfig(dimensions="all"))
result = evaluator.run()

print(f"Overall score: {result.overall_score:.2%}")
for tier_name, tier_score in result.tier_scores.items():
    print(f"  {tier_name}: {tier_score:.2%}")
aegis eval run --config eval.yaml     # Run evaluation suite
aegis eval dimensions                 # List all dimensions
aegis train start --model Qwen/Qwen2.5-7B --optimizer dr_grpo
aegis memory health                   # Check memory subsystem

Documentation

Topic Link
Quickstart docs/quickstart.md
Configuration docs/configuration.md
CLI Reference docs/cli-reference.md
API Reference docs/api-reference.md
Eval Dimensions docs/dimensions.md
Scoring docs/scoring.md
Plugins docs/plugins.md
Adapters docs/adapters.md

Full API docs are available at /docs when the server is running.


Project Structure

aegis/
├── src/aegis/
│   ├── adapters/          # Agent framework adapters (OpenAI, Anthropic, etc.)
│   ├── api/               # FastAPI server, routes, middleware
│   ├── cli/               # Typer CLI application
│   ├── core/              # Config, shared types, schema definitions
│   ├── eval/              # Evaluation engine, dimensions, scorers, judges
│   ├── ingestion/         # Document ingestion pipeline + storage sinks
│   ├── memory/            # Event log, graph, vector, temporal, provenance
│   ├── observatory/       # Training monitoring (reward hacking, drift)
│   ├── plugins/           # Domain plugins (legal, finance, safety)
│   ├── retrieval/         # Context retrieval (pgvector, Neo4j, cross-encoder)
│   ├── security/          # Governance and access control
│   ├── store/             # Persistence (SQLite, PostgreSQL)
│   └── training/          # RL engine (AMIR-GRPO, GRPO-SG, curriculum)
├── dashboard/             # Next.js dashboard
├── sdk/typescript/        # TypeScript SDK
├── examples/              # Python examples and sample configs
├── notebooks/             # Jupyter notebooks
├── tests/                 # Automated tests
├── benchmarks/            # Domain benchmark suites
├── Dockerfile
├── docker-compose.yml
├── pyproject.toml
└── README.md

Contributing

Contributions are welcome. See CONTRIBUTING.md for dev setup, code style, and PR workflow.


License

Apache License 2.0. See LICENSE for details.


Built by Metronis, Inc.

About

The Adaptive Intelligence Layer for AI Agents — eval, train, and memory on one platform.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors