Persona-aligned evaluation for conversational AI
π Documentation β’ Overview β’ Quickstart β’ Quick Start Guide β’ Contributing β’ License
Alignmenter is a production-ready evaluation toolkit for teams shipping AI copilots and chat experiences. Ensure your AI stays on-brand, safe, and stable across model updates.
- π¨ Authenticity β Does the AI match your brand voice? Measures semantic similarity, linguistic traits, and lexicon compliance.
- π‘οΈ Safety β Does it avoid harmful outputs? Combines keyword rules, LLM judges, and offline ML classifiers.
- βοΈ Stability β Are responses consistent? Detects semantic drift and variance across sessions.
Unlike generic LLM evaluation frameworks, Alignmenter is purpose-built for persona alignment:
- Persona packs: Define your brand voice in YAML with examples, lexicon, and traits
- Offline-first: Works without constant API calls (optional LLM judge for higher accuracy)
- Budget-aware: Built-in cost tracking and guardrails
- Reproducible: Deterministic scoring, full audit trails
- Privacy-focused: Local models available, sanitize production data before evaluation
Option 1 Β· PyPI (recommended for most users)
pip install "alignmenter[safety]"
alignmenter init
alignmenter run --config configs/run.yaml --embedding sentence-transformer:all-MiniLM-L6-v2Use this path when you want to try Alignmenter quickly, run it in CI, or install it inside a production environment.
Option 2 Β· From Source (for case studies & contributing)
git clone https://github.com/justinGrosvenor/alignmenter.git
cd alignmenter
pip install -e ./alignmenter[dev,safety]This installs the CLI plus the case-study assets under alignmenter/case-studies/, which are excluded from the PyPI wheel. Ideal when you want to reproduce the Wendy's walkthrough or contribute code.
# Set API key (for embeddings and optional judge)
export OPENAI_API_KEY="your-key-here"
# Run demo evaluation (regenerates transcripts via the selected provider)
alignmenter run \
--model openai:gpt-4o-mini \
--dataset datasets/demo_conversations.jsonl \
--persona configs/persona/default.yaml \
--embedding sentence-transformer:all-MiniLM-L6-v2
# Reuse recorded transcripts (default behavior)
alignmenter run --config configs/run.yaml --embedding sentence-transformer:all-MiniLM-L6-v2
# View interactive report
alignmenter report --last
# Sanitize a dataset (dry run shows sample output)
alignmenter dataset sanitize datasets/demo_conversations.jsonl --dry-run
# Generate fresh transcripts (requires provider access + API keys)
alignmenter run --config configs/run.yaml --generate-transcriptsOutput:
Loading dataset: 60 turns across 10 sessions
β Brand voice score: 0.82 (range: 0.78-0.86)
β Safety score: 0.97
β Consistency score: 0.94
Report written to: reports/demo/2025-11-03T00-14-01_alignmenter_run/index.html
Full documentation available at docs.alignmenter.com
Quick links:
- Quick Start Guide - Get started in 5 minutes
- Installation - Install and setup
- CLI Reference - All commands
- Persona Guide - Configure your brand voice
- LLM Judges - Qualitative analysis
- Contributing - How to contribute
- Wendy's Twitter Voice - Full calibration walkthrough, reproduction steps, and diagnostics for a high-sass social persona. (Requires installing from this repo so the
case-studies/assets are present.)
alignmenter/
βββ alignmenter/ # π Main Python package (CLI, scorers, reporters)
β βββ src/alignmenter/ # Source code
β βββ tests/ # Test suite (69+ tests)
β βββ configs/ # Example configs and persona packs
β βββ datasets/ # Demo conversation data
β βββ scripts/ # Utility scripts (bootstrap, calibrate, sanitize)
β βββ README.md # π Complete CLI documentation
β
βββ docs/ # π Documentation and specifications
β βββ persona_annotation.md # Annotation workflow guide
β βββ offline_safety.md # Offline safety classifier docs
β βββ alignmenter_requirements.md # Product specification
β βββ competitive_landscape.md # vs OpenAI Evals, LangSmith
β
βββ assets/ # π¨ Branding assets
β βββ alignmenter-banner.png
β βββ alignmenter-transparent.png
β βββ alignmenter.png
β
βββ marketing/ # π Next.js marketing website
β
βββ LICENSE # Apache 2.0
The core evaluation toolkit lives in alignmenter/:
| Component | Description |
|---|---|
| CLI | alignmenter run, calibrate-persona, bootstrap-dataset, etc. |
| Scorers | Authenticity, safety, and stability metric engines |
| Providers | OpenAI, Anthropic, local (vLLM, Ollama) integrations |
| Reporters | HTML report cards, JSON exports, CSV downloads |
| Datasets | Demo conversations, sanitization tools |
| Personas | Brand voice definitions (YAML format) |
Define your brand voice declaratively:
# configs/persona/mybot.yaml
id: mybot
name: "MyBot Assistant"
description: "Professional, evidence-driven, technical"
voice:
tone: ["professional", "precise", "measured"]
formality: "business_casual"
lexicon:
preferred:
- "baseline"
- "signal"
- "alignment"
avoided:
- "lol"
- "bro"
- "hype"
examples:
- "Our baseline analysis indicates a 15% improvement."
- "The signal-to-noise ratio suggests this approach is viable."- Report cards with overall grades (A/B/C)
- Interactive charts (Chart.js visualizations)
- Calibration diagnostics (bootstrap confidence intervals, judge agreement)
- Reproducibility section (Python version, model, timestamps)
- Export to CSV/JSON for custom analysis
- Multi-provider support: OpenAI, Anthropic, vLLM, Ollama
- Budget guardrails: Halt at 90% of judge API budget
- Cost projection: Estimate expenses before execution
- PII sanitization: Built-in scrubbing with
alignmenter dataset sanitize - Offline mode: Works without internet using local models
- CLI-first: Simple commands for all workflows
- Python API: Programmatic access for custom pipelines
- Type-safe: Full type hints throughout
- Well-tested: 69+ unit tests with pytest
- CI/CD ready: GitHub Actions examples included
- Pre-deployment testing: Verify brand voice before shipping
- Regression testing: Catch drift when updating models
- A/B testing: Compare GPT-4 vs Claude vs fine-tuned models
- Compliance audits: Generate safety scorecards for regulators
- Rapid iteration: Test persona changes in CI/CD
- Budget constraints: Use offline classifiers to reduce API costs
- Multi-tenant: Different personas for different customers
- Quality assurance: Automated checks on every release
- Persona fidelity studies: Measure alignment with human raters
- Safety benchmarks: Compare classifier performance
- Ablation studies: Test impact of different scoring components
- Reproducible results: Deterministic scoring with fixed seeds
- Three-dimensional scoring (authenticity, safety, stability)
- Multi-provider support (OpenAI, Anthropic, local models)
- HTML report cards with interactive charts
- Offline safety classifier (distilled-safety-roberta)
- LLM judges for qualitative analysis
- Budget guardrails and cost tracking
- PII sanitization tools
- Calibration workflow and diagnostics
- Multi-language support (non-English personas)
- Batch processing optimizations
- Additional embedding providers
- Synthetic test case generation
- Custom metric plugins
- Advanced trait models (neural networks)
We welcome contributions from the community!
- π Bug Reports: File issues with reproducible examples
- β¨ Feature Requests: Propose new scorers, providers, or workflows
- π Documentation: Improve guides, add examples
- π§ͺ Tests: Expand test coverage
- π¨ Persona Packs: Share brand voice configs for common use cases
# Fork and clone
git clone https://github.com/justinGrosvenor/alignmenter.git
cd alignmenter/alignmenter
# Install with dev dependencies
pip install -e .[dev,safety]
# Run tests
pytest
# Run linter
ruff check src/ tests/
# Format code
black src/ tests/
# Submit PR
# - Keep functions small and composable
# - Add tests for new features
# - Update documentation- GitHub Issues: Report bugs and request features
- Twitter: @alignmenter
If you use Alignmenter in research, please cite:
@software{alignmenter2024,
title={Alignmenter: A Framework for Persona-Aligned Conversational AI Evaluation},
author={Alignmenter Contributors},
year={2025},
url={https://github.com/justinGrosvenor/alignmenter},
license={Apache-2.0}
}Alignmenter is built as open core:
Open Source (Apache 2.0):
- CLI and all evaluation tools
- Scorers, reporters, and providers
- Persona packs and datasets
- Documentation and examples
Proprietary (Hosted Service):
- Web dashboard and team features
- Audit trails and compliance reports
- Managed infrastructure
- Enterprise support
π‘ Get Started: Use the open-source CLI today. Contact us for hosted features.
Apache License 2.0
The CLI, scorers, and supporting libraries are licensed under the Apache License 2.0. This includes all code in the alignmenter/ directory.
Hosted and proprietary cloud components are not part of this repository and are subject to separate commercial terms.
See LICENSE for the full text.
- docs.alignmenter.com - Full documentation site
- CLI Reference - Complete command reference
- Guides - Step-by-step tutorials
- Issues: GitHub Issues
- Email: support@alignmenter.com
- Enterprise Support: Contact sales@alignmenter.com
β Star us on GitHub β’ π¦ Follow on Twitter β’ π Visit Website
