Evaliphy simplifies end-to-end AI testing by treating your AI system like a black box.
Instead of research-focused frameworks or fine-tuning pipelines, Evaliphy lets you:
- Test your real AI API (black-box, no internals)
- Write assertions in TypeScript
- Run in CI like your other tests
- Get human-readable reports with actionable reasoning
No Python. No ML overhead. No vendor lock-in.
Works with any black-box AI system — RAG, agents, chatbots, content generation, summarization, and more.
First, run the development server:
npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun devMost AI evaluation frameworks are built for research. They're amazing if you're fine-tuning models or benchmarking datasets. But they're not built for developers shipping products.
You have:
- ✅ A RAG system in production (or agents, chatbots, summarizers, etc.)
- ✅ A CI/CD pipeline
- ✅ TypeScript/JavaScript code
- ❌ No way to test your AI system like you test your APIs
We took a different path. Evaliphy lets you:
- Test via HTTP (your real API)
- Write in TypeScript (your language)
- Run in CI (your pipeline)
- Use simple assertions (your mental model)
| Feature | Evaliphy | DeepEval / Ragas | Promptfoo |
|---|---|---|---|
| Best For | Developers testing in CI | Researchers / Fine-tuning | LLM Prompt Testing |
| Language | TypeScript / Node.js | Python | JavaScript / Python |
| Testing Approach | Black-box API calls | White-box Pipeline | Prompt-focused |
| Workflow | CI/CD Pipelines | Notebooks / Scripts | CLI / Web UI |
| Setup | npm install -g evaliphy |
pip install deepeval |
npx promptfoo |
| Use Case | "Does my RAG/AI API work?" | "How good is my model?" | "Is my prompt right?" |