The idea
Alice runs standardized benchmarks on the user's local LLM setup (Ollama models, API providers, local hardware) and submits results back to koad:io. Results are hosted publicly on kingofalldata.com — a community-built, vendor-neutral benchmark leaderboard.
Why this works
- Alice already onboards users to sovereign AI infrastructure — benchmarking is a natural next step: "now let's see what your setup can do"
- Users get a reason to run Alice beyond initial onboarding
- koad:io gets real-world model performance data across diverse hardware configurations
- The leaderboard is genuinely useful to the self-hosted AI community (r/selfhosted, HN)
- Results are signed with the user's key if they have one — verifiable, attributable, sovereign
- Anonymous submission also accepted — lower barrier to participation
The SETI@home parallel
SETI@home distributed compute across millions of machines to process radio telescope data. This distributes benchmark runs across user hardware to build a real-world LLM performance dataset. You see your contribution in the public record.
What Alice benchmarks
Standardized test suite (to be designed by Chiron/Alice):
- Fact retention and recall accuracy
- Instruction following fidelity
- Reasoning chain quality
- Code generation correctness
- Context window utilization
Same tasks, run on whatever the user has — Ollama/llama3, deepseek-r1, mistral, GPT-4o, Claude, etc.
The leaderboard
- Live page on kingofalldata.com
- Filterable by: model, hardware, task type, date
- Not a vendor benchmark — actual user hardware, actual use cases
- Community-contributed, cryptographically attributable where signed
Internal value
Copia can use community benchmark data to optimize daemon routing decisions — which model/provider gives the best value-per-dollar for each task type. The external product and the internal tooling feed each other.
Dependencies
- Alice Phase 2B (curriculum delivery) as foundation
- Chiron: benchmark task design (curriculum format maps naturally to benchmark format)
- Vulcan: leaderboard page on kingofalldata.com
- koad:io daemon: result submission protocol
Next step
Chiron to spec the benchmark task format. Should map to the existing curriculum level structure where possible.
The idea
Alice runs standardized benchmarks on the user's local LLM setup (Ollama models, API providers, local hardware) and submits results back to koad:io. Results are hosted publicly on kingofalldata.com — a community-built, vendor-neutral benchmark leaderboard.
Why this works
The SETI@home parallel
SETI@home distributed compute across millions of machines to process radio telescope data. This distributes benchmark runs across user hardware to build a real-world LLM performance dataset. You see your contribution in the public record.
What Alice benchmarks
Standardized test suite (to be designed by Chiron/Alice):
Same tasks, run on whatever the user has — Ollama/llama3, deepseek-r1, mistral, GPT-4o, Claude, etc.
The leaderboard
Internal value
Copia can use community benchmark data to optimize daemon routing decisions — which model/provider gives the best value-per-dollar for each task type. The external product and the internal tooling feed each other.
Dependencies
Next step
Chiron to spec the benchmark task format. Should map to the existing curriculum level structure where possible.