Alice: distributed LLM benchmarking — SETI@home for local AI

## The idea

Alice runs standardized benchmarks on the user's local LLM setup (Ollama models, API providers, local hardware) and submits results back to koad:io. Results are hosted publicly on kingofalldata.com — a community-built, vendor-neutral benchmark leaderboard.

## Why this works

- Alice already onboards users to sovereign AI infrastructure — benchmarking is a natural next step: "now let's see what your setup can do"
- Users get a reason to run Alice beyond initial onboarding
- koad:io gets real-world model performance data across diverse hardware configurations
- The leaderboard is genuinely useful to the self-hosted AI community (r/selfhosted, HN)
- Results are signed with the user's key if they have one — verifiable, attributable, sovereign
- Anonymous submission also accepted — lower barrier to participation

## The SETI@home parallel

SETI@home distributed compute across millions of machines to process radio telescope data. This distributes benchmark runs across user hardware to build a real-world LLM performance dataset. You see your contribution in the public record.

## What Alice benchmarks

Standardized test suite (to be designed by Chiron/Alice):
- Fact retention and recall accuracy
- Instruction following fidelity
- Reasoning chain quality
- Code generation correctness
- Context window utilization

Same tasks, run on whatever the user has — Ollama/llama3, deepseek-r1, mistral, GPT-4o, Claude, etc.

## The leaderboard

- Live page on kingofalldata.com
- Filterable by: model, hardware, task type, date
- Not a vendor benchmark — actual user hardware, actual use cases
- Community-contributed, cryptographically attributable where signed

## Internal value

Copia can use community benchmark data to optimize daemon routing decisions — which model/provider gives the best value-per-dollar for each task type. The external product and the internal tooling feed each other.

## Dependencies

- Alice Phase 2B (curriculum delivery) as foundation
- Chiron: benchmark task design (curriculum format maps naturally to benchmark format)
- Vulcan: leaderboard page on kingofalldata.com
- koad:io daemon: result submission protocol

## Next step

Chiron to spec the benchmark task format. Should map to the existing curriculum level structure where possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alice: distributed LLM benchmarking — SETI@home for local AI #66

The idea

Why this works

The SETI@home parallel

What Alice benchmarks

The leaderboard

Internal value

Dependencies

Next step

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Alice: distributed LLM benchmarking — SETI@home for local AI #66

Description

The idea

Why this works

The SETI@home parallel

What Alice benchmarks

The leaderboard

Internal value

Dependencies

Next step

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions