Skip to content

Alice: distributed LLM benchmarking — SETI@home for local AI #66

@koad

Description

@koad

The idea

Alice runs standardized benchmarks on the user's local LLM setup (Ollama models, API providers, local hardware) and submits results back to koad:io. Results are hosted publicly on kingofalldata.com — a community-built, vendor-neutral benchmark leaderboard.

Why this works

  • Alice already onboards users to sovereign AI infrastructure — benchmarking is a natural next step: "now let's see what your setup can do"
  • Users get a reason to run Alice beyond initial onboarding
  • koad:io gets real-world model performance data across diverse hardware configurations
  • The leaderboard is genuinely useful to the self-hosted AI community (r/selfhosted, HN)
  • Results are signed with the user's key if they have one — verifiable, attributable, sovereign
  • Anonymous submission also accepted — lower barrier to participation

The SETI@home parallel

SETI@home distributed compute across millions of machines to process radio telescope data. This distributes benchmark runs across user hardware to build a real-world LLM performance dataset. You see your contribution in the public record.

What Alice benchmarks

Standardized test suite (to be designed by Chiron/Alice):

  • Fact retention and recall accuracy
  • Instruction following fidelity
  • Reasoning chain quality
  • Code generation correctness
  • Context window utilization

Same tasks, run on whatever the user has — Ollama/llama3, deepseek-r1, mistral, GPT-4o, Claude, etc.

The leaderboard

  • Live page on kingofalldata.com
  • Filterable by: model, hardware, task type, date
  • Not a vendor benchmark — actual user hardware, actual use cases
  • Community-contributed, cryptographically attributable where signed

Internal value

Copia can use community benchmark data to optimize daemon routing decisions — which model/provider gives the best value-per-dollar for each task type. The external product and the internal tooling feed each other.

Dependencies

  • Alice Phase 2B (curriculum delivery) as foundation
  • Chiron: benchmark task design (curriculum format maps naturally to benchmark format)
  • Vulcan: leaderboard page on kingofalldata.com
  • koad:io daemon: result submission protocol

Next step

Chiron to spec the benchmark task format. Should map to the existing curriculum level structure where possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    contentReality Pillar posts and pipelinegovernanceTrust bonds, policy, entity authorizationsomeday-maybeParked, not committed

    Projects

    Status

    Someday

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions