Hamlet's Ghost

What do model panels prefer when multiple answers look plausible, and where do those preferences diverge from human judgment?

Hamlet's Ghost is a local comparative-judgment lab for studying LLM aesthetic preference. It generates rival artifacts, asks a Muse/Athena/Apollo evaluator panel to judge them, and routes meaningful Apollo-vs-panel disagreements to a human operator. The public status of each major claim is tracked in docs/implementation-status.md; read that ledger as the backstop for everything this README says.

The lab is early, test-backed, and intentionally modest about what is real. The disagreement workflow, provider separation, artifact auth gate, and demo evidence path are implemented. The prompt compiler, seeded taxonomy, cross-family rule promotion, and generator distinctiveness work are prototypes unless and until earned evidence says otherwise.

Quickstart

git clone <repo-url> hamlets-ghost
cd hamlets-ghost
python3 -m venv .venv && ./.venv/bin/python -m pip install -r requirements.txt
cp .env.example .env
./start.sh demo

Demo mode uses synthetic fixture data and bootstraps demo_lab.db on first run. Live model calls require provider credentials in .env.

What This Is

The lab's basic loop is prompt in, rival outputs out, evaluator votes recorded, and human review used as the characterization reference when the machine panel disagrees. Apollo is now the outside auditor lane: it runs through Hermes/GPT-5.4 by default, while Theron is the OpenClaw/Opus generator lane.

The longer-term direction is a prompt compiler: a traced system that proposes prompt transformations, tests them against rival outputs, and only promotes rules when panel behavior and human review have been separated cleanly. That compiler is not yet a learned product. It is currently a prototype harness with evidence plumbing.

What's Implemented, What Isn't

See docs/implementation-status.md for the claim-by-claim ledger.

Implemented: independent Apollo evaluator lane; Apollo-centered disagreement queue; admin-token protection on /api/artifact/{id}; demo fixture walkthrough.
Prototype: prompt compiler traces; cross-family rule promotion; seeded anti-pattern taxonomy; Genesis/Theron distinctiveness.
Aspirational: five-advisor council governance protocol; dashboard calibration rates such as constraint_recovery_rate and critique_help_rate.

Architecture

The backend is a FastAPI app backed by SQLite. Experiments move through generator roles, evaluator roles, review queues, and wiki/taxonomy surfaces: Genesis/Theron generate, Muse/Athena/Apollo judge, human review characterizes disagreements, and the reflective wiki preserves seeded concept pages, each labeled seeded until lab evidence corroborates them. The operational code lives mostly in server.py, agents.py, database.py, and judgment_wiki.py.

Live-Mode Runtime Dependencies

Theron and Apollo are provider-agnostic shell-outs to local CLI binaries that are not distributed with this repo. This separation is intentional: Apollo is meant to audit from a different model family than the OpenAI-backed Muse/Athena lane. Theron routes through openclaw (THERON_OPENCLAW_BIN, default openclaw) against a locally running OpenClaw gateway; Apollo routes through hermes (APOLLO_HERMES_BIN, default hermes). If a cloner does not have those binaries installed and reachable, Genesis/Muse/Athena still run on OpenAI with the credentials in .env, and the Theron/Apollo surfaces will report provider errors through /api/providers. What clones cleanly is the role definitions, adapter layer, prompts, schemas, and the full lab/review/wiki pipeline, not the external runtimes those two roles depend on. Demo mode (./start.sh demo) bypasses all external providers and runs entirely on synthetic fixtures.

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
data		data
docs		docs
scripts		scripts
static		static
taxonomy		taxonomy
tests		tests
wiki		wiki
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
agents.py		agents.py
database.py		database.py
experiments.py		experiments.py
judgment_wiki.py		judgment_wiki.py
lab-research-council.md		lab-research-council.md
program.md		program.md
requirements.txt		requirements.txt
research-roadmap.md		research-roadmap.md
server.py		server.py
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hamlet's Ghost

Quickstart

What This Is

What's Implemented, What Isn't

Architecture

Live-Mode Runtime Dependencies

Read Next

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hamlet's Ghost

Quickstart

What This Is

What's Implemented, What Isn't

Architecture

Live-Mode Runtime Dependencies

Read Next

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages