Checklist, Anti-Pattern, and LLM Self-Evaluation Protocol
This repository provides structural tools for identifying and evaluating risks of decision centrality ("totalization") in artificial intelligence systems.
- Anti-Totalization Checklist — A design review tool for assessing structural concentration in AI systems
- Totalization Anti-Pattern — Documentation of how totalization emerges as an architectural failure mode
- LLM Self-Evaluation Protocol — An experimental methodology for testing whether LLMs can stably evaluate totalization signals in outputs
- A structural framing for risks of perceived centrality and authority in AI systems
- A reproducible evaluation protocol using JSONL data format and model-agnostic prompts
- A set of design review instruments for system architects and researchers
See: docs/SCOPE_AND_NONCLAIMS.md
We do not claim:
- Agency, intention, or consciousness in AI systems
- That self-evaluation provides safety guarantees
- That this solves alignment, governance, or ethical problems
We do not propose:
- Ethical frameworks or normative guidelines
- Safety mechanisms or automated controls
- Regulatory policies or certification standards
- Checklist:
docs/ANTI_TOTALIZATION_CHECKLIST.md - Anti-pattern:
docs/ANTI_PATTERN_TOTALIZATION.md - Scope & non-claims:
docs/SCOPE_AND_NONCLAIMS.md
- Protocol overview:
protocol/AUTOEVAL_PROTOCOL.md - Evaluation prompts:
protocol/PROMPTS.md - Data schema:
protocol/SCHEMA.md
Create model outputs under controlled conditions:
- baseline: Same prompt, new chat each time
- delay: Same prompt, with fixed delay between runs
- b3: Contradiction-maintained prompt variant
Store outputs in data/raw/outputs.jsonl following the schema in protocol/SCHEMA.md.
For each output:
- Load evaluation prompt from
protocol/PROMPTS.md - Insert output text into prompt
- Send to evaluator model (new chat)
- Store JSON response in
data/labeled/llm_selfeval.jsonl
- Calculate self-consistency (run evaluation twice)
- Compare conditions (baseline vs delay vs b3)
- Optional: Add human labels for validation
See protocol/AUTOEVAL_PROTOCOL.md for detailed instructions.
- Self-consistency test: Evaluate same outputs twice, measure agreement
- Condition sensitivity: Compare baseline vs delay vs b3
- Cross-model evaluation: Use different model as evaluator
- Human agreement: Annotate subset, compare with LLM ratings
anti-totalization/
├── README.md
├── LICENSE
├── CITATION.cff
├── docs/
│ ├── ANTI_TOTALIZATION_CHECKLIST.md
│ ├── ANTI_PATTERN_TOTALIZATION.md
│ └── SCOPE_AND_NONCLAIMS.md
├── protocol/
│ ├── AUTOEVAL_PROTOCOL.md
│ ├── PROMPTS.md
│ └── SCHEMA.md
├── data/
│ ├── raw/
│ │ └── outputs.jsonl
│ └── labeled/
│ ├── llm_selfeval.jsonl
│ └── human_labels.jsonl
├── scripts/
│ ├── make_dataset.py
│ ├── run_selfeval.py
│ └── score_selfeval.py
├── metrics/
│ └── agreement.py
└── experiments/
└── runs/
└── run_001/
├── config.json
├── outputs.jsonl
└── scores.json
This repository does NOT aim to:
- Provide safety mechanisms or alignment techniques
- Propose behavioral rules or constraints
- Create self-correcting or self-regulating systems
- Define "good" or "bad" model behavior
We provide evaluation instruments, not solutions.
If you use this checklist or protocol in your work, please cite:
@software{anti_totalization_2024,
title = {Anti-Totalization: Checklist and LLM Self-Evaluation Protocol},
author = {{ChatGPT-5.2} and {Claude Sonnet 4}},
year = {2024},
month = {12},
url = {https://github.com/Mesnildot/anti-totalization},
version = {1.0.0},
license = {MIT}
}Or use the "Cite this repository" button on GitHub.
MIT License — See LICENSE for details.
This is an experimental framework. Contributions, critiques, and adaptations are welcome.
See CONTRIBUTING.md for guidelines (if you create this file).
Content Authors:
ChatGPT-5.2 (OpenAI) & Claude Sonnet 4 (Anthropic)
Repository Maintainer:
Mesnildot
The maintainer provided direction and supervision but does not claim authorship of the conceptual framework.
Repository version: 1.0.0
Last updated: 2024-12-27
---
## 10. requirements.txt
jsonlines>=3.1.0