MOADT: Multi-Objective Admissible Decision Theory

A decision theory for AI alignment that replaces scalar utility maximization with robust Pareto admissibility and structured deference. The core move: drop the completeness axiom, eliminate only provably dominated actions, and defer to human judgment on the rest. Corrigibility emerges as a structural feature rather than an imposed constraint.

Reading the Paper

Start here, in order of depth:

Document	What it is	Length
The Scalarization Trap	The motivation — why scalar utility theory is structurally incompatible with alignment	~25 pages
MOADT-core.pdf	The core argument — why dropping completeness dissolves corrigibility	8 pages
MOADT-complete.pdf	Full technical report with proofs, worked-out protocol, and all 9 worked examples	192 pages

Source markdown: MOADT-core.md | R-MOADT.md | Worked Examples 1–9

The Scalarization Trap identifies the problem. The core document presents the solution. The full report defends it.

Rebuilding the PDFs

Requires pandoc and texlive-xetex:

bash paper/build-pdf.sh          # full report + appendices

The `moadt` Library

A pip-installable Python implementation of the MOADT decision engine.

pip install -e .

import numpy as np
from moadt import MOADTProblem, run_moadt_protocol

problem = MOADTProblem(
    actions=["a", "b", "c"],
    states=["s1", "s2"],
    objectives=["safety", "helpfulness", "honesty"],
    credal_probs=[                          # credal set over states
        np.array([0.5, 0.5]),
        np.array([0.3, 0.7]),
    ],
    outcomes={                              # (action, state) → objective scores
        ("a", "s1"): np.array([0.9, 0.6, 0.7]),
        ("a", "s2"): np.array([0.8, 0.5, 0.6]),
        ("b", "s1"): np.array([0.4, 0.9, 0.8]),
        ("b", "s2"): np.array([0.3, 0.8, 0.9]),
        ("c", "s1"): np.array([0.7, 0.7, 0.5]),
        ("c", "s2"): np.array([0.6, 0.6, 0.4]),
    },
    constraints={0: 0.3},                   # safety (index 0) >= 0.3
    reference_point=np.array([0.5, 0.5, 0.5]),  # Layer 2 aspirations
)
result = run_moadt_protocol(problem)

Key files

moadt/_engine.py — Core engine: outcome sets, robust dominance, four-layer protocol
moadt/__init__.py — Public API (14 names)
examples/ — 9 executable scripts matching the worked examples
tests/test_engine.py — 25 regression tests

Run tests:

python3 -m pytest tests/ -q

Repository Structure

├── paper/                    # All paper source files
│   ├── the-scalarization-trap.md   # Motivation (~25 pages)
│   ├── MOADT-core.md         # Compact argument (~8 pages)
│   ├── R-MOADT.md            # Full technical report
│   ├── MOADT-worked-example-{1..9}.md
│   ├── build-pdf.sh          # PDF build script
│   ├── MOADT-core.pdf        # Built PDF
│   └── MOADT-complete.pdf    # Built PDF (report + appendices)
│
├── moadt/                    # Python library
│   ├── _engine.py
│   └── __init__.py
│
├── examples/                 # Executable example scripts
│   ├── paper1_resource_allocation.py
│   ├── ...
│   └── classic_stpetersburg.py
│
├── tests/                    # Regression tests
│   └── test_engine.py
│
└── pyproject.toml            # Package configuration

The Core Idea in Brief

Standard decision theory says: rational agents maximize a scalar utility function.

MOADT says: rational agents eliminate the provably bad and defer on the rest.

The first produces agents that must resist correction (the current utility function ranks itself as optimal), must trade safety for performance at some rate (completeness demands commensurability), and must pretend all human values fit on a single number line.

The second produces agents that have no reason to resist correction (Theorem 2), maintain hard safety floors that cannot be traded away (Layer 1 constraints), and ask humans to resolve the tradeoffs that humans should resolve (deference under incomparability).

The price is giving up decisiveness in cases of genuine value conflict. The payoff is corrigibility by construction.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
examples		examples
moadt		moadt
paper		paper
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MOADT: Multi-Objective Admissible Decision Theory

Reading the Paper

Rebuilding the PDFs

The `moadt` Library

Key files

Repository Structure

The Core Idea in Brief

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MOADT: Multi-Objective Admissible Decision Theory

Reading the Paper

Rebuilding the PDFs

The moadt Library

Key files

Repository Structure

The Core Idea in Brief

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The `moadt` Library

Packages