Skip to content

moridinamael/moadt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MOADT: Multi-Objective Admissible Decision Theory

A decision theory for AI alignment that replaces scalar utility maximization with robust Pareto admissibility and structured deference. The core move: drop the completeness axiom, eliminate only provably dominated actions, and defer to human judgment on the rest. Corrigibility emerges as a structural feature rather than an imposed constraint.

Reading the Paper

Start here, in order of depth:

Document What it is Length
The Scalarization Trap The motivation — why scalar utility theory is structurally incompatible with alignment ~25 pages
MOADT-core.pdf The core argument — why dropping completeness dissolves corrigibility 8 pages
MOADT-complete.pdf Full technical report with proofs, worked-out protocol, and all 9 worked examples 192 pages

Source markdown: MOADT-core.md | R-MOADT.md | Worked Examples 1–9

The Scalarization Trap identifies the problem. The core document presents the solution. The full report defends it.

Rebuilding the PDFs

Requires pandoc and texlive-xetex:

bash paper/build-pdf.sh          # full report + appendices

The moadt Library

A pip-installable Python implementation of the MOADT decision engine.

pip install -e .
import numpy as np
from moadt import MOADTProblem, run_moadt_protocol

problem = MOADTProblem(
    actions=["a", "b", "c"],
    states=["s1", "s2"],
    objectives=["safety", "helpfulness", "honesty"],
    credal_probs=[                          # credal set over states
        np.array([0.5, 0.5]),
        np.array([0.3, 0.7]),
    ],
    outcomes={                              # (action, state) → objective scores
        ("a", "s1"): np.array([0.9, 0.6, 0.7]),
        ("a", "s2"): np.array([0.8, 0.5, 0.6]),
        ("b", "s1"): np.array([0.4, 0.9, 0.8]),
        ("b", "s2"): np.array([0.3, 0.8, 0.9]),
        ("c", "s1"): np.array([0.7, 0.7, 0.5]),
        ("c", "s2"): np.array([0.6, 0.6, 0.4]),
    },
    constraints={0: 0.3},                   # safety (index 0) >= 0.3
    reference_point=np.array([0.5, 0.5, 0.5]),  # Layer 2 aspirations
)
result = run_moadt_protocol(problem)

Key files

Run tests:

python3 -m pytest tests/ -q

Repository Structure

├── paper/                    # All paper source files
│   ├── the-scalarization-trap.md   # Motivation (~25 pages)
│   ├── MOADT-core.md         # Compact argument (~8 pages)
│   ├── R-MOADT.md            # Full technical report
│   ├── MOADT-worked-example-{1..9}.md
│   ├── build-pdf.sh          # PDF build script
│   ├── MOADT-core.pdf        # Built PDF
│   └── MOADT-complete.pdf    # Built PDF (report + appendices)
│
├── moadt/                    # Python library
│   ├── _engine.py
│   └── __init__.py
│
├── examples/                 # Executable example scripts
│   ├── paper1_resource_allocation.py
│   ├── ...
│   └── classic_stpetersburg.py
│
├── tests/                    # Regression tests
│   └── test_engine.py
│
└── pyproject.toml            # Package configuration

The Core Idea in Brief

Standard decision theory says: rational agents maximize a scalar utility function.

MOADT says: rational agents eliminate the provably bad and defer on the rest.

The first produces agents that must resist correction (the current utility function ranks itself as optimal), must trade safety for performance at some rate (completeness demands commensurability), and must pretend all human values fit on a single number line.

The second produces agents that have no reason to resist correction (Theorem 2), maintain hard safety floors that cannot be traded away (Layer 1 constraints), and ask humans to resolve the tradeoffs that humans should resolve (deference under incomparability).

The price is giving up decisiveness in cases of genuine value conflict. The payoff is corrigibility by construction.

About

Multi-Objective Admissible Decision Theory (MOADT) — a decision theory for AI alignment that drops the completeness axiom, replacing scalar utility maximization with robust Pareto admissibility and structured deference to achieve corrigibility by construction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors