Research experiments on heterogeneous LLM-agent panels using DisSysLab as the experimental platform.
Pre-pilot. Phase 1 (single-agent calibration) starts July 2026.
- When does a panel of heterogeneous agents (different temperatures, possibly different models) with one or more moderators outperform a single agent on a given problem class?
- How does inter-agent error correlation modulate the benefit of panel deliberation?
- What moderator strategies — concatenation, critique-and-revise, rationale-exchange — help most, and under what conditions?
| Phase | What | Status |
|---|---|---|
| 1 | Calibrate p(T) for 4-5 (model, temperature) configurations on a fixed problem class (~200 problems each). |
not started |
| 2 | Measure inter-agent error correlation for the configurations from Phase 1. | not started |
| 3 | Pilot panel study: 3 panel configurations × 2 moderator strategies × 100 problems. | not started |
| 4 | Working notes / draft paper with preliminary findings. | not started |
DisSysLab — a framework
for building offices of agents in plain English. This repo depends
on dissyslab as a regular PyPI package.
- Du et al. (2023) "Improving Factuality and Reasoning in Language Models through Multiagent Debate."
- Liang et al. (2024) "Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate."
- Condorcet (1785) — foundational voting theorem for jury aggregation.
- Ladha (1993) — extensions of Condorcet to correlated voters.
MIT