MAPPO and MAA2C comparison and cross-algorithm evaluation

This project has been developed during the summer of 2025 for the Autonomous and Adaptive Systems course at the University of Bologna. We present a systematic comparison between two state of the art on-policy multi-agent reinforcement learning algorithms, MAPPO and MAA2C, within the Overcooked-AI benchmark. By implementing both algorithms from scratch under a shared CTDE framework, we ensured a fair and transparent evaluation across three axes: absolute performance, generalization, and an interesting and potentially key method that we could define as cross-algorithm coordination, to evaluate robustness.

Gameplay Demo

All the experiments are inside MARL_comparison.ipynb.

Quick Setup

Prerequisites

Python 3.10 (required for overcooked-ai compatibility - versions 3.11+ will not work)

Installing Python 3.10:

Mac users:

# Using Homebrew
brew install python@3.10

# Create virtual environment with Python 3.10
python3.10 -m venv overcooked-env
source overcooked-env/bin/activate

All platforms:

conda create -n overcooked python=3.10 -y
conda activate overcooked

Clone repository

git clone https://github.com/LorenzoVenturi/Overcooked.git
cd Overcooked
git submodule update --init --recursive

Install Python dependencies

# Core libraries for this project
pip install torch pygame numpy matplotlib seaborn

# Install the overcooked_ai environment
cd overcooked_ai
pip install -e .
cd ..

Verify installation

# Test that everything works
python utils/simple_render.py --list-models

Training Agents

Train new models from scratch using the terminal:

# List available layouts for training
python utils/train_agent.py --list-layouts

# Train MAA2C on single layout
python utils/train_agent.py --algorithm MAA2C --model-name my_cramped_model --layout cramped_room --batches 100

# Train MAPPO on multiple layouts
python utils/train_agent.py --algorithm MAPPO --model-name my_multi_model --layout cramped_room --layout coordination_ring --batches 200

# Train on all layouts (generalization)
python utils/train_agent.py --algorithm MAA2C --model-name my_generalized_model --all-layouts --batches 150

# Advanced training with custom parameters
python utils/train_agent.py --algorithm MAPPO --model-name advanced_model --layout forced_coordination --batches 300 --episodes-per-batch 15 --iters 12

Training Parameters

--algorithm or -a: MAA2C or MAPPO
--model-name or -m: Name for your trained model
--layout or -l: Layout(s) to train on (repeatable)
--all-layouts: Train on all available layouts
--batches or -b: Number of training batches (default: 100)
--episodes-per-batch or -e: Episodes per batch (default: 10)
--iters or -i: MAPPO iterations (default: 10)
--device or -d: cpu or cuda (default: cpu)

Testing existing trained models

Test the models or existing pre-trained ones:

# List available models and layouts
python utils/simple_render.py --list-models

# Test pre-trained models (already available)
python utils/simple_render.py --algorithm MAPPO --model cramped_room_selfplay --layout cramped_room
python utils/simple_render.py --algorithm MAA2C --model asymmetric_advantages_selfplay --layout asymmetric_advantages

# Test your own new trained model! (after training with train_agent.py)
python utils/simple_render.py --algorithm MAA2C --model my_cramped_model --layout cramped_room

Rendering Options

Required Arguments (for rendering)

--algorithm or -a: MAA2C or MAPPO
--model or -m: Model name to load
--layout or -l: Layout to render

Optional Arguments

--episodes or -e: Number of episodes (default: 1)
--frame-rate or -f: Frame rate (default: 10)
--device or -d: cpu or cuda (default: cpu)
--list-models: List all available models and layouts

Examples

List models:

python utils/simple_render.py --list-models

Basic rendering (using pre-trained models):

python utils/simple_render.py -a MAA2C -m cramped_room_selfplay -l cramped_room
python utils/simple_render.py -a MAPPO -m coordination_ring_selfplay -l coordination_ring

Multiple episodes:

python utils/simple_render.py -a MAA2C -m multi_layout_generalized5 -l forced_coordination -e 3 -f 15

Available Models and Layouts

MAA2C Models

cramped_room_selfplay, coordination_ring_selfplay, forced_coordination_selfplay, asymmetric_advantages_selfplay, multi_layout_generalized1-5

MAPPO Models

cramped_room_selfplay, coordination_ring_selfplay, forced_coordination_selfplay, asymmetric_advantages_selfplay

Layouts

cramped_room, coordination_ring, forced_coordination, asymmetric_advantages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.vscode		.vscode
algorithms		algorithms
overcooked_ai @ 739950a		overcooked_ai @ 739950a
utils		utils
.DS_Store		.DS_Store
.gitmodules		.gitmodules
MARL_comparison.ipynb		MARL_comparison.ipynb
README.md		README.md
Report_Venturi.pdf		Report_Venturi.pdf
rec.mov		rec.mov

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MAPPO and MAA2C comparison and cross-algorithm evaluation

Gameplay Demo

All the experiments are inside MARL_comparison.ipynb.

Quick Setup

Training Agents

Training Parameters

Testing existing trained models

Rendering Options

Required Arguments (for rendering)

Optional Arguments

Examples

Available Models and Layouts

MAA2C Models

MAPPO Models

Layouts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MAPPO and MAA2C comparison and cross-algorithm evaluation

Gameplay Demo

All the experiments are inside MARL_comparison.ipynb.

Quick Setup

Training Agents

Training Parameters

Testing existing trained models

Rendering Options

Required Arguments (for rendering)

Optional Arguments

Examples

Available Models and Layouts

MAA2C Models

MAPPO Models

Layouts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages