Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
bde5c99
updated submodules
nathan-miller23 Sep 16, 2020
a1c70b2
Fixed overcooked commit pointer based on issue #14
micahcarroll Nov 4, 2020
56073fb
Fixed test issue
micahcarroll Feb 1, 2022
6dce6d5
package versions corrections
alexdigimaker Jun 1, 2022
75f17b1
package versions corrections
alexdigimaker Jun 1, 2022
2507b93
Merge pull request #23 from HumanCompatibleAI/alex_neurips2019
micahcarroll Jun 2, 2022
4367121
Updated overcooked pointer
micahcarroll Jun 10, 2022
a413fb5
Fixed testing issue
micahcarroll Jun 11, 2022
343904f
changes to the install script and readme
alexlichtenstein Aug 4, 2022
b2336ff
Added PR template
micahcarroll Aug 4, 2022
a703929
fix to install file git lfs/brew
alexlichtenstein Aug 8, 2022
67f5dd6
add additional documentation to the load_trainer method
alexlichtenstein Aug 10, 2022
b1bfae9
Merge branch 'master' into mesut_changes
micahcarroll Aug 10, 2022
8dd5b64
Merge pull request #27 from alexlichtenstein/mesut_changes
alexlichtenstein Aug 10, 2022
a4a5cd1
adding trained model and test on resume functionality
alexlichtenstein Aug 23, 2022
e8ecd69
training script for 5 classic layouts
alexlichtenstein Aug 23, 2022
a4e759a
update on readme
alexlichtenstein Aug 23, 2022
6d145c3
plotting, shifting function to utils, changes to README
alexlichtenstein Sep 2, 2022
f8734cb
readme, utils
alexlichtenstein Sep 2, 2022
f2e887f
readme + plotting
alexlichtenstein Sep 7, 2022
40bafd6
fix for test case
alexlichtenstein Sep 12, 2022
a9386f7
fix for test case
alexlichtenstein Sep 12, 2022
e490880
get debug info
alexlichtenstein Sep 12, 2022
81fec42
disable logging
alexlichtenstein Sep 12, 2022
4f1f354
disable logging
alexlichtenstein Sep 12, 2022
aa834b6
disable logging
alexlichtenstein Sep 12, 2022
41569c8
disable logging
alexlichtenstein Sep 15, 2022
679ea3d
logging check
alexlichtenstein Sep 15, 2022
43e7fa1
logging check
alexlichtenstein Sep 15, 2022
11286e3
logging check
alexlichtenstein Sep 15, 2022
2d29915
logging check
alexlichtenstein Sep 15, 2022
cca9b1c
logging check
alexlichtenstein Sep 15, 2022
71d2622
logging check
alexlichtenstein Sep 15, 2022
1ea9cd6
logging check
alexlichtenstein Sep 15, 2022
e141052
logging check
alexlichtenstein Sep 15, 2022
cda4390
fix logging for unit test
alexlichtenstein Sep 15, 2022
6715f7b
Merge pull request #29 from alexlichtenstein/master
alexlichtenstein Sep 16, 2022
d80da42
Setup improvement
jyan1999 Oct 5, 2022
7402685
Merge pull request #32 from jyan1999/setupOp
jyan1999 Oct 12, 2022
938bcee
Update README.md
jyan1999 Oct 12, 2022
32b1b0e
update ray
jyan1999 Oct 20, 2022
8e01e31
Update Ray
jyan1999 Oct 25, 2022
10a2109
Merge pull request #34 from jyan1999/updateRay
micahcarroll Oct 25, 2022
20c9558
Merge branch 'master' into neurips2019
jyan1999 Nov 27, 2022
55c37ae
Update README.md
jyan1999 Nov 27, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
9 changes: 3 additions & 6 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
*.ipynb linguist-vendored

# Track all CSVs with LFS
**/*.csv filter=lfs diff=lfs merge=lfs -text

# Exclude CSVs from this particularly directory from LFS
human_aware_rl/static/human_data/dummy/*.csv filter= diff= merge= text
human_aware_rl/data/human/anonymized/trials_hh.csv filter=lfs diff=lfs merge=lfs -text
*.csv filter=lfs diff=lfs merge=lfs -text
human_aware_rl/data/human/anonymized/filtered_humanai_trials.pkl filter=lfs diff=lfs merge=lfs -text
38 changes: 0 additions & 38 deletions .github/workflows/python-app.yml

This file was deleted.

19 changes: 7 additions & 12 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,13 @@ venv.bak/
# mypy
.mypy_cache/

# VSCode
**/.vscode/

# CHAI specific
**/data_dir.py
**/slack.json

# Other
.DS_Store
*.key
Expand All @@ -129,15 +136,3 @@ data/ppo_exp/

# Other files
transfer_agent.sh

# sacred config files
**/slack.json

# VSCode metadata
**/.vscode

# Data directories
**/data_dir.py

# PyCharm
.idea/
11 changes: 10 additions & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
[submodule "baselines"]
path = baselines
url = https://github.com/micahcarroll/baselines.git
[submodule "overcooked_ai"]
path = overcooked_ai
url = https://github.com/HumanCompatibleAI/overcooked_ai.git
branch = master
branch = overcooked_ai_improvements
[submodule "stable-baselines"]
path = stable-baselines
url = https://github.com/micahcarroll/stable-baselines.git
[submodule "tfjs-converter"]
path = tfjs-converter
url = https://github.com/tensorflow/tfjs-converter.git
199 changes: 62 additions & 137 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,18 @@
# Human-Aware Reinforcement Learning

This code is based on the work in [On the Utility of Learning about Humans for Human-AI Coordination](https://arxiv.org/abs/1910.05789).
## :warning: DEPRECATION WARNING

# Contents
This repo is being deprecated and should no longer be used indepdently. This repo is now a module under the [overcooked_ai](https://github.com/HumanCompatibleAI/overcooked_ai/tree/master) project as we are in the process of consolidating several repos into one for convenience and better maintainability.

This repo should now **only** be used to reproduce the results in the 2019 paper [On the Utility of Learning about Humans for Human-AI Coordination](https://arxiv.org/abs/1910.05789).

*Note that this repository uses a specific older commit of the [overcooked_ai repository](https://github.com/HumanCompatibleAI/overcooked_ai)*, and should not be expected to work with the current version of that repository.

To play the game with trained agents, you can use [Overcooked-Demo](https://github.com/HumanCompatibleAI/overcooked-demo).

For more information about the Overcooked-AI environment, check out [this](https://github.com/HumanCompatibleAI/overcooked_ai) repo.

* [Installation](#installation)
* [Testing](#testing)
* [Repo Structure Overview](#repo-structure-overview)
* [Usage](#usage)
* [Troubleshooting](#troubleshooting)
* [Playing With Agents](#playing-with-agents)
* [Reproducing Results](#repoducing-results)
* [Human Data](./human_aware_rl/static/human_data/README.md)

# Installation
## Installation

When cloning the repository, make sure you also clone the submodules (this implementation is linked to specific commits of the submodules, and will mostly not work with more recent ones):
```
Expand All @@ -29,182 +24,112 @@ If you want to clone a specific branch with its submodules, use:
$ git clone --single-branch --branch BRANCH_NAME --recursive https://github.com/HumanCompatibleAI/human_aware_rl.git
```


## CUDA 10.0 Installation on Ubuntu 18.04
For Ubuntu 18.04, follow the direction [here](https://www.pugetsystems.com/labs/hpc/How-To-Install-CUDA-10-together-with-9-2-on-Ubuntu-18-04-with-support-for-NVIDIA-20XX-Turing-GPUs-1236/)

The only difference being the very last step.

Instead of running

```bash
$ sudo apt-get install cuda
It is useful to setup a conda environment with Python 3.7:
```

Please run
```bash
$ sudo apt-get install cuda-libraries-10-0
$ sudo apt-get install cuda-10-0
$ conda create -n harl python=3.7
$ conda activate harl
```

## Conda Environment Setup

Create a new conda environment and run the install script as before

[Optional Conda Installation for 18.04](https://www.digitalocean.com/community/tutorials/how-to-install-the-anaconda-python-distribution-on-ubuntu-18-04)

```bash
$ conda create -n harl_rllib python=3.7
$ conda activate harl_rllib
(harl_rllib) $ ./install.sh
To complete the installation, run:
```
$ cd human_aware_rl
human_aware_rl $ ./install.sh
```

Finally, install the latest stable version of tensorflow compatible with rllib
```bash
(harl_rllib) $ pip install tensorflow==2.0.2
Then install tensorflow and mpi4py (the GPU **or** non-GPU version depending on your setup):
```
Or, if working with gpus, install a version of tensorflow 2.*.* and cuDNN that is compatible with the available Cuda drivers. The following example works for Cuda 10.0.0. You can verify what version of Cuda is installed by running `nvcc --version`. For a full list of driver compatibility, refer [here](https://www.tensorflow.org/install/source#gpu)
```bash
(harl_rllib) $ pip install tensorflow-gpu==2.0.0
(harl_rllib) $ conda install -c anaconda cudnn=7.6.0
$ pip install tensorflow==1.13.1
$ conda install mpi4py
```

Your virtual environment should now be configured to run the rllib training code. Verify it by running the following command

```bash
(harl_rllib) $ python -c "from ray import rllib"
```
$ pip install tensorflow-gpu==1.13.1
$ conda install mpi4py
```

Note: if you ever get an import error, please first check if you activated the conda env
Note that using tensorflow-gpu will not enable to pass the DRL tests due to intrinsic randomness introduced by GPU computations. We recommend to first install tensorflow (non-GPU), run the tests, and then install tensorflow-gpu.

# Testing
## Verify Installation

If set-up was successful, all unit tests and local reproducibility tests should pass. They can be run as follows
To verify your installation, you can try running the following command from the inner `human_aware_rl` folder:

You can run all the tests with
```bash
(harl_rllib) $ ./run_tests.sh
```

## PPO Tests
Highest level integration tests that combine self play, bc training, and ppo_bc training
```bash
(harl_rllib) $ cd human_aware_rl/ppo
(harl_rllib) human_aware_rl/ppo $ python ppo_rllib_test.py
python run_tests.py
```

## BC Tests
All tests involving creation, training, and saving of bc models. No dependency on rllib
```bash
(harl_rllib) $ cd imitation
(harl_rllib) imitation $ python behavior_cloning_tf2_test.py
```
Note that most of the DRL tests rely on having the exact randomness settings that were used to generate the tests (and thus will not pass on a GPU-enabled device).

## Rllib Tests
Tests rllib environments and models, as well as various utility functions. Does not actually test rllib training
```bash
(harl_rllib) $ cd rllib
(harl_rllib) rllib $ python tests.py
```

You should see all tests passing.
On OSX, you may run into an error saying that Python must be installed as a framework. You can fix it by [telling Matplotlib to use a different backend](https://markhneedham.com/blog/2018/05/04/python-runtime-error-osx-matplotlib-not-installed-as-framework-mac/).

Note: the tests are broken up into separate files because they rely on different tensorflow execution states (i.e. the bc tests run tf in eager mode, while rllib requires tensorflow to be running symbollically). Going forward, it would probably be best to standardize the tensorflow execution state, or re-write the code such that it is robust to execution state.
## Repo Structure Overview

# Repo Structure Overview

`ppo/`:
- `ppo_rllib.py`: Primary module where code for training a PPO agent resides. This includes an rllib compatible wrapper on `OvercookedEnv`, utilities for converting rllib `Policy` classes to Overcooked `Agent`s, as well as utility functions and callbacks
- `ppo_rllib_client.py` Driver code for configuing and launching the training of an agent. More details about usage below
- `ppo_rllib_from_params_client.py`: train one agent with PPO in Overcooked with variable-MDPs
- `ppo_rllib_test.py` Reproducibility tests for local sanity checks
`ppo/` (both using baselines):
- `ppo.py`: train one agent with PPO in Overcooked with other agent fixed

`rllib/`:
- `rllib.py`: rllib agent and training utils that utilize Overcooked APIs
- `utils.py`: utils for the above
- `tests.py`: preliminary tests for the above
`pbt/` (all using baselines):
- `pbt.py`: train agents with population based training in overcooked

`imitation/`:
- `behavior_cloning_tf2.py`: Module for training, saving, and loading a BC model
- `behavior_cloning_tf2_test.py`: Contains basic reproducibility tests as well as unit tests for the various components of the bc module.
- `behaviour_cloning.py`: simple script to perform BC on trajectory data using baselines

`human/`:
- `process_data.py` script to process human data in specific formats to be used by DRL algorithms
- `data_processing_utils.py` utils for the above

`utils.py`: utils for the repo
`experiments/`: folder with experiment scripts used to generate experimental results in the paper

# Usage

Before proceeding, it is important to note that there are two primary groups of hyperparameter defaults, `local` and `production`. Which is selected is controlled by the `RUN_ENV` environment variable, which defaults to `production`. In order to use local hyperparameters, run
```bash
$ export RUN_ENV=local
```
`baselines_utils.py`: utility functions used for `pbt.py`
`overcooked_interactive.py`: script to play Overcooked in terminal against trained agents
`run_tests.py`: script to run all tests

Training of agents is done through the `ppo_rllib_client.py` script. It has the following usage:
# Playing with trained agents

```bash
ppo_rllib_client.py [with [<param_0>=<argument_0>] ... ]
```
## In terminal-graphics

For example, the following snippet trains a self play ppo agent on seed 1, 2, and 3, with learning rate `1e-3`, on the `"cramped_room"` layout for `5` iterations without using any gpus. The rest of the parameters are left to their defaults
```
(harl_rllib) ppo $ python ppo_rllib_client.py with seeds="[1, 2, 3] lr=1e-3 layout_name=cramped_room num_training_iters=5 num_gpus=0 experiment_name="my_agent"
```
To play with trained agents in the terminal, use `overcooked_interactive.py`. A sample command is:

For a complete list of all hyperparameters as well as their local and production defaults, refer to the `my_config` section of `ppo_rllib_client.py`
`python overcooked_interactive.py -t bc -r simple_bc_test_seed4`

Playing requires not clicking away from the terminal window.

Training results and checkpoints are stored in a directory called `~/ray_results/my_agent_<seed>_<timestamp>`. You can visualize the results using tensorboard
```bash
(harl_rllib) $ cd ~/ray_results
(harl_rllib) ray_results $ tensorboard --logdir .
```
## With JavaScript graphics

This requires converting the trained models to Tensorflow JS format, and visualizing with the [overcooked-demo](https://github.com/HumanCompatibleAI/overcooked-demo) code. First install overcooked-demo and ensure it works properly.

# Troubleshooting
### Converting models to JS format

## Tensorflow
Many tensorflow errors are caused by the tensorflow state of execution. For example, if you get an error similar to
Unfortunately, converting models requires creating a new conda environment to avoid module conflicts.

Create and activate a new conda environment:
```
ValueError: Could not find matching function to call loaded from the SavedModel. Got:
Positional arguments (1 total):
* Tensor("inputs:0", shape=(1, 62), dtype=float64)
Keyword arguments: {}
$ conda create -n model_conversion python=3.7
$ conda activate model_conversion
```

or

Run the base `setup.py` (from the inner `human_aware_rl`) and then install `tensorflowjs`:
```
NotImplementedError: Cannot convert a symbolic Tensor (model_1/logits/BiasAdd:0) to a numpy array.
human_aware_rl $ cd human_aware_rl
human_aware_rl $ python setup.py develop
human_aware_rl $ pip install tensorflowjs==0.8.5
```

or

To convert models in the right format, use the `convert_model_to_web.sh` script. Example usage:
```
TypeError: Variable is unhashable. Instead, use tensor.ref() as the key.
human_aware_rl $ ./convert_model_to_web.sh ppo_runs ppo_sp_simple 193
```
where 193 is the seed number of the DRL run.

It is likely because the code you are running relies on tensorflow executing symbolically (or eagerly) and it is executing eagerly (or symbolically)
### Transferring agents to Overcooked-Demo

This can be fixed by either changing the order of imports. This is because `import tensorflow as tf` sets eager execution to true, while any `rllib` import disables eager execution. Once the execution state has been set, it cannot be changed. For example, if you require eager execution, make sure `import tensorflow as tf` comes BEFORE `from ray import rllib` and vise versa.
The converted models can be found in `human_aware_rl/data/web_models/` and should be transferred to the `static/assets` folder with the same naming as the standard models.

### Playing with newly trained agents

## 'human_aware_rl.data_dir' not found
If you encounter
```
ModuleNotFoundError: No module named 'human_aware_rl.data_dir'
```

, please run

```
./run_tests.sh
```
To play with newly trained agents, just follow the instructions in the [Overcooked-Demo](https://github.com/HumanCompatibleAI/overcooked-demo) README.

to initiate those variables
# Reproducing results

# Reproducing Results
All DRL results can be reproduced by running the `.sh` scripts under `human_aware_rl/experiments/`.

The specific results in that paper were obtained using code that is no longer in the master branch. If you are interested in reproducing results, please check out [this](https://github.com/HumanCompatibleAI/human_aware_rl/tree/neurips2019) and follow the install instructions there.
All non-DRL results can be reproduced by running cells in `NeurIPS Experiments and Visualizations.ipynb`.
1 change: 1 addition & 0 deletions baselines
Submodule baselines added at 472994
29 changes: 29 additions & 0 deletions convert_model_to_web.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/sh
RUN_TYPE="$1"
RUN_NAME="$2"
SEED="$3"
if [ "$#" -eq 3 ]
then
if [ "$1" = "ppo_runs" ]
then
tensorflowjs_converter --input_format=tf_saved_model --output_node_names='ppo_agent/ppo2_model/action_probs' --saved_model_tags=serve human_aware_rl/data/$RUN_TYPE/$RUN_NAME/seed$SEED/ppo_agent \
human_aware_rl/data/web_models/$RUN_NAME\_seed$SEED\_temp
elif [ "$1" = "pbt_runs" ]
then
tensorflowjs_converter --input_format=tf_saved_model --output_node_names='agent0/ppo2_model/action_probs' --saved_model_tags=serve human_aware_rl/data/$RUN_TYPE/$RUN_NAME/seed_$SEED/agent0/best \
human_aware_rl/data/web_models/$RUN_NAME\_seed$SEED\_temp
else
echo "Should have 3 arguments: RUN_TYPE (ppo_runs/pbt_runs), RUN_NAME, and SEED"
exit 1
fi

cd tfjs-converter
yarn ts-node tools/pb2json_converter.ts ../human_aware_rl/data/web_models/$RUN_NAME\_seed$SEED\_temp \
../human_aware_rl/data/web_models/$RUN_NAME\_seed$SEED


rm -rf ../human_aware_rl/data/web_models/$RUN_NAME\_seed$SEED\_temp
else
echo "Should have 3 arguments: RUN_TYPE (ppo_runs/pbt_runs), RUN_NAME, and SEED"
exit 1
fi
Loading