jyp0802 · jyp0802 · Sep 16, 2020 · Nov 4, 2020 · Feb 1, 2022 · Jun 1, 2022
diff --git a/.gitattributes b/.gitattributes
@@ -1,7 +1,4 @@
 *.ipynb linguist-vendored
-
-# Track all CSVs with LFS
-**/*.csv filter=lfs diff=lfs merge=lfs -text
-
-# Exclude CSVs from this particularly directory from LFS
-human_aware_rl/static/human_data/dummy/*.csv filter= diff= merge= text
+human_aware_rl/data/human/anonymized/trials_hh.csv filter=lfs diff=lfs merge=lfs -text
+*.csv filter=lfs diff=lfs merge=lfs -text
+human_aware_rl/data/human/anonymized/filtered_humanai_trials.pkl filter=lfs diff=lfs merge=lfs -text
diff --git a/.github/workflows/python-app.yml b/.github/workflows/python-app.yml
diff --git a/.gitignore b/.gitignore
@@ -106,6 +106,13 @@ venv.bak/
 # mypy
 .mypy_cache/
 
+# VSCode
+**/.vscode/
+
+# CHAI specific
+**/data_dir.py
+**/slack.json
+
 # Other
 .DS_Store
 *.key
@@ -129,15 +136,3 @@ data/ppo_exp/
 
 # Other files
 transfer_agent.sh
-
-# sacred config files
-**/slack.json
-
-# VSCode metadata
-**/.vscode
-
-# Data directories
-**/data_dir.py
-
-# PyCharm
-.idea/
diff --git a/.gitmodules b/.gitmodules
@@ -1,4 +1,13 @@
+[submodule "baselines"]
+	path = baselines
+	url = https://github.com/micahcarroll/baselines.git
 [submodule "overcooked_ai"]
 	path = overcooked_ai
 	url = https://github.com/HumanCompatibleAI/overcooked_ai.git
-	branch = master
+	branch = overcooked_ai_improvements
+[submodule "stable-baselines"]
+	path = stable-baselines
+	url = https://github.com/micahcarroll/stable-baselines.git
+[submodule "tfjs-converter"]
+	path = tfjs-converter
+	url = https://github.com/tensorflow/tfjs-converter.git
diff --git a/README.md b/README.md
@@ -1,23 +1,18 @@
 # Human-Aware Reinforcement Learning
 
-This code is based on the work in [On the Utility of Learning about Humans for Human-AI Coordination](https://arxiv.org/abs/1910.05789). 
+## :warning: DEPRECATION WARNING
 
-# Contents
+This repo is being deprecated and should no longer be used indepdently. This repo is now a module under the [overcooked_ai](https://github.com/HumanCompatibleAI/overcooked_ai/tree/master) project as we are in the process of consolidating several repos into one for convenience and better maintainability. 
+
+This repo should now **only** be used to reproduce the results in the 2019 paper [On the Utility of Learning about Humans for Human-AI Coordination](https://arxiv.org/abs/1910.05789). 
+
+*Note that this repository uses a specific older commit of the [overcooked_ai repository](https://github.com/HumanCompatibleAI/overcooked_ai)*, and should not be expected to work with the current version of that repository.
 
 To play the game with trained agents, you can use [Overcooked-Demo](https://github.com/HumanCompatibleAI/overcooked-demo).
 
 For more information about the Overcooked-AI environment, check out [this](https://github.com/HumanCompatibleAI/overcooked_ai) repo.
 
-* [Installation](#installation)
-* [Testing](#testing)
-* [Repo Structure Overview](#repo-structure-overview)
-* [Usage](#usage)
-* [Troubleshooting](#troubleshooting)
-* [Playing With Agents](#playing-with-agents)
-* [Reproducing Results](#repoducing-results)
-* [Human Data](./human_aware_rl/static/human_data/README.md)
-
-# Installation
+## Installation
 
 When cloning the repository, make sure you also clone the submodules (this implementation is linked to specific commits of the submodules, and will mostly not work with more recent ones):
 ```
@@ -29,182 +24,112 @@ If you want to clone a specific branch with its submodules, use:
 $ git clone --single-branch --branch BRANCH_NAME --recursive https://github.com/HumanCompatibleAI/human_aware_rl.git
 ```
 
-
-## CUDA 10.0 Installation on Ubuntu 18.04
-For Ubuntu 18.04, follow the direction [here](https://www.pugetsystems.com/labs/hpc/How-To-Install-CUDA-10-together-with-9-2-on-Ubuntu-18-04-with-support-for-NVIDIA-20XX-Turing-GPUs-1236/)
-
-The only difference being the very last step. 
-
-Instead of running 
-
-```bash
-$ sudo apt-get install cuda
+It is useful to setup a conda environment with Python 3.7:
 ```
-
-Please run 
-```bash
-$ sudo apt-get install cuda-libraries-10-0
-$ sudo apt-get install cuda-10-0
+$ conda create -n harl python=3.7
+$ conda activate harl
 ```
 
-## Conda Environment Setup
-
-Create a new conda environment and run the install script as before
-
-[Optional Conda Installation for 18.04](https://www.digitalocean.com/community/tutorials/how-to-install-the-anaconda-python-distribution-on-ubuntu-18-04)
-
-```bash
-$ conda create -n harl_rllib python=3.7
-$ conda activate harl_rllib
-(harl_rllib) $ ./install.sh
+To complete the installation, run:
+```
+               $ cd human_aware_rl
+human_aware_rl $ ./install.sh
 ```
 
-Finally, install the latest stable version of tensorflow compatible with rllib
-```bash
-(harl_rllib) $ pip install tensorflow==2.0.2
+Then install tensorflow and mpi4py (the GPU **or** non-GPU version depending on your setup):
 ```
-Or, if working with gpus, install a version of tensorflow 2.*.* and cuDNN that is compatible with the available Cuda drivers. The following example works for Cuda 10.0.0. You can verify what version of Cuda is installed by running `nvcc --version`. For a full list of driver compatibility, refer [here](https://www.tensorflow.org/install/source#gpu)
-```bash
-(harl_rllib) $ pip install tensorflow-gpu==2.0.0
-(harl_rllib) $ conda install -c anaconda cudnn=7.6.0
+$ pip install tensorflow==1.13.1
+$ conda install mpi4py
 ```
 
-Your virtual environment should now be configured to run the rllib training code. Verify it by running the following command 
-
-```bash
-(harl_rllib) $ python -c "from ray import rllib"
+```
+$ pip install tensorflow-gpu==1.13.1
+$ conda install mpi4py
 ```
 
-Note: if you ever get an import error, please first check if you activated the conda env
+Note that using tensorflow-gpu will not enable to pass the DRL tests due to intrinsic randomness introduced by GPU computations. We recommend to first install tensorflow (non-GPU), run the tests, and then install tensorflow-gpu.
 
-# Testing
+## Verify Installation
 
-If set-up was successful, all unit tests and local reproducibility tests should pass. They can be run as follows
+To verify your installation, you can try running the following command from the inner `human_aware_rl` folder:
 
-You can run all the tests with 
-```bash
-(harl_rllib) $ ./run_tests.sh
 ```
-
-## PPO Tests
-Highest level integration tests that combine self play, bc training, and ppo_bc training
-```bash
-(harl_rllib) $ cd human_aware_rl/ppo
-(harl_rllib) human_aware_rl/ppo $ python ppo_rllib_test.py
+python run_tests.py
 ```
 
-## BC Tests
-All tests involving creation, training, and saving of bc models. No dependency on rllib
-```bash
-(harl_rllib) $ cd imitation
-(harl_rllib) imitation $ python behavior_cloning_tf2_test.py
-```
+Note that most of the DRL tests rely on having the exact randomness settings that were used to generate the tests (and thus will not pass on a GPU-enabled device).
 
-## Rllib Tests
-Tests rllib environments and models, as well as various utility functions. Does not actually test rllib training
-```bash
-(harl_rllib) $ cd rllib
-(harl_rllib) rllib $ python tests.py
-```
-
-You should see all tests passing. 
+On OSX, you may run into an error saying that Python must be installed as a framework. You can fix it by [telling Matplotlib to use a different backend](https://markhneedham.com/blog/2018/05/04/python-runtime-error-osx-matplotlib-not-installed-as-framework-mac/).
 
-Note: the tests are broken up into separate files because they rely on different tensorflow execution states (i.e. the bc tests run tf in eager mode, while rllib requires tensorflow to be running symbollically). Going forward, it would probably be best to standardize the tensorflow execution state, or re-write the code such that it is robust to execution state.
+## Repo Structure Overview
 
-# Repo Structure Overview
 
-`ppo/`:
-- `ppo_rllib.py`: Primary module where code for training a PPO agent resides. This includes an rllib compatible wrapper on `OvercookedEnv`, utilities for converting rllib `Policy` classes to Overcooked `Agent`s, as well as utility functions and callbacks
-- `ppo_rllib_client.py` Driver code for configuing and launching the training of an agent. More details about usage below
-- `ppo_rllib_from_params_client.py`: train one agent with PPO in Overcooked with variable-MDPs 
-- `ppo_rllib_test.py` Reproducibility tests for local sanity checks
+`ppo/` (both using baselines):
+- `ppo.py`: train one agent with PPO in Overcooked with other agent fixed
 
-`rllib/`:
-- `rllib.py`: rllib agent and training utils that utilize Overcooked APIs
-- `utils.py`: utils for the above
-- `tests.py`: preliminary tests for the above
+`pbt/` (all using baselines):
+- `pbt.py`: train agents with population based training in overcooked
 
 `imitation/`:
-- `behavior_cloning_tf2.py`:  Module for training, saving, and loading a BC model
-- `behavior_cloning_tf2_test.py`: Contains basic reproducibility tests as well as unit tests for the various components of the bc module.
+- `behaviour_cloning.py`:  simple script to perform BC on trajectory data using baselines
 
 `human/`:
 - `process_data.py` script to process human data in specific formats to be used by DRL algorithms
 - `data_processing_utils.py` utils for the above
 
-`utils.py`: utils for the repo
+`experiments/`: folder with experiment scripts used to generate experimental results in the paper
 
-# Usage
-
-Before proceeding, it is important to note that there are two primary groups of hyperparameter defaults, `local` and `production`. Which is selected is controlled by the `RUN_ENV` environment variable, which defaults to `production`. In order to use local hyperparameters, run
-```bash
-$ export RUN_ENV=local
-```
+`baselines_utils.py`: utility functions used for `pbt.py`
+`overcooked_interactive.py`: script to play Overcooked in terminal against trained agents
+`run_tests.py`: script to run all tests
 
-Training of agents is done through the `ppo_rllib_client.py` script. It has the following usage:
+# Playing with trained agents
 
-```bash
- ppo_rllib_client.py [with [<param_0>=<argument_0>] ... ]
-```
+## In terminal-graphics
 
-For example, the following snippet trains a self play ppo agent on seed 1, 2, and 3, with learning rate `1e-3`, on the `"cramped_room"` layout for `5` iterations without using any gpus. The rest of the parameters are left to their defaults
-```
-(harl_rllib) ppo $ python ppo_rllib_client.py with seeds="[1, 2, 3] lr=1e-3 layout_name=cramped_room num_training_iters=5 num_gpus=0 experiment_name="my_agent"
-```
+To play with trained agents in the terminal, use `overcooked_interactive.py`. A sample command is:
 
-For a complete list of all hyperparameters as well as their local and production defaults, refer to the `my_config` section of  `ppo_rllib_client.py`
+`python overcooked_interactive.py -t bc -r simple_bc_test_seed4`
 
+Playing requires not clicking away from the terminal window.
 
-Training results and checkpoints are stored in a directory called `~/ray_results/my_agent_<seed>_<timestamp>`. You can visualize the results using tensorboard
-```bash
-(harl_rllib) $ cd ~/ray_results
-(harl_rllib) ray_results $ tensorboard --logdir .
-```
+## With JavaScript graphics
 
+This requires converting the trained models to Tensorflow JS format, and visualizing with the [overcooked-demo](https://github.com/HumanCompatibleAI/overcooked-demo) code. First install overcooked-demo and ensure it works properly.
 
-# Troubleshooting
+### Converting models to JS format
 
-## Tensorflow
-Many tensorflow errors are caused by the tensorflow state of execution. For example, if you get an error similar to 
+Unfortunately, converting models requires creating a new conda environment to avoid module conflicts.
 
+Create and activate a new conda environment:
 ```
-ValueError: Could not find matching function to call loaded from the SavedModel. Got:
-  Positional arguments (1 total):
-    * Tensor("inputs:0", shape=(1, 62), dtype=float64)
-  Keyword arguments: {}
+$ conda create -n model_conversion python=3.7
+$ conda activate model_conversion
 ```
 
-or
-
+Run the base `setup.py` (from the inner `human_aware_rl`) and then install `tensorflowjs`:
 ```
-NotImplementedError: Cannot convert a symbolic Tensor (model_1/logits/BiasAdd:0) to a numpy array.
+human_aware_rl $ cd human_aware_rl
+human_aware_rl $ python setup.py develop
+human_aware_rl $ pip install tensorflowjs==0.8.5
 ```
 
-or
-
+To convert models in the right format, use the `convert_model_to_web.sh` script. Example usage:
 ```
-TypeError: Variable is unhashable. Instead, use tensor.ref() as the key.
+human_aware_rl $ ./convert_model_to_web.sh ppo_runs ppo_sp_simple 193
 ```
+where 193 is the seed number of the DRL run.
 
-It is likely because the code you are running relies on tensorflow executing symbolically (or eagerly) and it is executing eagerly (or symbolically)
+### Transferring agents to Overcooked-Demo
 
-This can be fixed by either changing the order of imports. This is because `import tensorflow as tf` sets eager execution to true, while any `rllib` import disables eager execution. Once the execution state has been set, it cannot be changed. For example, if you require eager execution, make sure `import tensorflow as tf` comes BEFORE `from ray import rllib` and vise versa.
+The converted models can be found in `human_aware_rl/data/web_models/` and should be transferred to the `static/assets` folder with the same naming as the standard models.
 
+### Playing with newly trained agents
 
-## 'human_aware_rl.data_dir' not found
-If you encounter 
-```
-ModuleNotFoundError: No module named 'human_aware_rl.data_dir'
-```
-
-, please run 
-
-```
-./run_tests.sh
-``` 
+To play with newly trained agents, just follow the instructions in the [Overcooked-Demo](https://github.com/HumanCompatibleAI/overcooked-demo) README.
 
-to initiate those variables
+# Reproducing results
 
-# Reproducing Results
+All DRL results can be reproduced by running the `.sh` scripts under `human_aware_rl/experiments/`.
 
-The specific results in that paper were obtained using code that is no longer in the master branch. If you are interested in reproducing results, please check out [this](https://github.com/HumanCompatibleAI/human_aware_rl/tree/neurips2019) and follow the install instructions there.
+All non-DRL results can be reproduced by running cells in `NeurIPS Experiments and Visualizations.ipynb`.
diff --git a/baselines b/baselines
diff --git a/convert_model_to_web.sh b/convert_model_to_web.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+RUN_TYPE="$1"
+RUN_NAME="$2"
+SEED="$3"
+if [ "$#" -eq 3 ]
+then
+    if [ "$1" = "ppo_runs" ]
+    then
+        tensorflowjs_converter --input_format=tf_saved_model --output_node_names='ppo_agent/ppo2_model/action_probs' --saved_model_tags=serve human_aware_rl/data/$RUN_TYPE/$RUN_NAME/seed$SEED/ppo_agent \
+            human_aware_rl/data/web_models/$RUN_NAME\_seed$SEED\_temp
+    elif [ "$1" = "pbt_runs" ]
+    then
+        tensorflowjs_converter --input_format=tf_saved_model --output_node_names='agent0/ppo2_model/action_probs' --saved_model_tags=serve human_aware_rl/data/$RUN_TYPE/$RUN_NAME/seed_$SEED/agent0/best \
+            human_aware_rl/data/web_models/$RUN_NAME\_seed$SEED\_temp
+    else
+        echo "Should have 3 arguments: RUN_TYPE (ppo_runs/pbt_runs), RUN_NAME, and SEED"
+        exit 1
+    fi
+
+    cd tfjs-converter
+    yarn ts-node tools/pb2json_converter.ts ../human_aware_rl/data/web_models/$RUN_NAME\_seed$SEED\_temp \
+        ../human_aware_rl/data/web_models/$RUN_NAME\_seed$SEED
+
+
+    rm -rf ../human_aware_rl/data/web_models/$RUN_NAME\_seed$SEED\_temp
+else
+    echo "Should have 3 arguments: RUN_TYPE (ppo_runs/pbt_runs), RUN_NAME, and SEED"
+    exit 1
+fi