Skip to content
/ ioc Public

Options of Interest: Temporal Abstraction with Interest Functions AAAI 2020

Notifications You must be signed in to change notification settings

kkhetarpal/ioc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Options of Interest

This repo contains code accompaning the paper, Options of Interest: Temporal Abstraction with Interest Functions (AAAI 2020). It includes code for interest-option-critic (IOC) to run all the experiments described in the paper.

  • You can find demonstrative videos of the trained agents on our project webpage.
  • All proofs, pseudo-code and reproducibility checklist details are available in the appendix on our project webpage.
  • For experiment details, please refer to the full paper provided on the webpage.

Contents:

Tabular Experiments (Four-Rooms)

Dependencies

To install dependencies for control experiments: run the following commands:

conda create -n interest python=3.6
conda actvate interest
pip install seaborn
pip install matplotlib

Usage

To run the ioc code, use:

python interestoptioncritic_tabular_fr.py --baseline --discount=0.99 --epsilon=0.01 --noptions=4 --lr_critic=0.5 --lr_intra=0.25 --lr_term=0.25 --lr_interestfn=0.15 --nruns=10 --nsteps=2000 --nepisodes=500 --seed=7200

To run the baseline oc code, use:

python optioncritic_tabular_fr.py --baseline --discount=0.99 --epsilon=0.01 --noptions=4 --lr_critic=0.5 --lr_intra=0.25 --lr_term=0.25 --nruns=10 --nsteps=2000 --nepisodes=500 --seed=7200

Performance and Visualizations

To visualize the environment itself, use the notebook: fr_env_plots.ipynb

To plot the performance curves, use the notebook: fr_analysis_performance.ipynb

To visualize the options learned, use the notebook: fr_analysis_heatmaps.ipynb


Control Experiments (TMaze & HalfCheetah)

Dependencies

To install dependencies for control experiments: run the following commands:

conda create -n intfc python=3.6
conda actvate intfc
pip install tensorflow
pip install -e . (in the main directory)
pip install gym==0.9.3
pip install mujoco-py==0.5.1
brew install mpich
pip install mpi4py

Usage

To run the code with TMaze experiments, use: python run_mujoco.py --env TMaze --opt 2 --seed 2 --switch

To run the code with HalfCheetah experiments, use: python run_mujoco.py --env HalfCheetahDir-v1 --opt 2 --seed 2 --switch

Running experiments on slurm

To run the code on compute canada or any slurm cluster, make sure you have installed all dependencies and created a conda environment intf. Now, use the script launcher_mujoco.sh wherein you would need to add account and add username and then run:

chmod +x launcher_mujoco.sh
./launcher_mujoco.sh

To run the baseline option-critic, use the flag --nointfc in the above script:

k="xvfb-run -n "${port[$count]}" -s \"-screen 0 1024x768x24 -ac +extension GLX +render -noreset\" python run_mujoco.py --env "$envname" --saves --opt 2 --seed ${_seed} --mainlr ${_mainlr} --piolr ${_piolr} --switch --nointfc --wsaves"

Performance and Visualizations

To plot the learning curves, use the script: control/baselines/ppoc_int/plot_res.py with appropiate settings.

To load and run a trained agent, use:

python run_mujoco.py --env HalfCheetahDir-v1 --epoch 400 --seed 0

where epoch would be the training epoch at which you want to visualize the learned agent. This assumes that the saved model directory is in the ppoc_int folder.


Visual Navigation Experiments (Miniworld)

Dependencies

To install dependencies for miniworld experiments: run the following commands:

conda create -n intfc python=3.6
conda actvate intfc
pip install tensorflow
pip install -e . (in first directory of baselines)
brew install mpich
pip install mpi4py
pip install matplotlib
# to run the code with miniworld
pip install gym==0.10.5

To install miniworld: follow these installation instructions.

Since the cnn policy code is much slower than mujoco experiments, the optimal way to run is using a cluster. To run miniworld headless and training on a cluster, follow these instructions here.

Usage

To run the code headless for oneroom task with transfer, use:

xvfb-run -n 4005 -s "-screen 0 1024x768x24 -ac +extension GLX +render -noreset" python run_miniw.py --env MiniWorld-OneRoom-v0 --seed 5 --opt 2 --saves --mainlr 1e-4 --intlr 9e-5 --switch --wsaves

Running experiments on slurm

To run the code on compute canada or any slurm cluster, make sure you have installed all dependencies and created a conda environment intf. Now, use the script launcher_miniworld.sh wherein you would need to add account and add username and then run:

chmod +x launcher_miniworld.sh
./launcher_miniworld.sh

Please note that to ensure that miniworld code runs correctly headless, we here make sure we specify an exclusive port per run. If the port# overlaps for multiple jobs, the jobs will fail. Ideally there has to be a better way to do this, but this is the one we found easiest to make it work. Depending on how many jobs you want to launch (e.x. runs/seeds), set the range for port accordingly.

To run the baseline option-critic, use the flag --nointfc in the above script in the run command.

Performance and Visualizations

To plot the learning curves, use the script: miniworld/baselines/ppoc_int/plot_res.py with appropiate settings.

To visualize the trajectories of trained agents: make the following changes in your local installation of the miniworld environment code: https://github.com/kkhetarpal/gym-miniworld/commits/master Load and run the trained agent to visualize the trajectory of the trained agents with a 2-D top-view of the 3D oneroom.

To load and run a trained agent, use:

python run_miniw.py --env MiniWorld-OneRoom-v0 --epoch 480 --seed 0

where epoch would be the training epoch at which you want to visualize the learned agent. This assumes that the saved model directory is in the ppoc_int folder.

Contact

To ask questions or report issues, please open an issue on the issues tracker.

Additional Material

  • Poster presented at NeurIPS 2019, Deep RL Workshop, Learning Transferable Skills Workshop can be found (here).
  • Preliminary ideas presented in AAAI 2019, Student Abstract track, Selected as a finalist in 3MT Thesis Competition (paper link), (poster link).

Citations

  • The fourrooms experiment is built on the Option-Critic, 2017 tabular code.
  • The PPOC, 2017 baselines code serves as base to our function approximation experiments.
  • To install Mujoco, please visit their website and acquire a free student license.
  • For any issues you face with setting up miniworld, please visit their troubleshooting page.

About

Options of Interest: Temporal Abstraction with Interest Functions AAAI 2020

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published