Implicit Updates for Average-Reward Temporal Difference Learning

This repository provides the source code necessary to conduct numerical experiments described in the paper.

Setup

We recommend using a virtual environment to ensure reproducibility. You can create and activate one as follows:

# Create a virtual environment named .venv
python3 -m venv .venv

# Activate the environment
# Linux / macOS
source .venv/bin/activate

# Windows (PowerShell)
.\.venv\Scripts\Activate.ps1

Evaluation Experiement Examples

For the evaluation experiments, we provide two examples: Markovian reward process and Boyan chain. Each can be run with different step-size schedules:

python different_TD_fixed_points_im2.py --env MRP  --step_size_schedule constant  
python different_TD_fixed_points_im2.py --env Boyan  --step_size_schedule s_decay

Control Experiment Examples

For the control experiments, we provide two examples: Access-control queuing and Pendulum. For example:

python control_experiment.py --env pendulum --num_experiments 30 --num_episodes 25000 
python control_experiment.py --env access_control --num_experiments 30 --num_episodes 25000

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
License		License
.gitignore		.gitignore
README.md		README.md
control_experiment.py		control_experiment.py
different_TD_fixed_points_im2.py		different_TD_fixed_points_im2.py
environment.py		environment.py
feature_matrix.py		feature_matrix.py
gain_and_bias.py		gain_and_bias.py
randomMRP.py		randomMRP.py
requirements.txt		requirements.txt
stationary_distribution.py		stationary_distribution.py
theta_star.py		theta_star.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implicit Updates for Average-Reward Temporal Difference Learning

Setup

Evaluation Experiement Examples

Control Experiment Examples

About

Uh oh!

Releases

Packages

Languages

CrawlingKiming/AR_TD_implicit

Folders and files

Latest commit

History

Repository files navigation

Implicit Updates for Average-Reward Temporal Difference Learning

Setup

Evaluation Experiement Examples

Control Experiment Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages