A collection of GPU/TPU-accelerated parallel game simulators for reinforcement learning (RL)
Brax, a JAX-native physics engine, provides extremely high-speed parallel simulation for RL in continuous state space. Then, what about RL in discrete state spaces like Chess, Shogi, and Go? Pgx provides a wide variety of JAX-native game simulators! Highlighted features include:
- β‘ Super fast in parallel execution on accelerators
- π² Various game support including Backgammon, Chess, Shogi, and Go
- πΌοΈ Beautiful visualization in SVG format
pip install pgxNote that all step functions in Pgx environments are JAX-native., i.e., they are all JIT-able.
import jax
import pgx
env = pgx.make("go_19x19")
init = jax.jit(jax.vmap(env.init)) # vectorize and JIT-compile
step = jax.jit(jax.vmap(env.step))
batch_size = 1024
keys = jax.random.split(jax.random.PRNGKey(42), batch_size)
state = init(keys) # vectorized states
while not (state.terminated | state.terminated).all():
action = model(state.current_player, state.observation, state.legal_action_mask)
state = step(state, action) # state.reward (2,)
β οΈ Pgx is currently in the beta version. Therefore, API is subject to change without notice. We aim to release v1.0.0 in May 2023. Opinions and comments are more than welcome!
| Backgammon | Chess | Shogi | Go |
|---|---|---|---|
![]() ![]() |
![]() ![]() |
![]() ![]() |
![]() ![]() |
Use pgx.available_envs() -> Tuple[EnvId] to see the list of currently available games. Given an <EnvId>, you can create the environment via
>>> env = pgx.make(<EnvId>)You can check the current version of each environment by
>>> env.version| Game/EnvId | Visualization | Version | Five-word description |
|---|---|---|---|
2048 "2048" |
![]() ![]() |
beta |
Merge tiles to create 2048. |
Animal Shogi"animal_shogi" |
![]() ![]() |
beta |
Animal-themed child-friendly shogi. |
Backgammon"backgammon" |
![]() ![]() |
beta |
Luck aids bearing off checkers. |
Chess"chess" |
![]() ![]() |
beta |
Checkmate opponent's king to win. |
Connect Four"connect_four" |
![]() ![]() |
beta |
Connect discs, win with four. |
Go"go_9x9" "go_19x19" |
![]() ![]() |
beta |
Strategically place stones, claim territory. |
Hex"hex" |
![]() ![]() |
beta |
Connect opposite sides, block opponent. |
Kuhn Poker"kuhn_poker" |
![]() ![]() |
beta |
Three-card betting and bluffing game. |
Leduc hold'em"leduc_holdem" |
![]() ![]() |
beta |
Two-suit, limited deck poker. |
MinAtar/Asterix"minatar-asterix" |
![]() |
beta |
Avoid enemies, collect treasure, survive. |
MinAtar/Breakout"minatar-breakout" |
![]() |
beta |
Paddle, ball, bricks, bounce, clear. |
MinAtar/Freeway"minatar-freeway" |
![]() |
beta |
Dodging cars, climbing up freeway. |
MinAtar/Seaquest"minatar-seaquest" |
![]() |
beta |
Underwater submarine rescue and combat. |
MinAtar/SpaceInvaders"minatar-space_invaders" |
![]() |
beta |
Alien shooter game, dodge bullets. |
Othello"othello" |
![]() ![]() |
beta |
Flip and conquer opponent's pieces. |
Shogi"shogi" |
![]() ![]() |
beta |
Japanese chess with captured pieces. |
Sparrow Mahjong"sparrow_mahjong" |
beta |
A simplified, children-friendly Mahjong. | |
Tic-tac-toe"tic_tac_toe" |
![]() ![]() |
beta |
Three in a row wins. |
- Bridge Bidding and Mahjong environments are under development π§
- Five-word descriptions were generated by ChatGPT π€
Pgx is intended to complement these JAX-native environments with (classic) board game suits:
- RobertTLange/gymnax: JAX implementation of popular RL environments (classic control, bsuite, MinAtar, etc) and meta RL tasks
- google/brax: Rigidbody physics simulation in JAX and continuous-space RL tasks (ant, fetch, humanoid, etc)
- instadeepai/jumanji: A suite of diverse and challenging RL environments in JAX (bin-packing, routing problems, etc)
Combining Pgx with these JAX-native algorithms/implementations might be an interesting direction:
- Anakin framework: Highly efficient RL framework that works with JAX-native environments on TPUs
- deepmind/mctx: JAX-native MCTS implementations, including AlphaZero and MuZero
- deepmind/rlax: JAX-native RL components
- google/evojax: Hardware-Accelerated neuroevolution
- RobertTLange/evosax: JAX-native evolution strategy (ES) implementations
- adaptive-intelligent-robotics/QDax: JAX-native Quality-Diversity (QD) algorithms
@article{koyamada2023pgx,
title={Pgx: Hardware-accelerated parallel game simulation for reinforcement learning},
author={Koyamada, Sotetsu and Okano, Shinri and Nishimori, Soichiro and Murata, Yu and Habara, Keigo and Kita, Haruka and Ishii, Shin},
journal={arXiv preprint arXiv:2303.17503},
year={2023}
}
Apache-2.0






































