A reinforcement learning implementation for a simplified version of the Dominion card game.
This project implements:
- A simplified, single-player version of Dominion as a reinforcement learning environment
- A Deep Q-Network (DQN) agent that learns to play the game
- A baseline fixed strategy ("Buy Menu" strategy) that can be used for imitation learning
The agent uses a combination of imitation learning from a predefined strategy and randomized exploration to discover optimal play patterns.
A full write-up of the project, including graphs, can be found at https://www.solarflare.org.uk/dominion.
Environmentdefines a reinforcement learning environment, including states, actions and rewardsStrategydefines a fixed strategy for anEnvironment- Fixed strategies can either be evaluated on their own, or used as starting points for training
DominionEnvdefines anEnvironmentfor single-player Dominion including:- Card definitions with types, costs and effects
- Including approx 20 cards so far, from a mixture of the base set, Alchemy and Seaside
- Game state management (deck, hand, discard pile etc.)
- Game phases (Action and Buy phases, together with special phases for handling certain cards)
- Game state representation for the RL agent
- Reward calculation (the agent is rewarded for scoring victory points)
- Game end conditions (currently the game just ends after a fixed number of turns; empty piles are ignored)
- Card definitions with types, costs and effects
BuyMenuStrategyinstantiates theStrategyclass with a simple, priority-based approach to playing the game:- Follows a predefined "buy menu" for purchasing cards
- Uses heuristics for playing action cards and trashing
learning.py defines a reinforcement learning agent that attempts to learn an optimal policy for any given Environment, including:
- Neural network architecture for Q-value approximation
- Experience replay buffer for stable learning
- Epsilon-greedy exploration strategy
- (Optional) Hybrid learning that starts from a predefined
Strategyand gradually transitions to self-learned policy - Functions for training, saving, and loading the model
- Tools for evaluating the agent and plotting graphs of training progress
Run the main script to train the agent:
python main.py
The script will:
- Initialize the environment with a specific kingdom
- Create a DQN agent with reinforcement learning, starting from a randomized initial strategy
- Train the agent over 4,000 games
- Save graphs of training progress to a "plots" folder (which will be created if required)
- Save the trained model as
dominion_dqn.pt
In the default kingdom, the agent learns a "Big Money + Wharf" strategy, which scores around 31 points in the time available.
To use the predefined strategy instead:
- Edit the DQNAgent creation code in
main.py, changingpredefined_strategy=Nonetopredefined_strategy=buy_menu_strategy - Delete the existing
dominion_dqn.ptfile if it exists - Run
python main.pyagain
In this case, the agent begins with an "imitation learning" phase where it learns to copy the predefined strategy, scoring around 10 points. As training progresses, the agent shifts away from the predefined strategy and instead explores freely on its own, allowing it to discover a new strategy involving University, Alchemist, Wharf and Vineyard, scoring around 40--60 points on average (although it might be necessary to train for longer than the default 4,000 episodes in order to see this).
You can also re-run main.py to resume training (starting from the saved dominion_dqn.pt model) if desired. (If doing this, it might be desirable to decrease the initial epsilon and/or predefined strategy probability, as you don't need as much exploration when restarting from a previous run.)
For further experimentation, training parameters can be adjusted by editing main.py, and the neural network architecture can be changed by editing the QNetwork class in learning.py. Also, of course, different combinations of kingdom cards can be tried, by modifying the list near the top of main.py.
To add new cards, you need to:
- Add an entry to the
CardIdenum indominion.py - Create a
CardInfoobject with appropriate properties and effects - Update the
CARD_INFOdictionary with the new card - (Optional) Edit the code for the
DominionEnvitself to implement special rules (see existing examples for Alchemist, Gardens and Vineyard)
You can create custom strategies by:
- Implementing the
Strategyabstract base class - Overriding the
choose_actionmethod to select actions based on your strategy
Alternatively, if your strategy can be expressed as a "buy menu", then passing a new buy menu to BuyMenuStrategy (and perhaps adjusting the action and trashing heuristics in buy_menu_strategy.py) might be sufficient.
Implementing a new Strategy subclass that can execute some of the strategies from Geronimoo's Dominion simulator might be an interesting project.
- Python 3.10+
- PyTorch
- NumPy
- Matplotlib
After installing Python, the remaining requirements can be installed with pip install torch numpy matplotlib.
Potential areas for enhancement:
- Implement more Dominion cards and expansions
- Add a multi-player mode, including self-play for multi-agent learning
- Add more sophisticated exploration mechanisms, to reduce the need for imitation learning
- Experiment with different RL algorithms (PPO, A2C, etc.) and neural network architectures
- Improve the state representation for better learning
- Add full game logging so we can see the played strategies in more detail
- In the single-player case, experiment with mean-variance optimization (i.e. search for strategies that have high expected scores but also low variance of points scored)
This project was created by Stephen Thompson.
Email: stephen (at) solarflare.org.uk
Website: https://www.solarflare.org.uk/