A minimal AlphaZero-style training loop for chess using python-chess and PyTorch. This project is intentionally compact and educational rather than highly optimized.
- Chess environment wrapper around
python-chess. - Simple board and move encoding to build fixed-size policy output.
- Lightweight CNN policy+value network in PyTorch.
- MCTS with PUCT for policy improvement.
- Self-play game generator and replay buffer.
- Supervised-style training on self-play targets.
- Simple evaluation arena to compare two checkpoints.
- Python 3.9+
- PyTorch
- python-chess
- numpy
Install:
pip install torch python-chess numpy
For NVIDIA GPU training on Windows, install a CUDA-enabled PyTorch wheel (example for CUDA 12.8):
pip uninstall -y torch
pip install --index-url https://download.pytorch.org/whl/cu128 torch
minialphazero/
├── chess_env/
│ ├── board.py # State encoding, move encoding
│ ├── rules.py # Wrapper around python-chess
├── nn/
│ ├── model.py # PyTorch NN (policy + value)
│ ├── inference.py # NN interface used by MCTS
├── mcts/
│ ├── node.py
│ ├── mcts.py
│ ├── puct.py
├── selfplay/
│ ├── game.py
│ ├── buffer.py
├── training/
│ ├── train.py
│ ├── loss.py
├── eval/
│ ├── arena.py
├── config.py
├── main.py
└── README.md
-
Activate virtual environment: ..venv\Scripts\Activate.ps1
-
Fast training: ..venv\Scripts\python.exe -m main --preset fast --self_play_games 12 --mcts_sims 24 --policy_topk 12 --alpha_beta_depth 0 --alpha_beta_topk 8
-
Faster iteration command: ..venv\Scripts\python.exe -m main --preset fast --self_play_games 12 --mcts_sims 24 --policy_topk 12 --alpha_beta_depth 0 --alpha_beta_topk 8
-
Higher-quality training: ..venv\Scripts\python.exe -m main --preset quality
Resume from checkpoint: ..venv\Scripts\python.exe -m main --preset fast --load checkpoints\fast\final.pt
Lower learning rate: ..venv\Scripts\python.exe -m main --preset fast --lr 3e-4
Save all moves from the last arena game to a file: ..venv\Scripts\python.exe -m arena --model_a checkpoints\fast\iter_011.pt --model_b checkpoints\fast\iter_012.pt --games 1 --sims 16 --alpha_beta_depth 0 --alpha_beta_topk 8 --forced_attack_color none --save_last_game_moves last_game.json --save_attack_pair_moves attack_pair.json
Simple GUI visualizer: ..venv\Scripts\python.exe -m visualizer_gui --moves_file last_game.json
Open the attack-pair game: ..venv\Scripts\python.exe -m visualizer_gui --moves_file attack_pair.json --game_index 0
GUI controls:
- First / Prev / Next / Last
- Autoplay with delay slider
- White pieces are uppercase creamy text, black pieces are lowercase dark text
- The move space is a fixed-size 20,480 vector indexed by
(from_square, to_square, promotion)where promotion ∈ {None, N, B, R, Q}. Illegal moves are masked out before softmax. - Input planes: 12 piece-type planes (6 per color) + 1 side-to-move plane = shape
(13, 8, 8). - This is educational code; for performance and strength, many enhancements are possible (history planes, castling/ep info, better architecture, dirichlet noise, resignation logic, etc.).