Skip to content

Gannonnn/miniAlphaZero

Repository files navigation

miniAlphaZero (Chess)

A minimal AlphaZero-style training loop for chess using python-chess and PyTorch. This project is intentionally compact and educational rather than highly optimized.

Features

  • Chess environment wrapper around python-chess.
  • Simple board and move encoding to build fixed-size policy output.
  • Lightweight CNN policy+value network in PyTorch.
  • MCTS with PUCT for policy improvement.
  • Self-play game generator and replay buffer.
  • Supervised-style training on self-play targets.
  • Simple evaluation arena to compare two checkpoints.

Requirements

  • Python 3.9+
  • PyTorch
  • python-chess
  • numpy

Install:

pip install torch python-chess numpy

For NVIDIA GPU training on Windows, install a CUDA-enabled PyTorch wheel (example for CUDA 12.8):

pip uninstall -y torch
pip install --index-url https://download.pytorch.org/whl/cu128 torch

Project Structure

minialphazero/
├── chess_env/
│   ├── board.py          # State encoding, move encoding
│   ├── rules.py          # Wrapper around python-chess
├── nn/
│   ├── model.py          # PyTorch NN (policy + value)
│   ├── inference.py      # NN interface used by MCTS
├── mcts/
│   ├── node.py
│   ├── mcts.py
│   ├── puct.py
├── selfplay/
│   ├── game.py
│   ├── buffer.py
├── training/
│   ├── train.py
│   ├── loss.py
├── eval/
│   ├── arena.py
├── config.py
├── main.py
└── README.md

Quick Start (Windows / PowerShell)

  1. Activate virtual environment: ..venv\Scripts\Activate.ps1

  2. Fast training: ..venv\Scripts\python.exe -m main --preset fast --self_play_games 12 --mcts_sims 24 --policy_topk 12 --alpha_beta_depth 0 --alpha_beta_topk 8

  3. Faster iteration command: ..venv\Scripts\python.exe -m main --preset fast --self_play_games 12 --mcts_sims 24 --policy_topk 12 --alpha_beta_depth 0 --alpha_beta_topk 8

  4. Higher-quality training: ..venv\Scripts\python.exe -m main --preset quality

Resume from checkpoint: ..venv\Scripts\python.exe -m main --preset fast --load checkpoints\fast\final.pt

Lower learning rate: ..venv\Scripts\python.exe -m main --preset fast --lr 3e-4

Arena mode (two checkpoints play eachother)

Save all moves from the last arena game to a file: ..venv\Scripts\python.exe -m arena --model_a checkpoints\fast\iter_011.pt --model_b checkpoints\fast\iter_012.pt --games 1 --sims 16 --alpha_beta_depth 0 --alpha_beta_topk 8 --forced_attack_color none --save_last_game_moves last_game.json --save_attack_pair_moves attack_pair.json

Move Visualizers

Simple GUI visualizer: ..venv\Scripts\python.exe -m visualizer_gui --moves_file last_game.json

Open the attack-pair game: ..venv\Scripts\python.exe -m visualizer_gui --moves_file attack_pair.json --game_index 0

GUI controls:

  • First / Prev / Next / Last
  • Autoplay with delay slider
  • White pieces are uppercase creamy text, black pieces are lowercase dark text

Notes

  • The move space is a fixed-size 20,480 vector indexed by (from_square, to_square, promotion) where promotion ∈ {None, N, B, R, Q}. Illegal moves are masked out before softmax.
  • Input planes: 12 piece-type planes (6 per color) + 1 side-to-move plane = shape (13, 8, 8).
  • This is educational code; for performance and strength, many enhancements are possible (history planes, castling/ep info, better architecture, dirichlet noise, resignation logic, etc.).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages