Wire batched MCTS into training self-play by cweill · Pull Request #17 · weill-labs/alphazero

cweill · 2026-05-26T19:41:33Z

Summary

Training self-play built its MCTS config as {num_simulations, dirichlet_eps}, so MCTS fell back to batch_size=1 (sequential) and the leaf-parallel batched MCTS from #9 — measured at ~2.8–4.2× in the benchmark — never reached real training. (The 5 Connect Four robustness runs provably ran sequential: their wandb configs logged no batch size.) This threads a self-play MCTS batch-size knob through both training entry points.

Changes

arena.py training CLI: new --mcts-batch-size (default 16), added to self_play_mcts_cfg.
modal_app.py: new mcts_batch_size param (default 16) on train_remote/main, added to self_play_mcts_cfg and logged in the wandb run config.
Eval is untouched: MCTSPlayer (gating/ladder) has no batch_size parameter, so evaluation stays on exact sequential search even though it reads the same cfg.

Design note

Leaf-parallel MCTS with virtual loss approximates sequential search — a speed/fidelity trade. Default 16 captures most of the speedup for data generation; --mcts-batch-size 1 restores fully sequential self-play. Keeping eval sequential preserves gating/Elo fidelity, where accuracy matters more than throughput.

Testing

Full suite: 143 passed.
New: CLI threads --mcts-batch-size into self_play_mcts_cfg (and defaults to 16); _self_play_cfg passes batch_size through; Modal train_remote/entrypoint forward and log it.
End-to-end: a real 1-iteration tic-tac-toe training with --mcts-batch-size 8 runs (exit 0), generates examples, and reduces loss. Batched MCTS mechanics themselves are covered by test_mcts.py (Batched / leaf-parallel MCTS inference (~3x net-eval throughput) #9).

Review focus

Default batch size 16 for self-play — reasonable, or prefer a more conservative default?
The self-play-batched / eval-sequential split.

Closes alphago-0oz.

Self-play built its MCTS config with only num_simulations + dirichlet_eps, so MCTS fell back to batch_size=1 (sequential) and #9's leaf-parallel speedup never reached real training (the 5 Connect Four robustness runs provably ran sequential). Add a --mcts-batch-size knob (default 16) to the arena training CLI and an mcts_batch_size param to the Modal entrypoint, threaded into self_play_mcts_cfg and logged to wandb. Only self-play is batched: the evaluation player (MCTSPlayer) ignores batch_size, so gating/Elo/ladder stay on exact sequential search. Set --mcts-batch-size 1 to restore fully sequential self-play. alphago-0oz

cweill merged commit eeed986 into main May 26, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire batched MCTS into training self-play#17

Wire batched MCTS into training self-play#17
cweill merged 1 commit into
mainfrom
feature/batched-mcts-training

cweill commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cweill commented May 26, 2026

Summary

Changes

Design note

Testing

Review focus

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant