Wire batched MCTS into training self-play#17
Merged
Conversation
Self-play built its MCTS config with only num_simulations + dirichlet_eps, so MCTS fell back to batch_size=1 (sequential) and #9's leaf-parallel speedup never reached real training (the 5 Connect Four robustness runs provably ran sequential). Add a --mcts-batch-size knob (default 16) to the arena training CLI and an mcts_batch_size param to the Modal entrypoint, threaded into self_play_mcts_cfg and logged to wandb. Only self-play is batched: the evaluation player (MCTSPlayer) ignores batch_size, so gating/Elo/ladder stay on exact sequential search. Set --mcts-batch-size 1 to restore fully sequential self-play. alphago-0oz
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Training self-play built its MCTS config as
{num_simulations, dirichlet_eps}, soMCTSfell back tobatch_size=1(sequential) and the leaf-parallel batched MCTS from #9 — measured at ~2.8–4.2× in the benchmark — never reached real training. (The 5 Connect Four robustness runs provably ran sequential: their wandb configs logged no batch size.) This threads a self-play MCTS batch-size knob through both training entry points.Changes
arena.pytraining CLI: new--mcts-batch-size(default 16), added toself_play_mcts_cfg.modal_app.py: newmcts_batch_sizeparam (default 16) ontrain_remote/main, added toself_play_mcts_cfgand logged in the wandb run config.MCTSPlayer(gating/ladder) has nobatch_sizeparameter, so evaluation stays on exact sequential search even though it reads the same cfg.Design note
Leaf-parallel MCTS with virtual loss approximates sequential search — a speed/fidelity trade. Default 16 captures most of the speedup for data generation;
--mcts-batch-size 1restores fully sequential self-play. Keeping eval sequential preserves gating/Elo fidelity, where accuracy matters more than throughput.Testing
--mcts-batch-sizeintoself_play_mcts_cfg(and defaults to 16);_self_play_cfgpassesbatch_sizethrough; Modaltrain_remote/entrypoint forward and log it.--mcts-batch-size 8runs (exit 0), generates examples, and reduces loss. Batched MCTS mechanics themselves are covered bytest_mcts.py(Batched / leaf-parallel MCTS inference (~3x net-eval throughput) #9).Review focus
Closes
alphago-0oz.