Add move-and-shoot training with curriculum learning#4
Merged
Conversation
Enables training the robot to shoot while moving along a path through the alliance zone. Ball inherits robot velocity at launch for realistic physics. Automatic curriculum ramps speed as the agent improves: stationary → slow (0.5-1.0 m/s) → medium (1.0-3.0) → fast (3.0-5.0). New modules: - src/paths/: Path generation with zone boundary and hub avoidance - src/callbacks/: Curriculum callback for automatic difficulty progression - src/physics/projectile.py: Full 3D Euler integration with robot velocity - 4D observation space [distance, bearing, vx, vy] when enabled Activated via --move-and-shoot flag (continuous env only). Also adds --resume flag for continuing training from checkpoints. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tic divergence - Use LoggingSAC instead of plain SAC so gradient norms are actually logged to TensorBoard (actor_grad_norm, critic_grad_norm, and _max variants) - Fix AttributeError in LoggingSAC on replay_data.discounts (doesn't exist in SB3's ReplayBufferSamples) - Add gradient clipping (max_norm=1.0) on actor and critic to prevent gradient explosion in move-and-shoot training - Set target_entropy=-6.0 (2x default) since optimal policy is nearly deterministic; default target drives entropy back up after convergence - Reduce batch_size 4096->256 and gradient_steps 4->1 to fix critic Q-value divergence from over-updating on small replay buffers - Bump learning_starts 500->1000 for better initial buffer coverage - Update curriculum levels: first level is now "crawl" (0.1-0.5 m/s) instead of stationary, with adjusted speed ranges - Clear replay buffer on curriculum advancement to prevent stale transitions from poisoning Q-value estimates Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…otes - Add move-and-shoot training commands and --resume flag to README - Document curriculum learning levels and thresholds - Add move-and-shoot results table (with and without air resistance) - Document SAC hyperparameters and rationale (batch_size, target_entropy, etc.) - Update project structure to include sac_logging, paths, and callbacks - Update CLAUDE.md with LoggingSAC, gradient clipping, and curriculum details Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--move-and-shootflag (continuous env only), backward compatible with existing training--resumeflag for continuing training from checkpointsNew modules
src/paths/src/callbacks/Key changes
src/physics/projectile.pycompute_trajectory_3d_moving()— full 3D integration with robot velocitysrc/env/shooter_env_continuous.py[dist, bearing, vx, vy], path-based episodesscripts/train.py--move-and-shoot,--resumeflags, curriculum callback wiringscripts/evaluate.py--air-resistance,--move-and-shootpassthroughTraining results (74K steps)
Test plan
pytest tests/ -v)check_envpasses for both stationary and moving modescompute_trajectory_3dresults🤖 Generated with Claude Code