N-Body Astrodynamics Reinforcement Learning Environment in Julia
Jove is a high-performance, physics-accurate reinforcement learning environment for autonomous gravity assist trajectory optimization. A Proximal Policy Optimization (PPO) agent learns to pilot a spacecraft through a live N-Body gravitational simulation, exploiting planetary flybys to reach Jupiter with minimal fuel expenditure.
The gravitational dynamics of five bodies (Sun, Earth, Mars, Jupiter, Spacecraft) are modeled symbolically using ModelingToolkit.jl and integrated with OrdinaryDiffEq.jl.
- Stiff ODE Integration: The
Rosenbrock23solver handles the stiff, tightly-coupled N-Body system reliably across the full range of agent-generated control inputs. - Symbolic Softening: A gravitational softening term (
+ 1e6) prevents singularities as bodies approach one another.
Injecting agent-commanded thrust into a live SciML integrator is non-trivial. Stiff solvers such as Rosenbrock23 use AutoForwardDiff internally, which maintains caches of Dual numbers for Jacobian evaluation. Mutating MTKParameters mid-integration (via integrator.ps[...] or setp(integrator, ...)) collides with these caches and raises MethodError.
Jove resolves this with a two-part strategy:
-
remake + solveper RL step: Rather than advancing a persistent integrator, each environment step issues a freshsolvecall over the[t, t+dt]window. Eachsolveconstructs its own cache from scratch, making parameter mutation provably safe. -
Compiled
setpsetters: Parameter mutation uses setters compiled once at environment initialization time viaModelingToolkit.setp. These are called at O(1) cost on every step, with no symbolic dictionary lookups.
# Compiled once at startup:
set_tx = ModelingToolkit.setp(sys, sys.thrust_x)
set_ty = ModelingToolkit.setp(sys, sys.thrust_y)
# Called every environment step at O(1):
env.set_tx(env.prob, thrust_x)
env.set_ty(env.prob, thrust_y)
mini_prob = remake(env.prob; u0=env.current_u, tspan=(t, t+dt))
sol = solve(mini_prob, Rosenbrock23(); save_everystep=false, ...)ModelingToolkit.structural_simplify reorders state variables for cache alignment. Raw integer offsets (e.g., u[N + i]) will silently read wrong quantities after simplification. Jove resolves the canonical position of every body's x, y, vx, vy at startup using findfirst over unknowns(sys) and stores these indices explicitly:
x_idx[i] = findfirst(v -> isequal(v, sys.x[i]), unknowns(sys))All state reads — including collision detection, reward computation, and observation extraction — use these resolved indices exclusively.
A custom, dependency-free PPO implementation built on Flux.jl and Optimisers.jl. Decoupled from ReinforcementLearning.jl internals to ensure stability across library versions.
- Actor-Critic network with shared trunk and separate policy/value heads
- Clipped surrogate objective (PPO-Clip)
- Generalized Advantage Estimation (GAE)
- Entropy bonus for sustained exploration
PPOBuffer: Plain JuliaVectors — no dependency on RL.jl trajectory types
Jove/
├── data/
│ └── ephemeris/ # NASA Horizons orbital state vectors
├── scripts/
│ ├── train.jl # 1M-step custom PPO training loop
│ └── evaluate.jl # Trajectory evaluation and visualization
├── src/
│ ├── Physics/
│ │ ├── equations.jl # Symbolic N-Body ODESystem (ModelingToolkit)
│ │ └── constants.jl # Physical constants
│ ├── Environment/
│ │ ├── wrapper.jl # GravityAssistEnv: ODE step, control injection
│ │ ├── states.jl # Observation extraction and normalization
│ │ └── reward.jl # Reward shaping and termination conditions
│ └── Agent/
│ └── policy.jl # JovePPOPolicy, PPOBuffer, jove_update!
├── test/ # Unit tests (energy conservation, RL API)
├── Project.toml
└── LICENSE
Requirements: Julia 1.10+
julia --project=. -e 'using Pkg; Pkg.instantiate()'julia --project=. --threads auto scripts/train.jlThe training loop runs for 1,000,000 environment steps. Episode rewards are printed every 100 episodes. A trained agent checkpoint is saved to data/ upon completion.
| Package | Role |
|---|---|
ModelingToolkit.jl |
Symbolic N-Body ODE construction |
OrdinaryDiffEq.jl |
Stiff ODE integration (Rosenbrock23) |
Flux.jl |
Actor-Critic neural network |
Optimisers.jl |
Adam optimizer for PPO updates |
ReinforcementLearning.jl |
AbstractEnv / AbstractPolicy interfaces |
MIT License — Copyright (c) 2026 Utku Yılmaz. See LICENSE for details.