Skip to content

utkuyilmaz1903/Jove

Repository files navigation

Jove

N-Body Astrodynamics Reinforcement Learning Environment in Julia

Jove is a high-performance, physics-accurate reinforcement learning environment for autonomous gravity assist trajectory optimization. A Proximal Policy Optimization (PPO) agent learns to pilot a spacecraft through a live N-Body gravitational simulation, exploiting planetary flybys to reach Jupiter with minimal fuel expenditure.


Architecture

Physics Engine — src/Physics/

The gravitational dynamics of five bodies (Sun, Earth, Mars, Jupiter, Spacecraft) are modeled symbolically using ModelingToolkit.jl and integrated with OrdinaryDiffEq.jl.

  • Stiff ODE Integration: The Rosenbrock23 solver handles the stiff, tightly-coupled N-Body system reliably across the full range of agent-generated control inputs.
  • Symbolic Softening: A gravitational softening term (+ 1e6) prevents singularities as bodies approach one another.

Cache-Safe Continuous Control Injection — src/Environment/wrapper.jl

Injecting agent-commanded thrust into a live SciML integrator is non-trivial. Stiff solvers such as Rosenbrock23 use AutoForwardDiff internally, which maintains caches of Dual numbers for Jacobian evaluation. Mutating MTKParameters mid-integration (via integrator.ps[...] or setp(integrator, ...)) collides with these caches and raises MethodError.

Jove resolves this with a two-part strategy:

  1. remake + solve per RL step: Rather than advancing a persistent integrator, each environment step issues a fresh solve call over the [t, t+dt] window. Each solve constructs its own cache from scratch, making parameter mutation provably safe.

  2. Compiled setp setters: Parameter mutation uses setters compiled once at environment initialization time via ModelingToolkit.setp. These are called at O(1) cost on every step, with no symbolic dictionary lookups.

# Compiled once at startup:
set_tx = ModelingToolkit.setp(sys, sys.thrust_x)
set_ty = ModelingToolkit.setp(sys, sys.thrust_y)

# Called every environment step at O(1):
env.set_tx(env.prob, thrust_x)
env.set_ty(env.prob, thrust_y)
mini_prob = remake(env.prob; u0=env.current_u, tspan=(t, t+dt))
sol = solve(mini_prob, Rosenbrock23(); save_everystep=false, ...)

Dynamic State Index Resolution

ModelingToolkit.structural_simplify reorders state variables for cache alignment. Raw integer offsets (e.g., u[N + i]) will silently read wrong quantities after simplification. Jove resolves the canonical position of every body's x, y, vx, vy at startup using findfirst over unknowns(sys) and stores these indices explicitly:

x_idx[i] = findfirst(v -> isequal(v, sys.x[i]), unknowns(sys))

All state reads — including collision detection, reward computation, and observation extraction — use these resolved indices exclusively.

PPO Agent — src/Agent/policy.jl

A custom, dependency-free PPO implementation built on Flux.jl and Optimisers.jl. Decoupled from ReinforcementLearning.jl internals to ensure stability across library versions.

  • Actor-Critic network with shared trunk and separate policy/value heads
  • Clipped surrogate objective (PPO-Clip)
  • Generalized Advantage Estimation (GAE)
  • Entropy bonus for sustained exploration
  • PPOBuffer: Plain Julia Vectors — no dependency on RL.jl trajectory types

Repository Structure

Jove/
├── data/
│   └── ephemeris/          # NASA Horizons orbital state vectors
├── scripts/
│   ├── train.jl            # 1M-step custom PPO training loop
│   └── evaluate.jl         # Trajectory evaluation and visualization
├── src/
│   ├── Physics/
│   │   ├── equations.jl    # Symbolic N-Body ODESystem (ModelingToolkit)
│   │   └── constants.jl    # Physical constants
│   ├── Environment/
│   │   ├── wrapper.jl      # GravityAssistEnv: ODE step, control injection
│   │   ├── states.jl       # Observation extraction and normalization
│   │   └── reward.jl       # Reward shaping and termination conditions
│   └── Agent/
│       └── policy.jl       # JovePPOPolicy, PPOBuffer, jove_update!
├── test/                   # Unit tests (energy conservation, RL API)
├── Project.toml
└── LICENSE

Getting Started

Requirements: Julia 1.10+

Installation

julia --project=. -e 'using Pkg; Pkg.instantiate()'

Training

julia --project=. --threads auto scripts/train.jl

The training loop runs for 1,000,000 environment steps. Episode rewards are printed every 100 episodes. A trained agent checkpoint is saved to data/ upon completion.


Key Dependencies

Package Role
ModelingToolkit.jl Symbolic N-Body ODE construction
OrdinaryDiffEq.jl Stiff ODE integration (Rosenbrock23)
Flux.jl Actor-Critic neural network
Optimisers.jl Adam optimizer for PPO updates
ReinforcementLearning.jl AbstractEnv / AbstractPolicy interfaces

License

MIT License — Copyright (c) 2026 Utku Yılmaz. See LICENSE for details.

About

A robust astrodynamics simulator engineered so our neural network can confidently run out of fuel and crash into the Sun 1,000,000 times without crashing the compiler.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages