https://www.youtube.com/watch?v=yQw0JmvOVwM
3/2/2026: This repository contains the MPAIL2 algorithm implementation. Examples of IsaacLab or real-world workflows will be included later or in a separate repository.
git clone git@github.com:UWRobotLearning/mpail2.git
cd mpail2
conda create -n mp2 python=3.10
conda activate mp2
pip install -e .
Ant:
python train/train_mpail_gym.py # Defaults to Ant-v5
Humanoid with video:
python train/train_mpail_gym.py --env Humanoid-v5 --video
Hopper with video and wandb:
python train/train_mpail_gym.py --env Hopper-v5 --video --wandb True
runner.py: Outer-most loop. Steps environment and callsact()on the learner.learner.py: Stores interactions, calls planner, and updates component models.planner.py: Performs online planning (MPPI) using component models.
All loss computations and gradient updates are performed within learner.py. For those interested in reading the implementation, learner.py is the file to begin with.
Composed by the planner are the component models in the above figure and discussed in Section 3 of the paper.
-
encoder.py: expectsDict[str,tensor]observations -
dynamics.py:$f:\mathcal{Z} \times \mathcal{A}^{H} \rightarrow \mathcal{Z}^{H+1}$ -
reward.py:$r:\mathcal{Z}\times\mathcal{Z}\rightarrow \mathbb{R}$ -
value.py: Ensembled,$Q:\mathcal{Z}\times\mathcal{A}\rightarrow \mathbb{R}$ -
sampling.py(policy): composes the policy$\pi(\mathbf{a}_{t:t+H}|z)$ . Uses policy and previous plan to sample plans from policy and fitted gaussian.
Except for sampling.py, these files are primarily torch.nn.Modules with forward definitions as you expect their mathematical representations to be.
layers.pystorage.py
