Observation norm for RL problems with early stopping via Welford's parallel algorithm by keraJLi · Pull Request #98 · RobertTLange/evosax

keraJLi · 2025-11-02T21:36:36Z

This addresses #97, which was prematurely closed by @maxencefaldor.

Switched to Welford's parallel algorithm for combining statistics collected during policy rollouts with global observation normalization statistics. This way, we don't have to go up to the maximum number of episode steps for every rollout, which can be super expensive.

This does not increase implementation complexity:

gymnax.py: 63 insertions(+), 61 deletions(-)
brax.py: 68 insertions(+), 64 deletions(-)

So +6 lines total, even though my code is very verbose.

I have tested both GymnaxProblem and BraxProblem by running ES training and calculating the statistics of the normalized rewards.

Caveat: we don't return all env states of the complete rollout. Doing this might be nice for visualization, but then we would also have to return state validity and mask out invalid states, which is not currently implemented and might easily be forgotten by users of the library. I suggest rolling out policies again for visualization, which is orders of magnitude cheaper than running a single generation.

keraJLi · 2025-11-02T21:39:00Z

Before this is merged, we should update the RL example notebook to handle the missing rollout states for brax.

keraJLi · 2025-11-02T22:31:31Z

I have updated the notebook, which now explicitly rolls out the search distribution mean. This did not create any notable extra complexity.

rl problem obs norm with early stopping

e89d5cc

keraJLi marked this pull request as draft November 2, 2025 21:42

keraJLi added 2 commits November 2, 2025 21:57

fix: rename state to env_state

3446d09

update example rl notebook brax viz

258b4d6

keraJLi marked this pull request as ready for review November 2, 2025 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observation norm for RL problems with early stopping via Welford's parallel algorithm#98

Observation norm for RL problems with early stopping via Welford's parallel algorithm#98
keraJLi wants to merge 3 commits intoRobertTLange:mainfrom
keraJLi:main

keraJLi commented Nov 2, 2025 •

edited

Loading

Uh oh!

keraJLi commented Nov 2, 2025

Uh oh!

keraJLi commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

keraJLi commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keraJLi commented Nov 2, 2025

Uh oh!

keraJLi commented Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

keraJLi commented Nov 2, 2025 •

edited

Loading