Skip to content

Observation norm for RL problems with early stopping via Welford's parallel algorithm#98

Open
keraJLi wants to merge 3 commits intoRobertTLange:mainfrom
keraJLi:main
Open

Observation norm for RL problems with early stopping via Welford's parallel algorithm#98
keraJLi wants to merge 3 commits intoRobertTLange:mainfrom
keraJLi:main

Conversation

@keraJLi
Copy link
Copy Markdown

@keraJLi keraJLi commented Nov 2, 2025

This addresses #97, which was prematurely closed by @maxencefaldor.

Switched to Welford's parallel algorithm for combining statistics collected during policy rollouts with global observation normalization statistics. This way, we don't have to go up to the maximum number of episode steps for every rollout, which can be super expensive.

This does not increase implementation complexity:

  • gymnax.py: 63 insertions(+), 61 deletions(-)
  • brax.py: 68 insertions(+), 64 deletions(-)

So +6 lines total, even though my code is very verbose.

I have tested both GymnaxProblem and BraxProblem by running ES training and calculating the statistics of the normalized rewards.

Caveat: we don't return all env states of the complete rollout. Doing this might be nice for visualization, but then we would also have to return state validity and mask out invalid states, which is not currently implemented and might easily be forgotten by users of the library. I suggest rolling out policies again for visualization, which is orders of magnitude cheaper than running a single generation.

@keraJLi
Copy link
Copy Markdown
Author

keraJLi commented Nov 2, 2025

Before this is merged, we should update the RL example notebook to handle the missing rollout states for brax.

@keraJLi keraJLi marked this pull request as draft November 2, 2025 21:42
@keraJLi keraJLi marked this pull request as ready for review November 2, 2025 22:27
@keraJLi
Copy link
Copy Markdown
Author

keraJLi commented Nov 2, 2025

I have updated the notebook, which now explicitly rolls out the search distribution mean. This did not create any notable extra complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant