Issue Description:
There appears to be a bug in the NormalizeRewardsByEnv class related to operator precedence in the z-score normalization calculation.
Current Code:
normed = ((r-r.mean())/r.std()+1e-8) if self.z_score else r-r.mean()
Issue:
The current implementation adds the epsilon value (1e-8) to the entire z-score calculation instead of adding it to the standard deviation to prevent division by zero.
Expected Fix:
normed = ((r-r.mean())/(r.std()+1e-8)) if self.z_score else r-r.mean()
Explanation:
The epsilon should be added to r.std() before division to handle cases where the standard deviation is zero or very close to zero, which would otherwise cause division by zero errors or numerical instability.
Issue Description:
There appears to be a bug in the
NormalizeRewardsByEnvclass related to operator precedence in the z-score normalization calculation.Current Code:
Issue:
The current implementation adds the epsilon value (
1e-8) to the entire z-score calculation instead of adding it to the standard deviation to prevent division by zero.Expected Fix:
Explanation:
The epsilon should be added to
r.std()before division to handle cases where the standard deviation is zero or very close to zero, which would otherwise cause division by zero errors or numerical instability.