improve MC returns accuracy, avoid for loop

`compute_MC_returns` currently loops in reverse to compute the returns and discount each step.
Comparing this with discount `gamma=1.0` and and just taking `data["rewards"].sum(dim=0)`, there is a discrepancy of

```
(data["rewards"].sum(dim=0)-compute_MC_returns(data, 1.0, test_critic)[0, :]).abs().max()
out: tensor(1.9073e-06, device='cuda:0')
```

so not very big, but still there.

**Describe the solution you'd like**
Pre-compute the discounting vector, and multiply then call .sum().


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve MC returns accuracy, avoid for loop #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

improve MC returns accuracy, avoid for loop #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions