Seeking Clarification: Cumulative Rewards in Batch B1 and B2 - SDP Process (ISAC)

Hello everyone,

I hope this message finds you well. I've been working on implementing the SDP (Part 1 of the paper), and I've come across a point of potential confusion regarding the cumulative rewards for transitions in batch B1.

The paper mentions that every transition within batch B1 should have the same cumulative reward( following the math description of  B_1), but upon reviewing the code, it seems that transitions are randomly selected with the possibility of having different cumulative rewards.

Before jumping to any conclusions, I wanted to open up a discussion and seek clarification from the community and maintainers. Could someone please shed light on whether the intended behavior is to have uniform cumulative rewards for all transitions in B1, or if the current code aligns with the paper's specifications?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking Clarification: Cumulative Rewards in Batch B1 and B2 - SDP Process (ISAC) #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Seeking Clarification: Cumulative Rewards in Batch B1 and B2 - SDP Process (ISAC) #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions