Skip to content

Seeking Clarification: Cumulative Rewards in Batch B1 and B2 - SDP Process (ISAC) #1

@Feedback02

Description

@Feedback02

Hello everyone,

I hope this message finds you well. I've been working on implementing the SDP (Part 1 of the paper), and I've come across a point of potential confusion regarding the cumulative rewards for transitions in batch B1.

The paper mentions that every transition within batch B1 should have the same cumulative reward( following the math description of B_1), but upon reviewing the code, it seems that transitions are randomly selected with the possibility of having different cumulative rewards.

Before jumping to any conclusions, I wanted to open up a discussion and seek clarification from the community and maintainers. Could someone please shed light on whether the intended behavior is to have uniform cumulative rewards for all transitions in B1, or if the current code aligns with the paper's specifications?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions