fix: correct ground_truth comparison in BanditEnv by bigsawman · Pull Request #190 · TextArena/TextArena

bigsawman · 2026-03-31T11:36:25Z

Summary

ground_truth in BanditEnv is a dict mapping button names to probabilities (e.g. {"red": 0.6, "blue": 0.3, ...}), but the final-turn winner check compares a button string directly against this dict (button == self.state.game_state['ground_truth']), which always evaluates to False.
This means the player can never be recognized as having chosen the correct button — every game ends with an incorrect outcome and a regret-based reward.
Fix: find the button with the highest probability via max(..., key=...), then compare the player's choice against that.

Reproduction

ground_truth = {"red": 0.6, "blue": 0.3}
button = "red"
print(button == ground_truth)  # False — string vs dict, always False

Test plan

Verified that _regret() already correctly uses ground_truth as a dict (calls .values() and indexes by button name), confirming this is a dict, not a string.
Confirmed the fix matches the intended semantics: reward 1.0 when the player picks the highest-probability button.

`ground_truth` is a dict mapping button names to probabilities, but the final-turn check compared a button string directly against this dict (`button == self.state.game_state['ground_truth']`), which always evaluates to False. This means the player can never win. Fix: find the button with the highest probability first, then compare.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct ground_truth comparison in BanditEnv#190

fix: correct ground_truth comparison in BanditEnv#190
bigsawman wants to merge 1 commit into
TextArena:mainfrom
bigsawman:fix/bandit-ground-truth-comparison

bigsawman commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bigsawman commented Mar 31, 2026

Summary

Reproduction

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant