Right now the policy jumps to the next big map reward instead of exploring towns (Pokecenters, Marts), and battle rewards don’t teach “fight the battle” early enough. I want a better exploration signal inside towns and clearer incentives to actually take actions that win fights.
Why this matters
• The agent skips useful town interiors, which slows progress and leads to weak policies.
• Early training learns menu spam and partial battle loops instead of consistent fighting.
• Current reward shaping is too coarse for town coverage and battle skill.
Current behavior
• Town exploration stops once easy novelty is collected.
• Battle behavior often stalls; wins happen but are not learned efficiently.
Proposed direction
• Add a town/city exploration target or coverage signal (per-map coverage target or interior POIs).
• Make battle rewards more action-aligned: stronger reward for damage dealt and win, clearer penalty for stalling.
• Optional: small penalty for “menu in battle without useful action” to avoid stalling.
Acceptance criteria
• In a short run, the agent explores most of a town (including center/mart) before moving on.
• In battle, the agent consistently chooses actions that reduce enemy HP instead of stalling.
• No new degenerate loops (menu spam or idle in battle).
Notes
• Exploration is currently tile novelty with caps and map exhaustion penalties.
• Battle shaping exists but does not reliably lead to active fighting early.
Right now the policy jumps to the next big map reward instead of exploring towns (Pokecenters, Marts), and battle rewards don’t teach “fight the battle” early enough. I want a better exploration signal inside towns and clearer incentives to actually take actions that win fights.
Why this matters
• The agent skips useful town interiors, which slows progress and leads to weak policies.
• Early training learns menu spam and partial battle loops instead of consistent fighting.
• Current reward shaping is too coarse for town coverage and battle skill.
Current behavior
• Town exploration stops once easy novelty is collected.
• Battle behavior often stalls; wins happen but are not learned efficiently.
Proposed direction
• Add a town/city exploration target or coverage signal (per-map coverage target or interior POIs).
• Make battle rewards more action-aligned: stronger reward for damage dealt and win, clearer penalty for stalling.
• Optional: small penalty for “menu in battle without useful action” to avoid stalling.
Acceptance criteria
• In a short run, the agent explores most of a town (including center/mart) before moving on.
• In battle, the agent consistently chooses actions that reduce enemy HP instead of stalling.
• No new degenerate loops (menu spam or idle in battle).
Notes
• Exploration is currently tile novelty with caps and map exhaustion penalties.
• Battle shaping exists but does not reliably lead to active fighting early.