-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Please view the Contributing Guidelines for information on Contributing.
Is your feature request related to a problem? Please describe.
The Epsilon greedy method is not good at picking one of two high Q-values and it will always pick the highest Q-value and ignore all other high Q-values, this can sometimes lead to improper exploitation of all high valued actions.
Describe the solution you'd like
Implementing Boltzmann Softmax exploration, which helps Brain to pick the actions on a probabilistic manner, where all high Q values are given high probability, so each action has a probability of getting picked.
Additional context
Boltzmann Softmax Exploration
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request