Skip to content

Adding Boltzmann Softmax Exploration #16

@anmole17

Description

@anmole17

Please view the Contributing Guidelines for information on Contributing.

Is your feature request related to a problem? Please describe.
The Epsilon greedy method is not good at picking one of two high Q-values and it will always pick the highest Q-value and ignore all other high Q-values, this can sometimes lead to improper exploitation of all high valued actions.

Describe the solution you'd like
Implementing Boltzmann Softmax exploration, which helps Brain to pick the actions on a probabilistic manner, where all high Q values are given high probability, so each action has a probability of getting picked.

Additional context
Boltzmann Softmax Exploration

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions