a: 0.1
e: 9.251777478947598e-05
g: 0.9
I don't know what these mean its just an example but comments and how to make it better etc is surely missing. I know its your learning project but do tell us what you learnt as I too wanna do this Q learning project as my first and want to make the perfect snake 🥇 .
on a closer look I think its just ,
a (α - Alpha): This is the learning rate, denoted by α. It determines how quickly the Q-values are updated based on new experiences. A higher value means that the agent will adjust its Q-values more rapidly in response to new information. A lower value makes the agent more resistant to changing its Q-values based on new experiences.
e (ε - Epsilon): This is the exploration factor, denoted by ε. It determines the likelihood that the agent will choose a random action instead of following its learned policy. Exploration is important to discover new actions and states, which helps the agent find better policies. A higher ε encourages more exploration, while a lower ε favors exploitation of the current knowledge.
g (γ - Gamma): This is the discount factor, denoted by γ. It determines the agent's consideration of future rewards in the decision-making process. A higher value of γ makes the agent prioritize long-term rewards, while a lower value makes it focus more on immediate rewards. It is used to calculate the cumulative discounted future rewards when updating Q-values.
If it was written somewhere in README.md , it would be greatly helpful
a: 0.1
e: 9.251777478947598e-05
g: 0.9
I don't know what these mean its just an example but comments and how to make it better etc is surely missing. I know its your learning project but do tell us what you learnt as I too wanna do this Q learning project as my first and want to make the perfect snake 🥇 .
on a closer look I think its just ,
a (α - Alpha): This is the learning rate, denoted by α. It determines how quickly the Q-values are updated based on new experiences. A higher value means that the agent will adjust its Q-values more rapidly in response to new information. A lower value makes the agent more resistant to changing its Q-values based on new experiences.
e (ε - Epsilon): This is the exploration factor, denoted by ε. It determines the likelihood that the agent will choose a random action instead of following its learned policy. Exploration is important to discover new actions and states, which helps the agent find better policies. A higher ε encourages more exploration, while a lower ε favors exploitation of the current knowledge.
g (γ - Gamma): This is the discount factor, denoted by γ. It determines the agent's consideration of future rewards in the decision-making process. A higher value of γ makes the agent prioritize long-term rewards, while a lower value makes it focus more on immediate rewards. It is used to calculate the cumulative discounted future rewards when updating Q-values.
If it was written somewhere in README.md , it would be greatly helpful