A better documentation would be nice 😄 

a: 0.1
e: 9.251777478947598e-05
g: 0.9

I don't know what these mean its just an example but comments and how to make it better etc is surely missing. I know its your learning project but do tell us what you learnt as I too wanna do this Q learning project as my first and want to make the perfect snake 🥇 .

on a closer look I think its just , 

a (α - Alpha): This is the learning rate, denoted by α. It determines how quickly the Q-values are updated based on new experiences. A higher value means that the agent will adjust its Q-values more rapidly in response to new information. A lower value makes the agent more resistant to changing its Q-values based on new experiences.

e (ε - Epsilon): This is the exploration factor, denoted by ε. It determines the likelihood that the agent will choose a random action instead of following its learned policy. Exploration is important to discover new actions and states, which helps the agent find better policies. A higher ε encourages more exploration, while a lower ε favors exploitation of the current knowledge.

g (γ - Gamma): This is the discount factor, denoted by γ. It determines the agent's consideration of future rewards in the decision-making process. A higher value of γ makes the agent prioritize long-term rewards, while a lower value makes it focus more on immediate rewards. It is used to calculate the cumulative discounted future rewards when updating Q-values.

If it was written somewhere in README.md , it would be greatly helpful

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A better documentation would be nice 😄 #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

A better documentation would be nice 😄 #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions