-
Notifications
You must be signed in to change notification settings - Fork 1
Training scripts
This document details how to the various Training scripts available within the root train directory of this project work and how to extend and make your own scripts!
The NEAT training scripts all follow quite similar steps to commence training:
- The NEAT config is loaded using the default function:
neat.load_config_with_defaults(config_path)
- This config and the rest of the input args are then passed to a
runorevolvefunction - A network is created or loaded from checkpoint if a checkpoint exists using the config
- A ParallelEvaluator is created which manages our multi-threading capabilities
- The network is trained using this evaluator and a custom genome evalation function
- This function is the biggest point of difference between the networks
Once the population has been trained for the specified number of generations several outputs are available which form the best genome output as well as the stats and graphic output.
As mentioned this function is the biggest point of difference between the training scripts and forms how training is actually performed and on what factors a genome is successful and when it is not.
This function takes a genome and a config, the config being the config for that was created at the start and an individual genome of the generation that is being trained. The genome can then be used to take the inputs from the emulator and provide a set of dynamically generated outputs which are actioned within the emulator.
Depending on the outcome of the actions taken several things can happen:
- If Mario is stuck in place for a certain amount of time; return
- If Mario dies; return
- On return, calculate fitness for this genome and report back to the population
As you can see several controls are made available to say when Mario should quit, how long to wait for him to move, what calculation is used for the "fitness" of the genome (distance? points? both?) that all impact the end resultant network. The evaulation loop is very important to actually producing a valid network.
To take the baseline eval_genome script:
def eval_genome(genome, config):
# Create the network
net = neat.nn.FeedForwardNetwork.create(genome, config)
# Define a limit for how long Mario should wait for
stuck_max = 600
info = {}
# Loop over every level in the game
for i in range(0, 32):
# Reset the environment and get the first observable frame
observation = ENV_ARR[i].reset()
done = False
stuck = 0
while not done:
# Use the observation to generate an output set from the network
outputs = neat_.clean_outputs(net.activate(observation.flatten()))
# Make a move!
observation, reward, done, info = ENV_ARR[i].step(outputs)
# Check if Mario is progressing in level
stuck += 1 if reward <= 0 else 0
# If we haven't moved in 600 frames; close
if stuck > stuck_max:
ENV_ARR[i].close()
return neat_.calculate_fitness(info)
# If we have died; close
if info['life'] == 0:
break
ENV_ARR[i].close()
# Calculate fitness for this genome
return neat_.calculate_fitness(info)To make changes to how the network works this is the best function to alter.
We do not have any examples outside of NEAT however the interaction with the OpenAI environment should be easy enough to map onto other training systems. You would simply need a way to process the "input" array from the emulator and have it result in a set of "outputs" that the environment can understand. The inputs being each square in the game and the outputs being the button presses available to the player.
From the above example these are the commands you are looking to action:
# Reset the OpenAI environment
observation = ENV_ARR[i].reset()
# Generate outputs from the input variables (in a loop)
outputs = some_function(observation)
# Make Mario move (in a loop)
observation, reward, done, info = ENV_ARR[i].step(outputs)
# Close the environment once done
ENV_ARR[i].close()This forms the basis of how we interact with the OpenAI environment.