Training scripts

This document details how to the various Training scripts available within the root train directory of this project work and how to extend and make your own scripts!

Composition of a NEAT training script?

The NEAT training scripts all follow quite similar steps to commence training:

The NEAT config is loaded using the default function:

neat.load_config_with_defaults(config_path)

This config and the rest of the input args are then passed to a run or evolve function
A network is created or loaded from checkpoint if a checkpoint exists using the config
A ParallelEvaluator is created which manages our multi-threading capabilities
The network is trained using this evaluator and a custom genome evalation function

This function is the biggest point of difference between the networks

Once the population has been trained for the specified number of generations several outputs are available which form the best genome output as well as the stats and graphic output.

Genome evaluation?

As mentioned this function is the biggest point of difference between the training scripts and forms how training is actually performed and on what factors a genome is successful and when it is not.

This function takes a genome and a config, the config being the config for that was created at the start and an individual genome of the generation that is being trained. The genome can then be used to take the inputs from the emulator and provide a set of dynamically generated outputs which are actioned within the emulator.

Depending on the outcome of the actions taken several things can happen:

If Mario is stuck in place for a certain amount of time; return
If Mario dies; return
On return, calculate fitness for this genome and report back to the population

As you can see several controls are made available to say when Mario should quit, how long to wait for him to move, what calculation is used for the "fitness" of the genome (distance? points? both?) that all impact the end resultant network. The evaulation loop is very important to actually producing a valid network.

To take the baseline eval_genome script:

def eval_genome(genome, config):
    # Create the network
    net = neat.nn.FeedForwardNetwork.create(genome, config)

    # Define a limit for how long Mario should wait for
    stuck_max = 600
    info = {}

    # Loop over every level in the game
    for i in range(0, 32):

        # Reset the environment and get the first observable frame
        observation = ENV_ARR[i].reset()
        done = False
        stuck = 0

        while not done:
            # Use the observation to generate an output set from the network
            outputs = neat_.clean_outputs(net.activate(observation.flatten()))

            # Make a move!
            observation, reward, done, info = ENV_ARR[i].step(outputs)

            # Check if Mario is progressing in level
            stuck += 1 if reward <= 0 else 0

            # If we haven't moved in 600 frames; close
            if stuck > stuck_max:
                ENV_ARR[i].close()
                return neat_.calculate_fitness(info)

        # If we have died; close
        if info['life'] == 0:
            break

        ENV_ARR[i].close()

    # Calculate fitness for this genome
    return neat_.calculate_fitness(info)

To make changes to how the network works this is the best function to alter.

Composition of other traing scripts?

We do not have any examples outside of NEAT however the interaction with the OpenAI environment should be easy enough to map onto other training systems. You would simply need a way to process the "input" array from the emulator and have it result in a set of "outputs" that the environment can understand. The inputs being each square in the game and the outputs being the button presses available to the player.

From the above example these are the commands you are looking to action:

# Reset the OpenAI environment
observation = ENV_ARR[i].reset()

# Generate outputs from the input variables (in a loop)
outputs = some_function(observation)

# Make Mario move (in a loop)
observation, reward, done, info = ENV_ARR[i].step(outputs)

# Close the environment once done
ENV_ARR[i].close()

This forms the basis of how we interact with the OpenAI environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training scripts