Mutable Environment - Agent-environment Simulation

4.5 Agent-environment Simulation

4.5.4 Mutable Environment

We’re looking to exploit and assert the self-learning and generalizing properties of the evolving SNNs, which serve as controllers in agents tasked with surviving

in their environment for as long as possible. We therefore introduce a mutable environment, where the rules of survival are constantly changing.

In the real world, biological organisms inherit survival mechanisms that allow them to adapt to the environment, like animals with camouflaging bodies that mimic their habitat. Certain insects, like stick-bugs have inherited bodies that makes them difficult to make out visually as long as they stay in their habitat [33]. Some organisms have taken it a step further, allowing them to adapt to changes in the environment. Certain octopi, fish, frogs and chameleons are able to change the colour and/or texture of their bodies to blend in with their sur-roundings (or for communication) [34][35], and furred mammals like the arctic fox change the density and/or color of their coat depending on the season to keep them from freezing or overheating, or to camouflage themselves in snow [36].

In the following sections, we propose two types of mutable environment inspired by such examples from nature. They both make use of binary inputs, but differ in the number of input signals per input sample.

Food Foraging

We first propose a simple environment for food foraging simulation. The envi-ronment provides the agent with two types of input, or food: black and white.

The agent can interact with food in one of two ways, by eating it or by avoiding it. The food can either be toxic or healthy, and whether a color of food is toxic or healthy is dependent on the state of the environment. For example, white food can start out being healthy, and the agent should eat it. But once the en-vironment mutates, it can suddenly make white food toxic, and now the agent should avoid it. Figure 4.6 is a simple illustration of this type of environment.

The food is encoded in a way so that the agent is able to distinguish between them, but the agent cannot know which food is healthy and which is toxic at any one time. The agent can only figure this out by interacting with the food.

An incorrect action is defined by eating a toxic food, or avoiding a healthy food, while a correct action is defined by eating a healthy food or avoiding a toxic food. If the agent makes an incorrect action, it receives a penalty signal (repre-senting pain, revulsion or hunger) and if it makes a correct action, it receives a reward signal. The environment has four possible states which describe which color of food is currently healthy: black, white, both or none. The correct action for each state is shown by Table 4.6.

Food Truth Table

Table 4.6: Truth table showing the correct action for each combination of input food color and healthy food.

Figure 4.6: Simple illustration of a food foraging type environment using binary encoding. Image taken from [18].

Logic Gates

In this environment, the mutable environment state is a two-input logic gate.

The environment provides the agent with two binary inputs of 0’s and 1’s. The agent’s task is to predict the correct output for the current logic gate given the current input. Similarly to the food foraging environment, it receives a reward signal if it’s currently predicting the correct output, and a penalty signal if it’s currently predicting the wrong output.

In order to measure the generalizing properties of agents, we propose the use of two slightly different environments: a training environment, which is used in calculating the fitness while running the EA, and a test environment which has a fully disjoint set of possible environmental states. A full overview of the logic gates found in both the training and the test environments, as well as exhaustive truth values for all input and output combinations, are found in Table 4.7 and Table 4.8.

Training Logic Gate Truth Table Input

A B NOT A NOT B ONLY 0 ONLY 1 XOR XNOR

A B

0 0 0 0 1 1 0 1 0 1

0 1 0 1 1 0 0 1 1 0

1 0 1 0 0 1 0 1 1 0

1 1 1 1 0 0 0 1 0 1

Table 4.7: Truth table showing the correct output for each training logic gate.

Testing Logic Gate Truth Table Input

AND NAND OR NOR

A B

0 0 0 1 0 1

0 1 0 1 1 0

1 0 0 1 1 0

1 1 1 0 1 0

Table 4.8: Truth table showing the correct output for each testing logic gate.

Chapter 5

Implementation of the NAGI Framework

In this chapter we briefly discuss choices related to the implementation of the code, as well as giving credits to existing code that influenced the implementa-tion. The code is available athttps://github.com/krolse/neat-nagi-python.

The implementation is part of the Socrates Project [37] on GitHub, and future development will be made in a fork of the original implementation which will be made available at https://github.com/SocratesNFR/neat-nagi-python at a later point.

5.1 Language

When deciding on the language for the implementation, there were two main deciding factors. It made a lot of sense to choose a language that was familiar to both the author and the supervisors, and we wanted the implementation to be written in a language that’s accessible and relevant to the scientific AI community. We decided on writing the implementation in Python 3 [38], because it is widely used in AI computing, having access to excellent AI frameworks such as PyTorch [39] andTensorFlow [40], as well as optimization packages such as multiprocessing andNumPy.

In document Neuroevolution of Artificial General Intelligence (sider 47-51)