For Correlating Biological Plausibility and Performance, With the Goal of

(1)

Evolving Spiking Neural

Networks on Active Categorical Perception Problems

For Correlating Biological Plausibility and Performance, With the Goal of

Replicating on a Mixed-Signal Neuromorphic System

Daniel Sander Isaksen

Thesis submitted for the degree of

Master in Informatics: Robotics and Intelligent Systems

60 credits

Department of Informatics

Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

(2)

(3)

Evolving Spiking Neural Networks on Active Categorical Perception

Problems

For Correlating Biological

Plausibility and Performance, With the Goal of Replicating on a

Mixed-Signal Neuromorphic System

Daniel Sander Isaksen

(4)

Evolving Spiking Neural Networks on Active Categorical Perception Problems

http://www.duo.uio.no/

Printed: Reprosentralen, University of Oslo

(5)

Abstract

Artificial intelligence has made impressive progress by applying artificial neural networks to many computational problems in recent years. The list of tasks that neural networks can solve better than humans is growing every day. It seems that there is no end to creative and technical applications.

However, the models are growing ever more demanding, and hardware can not be scaled indefinitely. As a solution, more biological models, such as spiking neural networks and non-logical computer architecture such as neuromorphic computing, is being developed as a new paradigm. However, it has yet to be proven more effective than traditional solutions. In this thesis, we evaluate whether biologically plausible neural network models are linked to better performance in a task designer to measure several cognitive capacities; the active categorical perception task. In particular, it is shown that recurrence and noise, two crucial biological concepts, are meaningful for performance. Additionally, the results provide important insights into training spiking neural networks for such tasks, and these insights are transferable to implementations on neuromorphic systems. However, the implemented networks did not show a clear overall improvement in performance, training time, or computational cost than more traditional simple networks. In sum, more complex and realistic solutions are not necessar- ily better than keeping it simple, partially because implementation details themselves are so complex to get right. Thus, any computational savings in performance might be lost in tuning.

(6)

Acknowledgements

I want to thank my friends and family for their support and patience.

I want to thank my daughter for being the flame in my lamp,

and my grandmother for being the rock that secures through the storm.

“Let it go Let it leave Let it happen Nothing In this world Was promised or

Belonged to you anyway - all you own is yourself.”

–M

I am deeply grateful for all my incredible friends who have inspired my by taking part in invaluable discussions that helped shape this work.

Many thanks to my supervisors, Kyrre Glette, Johan Storm, and Ole Jakob Elle for your trust, time and for sharing your wisdom.

Special thanks to my good friend and supervisor, André Sevenius Nilsen, for giving me the opportunity to venture out on this intellectual journey that encompassed so many incredible fields of science and philosophy. Thank you for being such an important part of this adventure. I have truly enjoyed it.

(7)

Abbreviations

HBP Human Brain Project POC Proof-of-Concept AI Artificial Intelligence NN Neural Network

SLP Single-Layer Perceptron MLP Multi-Layer Perceptron ANN Artificial Neural Network FFNN Feed-Forward Neural Network RNN Recurrent Neural Network

CT-RNN Continuous-Time Recursive Neural Network IAF Integrate-and-Fire

SNN Spiking Neural Network

R-SNN Recursive Spiking Neural Network IZNN Izhikevich Neural Network

STDP Spike-Timing-Dependent Plasticity LTP Long-Term Potentiation

LTD Long-Term Depression EA Evolutionary Algorithm SGA Simple Genetic Algorithm

NEAT NeuroEvolution of Augmenting Topologies NEST Neural Simulation Technology

FPGA Field-Programmable Gate Arrays FPTA Field-Programmable Transistor Arrays EH Evolvable Hardware

(8)

BSS-1 BrainScaleS-1

MPI Message Passing Interface

API Application Programming Interface ACP Active Categorical Perception IIT Integrated Information Theory

(9)

List of Tables

3.1 The table show comparable test-results for each of the three parallelization schemes A, B and C. . . 41 4.1 ACP: 10 levels of complexity . . . 54

(12)

List of Figures

2.1 A figure showing the power efficiency of various computing

platforms. From a presentation by Steve Furber. . . 9

2.2 An overview of the most popular ANN algorithms. From [49] . 12 2.3 Neurophysiological Primary Components . . . 15

2.4 Schematic diagram of the AdExp circuit . . . 21

2.5 Firing Modes ofthe AdExp circuit. . . 22

2.6 the HICANN Chip . . . 22

2.7 The Synapse Circuit of the ANC . . . 23

3.1 Fully connected Neural Network (NN) agent . . . 31

3.2 Basic Falling Blocks Game . . . 32

3.3 The Tasks of the Falling Blocks Game . . . 33

3.4 The left figure show the action potential as it circulating through the ring. . . 42

3.5 The right figure show the movement as perceived by an agent’s sensor, where the potential moves back and forth. . . 43

3.6 A photo of BSS-1 . . . 43

4.1 Each algorithm or model’s initial thousand. . . 47

4.2 Comparison of Simulators: SGA NEST vs. NEAT Python . . . 48

4.3 Comparison of network types evolved with SGA . . . 49

4.4 Tuning NEAT parameters . . . 50

4.5 Comparison of network types evolved with NEAT . . . 52

4.6 RNN evolved with NEAT on ACP with varying complexity, divided in levels and stages. . . 55

(13)

Chapter 1

Introduction

This chapter presents the ideas and goals of this project. A summary of background theory and the motivation for this study are presented in section 1.1, goal and research question are explained in section 1.2 on page 4, and the overall structure of the thesis is summarized in section 1.3.

1.1 Motivation

The fields of Machine learning and Artificial Intelligence (AI) have in the last decades seen a considerable rise in their models’ ability to solve specific tasks, notably by triumphing over humans in games like Go and Starcraft 2[1, 2]. These models are increasing in both complexity and cost, leading to many researchers predicting a new AI winter (period of stagnation in development and funding) due to limits on energy and computational resources (materials) following the end of Moore’s law [3, 4, 5]. However, this is where biology might come to a rescue. The mammalian brain is superior to computers at three things: computational density (materials), power efficiency (energy requirement) and adaptation. The following examples on simulation and machine-learning might help provide perspective:

"... a “human-scale” simulation with 100 trillion synapses (with relatively simple models of neurons and synapses) required 96 Blue Gene/Q racks of the Lawrence Livermore National Lab Se- quoia supercomputer—and, yet, the simulation ran 1.500 times slower than real-time. A hypothetical computer to run this simulation in real-time would require 12 GW, whereas the human brain consumes merely 20 W."[6]

. One may argue that full-scale brain simulations are not needed to achieve super-intelligence, but even the most sophisticated self-learning models today, on state-of-the-art machine learning hardware, require significantly more energy than a human learner: DeepMind’s AlphaGo required 40 days of training on 4 TPUs to achieve a 60-0 win-rate at online Go games [7].

This means it consumed4∗75W = 300W [8] for 40 days. In comparison a human brain uses approximately a 15th of the cost, 20W, per day [9]. Thus, in energy-wise comparison, a human brain would get 600 full days, or 1800

(14)

western, professional work-days, to achieve a 60-0 win-rate at online Go.

Taking inspiration from biological systems could thereby address the current limitations of computational density and efficiency. However, it is well known that "we did not learn how to fly by imitating birds" and approaching biology has its challenges and limitations too. The question is, how close to biological brains should our computers be? Can we find efficient solutions with more biologically plausible models? This thesis aims at exploring how small neural networks with various biological plausibility perform at a relatively complex problem. Specifically, I investigate how popular variations of McCullochs-Pitts, like Linear Threshold Unit - and Sigmoid neural networks (here called ANNs) in FFNNs and RNNs behave against Spiking Neural Networks (SNNs)[10]. SNNs models the human brain to a much higher degree than ANNs, and is currently used in state-of-the art brain simulations. The use of SNNs in AI is currently growing, which indicates a theoretical potential for usage in applications [11]. The general hypothesis is that more biological realism can achieve higher performance on a given task.

Research on Artificial Intelligence (AI) Research on AI has always been inspired by biology and psychology. One could say that the scientific field of AI started when George Boole tried to represent human thought with mathematics and created Boolean logic [12]. From an historical perspective, Boolean logic lead to Shannon Theory [13], which is a fundamental theory both in the development of the transistor and in the information theory used in computing programs today. The concept of AI was further examined by thinkers like Alan Turing[14]. One of the earliest neurolog- ically inspired neural networks was The Perceptron, a machine developed by Rosenblatt [15]. The model was refined by Minsky and is fundamental to machine learning models still in use today [16].

Modern computers have immense computing power, and the algorithms based on The Perceptron have been transformed into complex deep-learning systems with recurrent convolutional rectified linear neural networks and the like [17]. The rapidly growing field of AI and Machine Learning is filled with accomplishments that outperform humans on demanding tasks like image recognition, systems control, reading and writing, and computer games. The results are many, and they are becoming ever more thanks to ever-increasing computing power and algorithmic complexity. Alas, the results are becoming ever more expensive, both in terms of materials (i.e., transistors) and energy consumption during training. Modern state-of-the- art AI is also very task-specific, and it seems that the cognitive abilities of the AI-systems are often uncorrelated to the system’s task efficiency or even the amount of task-specific skills.

State-Of-The-Art AI An example is the GPT-3¹ from OpenAI, which is an autoregressive language model that uses deep-learning to produce and analyse human-like texts [18]. In 2020, these results are incredible, and it has been widely discussed whether this model can pass several kinds of written Turing-tests. However, as Floridi and Chiratti point out in [19],

1https://openai.com/blog/openai-api/

(15)

there is a dark side to GPT-3 regarding cost and actual capabilities: GPT- 3 uses 175 billion parameters, and training them has an estimated cost of 12 millions USD. Moreover, the model only really works where the use- case mostly requires syntactical skills. Floridi and Chiratti went on to test GPT-3 on semantics, ethics, and mathematics/logic, and the results are not as impressive when it comes to natural language syntax. Thus it seems that GPT-3 has a high amount of task efficiency and a broad range of task- specific skills but little understanding of what it is doing, much like a highly efficient factory.

One could argue whether the performance of one task-specific model alone is worth tens of millions of US dollars, which of course is multiplied many-fold when taken into account all AI-models and all their generations.

So far, the materialistic development has been enabled by the exponential development of transistors, as predicted by Gordon Moore and popularized by Carver Mead [20]. How long can we push development in the same direction before the resources stop? Can we re-think and create different kinds of models, algorithms, or hardware that might reduce necessary parameters or increase the efficiency of our AI-systems?

Biologically Inspired Systems A popular idea among researchers is that the ANNs used in deep learning are simply too different from the mammalian neural systems ever to achieve the same kind of cognitive abilities or low-power computation that is characteristic of biological systems. One of the main differences is that information in biological neuronal networks is dependent on a spatially discrete and temporarily continuous transmission of discrete states (i.e. behavior is dependent on which output neurons are active, and when), while information in most machine-learning systems is based on only spatially discrete transmission (i.e. behavior is dependent on which output neurons are active). In other words, while both types of networks have separate neurons, only biological networks operate continuously over time. This difference is bridged in SNNs where information is transmitted by the use of spatially discrete neurons and approximated temporarily continuous information transmission. However, SNNs are computationally incompatible with traditional von Neumann architecture. This is tackled by the development of neurmorphic electronic systems. By definition these are low-power, massively parallel, and modular with collocated memory and processing units. Thus, relying on intrinsic parallel computing where each neuron is hardware-simulated on a single chip, many of the bottlenecks of more serial computing can be avoided.

Further, the spatio-temporal discrete nature of SNNs might be a perfect fit for processing streams of data in a dynamic computing landscape where a high number of units are providing data for various kinds of applications.

Additionally, using SNNs to process data might be more energy efficient than traditional machine learning as SNNs can be trained while they are running, thanks to the biologically inspired plasticity in the neuronal and synaptic models. It is worth noting that plasticity has also recently been implemented in Deep ANNs, although plasticity is not an inherent characteristic of ANN [21]. If these stream-processing SNNs can be run on energy-efficient neuromorphic systems, then the cost of processing of big

(16)

data can be reduced drastically. One such system is BSS-1 which is in- credibly fast, massively parallel, energy-efficient, and implements neural models that have plasticity [22].

1.2 Goal and Research Question

Research Question: The Research Question: Do neural network models with more biological plausibility, through recur- rency and spiking neurons, have an increased performance at real-time tasks that require working memory, active perception and categorization, compared to more standard ANNs?

The future aim of this study is to train networks implemented on a neuromorphic system, such as BSS-1, in an environment that requires some degree of memory and "understanding", or in other words, partial cognitive abilities. This solution is intended to reduce the high resource costs of both AI applications in industry and cognitive simulations in biological research.

However, in order to determine whether an SNN implemented on a neuromorphic system is beneficial compared to more standard ANNs, it is pru- dent to solve the given tests in silico first. This includes configuring optimal experimental environments, tuning network, neuron properties, and developing training algorithms. In addition, as there are several fundamental changes between the move from ANN to neuromorphic SNN (e.g., computing platform or neuron model complexity) that might on their own confer benefits or drawbacks in terms of task performance, a step-wise approach has beneficial research purposes.

Ideally, the experimental environment should be set up so that one could easily change between software-based simulation, hardware-based simulation, and network and model characteristics, without much impact on the code or possibility of a direct comparison. However, as networks and neuron models become more complex over time, so does the calibration of parameters in order to reduce computational costs associated with training networks to a task. Thus, to some extent, in silico optimization is required, making a direct comparison between different implementations difficult.

In other words, many of the steps from ANN to neuromorphic SNN could be delved into "indefinitely".

The goal of the present study is to provide an answer mainly on one crucial step in the process to reach the future target of low-power neuromorphic application, by testing whether an SNN could outperform other ANNs on a chosen task that covers several abilities necessary for many real-world applications. In order to best answer the question that this study is propos- ing to answer, there is a need for appropriate training algorithms and simulation methods. Note that the coverage of these topics will be synthetic and in any case exhaustive in this work.

The Goal:To simulate and tune an SNN to achieve the desired behavior on a relatively complex problem, and compare it to an ANN on the same task.

(17)

Although the present study is a sub-study, it can answer essential ques- tions in itself: as an SNN is more complicated than other variations of the McCullochs-Pitts neuron, i.e. threshold neurons and sigmoidal neurons, and thus the study covers whether the setup can be tuned so that a minimal recursive SNN can achieve higher fitness (or reduced training time) on a relatively complex task than these other, more popular, ANNs could with about the same number of neurons. This sub-study disregards power- and time-efficiency (computational time), even though this aspect would be considered after an instance on a neuromorphic system was complete.

The Challenges and the Hypothesis I assume that networks with higher fitness at this kind of test have higher AI capabilities in general, but I also consider the number of hidden nodes in the network. When it comes to algorithms, I mainly focus on the number of generations it takes to the convergence of a solution with high fitness.

The hypothesisis that a RNN would potentially perform better on complex tasks than a FFNN would do, and that a SNN would perform better than an ordinary ANN.

Ideally, I would also measure the power-efficiency of the networks. How- ever, I can already assume that a SNN is more costly than an ordinary ANN and that a RNN is more costly than a FFNN. Thus, measuring efficiency would make no sense before a physical, neuromorphic instance of the SNNs is run successfully. However, while one iteration of training might take more time in computational terms, it is of interest to look at how many iterations are required to "peak performance" for both SNNs and ANNs.

A challenge of this sub-study is linked to setting up and tuning an SNN to achieve an adequate performance on a relatively complex task. Here, a relatively complex task is taken to be a task that is not trivially easy to solve for a small standard ANN, such as dynamic tasks with partial information availability. Another challenge is finding appropriate tests, simulators, and training algorithms to conduct the experiments. The challenges typical to the training of SNNs is often related to tuning the vast amount of parameters related to the biological models of neurons and synapses. To overcome this, I chose neuron models that are simplified but yet of sufficient electrophysiological detail. As an SNN is modeled by multi-dimensional differential equations that are relatively costly to run, I would need to simulate them with optimized algorithms. Training SNNs with gradient-descent has mostly been done with an approximated gradient and is computationally expensive [23, 24, 25, 26] . Therefore, I want a type of training algorithm that can provide a gradient-free training framework that can efficiently search the huge parameter-space. I chose to use EAs because it is a well- tested and popular algorithm of the prescribed kind. While evolution is not a biological learning mechanism of the brain, it is undoubtedly a biological mechanism proven efficient with biological systems. Besides, any solution reached by a classical training algorithm like backpropagation can also be reached by evolutionary methods. The test chosen to test basic cognitive abilities is a simplified ACP task that requires the processing of spatio- temporal partial information and has been extensively used in earlier cognitive psychology studies. To simplify the workload, this project’s primary

(18)

measure is simply the solution’s fitness on this task, as the fitness on this task has already been proven to be correlated to higher cognitive abilities [27, 28, 29]. A challenge here is to properly integrate the SNN with the task, which mainly means encoding and decoding the data from and to the task in a "meaningful" way for a SNN.

Summary While there are several outstanding applications of NNs in use all around us, the models in use have increasing resource-requirements and relatively narrow use-cases. This can be seen as unsustainable, and therefore, new ways of doing AI might be the way to go. One new way is the more biologically inspired SNN. However, even though the models themselves have the potential to be computationally more powerful, there are few successful applications of SNNs. The lack of successful applications can be linked to training methods, ineffective ways of simulating the models, inappropriate use-cases, and limited suitable computational platforms such as neuromorphic hardware.

The ultimate goal of this project’s work is the future exploration of algorithms and use-cases on simulators that are fundamentally different from the ones in use today, namely neuromorphic electronic systems. These systems are currently very immature and the road to deploy successful applications is long. Given these constraints, this thesis focuses specifically on parts of the road to applied neuromorphic computing, namely on training SNNs with EAs. The contributions of this work are highlighting important aspects to consider regards to evaluation methods and simulation methods.

The results indicate a correlation between biological plausibility and performance, which is an important motivation for neuromorphic computing, but the data are still insufficient for making any strong claims.

1.3 Thesis Structure

First, I will detail some of the published theoretical background motivating this project and background on the tools, technologies, and methods used in this project. Then, I will describe implementation details, parameter selec- tion, experimental setup, and general approach. Following the methodolog- ical description, I will present the experiments performed and their results.

Lastly, a concluding chapter will discuss the results and limitations of the current work as well as highlight the future work needed.

(19)

Chapter 2

Background Theory

This chapter introduces some fundamental theory that is related to the goals of this project. Section 2.1.1 briefly covers some of the history of AI as is appropriate for a study on the cognition of AI agents. An overview of the computational cost of AI-research is presented in section 2.1.2. In section 2.1.3 there is a short introduction to biologically inspired systems and how we could develop more computationally efficient systems by drawing even more inspiration from nature. By adapting our computers to biological architectures, some would hope to come closer to developing artificial general intelligence, but how is this intelligence to be measured? Section 2.1.4 shines some light on the existing theories in this field. The remaining sec- tions cover background theory that is directly applied to the implemented methods of this project.

Human Brain Project (HBP) The HBP is a European Flagship that is developing a European research infrastructure advancing brain research, medicine, and brain-inspired information technology for both industry and science. The project has now entered its third phase, with over 100 part- nering institutions from over 20 countries in Europe and over 100 partner- ing projects. There are 12 subprojects in HBP that span the development of six ICT-based Platforms. One of these six platforms is the Neuromor- phic Computing (NMC) Platform, with two systems; the mixed-signal VLSI BrainScaleS (Brain-inspired multiScale computation in neuromorphic hy- brid systemS) and the massively parallel digital SpiNNaker (Spiking Neu- ral Network architecture) [30]. The present study is contributing to a sub- project in the HBP.

2.1 General Background

2.1.1 Brief History of AI Research

The philosopher George Boole’s maybe most excellent work was his mathematical An Investigation of the Laws of Thought, wherein is the notion that human reasoning could be summed up by symbolic logic [12]. This work later inspired Claude Shannon to build the first "switching circuits" that

(20)

facilitated the design for combinatorial logic circuits and led to the vacuum tube technology used in early computers. John von Neumann draws par- allels to the human neurons in his report on the EDVAC, which was used by Alan Turing as a theoretical source while designing the Automatic Com- puting Engine [31]. Combinatorial logic eventually led to the development of modern Information Theory, which was proposed by the same Shannon in his A Mathematical Theory of Communication. Thus, the foundational basics of our computer programs are also rooted in the search for intelligent systems. The most successful artificial intelligence programs were, for a long time, only compositions of logical if-then sentences [3].

Today, a significant part of AI-research put into artificial replication of mammalian function or behavior involves taking inspiration of biology:

Neural Network research. In 1943, the first mathematical neuron model was published by McCulloch and Pitts, which modeled the neuron as a logic-gate.[32]. Through improvements to the McCulloch-Pitts neuron, by scientists as Hebb [33], we eventually got the Perceptron. Introduced by Rosenblatt [15] and refined by Minsky et al. [16], the Perceptron is a binary classifier which models synaptic plasticity and thus, enabled the first

"self-learning" ANN algorithm, called the Single-Layer Perceptron (SLP).

In 1974, Paul Werbos developed backpropagation, which enables propagat- ing the prediction-error of the approximated function as a gradient through several hidden layers of SLPs[34]. This led to the Multi-Layer Percep- tron (MLP), which is able to approximate any continuous function [35], and this in turn made ANNs more popular.

Today most artificial intelligence programs are based on arithmetic ap- proximation of complex functions through millions of logic-gate transistors.

ANNs, especially with the method of Deep Learning, pioneered by LeCun et al. [36], is now one of the most popular methods in modern artificial intelligence algorithms [3]. The use of deep learning - or vast amounts of hidden layers - with modern-day computer architecture has been proven very efficient for most computational tasks and very inefficient for intricate tasks that are easily solved by biological systems. However, modern computing is still rooted in the same "outdated" knowledge of the human brain. As our expectations of the computer’s functionality grow increasingly similar to the functionality of biological systems, we will require devices that are not arithmetic machines but rather "thinking machines." It is likely that the resources would stop before we could develop "thinking machines" with today’s architecture.

2.1.2 Cost of Computations

Three of the primary problems with massive computation today is power consumption, material requirements and computational density. The development of lower computational density were previously driven by Moore’s law of exponential CMOS growth [37]. The infinite cramming of transistors is also the source of the increasing requirements to energy and materials. Additionally, as massively parallel systems stack ever more CPUs onto their chips, the cost of the “von Neumann” bottle-neck becomes apparent.

The bottle-neck refers to the fact that an instruction fetch and a data oper- ation can not occur at the same time on von Neumann architecture [31].

(21)

Figure 2.1: A figure showing the power efficiency of various computing platforms. From a presentation by Steve Furber.

(22)

One possible solution is to move away from hierarchical deep learning convolutional networks of binary or numerical threshold gates, towards more recurrent flexible networks with biological online learning mecha- nisms. These types of networks are called Spiking Neural Networks (SNNs).

Wolfgang Maass shows in his 1996 article that SNNs are, in theory and with regard to the number of neurons that are needed, computationally more potent than other types of ANNs [10]. However, when it comes to power-efficiency, SNNs the current software-implementations of SNNs are computationally incompatible with traditional von Neumann architecture.

A proposed solution to all the abovementioned challenges is Neuromor- phic Electronic Systems, published in 1990 by Carver Mead, the same man who coined the term “Moore’s law” [38]. By definition these are low-power, massively parallel, and modular with collocated memory and processing units. Thus with multi-dimensional differential equations simulating cells, synapses, and electrophysiological dynamics. In neuromorphic systems, cells and synapses can be simulated by only a few transistors, often in a massively parallel and sped-up fashion [39]. Thus, relying on intrinsic parallel computing where each neuron is hardware-simulated on a single chip, many of the bottlenecks of more serial computing can be avoided.

Recently, Rajendran et al. published excellent comparisons of such systems [5]. Albada et al. describes how neuromorphic systems use less energy than HPC systems per synaptic event by several orders of magnitude.

This is visualized in fig. 2.1, which is a figure from a presentation created by Steve Furber, the leaders of the SpiNNaker platform in HBP. Cao published a good example of how these systems can replace Deep Convolutional Nets for Energy-Efficient Object Recognition [40].

2.1.3 Biologically Inspired Systems

A popular idea among researchers is that the neural networks used in deep learning simply are too different from the mammalian neural systems ever to achieve the same kind of cognitive abilities. One of the most remark- able differences is that information in biological neurons are based on the spatio-temporal transmission of discrete states, while the information in most machine learning systems, of the ANN type, is based on the binary transmission of numerical values. This difference is bridged in SNN where the use of spatio-temporal discrete states transmits information. In other words, an ANN processes information in discrete time-steps and outputs numerical values. A SNN processes information continuously but outputs binary signals.

As our understanding of neurophysiology has progressed, more advanced models have been developed, including details of the ion flows and proteins that affect the membrane potential [41]. These models are often called Leaky Integrate-and-Fire (IAF) Neurons, and the neural networks are often called SNNs. From one point of view, biology has been the source of inspiration for most successful solutions for AI, for example RNNs and EAs [42].

SNNs might be the next step towards an agent of low-cost human-level intelligence. Even though our computational power has grown exponentially for 70 years, the efficiency of our systems does not apply to extensive simulations running SNNs because a detailed simulation of one neuron could be

(23)

as costly as that of a whole ANN. Thus, a more suitable technology might be required, such as neuromorphic systems which can allow neurons to be simulated by a few (or even single) transistors in a massively parallel and sped-up fashion [39, 43, 44].

Thus, what we would expect of the thinking computer of tomorrow are both robustness and efficiency[38]. Two characteristics highly developed in biological systems! Indeed, among the huge diversity of biological or- ganisms and systems, evolutionary processes have kept a large panel of elements that complete the same conserved function. This functional re- dundancy, enabling elements that are structurally different to perform the same function under certain conditions. At the same time, they can have distinct functions in other conditions [45].

The high utilization of the characteristics of the elements in biological systems is the key to high efficiency in terms of energy and material, and a source of inspiration for the design of this inspires researchers in the design of artificial systems [38]. It is essential to consider how computer architecture usually has been modeled after the brain since the beginning, but always at a simplified notion compared to the current knowledge of biology and psychology.

While more biologically inspired systems might or might not be more efficient in terms of solving specific tasks, classifications, or algorithms, and even be trained explicitly to play more complex games like Starcraft 2, they’re far away from managing even "simple" biological behavior.

2.1.4 Intelligence, Fitness and Consciousness

If a system can solve the same intricate tasks as a human, with ease, and also have a general behavior as a human, it could likely pass The Tur- ing Test, but would it then have a human level of intelligence? Based on the plethora of theories regarding intelligence that exist today, the answer could be yes, and no [3]. If a system was to have a human level of intelligence, one could argue that it would also be conscious, based on the definition of consciousness. Some consciousness theories predict that if a system is connected in the right way (e.g., is recurrent) and its dynamics complex enough, that system would be conscious. Integrated Information Theory (IIT) is one such theory, and it even provides mathematical tools to assess the "degree" to which a system is conscious or not [46]. Further, the theory has also been used to show a link between this "degree" of consciousness (termed Phi) and a systems cognitive abilities as measured by an active categorical perception task [28]. While the system employed in that study was more similar to a classical ANN (although it was recurrent), it is possible that more complex neuron models would result in a higher Phi and more robust capabilities per the inference between intelligence and consciousness [46].

However, IIT’s Phi measure is hugely demanding to calculate, and only smaller systems of up to a few dozen nodes can be analyzed. As such, the development of "high Phi" systems is both computationally intensive and no guarantee for increased cognitive capacity. Most systems today that can be considered intelligent are developed on traditional computers. The problem with traditional computers is that they are highly modular, with different

(24)

Figure 2.2: An overview of the most popular ANN algorithms. From [49]

parts ultimately fulfilling their function in the bigger system. Modular- ity implies a low level of integration per definition, and it follows that all programs implemented on this architecture will have a low level of integration. IIT specifically rules out consciousness in simulated systems as long as the simulation is performed on a modular or serial architecture found in modern computers. However, a neuromorphic system is, on the other hand, massively integrated. Thus, using IIT to evaluate a computer program’s intelligence with autonomous functionality seems to be useless. By oppo- sition, some neuromorphic systems, built to resemble neural architecture, have more integrated architecture. A program written for one of these systems might be considered conscious by IIT and might lead to new insights into the requirements for creating a system with a human level of intelligence.

2.2 Technical Background

2.2.1 Artificial Neural Networks (ANNs)

A popular AI algorithm is the ANN, which creation was motivated by the quest to make self-learning computers able to induce and deduce. Although the biological brain has been an excellent inspiration for ANNs, most learning algorithms used by ANNs are, strictly speaking, developed from classical statistical pattern analysis. Most ANN algorithms have two things in common: first, the network-models consist of many connected artificial neurons. These networks learn from data by representing learned knowledge in their model parameters. Second, an ANN model is unbiased towards the data distribution before learning [47] (Chapter 6.01, Pages 1-17).

When mentioning various types of ANNs in AI literature, most of these types still use simple artificial neurons based on the McCulloch-Pitts neuron model. What usually separate the ANN types are the activation function and the connectivity model. Activation functions used in this thesis are the Linear Threshold Unit and the Sigmoid neuron, which are used in this study [48]. Connectivity models that are used in this study are the MLP (classified as a FFNN) and the RNN.

A RNN is a feed-forward network where recurrent connections are al- lowed, that is, connections to other neurons in the same layer, including self-connections[49]. See figure 2.2 for an overview of the most popular ANN models.

(25)

Continuous-Time Recursive Neural Network (CT-RNN) CT-RNNs are biologically plausible and provide an intermediary step between sigmoidal neurons and SNNs, but still classified as an ANN in this study.

The biological intepretation of CT-RNNs is plausible because their variable internal state is analogous to a neuron’s mean membrane potential and the output can be associated with its firing rate. The network model is a promising intermediary step because even though it is one of the simplest nonlinear, continuous dynamical models, it retains a lot of qualities from Sigmoidal RNNs. Thus, CT-RNN has been used successfully in biological modeling and AI applications like evolutionary robotics and other control tasks [27, 50].

2.2.2 Spiking Neural Networks (SNNs)

Third Generation ANN SNNs, often called The Third Generation of ANNs, consist of neuron and synaptic models from modern computational neuroscience. These advanced neuronal models are results from the best of physiology’s knowledge of synaptic connections between neurons and what stimulates the membrane potential of the cells, even at at a molecular level [51]. The third generation of ANNs are far from representing all the characteristics of the biological nervous system, but introduce especially one important aspect that earlier generations have trivialized, namely the actual output of the biological neuron [10]. In 1997, Wolfgang Maas proved the- oretically that SNNs have at least the same computational power as MLP and Sigmoid networks. Additionally it was proved that many functions can be computed with significantlyfewerneurons in a SNN with continuous and piece-wise linear response and threshold functions [10]. Biological neurons are inherently different from networks of simple logic gates like MLP, and modeling them could open up new horizons of biologically inspired computing [32, 52].

Spatio-Temporal Coding In addition to transmitting information in terms of position, that is, where a rate of spikes occurs, information in the brain is also temporally coded. The output of a neuron consist of the set of points in time when the neuron fires. SNNs allows investigation of how time can be used as a resource in ANNs [10].

Noise-as-a-Resource It is widely discussed among neuroscientists whether noise (as probabilistic inference) is one of several coding schemes for knowledge in the biological brain. Probabilistic interference is a powerful tool used in many applications, especially computer vision and robotics. Maass reviews whether noise can be used as a resource for coding knowledge in SNNs, and show results indicating that networks can store valuable information in noise [53]. With mixed-signal neuromorphic simulators, like BSS-1, that allows very small time constants, noise could be readily available. Noise could be used as a resource if we develop training algorithms that can utilize it [22]. Noise will not be considered greatly in this present study, but it might be taken into moderate consideration.

(26)

Simulation As already mentioned, the SNNs are costly to simulate [54].

Additionally there will be a vast amount of spikes, because the time constant has to be sufficiently small for the networks to be continuous and piecewise linear. The high volume of operations and communication happening in parallel can quickly be strained limited bandwidth. See section 2.1.2 on page 8. To date SNNs still lag behind ANNs in both efficiency and accuracy, but for some special cases when processing spatio-temporal data [26]. Some may argue that impressive results have been provided using traditional ANNs and von Neumann architecture, and that we therefore should just continue in that direction while making our models better and our hardware cheaper. However, if we want to create autonomous systems that can efficiently mimic the capabilities of biological creatures, others may argue that systems closely related to the architecture of the very efficient brain, is the way forward [55, 40]. It remains debated whether neuronal spikes (as opposed to e.g, firing rate) is the correct way to describe neuronal communication [41]. However, neuromorphic architecture today is mainly defined as low-power, non-von Neumann architecture, e.g. massively parallel, highly connected, modular with collocated processing and memory [56].

2.2.2.1 Biologically Plausible Neuron Models for SNNs

This section cover neuron models that replicate biological neurons to varying degrees. See fig. 2.3 for an illustration of the most important neurophysiological terms.

Neuron Models An example is the Hodgkin-Huxley model, which describes neurons’ electrical characteristics, much like a conductance-based electrical circuit, and has proven by patch-clamp (a measurement method in biology) experiments its ability to model neural dynamics with high accuracy [58]. Another example is the popular Izhikevich model, which is a simpler variant than the Hodgkin-Huxley model but that still reproduces many of the computational features with less computational costs [59].

Another popular class of spiking neuron models are IAF which vary in their complexity, with Leaky IAF being the most popular. Leaky IAF comes with several additions, like Exponential Leaky IAF and Adaptive Exponen- tial Leaky IAF. Typical SNNs usually incorporate one of these variations of IAF [56].

There are many more models, and some of these most popular are: Spik- ing Feed-Forward, Spiking Deep Belief, Spiking Hebbian, Spiking Hopfield, Associative Memories, Spiking Winner-Take-All, Spiking Probabilistic, and Spiking Random. Common for all these is that the typical training methods is similar to traditional ANNs training methods (i.e. gradient descent, backpropagation, etc.), which might not utilize the full potential of either the model or a neuromorphic system. [56]

Synaptic Models Synaptic models aim to model how two neurons communicate in terms through the intermediate of the synapse, with some including merely a copying of the pre-synaptic signal, while other variants model neurotransmitter release and uptake, Long-Term Potentiation (LTP)

(27)

Figure 2.3: An artistic illustration of the most important neurophysiological components. [57]

andLong-Term Depression (LTD) (plasticity; Spike-Timing-Dependent Plas- ticity (STDP)), and even computations [60]. While synapse properties can simplified with "weight" (i.e. how much to modulate incoming signal), especially plasticity rules can be important for online learning. The BrainScaleS system implements several of these rules, including STDP with both LTP and LTD [22, 61, 56]. Utilizing the models with plasticity rules might be the way forward for training SNNs, but this is not done in the present study.

2.2.3 Summary of Neuron Models

This section summarizes the neuron models used in this study. As mentioned previously, threshold-activated neuron models are primarily differ- entiated by their activation functions, leading to the naming below:

Linear Threshold Unit (Threshold Logic Unit) For neuronnthe activation functionf would yield the following for inputx:

fn(x) =

(1, ifbn+Pinputs

i=1 vi×wi≥tn

0, otherwise (2.1)

, wherevis input-value,wis weight,bis bias andtis threshold.

(28)

Sigmoid Neuron These neurons are characterized by having an activation function that outputs an "S"-shaped curve. The most important requirements to a sigmoid function is that it is bounded, differentiable and real-valued [35]. Many functions are sigmoid, but one used in this project is the logistic function:

S(x) = 1

1 +e^−x (2.2)

Dynamical Neuron The CT-RNN is modeled as a system of ordinary differential equations, with neuron potentials as the dependent variables.

Ti

dyi

dt =−yi+fi(βi+ X

j∈Ai

wijyj) (2.3)

WhereTiis the time contant of neuroni. yiis the potential of neuroni. fi

is the activation function of neuroni.βiis the bias of neuroni.Aiis the set of indices of neurons that provide input to neuroni.wijis the weight of the connection from neuronjto neuroni.

The time evolution of the network is computed using the forward Euler method:

yi(t+ ∆t) =yi(t) + ∆tdyi

dt (2.4)

The Izhikevich neuron The Izhikevich model reduces a Hodgkin-Huxley- type neuronal model to a two-dimensional system of ordinary differential equations of the form

v⁰= 0.04v+ 5v+ 140−u+I (2.5)

u⁰=a(bv−u) (2.6)

ifv≥30mV, then

(v←c

u←u+d. (2.7)

Here v and u are dimensionless variables, and a, b, c and d are dimensionless parameters, and ⁰ = d/dt, wheret is the time. v represents the membrane potential of the neuron andurepresents a membrane recovery variable, which accounts for the activation ofK⁺ionic currents and inacti- vation ofN a⁺ ionic currents, and it provides negative feedback tov. After the spike reaches its apex (+30 mV), the membrane voltage and the recovery variable are reset according to the eq. (2.7). Synaptic currents or injected dc-currents are delivered via the variable I. The parameterade- scribes the time scale of the recovery variableu. The parameterbdescribes the sensitivity of the recovery variableuto the subthreshold fluctuations of the membrane potentialv. The parametercdescribes the after-spike reset value of the membrane potential v caused by the fast high-thresholdK⁺. The parameterddescribes the after-spike reset of the recovery variableu caused by slow high-thresholdN a⁺andK⁺conductances. [59]

(29)

The Conductance-based Exponential IAF model This model is a two- dimensional IAF model that combines an exponential spike mechanism with an adaptation equation. Like the Izhikevich model, this model sim- plifies the Hodkin-Huxley model while retaining some detail in capturing neurons’ electrophysiological behavior. The simplification yields a smaller parameter-set that can be tuned more efficiently and thus might be auto- mated, for example, by an evolutionary algorithm.

The membrane potential is given by the following differential equation:

CdV

dt =−g_L(V−E_L)+g_L∗∆T∗exp(V −V_T)

DeltaT −g_e(t)(V−E_e)−g_i(t)(V−E_i)−w+I_e (2.8) , whereC is the membrane potential,gLis the leak conductance,ELis the resting potential,∆T is the slope factor, V is the potential,VT is the threshold potential, Ee is the excitatory reversal potential, Ei is the inhibitory reversal potential,geis the excitatory synaptic conductance,gi(t)is the inhibitory synaptic conductance, w is the spike-adaption current, and Ie is the constant external input current. and

T_w×dw

dt =a(V −E_L)−w (2.9)

defines the adaption currentw, whereTwis the time constant andarepre- sents the level of the subthreshold adaption. [62]

2.2.4 Software SNN Simulators

2.2.4.1 NEST

This section concerns why the NEST Simulator was used. For more practi- cal information about NEST, see Chapter 3. To help answer the hypothesis, some of the experiments in this project involves evolving SNNs, as the functionality of these networks is more closely related to biological neural networks. To mitigate the cost of simulating SNNs, the use of NEST is implemented in the experiments: NEST is a highly optimized simulator with a kernel written in C++ with a large library of implemented neuron and synaptic models. The use of NEST was intended to reduce development- time and increase the value of the experiments. In future works, it is desired to evolve neuromorphic networks, which can further bridge the gap between cognitive abilities analysis in simulated neural networks and biological brains thanks to the biologically inspired hardware. Most neuromorphic systems physically implement variations of SNNs, which further justifies the choice to evolve SNNs in this thesis, albeit in software. The idea is that if the experiment works with a simulator like NEST, then it can easily be set up with a neuromorphic simulator later. However, due to the complexity of NEST it can be challenging to use it with other packages, which resulted in some experiments involving NEAT evolving SNNs to use SNNs implemented in Python.

Other tools could have been used to close further the gap between evolving SNNs in software and neuromorphic systems. One of these tools is called PyNN section 3.1, which is an API that is mapped to functions in the software of neuromorphic systems like BrainScaleS as well as software

(30)

simulators like NEST. This thesis touches on experiments that involve implementations using pyNN, although most implementations were depre- cated because of bugs and incompatibilities in pyNN that made it difficult to optimize the experiments.

2.2.5 Hardware SNN Simulators

The following section is a long one, because it thoroughly describe BSS-1.

The description might seem out of scope, because a full experiment was not implemented, but the described information is a bare minimum requirement when implementing. Thus it was used in the Proof-of-Concept (POC), see section 3.9 on page 41. Before describing neuromorphic hardware Evolvable Hardware (EH) should be briefly mentioned as a basis: Inspired by biology, EH is an attempt to copy these traits mentioned above of biological systems by applying EAs to hardware design. See section 2.2.7 for an explanation of EAs.

Two Main Classes of EH There are two main classes of EH, Extrinsic and Intrinsic EH. Extrinsic EH is the approach of evaluating the evolved electronic circuit through simulation rather than actual building and testing. This approach might be advantageous in terms of costs and hardware design but is limited by the simulation and will not fully utilize the specific device characteristics. Intrinsic EH, on the other hand, is characterized by evaluating configurations on programmable hardware. This approach might lead to high utilization of the actual device but is limited by a pre- built design, which might be costly to change. Therefore, Intrinsic EH is often configurable, which can results in outputs that are very complicated to understand from a human perspective. Common challenges with Extrin- sic EH are that the solution can only be as right as the simulation model.

The EH model can "overfit" with the simulation, resulting in a "reality gap."

[63]

The standard hardware for the development of EH is Field-Programmable devices, either digital or analog. Digital devices are typically Field-Programmable Gate Arrayss (FPGAs). Analog devices are typically Field-Programmable Analog/Transistor Arrays (FPAAs/FPTAs). While there are excellent syn- thesis tools for digital circuitry, analog electronics development lacks these same kinds of tools. Therefore, EH has proven useful for designing analog circuits, which are required to create and process analog signals. The Hei- delberg Field-Programmable Transistor Arrays (FPTA) has been one out of several successful architectures in this field and has been used to realize a wide range of applications; including analog filters, comparators, Digi- tal to Analog Converters (DACs), Analog to Digital Converters (ADC), and Operational Amplifiers (Op-Amps)[64].

Furthermore, a natural way to realize such systems are through configurable circuits. A specialized branch of EH that spawned in the early 2000s was the Networks-on-Chip (NoC) paradigm, which was, similarly to neuromorphic systems, a promising solution to the high-throughput and high-interconnect requirements of large-scale multi-processor systems, previously mentioned as the “von Neumann bottleneck” [65][45][38]. Today, most developers consider the von Neumann architecture the most efficient

(31)

and scalable architecture for doing arithmetic computation. However, the NoC paradigm contain many successful neuromorphic architectures [45].

Most are developed by and for researchers, but some are oriented towards commercial applications such as the systems devolped by Qualcomm, Intel and IBM [22][66][67][45].

2.2.5.1 BrainScaleS-1 (BSS-1)

BSS-1 was chosen as a hardware-simulator for this study as it enables mixed-signal simulation of SNNs at a10⁴ speed-up factor compared to biological time, where similar simulation is often at a 10⁻2 speed-up factor compared to biological time [22].

High Input Count Analog Neural Network (HICANN) The analog NoC ASIC EH-chip called HICANN [68] is the minimal building block of BSS-1. The HICANN is a full custom featuring configurable neural network arrays. The neuron model implemented is based on a spiking neural network model and is realized using Op-Amps and capacitors. While using analog circuitry to model neurons and synapses, the HICANN uses a digital, asynchronous bus interconnect, both on-chip and external connections, featuring DACs and decoders. Manufactured in a 180 nm CMOS technology, the HICANN features 114,688 programmable dynamic synapses and up to 512 neurons [69]. [63][45]

The Primary Application of BSS-1 BSS-1 was designed primarily to study physical neural dynamics at a higher rate than available in biology.

Thus, BSS-1 was designed with a speed-up factor for 1000 - 10 000 compared to biological wall time. Modeling of biological neurons and synapses are realized with physical, analog components, while the interconnections are digital and programmable to increase efficiency [69]. In addition to brain research, this neuromorphic system may enable new applications in robotics, artificial intelligence, and human-machine interfaces. Using a physical model keeps a one-to-one relationship between the neurons and synapses of the biological example and the model, preserving the fault tol- erance concerning the loss inherent to the biological brain.

Reduced Resource Requirements The power- and material costs when simulating the neuron is reduced by several orders of magnitude using only a few analog components per neuron, compared to several millions involved in the same task while solving these equations numerically in a micropro- cessor core [70]. As the neurons on the HICANN chip are emulated with analog electronics rather than with a high number of arithmetic operations, the circuitry is power-efficient, and a complete HICANN consumes only 1.3W/cm². The theoretical worst-case power consumption of a wafer module is 2 kW [69].

The Current Implementation The current BSS-1 implementation is at the Kirchhoff-Institute for Physics at Heidelberg University, enables up to 20 wafer modules, with up to 200 000 neurons and 40 000 000 synapses, per

(32)

wafer. Wafer-Scale integration was chosen to allow for the extreme bandwidth requirements of the accelerated system [69]. The Wafer-Scale integration is partly following the proposal of C. Mead, who proposed intercon- necting chips with analog components by integrating the production wafer [38]. The base chips on the BSS-1 wafer are the HICANN, section 2.2.5.1 on the previous page. A single 200 mm wafer carries 384 HICANN chips.

Equipped to the wafer is FPGAs that handle communication with other wafers and with the host computer. In addition to delivering signals to and from each wafer, the FPGAs is used to configure the chips. [69]. On one wafer module, there are 48 FPGAs, each equipped with Gigabit-Ethernet, to handle the high amount of event data per time that will occur on the accelerated system. 12 Gigabit connections are routed to each edge of the module to communicate with other wafer modules and the host computer.

On the wafer, one FPGA controls 8 HICANN chips that together account for one reticle. Every HICANN has two full-duplex serial LVDS links with separate clock and data lines to the FPGA module, and each link can transmit two GBit/s. One fundamental concept with the HICANN is that it allows the construction of neurons with over 10 000 pre-synaptic connections. The high connectivity leads to a transmission requirement of 1.4 GEvents/s per neuron, which is why the silicon wafer is kept as a whole to produce shorter transmission lines and a lower capacitive load [69].

The Neuron Model The neuron model at the basis of the HICANN is called the Adaptive Exponential IAF model (AdExp) [62]. The AdExp model was co-developed by the FACETS project [70], a predecessor to the BSS-1 project, which is now part of the neuromorphic computing platform of HBP.

The model contains several additions compared to the standard IAF:

−Cm

dV

dt =gl(V−El)−gl∆thexp(V −V_th

∆th

)+ge(t)(V−Ee)+gl(t)(V−Ei)+w(t) (2.10) The variablesC_m, g_lE_l, E_eandE_i are the membrane capacity, the leakage conducatance and the leakage, exciatory and inhibitory reversal potentials.

The variablesg_e(t)and g_i(t) represent the total excitatory and inhibitory synaptic conductances. The introduced addition to the standard IAF model is theexponentialterm on the right-hand side of the equation, which models the near-asymptotic growth of the membrane potential under certain conditions. Thethreshold potentialVthrepresents the critical value above which this rapid growth can occur, and theslope factor∆thdetermines the rapid- ness of the triggered growth. Such a situation is interpreted as a spike.

Each time a spike is detected, a separately generated output event signal is transmitted to possible connected target neurons or recording devices. The membrane potential is forced to a reset potentialVresetby an adjustable reset conductance. A second equation describes the temporal evolution of the so-calledadaption currentw(t):

−τ_wdw

dt =w(t)−a(V −E_l) (2.11) Every time a spike is emitted by the neuron,wchanges its value:w→w+b.

The time constant and the efficacy of the so-called sub-treshold adaption

(33)

Figure 2.4: Schematic diagram of the AdExp neuron circuit, taken from [70].

mechanism are given byτ_w anda, whilebdefines the amount of so-called spike-triggered adaption. The exponential term of equation 2.10 and the adaption function of equation 2.11 can be deactivated to reduce the AdExp model to the standard IAF model. [70][62]

Fig. 2.4 shows the individual circuit components and fig. 2.5 illustrates the firing modes of this neuron circuit. In the currently effective implementation of BSS-1, the neurons are implemented in a 180 mm CMOS technology. By design, the system is scalable with newer generations of CMOS technology and HICANN chips [69]. Millner et al. [71] describe the hardware implementation of the neuron to great detail. The paper also report that the emulation of an IAF neuron with the implemented hardware is 3000 times more power-efficient and accelerated by a factor of 10 000 compared to an Izhikevich neuron simulated on a supercomputer [71]. The Izhikevich neuron is a popular model because it gives an accurate repre- sentation of a neuron’s functionality while being simple to model [56].

The Composing Structure of the neurons, with their respective synapses, is called the Analog Network Core (ANC), shown in fig. 2.6. The neurons are constructed by Dendrite Membrane (DenMem) circuits that allow neurons with up to 14336 synaptic inputs per neuron. The synaptic weights, stored in a four-bit SRAM, are represented by a DAC current. STDP is implemented by the use of capacitors modulating the signals and an algorithm that manipulates the digitally stored weights. Fig. 2.7 shows a schematic diagram of the synaptic circuit, wheregmaxis a programmable analog parameter controlling the DAC scale. In each ANC, communication is asynchronous and mixed-signal. The cores are highly integrated systems working in a continuous-time mode. [70]

The Communication Protocol Communication on a wafer happens on several levels; There is the communication happening between the local neurons and synapses, then there are Layer 1 (L1) and Layer 2 (L2) chan- nels depending on how far the communication reaches. This subsection dis- cusses the system’s inter-neuron communication in terms of how closely it

(34)

Figure 2.5: Example of firing modes of the AdExp neuron circuit, taken from [70].

Figure 2.6: To the left is a single HICANN chip, and to the right is diagram of the ANC which is located at the center of the chip, taken from [72].

(35)

Figure 2.7: Schematic diagram of a synapse circuit, taken from [70].

models nature, which is vital information for any comparison analysis - for example, involving Integrated Information Theory, section 2.1.4 on page 11.

The communication protocol between ANCs is L1, a real-time serial event protocol operating at up to 2 Gb/s. The protocol uses two time-frame bits and six address data bits and uses continuous-time transmission [68].

This digital transmission protocol limits power consumption while retaining continuous communication [70]. Wafer-Scale integration was selected to support an accelerated system’s channel density requirements, where each neuron can receive over 10k pre-synaptic inputs [70]. The solution of the Wafer-Scale integration is explained in detail in [69] and [72]. The inter- ANC communication on a wafer is real-time, serial, and only dependent on the local circuits’ states.

Each HICANN chip has an L2 channel to allow for inter-wafer communication. Due to latency issues involved with real-time long-range communication, L2 is based on packets-switching and digitized time stamps [70]

[72]. This digital communication in-between wafers, or with the host computer, is handled by FPGAs [69]. Therefore, inter-neuron communication between wafers has a lower level of integration.

2.2.6 Evaluation

Evaluation of models in learning algorithms need to be fitted to the test they are training for. We want to test the performance of SNN models, but on what? This section covers possible evaluation methods, what factors to evaluate, and why we chose an ACP problem.

Intelligence For the past 70 years, a myriad of researchers and developers has been on a quest for a human-made system that might replicate the human function, and behavior [3]. The development of autonomous systems has gotten so far it can imitate human function efficiently, but only in a few use-cases at a time. While the specific ability of AI has increased, it is argued that their intelligence has not[73]. Several scientists and philoso- phers have proposed theories and tools to evaluate the level of intelligence of a system. For example, Alan Turing proposed the Turing Test, inspired

(36)

by the party game The Imitation Game, to assess whether a machine is intelligent [14] based whether it make another person believe that it’s ac- tions were taken by a human being. Some computer programs exhibiting autonomous functions seem to stretch over a broad field of capabilities and to fool human agents at the Turing Test for prolonged periods. However, most illusions are broken with time because, ultimately, the AI that per- forms best on the Turing Test is not general but still engineered explicitly for that one task. The previously mentioned AlphaZero, AlphaStar, and GPT-3 are examples of this kind of AI. These mitigated results doesn’t seem to discourage researchers in their quest for an intelligent AI.

Integration ("Consciousness") While evaluating a neural network in terms of consciousness might be premature, considering that the field of consciousness research has no measure of consciousness even in humans.

Certain theories suggest that consciousness is correlated with some general properties of networks that might be easy to capture. For example, the integrated information theory (IIT) suggests that the degree to which a network integrates information above and beyond the network’s parts pre- dicts the "degree" to which the system is conscious. While their suggested measure Phi is NP-hard, proxy measures exist that might be useful [74].

However, while some have proposed that consciousness level indicates a level of intelligence [75], the reverse might not be the case.

Efficiency The efficiency of a neural network can be judged on several levels. First is how efficient it is to run a pre-trained network on a certain task. The more efficient, the more feasible it is to implement such a network, such as image-classification on a mobile phone. Secondly is how efficient a network is in terms of training. The longer it takes to tune, the more costly it is to develop new solutions to a specific task or training set of material. Finally, a network’s efficiency might be judged by how quickly it can be retrained to a novel task or training set. This final measure of efficiency could be important for solving tasks that evolve or reduce development time on new architectures for new task domains.

2.2.6.1 ACP and other Evaluation Methods

There are many different tasks to choose from in testing AI capabilities;

handwriting recognition, image classification, speech decoding, translation, and many others. Most of these types of tasks are useful to test networks because they share aspects of tasks we would like to be performed by an AI.

However, when evaluating the performance of SNNs with a goal of showing good use-cases for models with higher biological plausibility then assessing general cognitive abilities might be a good idea.

A well-known type of benchmarking tasks that have gained a lot of popular media attraction are AI performance in games such as chess, Go, console games, and recently Starcraft 2[1, 7]. While these games can be very complicated, and might seem to require general cognitive abilities, they might be too complex to understand, compare, and develop AI architectures.

A lesser known, but potentially powerful task is to build artificial environments, such as Polyworld, populated by agents, which have to com-

For Correlating Biological Plausibility and Performance, With the Goal of

Evolving Spiking Neural

Networks on Active Categorical Perception Problems

For Correlating Biological Plausibility and Performance, With the Goal of

Replicating on a Mixed-Signal Neuromorphic System

Daniel Sander Isaksen

Thesis submitted for the degree of

Master in Informatics: Robotics and Intelligent Systems

60 credits

Department of Informatics

Faculty of mathematics and natural sciences

UNIVERSITY OF OSLO

Evolving Spiking Neural Networks on Active Categorical Perception

Problems

For Correlating Biological

Plausibility and Performance, With the Goal of Replicating on a

Mixed-Signal Neuromorphic System

Daniel Sander Isaksen

Abstract

Acknowledgements

Abbreviations

Contents

List of Tables

List of Figures

Chapter 1

Introduction

1.1 Motivation

1.2 Goal and Research Question

1.3 Thesis Structure

Chapter 2

Background Theory

2.1 General Background

2.1.1 Brief History of AI Research

2.1.2 Cost of Computations

2.1.3 Biologically Inspired Systems

2.1.4 Intelligence, Fitness and Consciousness

2.2 Technical Background

2.2.1 Artificial Neural Networks (ANNs)

2.2.2 Spiking Neural Networks (SNNs)

2.2.3 Summary of Neuron Models

2.2.4 Software SNN Simulators

2.2.5 Hardware SNN Simulators

2.2.6 Evaluation