Development of neuromorphic hardware and non-iterative learning designs for edge computing applications

(1)

DOCTORAL THESIS 2020

DEVELOPMENT OF NEUROMORPHIC HARDWARE AND NON-ITERATIVE LEARNING

DESIGNS FOR EDGE COMPUTING APPLICATIONS

Fabio Galán Prado

(2)

(3)

DOCTORAL THESIS 2020

PhD in Physics programme

DEVELOPMENT OF NEUROMORPHIC HARDWARE AND NON-ITERATIVE LEARNING

DESIGNS FOR EDGE COMPUTING APPLICATIONS

Fabio Galán Prado Supervisor: Josep Lluis Rosselló Sanz Tutor: Claudio Rubén Mirasso Santos

Doctor by the University of the Balearic Islands

(4)

(5)

Funding

This work has been partially supported by the Spanish Ministry of Science, Innovation and Universities, the State Research Agency and the Regional European Development Funds (FEDER) under grant contracts TEC2014-56244-R, TEC2017-84877-R and fellowship (BES-2015-076161).

i

(6)

(7)

Authorship Certificate

Dr José Luis Rosselló Sanz, Associate Professor at the Physics Department of the University of the Balearic Islands

I DECLARE:

That the Thesis entitledDevelopment of neuromorphic hardware and non-iterative learning designs for edge computing applications, presented byFabio Galán Prado to obtain a doctoral degree, has been completed under my supervision.

For all intents and purposes, I hereby sign this document.

Signature

Palma de Mallorca,

iii

(8)

(9)

Abstract

The number of components on an integrated circuit has been doubled every two years for the last decades as predicted by Gordon Moore in 1975. However, there is evidence that Moore’s law will expire soon, leading to a point at which the physical laws or the low economic profitability will prevent the CMOS technology from developing as it used to. As a consequence, the research efforts have focused on finding alternatives to the classic cramming of components onto the integrated circuits. These alternatives span from different manufacturing such as nanowires, memristors or 3D circuit integration to newer unconventional circuits designs such as those implementing or neural networks.

In this Thesis several novel digital circuitry designs are proposed for the neural networks field. Firstly, a fast and efficient Stochastic Spiking Neuron circuit model is proposed, which is tested and employed to build a Stochastic Spiking Neural Network for pattern recognition purposes. The proposed circuitry is configured onto a Field Programmable Gate Array (FPGA), showing a good efficiency in terms of speed and hardware resources if compared with the state of the art. The second proposed concept of this Thesis is a new architecture of reservoir computing which is applied to time series prediction, also implemented on an FPGA. The results prove that the proposed design is more accurate and more energy-efficient than the state of the art implementations. Finally, a fully parallel digital circuitry that is capable of performing the neural network training on-chip is proposed. For this purpose an unconventional algebra formulation is used for the sake of the circuitry’s simplicity.

The sequential version of this circuitry is also proposed in order to save resources by sacrificing speed. Both final proposed circuits are programmed on an FPGA as well.

v

(10)

(11)

Resumen

El número de componentes en un circuito integrado ha sido doblado cada dos años durante las últimas décadas, tal como predijo Gordon Moore en 1975. Sin embargo, existen indicios de que la ley de Moore pronto dejará de ser válida, conduciéndonos a un punto en el que las propias leyes físicas o la baja rentabilidad económica impedirán que la tecnología CMOS se siga desarrollando como lo había hecho hasta ahora. A consecuencia de esto, los esfuerzos de investigación en la materia se han centrado en la búsqueda de alternativas a la saturación de componentes en circuitos integrados.

Estas alternativas varían desde distintos procesos de fabricación como los nanohilos, los memristores o la integración de circuitos en 3D, hasta la implementación de nuevos algoritmos y formas de computación tales como las redes neuronales.

En esta Tesis se proponen varios diseños novedosos de circuitería digital aplicados al campo de las redes neuronales. En primer lugar, se propone un modelo de Neurona Estocástica tipo Spiking rápido y eficiente. Se comprueba que el funcionamiento es correcto y además se emplea para la construcción de Redes Neuronales Estocásticas de tipo Spiking. Este modelo es implementado digitalmente en una Matriz de Puertas lógicas Programable (FPGA), mostrando una buena eficiencia en términos de velocidad y recursos hardware si se compara con el Estado del Arte. El segundo concepto propuesto en esta Tesis es una nueva arquitectura para redes neuronales tipo reservoir computing que se aplica a la predicción de series temoporales, y que también se implementa en una FPGA. Los resultados demuestran que nuevamente el diseño propuesto es más preciso y más eficiente en términos energéticos que el estado del arte. Por último, se propone una circuitería completamente paralela que es capaz de realizar el entrenamiento de redes neuronales en el propio chip. Para esta propuesta, se utiliza una formulación algebraica no convencional con el fin de simplificar la circuitería. También se propone la versión secuencial con el fin de ahorrar recursos sacrificando velocidad. Ambas propuestas finales también son programadas en una FPGA.

vii

(12)

(13)

Resum

El nombre de components en un circuit integrat ha estat doblat cada dos anys durant les últimes dècades, tal com va predir Gordon Moore al 1975. No obstant això, hi ha indicis que la llei de Moore aviat deixarà de ser vàlida, conduint-nos a un punt en el que les pròpies lleis físiques o la baixa rendibilitat econòmica impediran que la tecnologia CMOS se segueixi desenvolupant com ho havia fet fins ara. A conseqüència d’això, els esforços d’investigació en la matèria s’han centrat en la recerca d’alternatives a la saturació de components en circuits integrats. Aquestes alternatives varien des de diferents processos de fabricació com els nanofils, els memristors o la integració de circuits en 3D, fins a la implementació de nous algoritmes i formes de computació tals com les xarxes neuronals.

En aquesta Tesi es proposen diversos dissenys nous de circuiteria digital applicats al camp de les xarxes neuronals. En primer lloc, es proposa un model de Neurona Estocàstica tipus Spiking ràpid i eficient, el correcte funcionament és comprovat i emprat per a la construcció de Xarxes Neuronals Estocàstiques de tipus Spiking.

Aquesta proposta és configurada en una Matriu de Portes lògiques Programable (FPGA), mostrant una bona eficiència en termes de velocitat i recursos hardware si es compara amb l’Estat de l’Art. La segona proposta d’aquesta Tesi és una nova arquitectura per a xarxes neuronals tipus reservoir computing que és aplicada a la predicció de sèries temoporales, i que també és implementada en una FPGA. Els resultats demostren que el disseny proposat és més precís i més eficient en termes energètics que l’estat de l’art de les implementacions de reservoir. Finalment, es proposa una circuiteria completament paral·lela que és capaç de realitzar l’entrenament de xarxes neuronals en el propi xip. Per a aquesta proposta, es fa ús d’una formulació algebraica no convencional amb la finalitat de simplificar la circuiteria. També es proposa la versió seqüencial per tal d’estalviar recursos sacrificant velocitat. Les dues propostes finals també són programades en una FPGA.

ix

(14)

(15)

Author’s Contributions

The contributions from Chapter 2 are included in [2], [4], [5] and [6]. While the contributions from Chapter 3 are partially published in [1] and [3].

[1] F. Galán-Prado, J. Font-Rosselló, and J. L. Rosselló. “Tropical Reservoir Computing Hardware”. In:IEEE Transactions on Circuits and Systems II:

Express Briefs 67.11 (Nov. 2020).Q1, pp. 2712–2716.ISSN: 1558-3791.DOI: 10.1109/TCSII.2020.2966320.

[2] Fabio Galán-Prado, Alejandro Morán, Joan Font, Miquel Roca, and Josep L. Rosselló. “Compact Hardware Synthesis of Stochastic Spiking Neural Networks”. In: International Journal of Neural Systems 29.08 (2019).Q1, p. 1950004. DOI:10.1142/S0129065719500047.

[3] F. Galán-Prado, J. Font-Rosselló, and J. L. Rosselló. “Morphological Reser- voir Computing Hardware”. In:2019 29th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS). July 2019, pp. 141–144.DOI: 10.1109/PATMOS.2019.8862100.

[4] F. Galán-Prado, A. Morán, J. Font, M. Roca, and J. L. Rosselló. “Stochastic Radial Basis Neural Networks”. In:2019 29th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS). July 2019, pp. 145–149.DOI: 10.1109/PATMOS.2019.8862129.

[5] A. Morro, V. Canals, A. Oliver, M. L. Alomar, F. Galán-Prado, P. J.

Ballester, and J. L. Rosselló. “A Stochastic Spiking Neural Network for Virtual Screening”. In:IEEE Transactions on Neural Networks and Learning Systems 29.4 (Apr. 2018). Q1, pp. 1371–1375. ISSN: 2162-2388.DOI: 10.

1109/TNNLS.2017.2657601.

[6] Fabio Galán-Pradoand Josep L. Rosselló. “Smart Hardware Implementation of Spiking Neural Networks”. In: Advances in Computational Intelligence. Ed. by Ignacio Rojas, Gonzalo Joya, and Andreu Catala. Cham: Springer International Publishing, 2017, pp. 560–568.ISBN: 978-3-319-59153-7. DOI: 10.1007/978-3-319-59153-7_48.

Article [1] proposes a new simple cycle reservoir network architecture which manages without the use of the arithmetic multiplication, thus drastically reducing the hardware area. It shows great results in terms of both accuracy and energy-efficiency and has been published in a Q1 journal. Article [2] proposes a new stochastic spiking

xi

(16)

neuron digital model which shows several advantages when compared with the state of the art. It has been published in a Q1 journal. Articles [3] and [6] relate to [1]

and [2] but in an earlier state of development. Finally, Article [4] studies thoroughly an application of the neuron model presented in [2] and article [5] is related to a real-life application of an Stochastic Spiking Neural Network.

(17)

List of Figures

1.1 Scheme of a biological neuron and its interconnection. . . 3

1.2 Scheme of an artificial neuron. . . 4

1.3 Cloud computing scheme. . . 6

1.4 Edge Computing Scheme. . . 8

2.1 Correlation among signals. . . 16

2.2 Boolean representation of neural spikes. . . 20

2.3 Neuron behavior scheme. . . 22

2.4 Closed-loop stochastic neuron’s scheme. . . 23

2.5 CAD’s flux diagram. . . 24

2.6 Example of a NN generated by the CAD tool. . . 25

2.7 Scheme of the network described by Tables 2.2, 2.3 and 2.4 . . . . 26

2.8 Network’s scheme reproducing function(1− |x−a|)^C. . . 27

2.9 Open-loop stochastic spiking neural circuit. . . 28

2.10 Comparison between the results obtained by using a previously published SSNN model and the proposed in this Thesis. . . 29

2.11 General vector comparator scheme. . . 30

2.12 Output from two vector comparators. . . 31

2.13 Timing response of the proposed neuron model. . . 32

3.1 Scheme for the general structure of a RC system. . . 41

3.2 Scheme for the special reservoir architecture and both the on-chip and off-chip non-iterative training processes for one-dimensional TSP. 43 3.3 Tropical SCR scheme. . . 45

3.4 Scheme for the TE concept. . . 46

3.5 Scheme for the selection of the smaller when using the TE. . . 47

3.6 Representation for the addition process in the temporal domain by means of DTC components. . . 48

3.7 Scheme for the fully parallel hardware implementation of the training and inference circuitry when using TE. . . 49

3.8 Scheme for the fully parallel hardware implementation of the training and inference circuitry for when using SC. . . 51

3.9 Scheme for the sequential hardware implementation of the training and inference circuitry. . . 53

xv

(20)

3.10 Scheme for the hardware implementation of the Min Adder component. 54 3.11 Scheme for the hardware implementation of the Max Adder component. 55 3.12 Digital design of the tropical neuron. . . 55 3.13 Matrix scheme of the training process for TSP. . . 58 3.14 NMSE as a function of τ for Santa-Fe series. . . 60 3.15 Error vs. reservoir size for the Mackey-Glass time series forecasting

problem. . . 61 3.16 Scheme for the fully parallel hardware implementation of the training

and inference circuitry for one-dimensional time series when employing a SCR. . . 64 3.17 NMSE vs N and L for the Mackey-Glass time series benchmark. . . 66 3.18 NMSE vs N and L for the Santa-Fe time series benchmark. . . 66 3.19 Santa-Fe TSP task employing a SCR and tropical-based on-line and

on-chip training. . . 68 3.20 Mackey-Glass TSP task employing a SCR and tropical-based on-line

and on-chip training. . . 69

(21)

List of Tables

1.1 Societal grand challenges. . . 2

2.1 PTM elements used to derive the stochastic function of a two-input combinatorial circuit atSCC levels0,−1and+1[110] . . . 19

2.2 Example of connectivity matrix. . . 25

2.3 Example of correlation matrix. . . 26

2.4 Example of output Matrix. . . 26

2.5 Comparison between the proposed model and the open-loop spiking neural model. . . 33

3.1 Six useful binoids [117]. . . 35

3.2 Number of hardware lookup tables required to perform an 8-bit and 16-bit binary operation. . . 45

3.3 Equivalence between decimal and binary numeral systems forN = 3. 56 3.4 Equivalence between decimal and binary numeral systems forN = 3. 56 3.5 Comparison between the different training methodologies employed in this Thesis. . . 59

3.6 Spent logic elements for different sizes of the reservoir using a con- ventional digital design [157] and the proposed design. . . 61

3.7 Comparison with previously published results for the Mackey-Glass series. . . 62

3.8 Comparison with previously published results for the Santa-Fe series. 62 3.9 Hardware resources consumption as a function ofN andL for TE. . 67

3.10 Hardware resources consumption as a function ofN andL for SC. . 67

3.11 Hardware resources consumption as a function ofN andLfor the sequential flavor. . . 67

3.12 Comparison with previously published results for the Mackey-Glass series. . . 71

3.13 Comparison with previously published results for the Santa-Fe series. 71 3.14 Comparison between the different implementation methodologies employed for the Santa-Fe TSP in this section. . . 72

xvii

(22)

(23)

Acronyms

AI Artificial Intelligence. 2, 3

ALM Adaptive Logic Module. 31, 33, 65, 67–69 ALUT Adaptive Look-Up Table. 65, 67–69, 72 ANN Artificial Neural Network. 1, 2, 11–13 ASIC Application-Specific Integrated Circuit. 21 CAD Computer Aided Design. xv, 19, 21, 24, 25, 27 CNN Convolutional Neural Network. 6

DLR Dedicated Logic Register. 65, 67

DTC Digital to Temporal Converter. xv, 46, 48–50, 63, 64 EC Edge Computing. 6, 7

EF Energy Efficiency. 71, 72

ELM Extreme Learning Machine. 48, 62

ESN Echo State Network. 8, 40, 42, 44, 59, 61–63 FF Feed-Forward. 5

FPGA Field Programmable Gate Array. v, vii, ix, 13, 19, 21, 31, 33, 42, 45, 59, 61–63, 65, 67–72

IoT Internet of Things. 7

LFSR Linear-Feedback Shift Register. 15, 21 LIF Leaky Integrate and Fire. 20

LSM Liqued State Machine. 40

xix

(24)

ML Machine Learning. 1

MNN Modular Neural Network. 6 MR Membrane Register. 21

NMSE Normalized Mean Square Error. xvi, 59, 60, 62, 63, 66, 68, 69, 71, 72 NN Neural Network. xv, 1–6, 8, 12, 13, 19, 21, 25, 33, 48, 50, 52, 65, 70 PTM Probabilistic Transfer Matrix. xvii, 17–19

RBF Radial Basis Function. 5

RC Reservoir Computing. xv, 5, 6, 8, 40, 41, 48, 67 ReCA Cellular Automata Reservoir. 40

RNG Random Number Generator. 15 RNN Recurrent Neural Network. 5, 62

SC Stochastic Computing. xv, xvii, 13–15, 51, 65, 67, 68, 70 SCC Stochastic Cross Correlation. 17

SCR Simple Cycle Reservoir. xv, xvi, 44, 45, 54, 59–65, 68, 69, 72 SNN Spiking Neural Network. 11, 12

SOM Self Organizing Map. 6 SS Stochastic Signal. 15

SSNN Stochastic Spiking Neural Network. xv, 12, 19, 20, 29, 31 TDC Temporal to Digital Converter. 46

TE Temporal Encoding. xv, xvii, 46, 47, 49, 50, 63–65, 67–70 TIC Training and Inference Circuitry. 67

TSP Time Series Prediction. xv–xvii, 42–44, 52, 58, 59, 63, 68, 69, 72

(25)

Chapter 1

Introduction

The current demand implies the development of powerful portable devices. During the second half of the 20^thcentury and the beginning of the 21^st, the electronic manufacturing industry has been able to keep up with the so called Moore’s Law [1].

Experts prognosticate that this will soon come to an end [2, 3]. Another factor at play is the fact that the power usage of a device stops being proportional to the area when devices become small enough, that is, the so called Dennards scaling law [4].

All these facts among others suggest that a point at which the physical, technological and economic limitations of CMOS technology will be unavoidably reached.

As a consequence, both private and public research institutions have focused on figuring out alternatives to the classic cramming of components into the integrated circuits. These alternatives span from different manufacturing such as nanowires [5], memristors [6] or 3D circuit integration [7] to newer algorithms or unconventional computing such as Machine Learning (ML) [8], Artificial Neural Networks (ANNs) [9] or reversible computing [10]. This Thesis is focused on ML, especially on the Neural Networks (NNs) sub-field, because of what is presented below.

“ML has proven to be both broad enough to apply to many domains and narrow enough to benefit from domain-specific architectures, such as Google’s Tensor Processing Unit” [11]. Table 1.1 includes 16 of the current most important society’s problems. Most of them “may seem out of the skill-set of scientists and engineers”.

Nevertheless, there are authors [11] who believe that “the advances in ML can be translated into advances in those challenges”.

“The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. It has been run annually from 2010 to 2017” [13]. In 2012 Alex Krizhevsky and collaborators achieved error rates that were considerably better than the state of the art by means of the use of deep-neural-networks [14]. That led to a sequence of further improvements such that those machines can now even exceed human accuracy in image-recognition tasks [15–18]. Similar milestones have also boosted fields such as speech recognition [19], human language translation [20, 21], medical diagnosis [22–24] or even playing games [25].

1

(26)

1. Advance personalized

training

2. Make solar

energy affordable 3. Enhance virtual

reality 4. Reverse

engineer the brain 5. Engineer the

tools for scientific discovery

6. Advance health

informatics 7. Restore and improve urban infrastructure

8. Secure cyberspace 9. Provide access

to clean water 10. provide energy

from fusion 11. prevent

nuclear terror 12. Manage the nitrogen cycle 13. Develop

carbon sequestration

methods

14. Engineer

better medicines Enable universal

communication Build flexible general purpose AI

systems

Table 1.1: Societal grand challenges. The National Academy of Engineering proposed the first 14 in 2008 [12]. A decade later, the authors of [11] added two new challenges regarding universal communication and AI systems, which seem appropriate for today.

1.1 Neural networks

The brain is the organ where the central neural system is placed in every vertebrate and invertebrate animal. It is composed of an intricate network with billions of interconnected cells calledneurons. The study of ANNs aims to understand how a large collection of interconnected elements (the neurons) can be used to efficiently process the information.

In biology, neurons are electrically excitable cells that receive multiple signals in the form of electrical pulses from other neurons. These pulses are processed in such a way that an accumulation of them will trigger a new electrical pulse in the receiving neuron. The pulse will then be transmitted to the other neurons which are connected to the former, as shown in Figure 1.1. Thus, each neuronprocesses the input pulses and generates an output pulse. A typical neuron consists of a cell body (soma), dendrites and a single axon (no neuron has ever more than one axon) as shown in Figure 1.2. The soma is the bulbous part of the neuron from which the axon and the dendrites extrude. Neurons communicate with each other through synaptic connections. This interconnection may change when they receive external stimuli, that is how learning occurs in living organisms. Throughout this Thesis, the term NN refers to ANN rather than biological ones.

NNs (or ANNs) are an abstraction of the human nervous system that employs computational units inspired in the biological neurons. They aim to learn and process information in order to create Artificial Intelligence (AI) just as real neurons do.

Obviously, this is not a simple task. The human brain weighs∼¹/50of the whole body weight. However, it consumes∼20%of the whole body consumption, which is roughly20W. This is incredibly low when compared with the amount of energy

(27)

1.1. NEURAL NETWORKS 3

Axon Terminal

Synapse Neurotransmiter

Swan Cell

NucleusCell

Axon NEURAL IMPULSE

(Cell body)Soma

Figure 1.1: Scheme of a biological neuron and its interconnection.

that the most advanced of today’s computer requires to perform specific brain’s tasks. The brain is especially skilled to perform some tasks like pattern recognition or times series prediction. In our daily routines, the humans recognize people, animals, objects and so on. These recognized elements are often non-static so we have to infer what they may do and how we might react. Initially, NNs caught lots of attention when they appeared for the very first time soon after the advent of computers, especially because of their appliance to AI. However, the lack of the amount of data and computational power required for NNs to run led to a loss of interest in the field. It was at the beginning of this century when the requirements of data and computational power were partially met and the field reborn.

Directional connections among neurons determine which neurons are the inputs of other neurons. Each connection has an associated synaptic weight, which may be excitatory or inhibitory. Thus, the output of a given neuron is a function of the weighted addition of the neuron’s inputs which can be the output from another neuron as well. This function is called the activation function, such that the output of a neuron can be expressed as f(P

Wi·xi). The weights represent how strong the synaptic connection among neurons is. Thus, learning occurs by changing the weights that connect the neurons. Learning in NNs occurs just as in biological organisms, that is, data as external stimuli must be provided. This is done in the training stage, which in the case of supervised learning, input-output pairs of the function to be learned are provided.

Universal approximation theorems prove that NNs can, theoretically, reproduce any mathematical function provided sufficient training data, an example of these is in [26]. However these theorems do not commonly provide a construction for the weights, they merely state that such construction is possible. An early version of these theorems was given by George Cybenko in 1989 [27] for single internal hidden layer

(28)

+ W₁

W₂

...

Wk

x₁

x₂

xk

W1·x1

W2·x2

Wk·xk

Activation Function

PWi·xi f(P

Wi·xi)

Figure 1.2: Scheme of an artificial neuron.

feed-forward networks that employ an arbitrarily continuous sigmoid as the activation function. Later in 1991 [28] Kurt Hornik proved that the only requirement of being feed-forward networks suffices to be universal approximators. Other than that, [29]

H.T. Siegelmann and E.D. Sontag proved that first-order recurrent networks, also known asprocessor nets, can simulate any learning algorithm given enough training data, which is also known as being Turing complete. In words of Aggarwal, Charu C.

“The mathematical NN abstraction can be viewed as a modular approach of enabling learning algorithms that are based on continuous optimization on a computational graph of dependencies between the input and output”. In fact, this is not too far from the traditional work in control theory”. “Some of the methods used for the optimization in control theory are strikingly similar and preceded historically the most fundamental algorithms in NNs” [30].

Based on the activation function, neurons can be classified in first and second generation [31]. The so called first generation is related to the use of the the Heaviside step function as the activation function, whereas for the second generation neurons the activation function is a smoother nonlinear one such as sigmoidal, gaussian, hyperbolic tangent and so forth. The first generation of neurons is often called perceptron as well, which was proposed by Rosenblatt’s in 1960. It is seen as a fundamental cornerstone of NNs.

The first simplified model for a biological neuron was introduced in 1943 by Warren McCulloch and Walter Pits [9]. In 1981 Christopher Von der Malsburg from the Technological Institute of California (USA) proposed a newer and more realistic neural model, which he denominatedspiking neurons [32], nowadays they are also called the third generation of artificial neurons. This newer model was intended to help the problem with the interconnection among neurons. This work stands out for pointing out that the neurons which encode the specific features or traits from a particular object do spike in phase. Thus, enabling a single network to represent several objects [33]. These neurons are also much more biologically reliable than the previous first and second generation.

(29)

1.1. NEURAL NETWORKS 5 In 1986 G.E. Hinton and T.J. Sejnowski from the Carnegie-Mellon University (USA) published the paper entitledLearning and relearning in Boltzmann machines [34] in which they described a Boltzmann machine as the means to simulate the biological brain dynamics. This was the very first Stochastic Neural Network.

The ideas from spiking neurons and stochastic neural networks knitted together give rise to the stochastic spiking neural networks. That is the kind of NN that are treated in Chapter 2 of this Thesis.

1.1.1 Types of neural networks

Since the very first time the NN concept appeared, there has been huge amounts of contributions to the field making it vast and complex. Therefore it is hard, or almost impossible, to have a standard, simple and readable classification of all the existing types of NN. In this section, a simplified scheme of the actual paradigm will be sketched out.

• Feed-Forward (FF). This wast the earliest type of NN and was the simplest as well [35]. Neurons in this type of network do not form a cycle. Thus, the information in these networks goes always forward, hence its name. They are composed of an input, an output layer and sometimes a hidden layer, which can at the same time be formed by multiple layers. When there is a significant number of hidden layers in the network they are commonly namedDeep Neural Networks (DNNs).

• Radial Basis Function (RBF) Network [36]. This kind of networks encompasses all those made of neurons with a function of radial basis as the activation function, that is, the value of the function only depends on the distant to some fixed point. They are based on the fact that any continuous function,y(x), on a compact interval can in principle be interpolated with arbitrary accuracy (depending onN) by a sum of the form:

y(x) = XN i=1

wiϕ(kx−xik)

wherey(x)is the approximated function,ωi are the weights andϕ(kx−xik) is the radial basis function or the output from the neurons in a RBF Network.

• Recurrent Neural Network (RNN)[37]. The main feature of these networks is the fact that their connections, unlike the FeedForward type, may form cycles. This enables a temporal dynamic behavior. Reservoir Computing (RC) is widely known and used type of RNN. They consist in an input layer, a hidden layer and an output layer. The hidden layer is where the nonlinear elements go, that is, where the neurons are. The connectivity of the neurons inside the hidden layer is usually random an recursive. In the RC scheme, only the output layer is trained, the rest of weights are fixed. The output from the hidden layer is linearly mapped onto the desired output by means of linear regressions.

(30)

Cloud Server

Producer Data Data

Consumer

Figure 1.3: Cloud computing scheme.

A particular case of RC isecho state networks in which the connectivity of the hidden layer is really sparse (tipically less than 1%) and where classical second generation neurons are employed [38]. Finally another common example is Liquid State Networks which comply to the criteria of echo state networks but the units in the hidden layer are spiking neurons rather than neurons of the second generation.

• Kohonen Self Organizing Map (SOM)[39]. The aim of these networks is mapping the input onto a lower dimensional space while preserving the topological properties of the input by means of a neighborhood function. That is, something like a map of the input, hence the name. Their main difference from the other network types is the application of a competitive learning rather than an error-correction one.

• Convolutional Neural Network (CNN). These networks show clear advan- tage in tasks of pattern recognition. Their procedure advantage is based on data assembling of complex patterns from smaller and simpler ones, which enables lower connectivity and complexity. They are inspired by the fact that in the animal visual cortex only a restricted region reacts to specific visual stimuli [40].

• Modular Neural Network (MNN). These are characterized for being composed of several independent NNs that interact by means of some intermediary.

They are inspired in the biological fact discovered by Azam Farooq in his PhD Thesis [41] which establishes that several different regions in the animal brain cooperate to accomplish complex tasks by performing one subtask on each region.

1.2 Edge computing

The origins of Edge Computing (EC) go back to the late nineties when content delivery networks began to have an increasing demand to serve web and video content. This turned them vulnerable to the flash crowd problem, in which the resulting overload of the site’s infrastructure can cause crashes or unusually high response times of the web sites. As a consequence of that, companies like Akamai [42] devised a system to serve requests from a variable number of surrogate origin servers at the network edge. These servers cache content at the Internet’s edge,

(31)

1.2. EDGE COMPUTING 7 thus reducing the demand on the site’s infrastructure and providing a faster service for users, whose content comes from nearby servers.

An important factor in the recent success of EC is the proliferation of Internet of Things (IoT). IoT was introduced by Ashton, K. in a presentation he made at Procter

& Gamble (P&G) in 1999. [43]. He meant by IoT that “computers (and, therefore, the Internet) are almost wholly dependent on human beings for information”. That is, “data available on the Internet were first captured and created by humans by means of typing, pressing a record button etc”. However, “people have limited time, attention and accuracy, which can be translated as they are not very good at capturing data about things in the real world”. Therefore, if computers were independent from us, tracking and counting everything would be possible, and waste, loss and cost would be greatly reduced. IoT is then about to empowering computers with their own means to gather information, so they can see, hear and smell the world for themselves.

Currently with IoT the amount of data generated by things that are immersed in our daily life is orders of magnitude greater than the global data center IP traffic can handle [44]. For example, about half a terabyte of data will be generated by a Boeing 787 per flight [45], but the bandwidth between the airplane and either satellite or base station on the ground is not large enough for data transmission.

Therefore, data needs to be stored, processed, analyzed, and acted upon close to, or at the edge of, the network [46].

In the cloud computing paradigm, the devices at the edge of the network usually play a data consumer role, such as watching a YouTube video on a smart phone.

This is represented in Figure 1.3. Nevertheless, people nowadays produce data from their mobile devices as well. People recording videos and taking photos with their smart phones to upload them later to Social Networks. For example, Instagram users upload over 100 million photos and videos everyday [47], which could be fairly large and occupy a lot of bandwidth for uploading to the cloud. In that case, the video clips should be demised and adjusted to suitable size and resolution at the edge before uploading to cloud. This implies that the edge of the network is changing from data consumer only to data producer and consumer, which is represented in Figure 1.4.

Another worth mention fact about edge computing is its implications on privacy [44, 48]. Some of the data gathered by the IoT may be of the private kind, which introduces a shift in the security schemes employed in cloud computing. Data must travel encrypted between nodes and those nodes may have restricted resources, thus limiting the choices in terms of security methods. On the other hand, there exists the option to process data at the edge which, in terms of privacy, might be better than uploading the raw data. Always taking into account that the devices at the edge might have restrictions like battery lifetime.

To sum up, EC brings computation closer to where the data is produced, which has multiple advantages such as faster responses or less bandwidth requirements.

(32)

Cloud Server Producer Data

Producer/ Data Consumer

Computer Edge

Figure 1.4: Edge Computing Scheme.

1.3 Thesis goals

This Thesis is about the design of new neuromorphic hardware for edge applications.

It has three main goals:

• Development of digital bio-inspired circuitry.

• Development of low cost reservoir alternatives for edge computing applications.

• Development of a digital circuitry capable of performing a non-iterative on-line supervised training process directly on-chip.

Let us now describe the goals a bit further. The first one is the design of a digital circuitry that approaches the behavior of a real neuron, that is, a bio-inspired circuitry. To do so, this Thesis makes use of the so called stochastic computing, due to its resemblance with the biological neural behavior.

The second goal is related to RC. RC is a NN architecture which is especially suited for time series prediction and it is particularly interesting because only the output layer must be trained. This thesis proposes an RC circuit architecture that makes use of max-plus algebra and is highly suitable for digital hardware implementation.

Finally the third goal is related to integrate both the network and the training process in a single chip. For this purpose, unconventional algebra along with low hardware consuming encoding have been employed. The result is a digital embedded circuitry which is capable to train ESNs in a non-iterative and fully parallel way, albeit a sequential counterpart is proposed as well.

1.4 Thesis structure

• Chapter 1. This chapter introduces the topic of the Thesis. The goals of the Thesis are also exposed as well as the Thesis structure.

(33)

1.4. THESIS STRUCTURE 9

• Chapter 2. This chapter includes all the work related to spiking neural networks.

It includes an stochastic computing section, the detailed description of the models and finally the results.

• Chapter 3. This chapter encompasses both the second and the third goal because both make use of the unconventional algebra. It includes a tropical algebra section, a reservoir computing section, a description of an alternative encoding to the stochastic computing, a detailed explanation of the circuitry employed and finally the results.

• Chapter 4. Summarizes the conclusions extracted from the results of this Thesis.

(34)

(35)

Chapter 2

Spiking Neural Networks

2.1 Spiking neural networks

Spiking Neural Networks (SNNs) [49, 50] are the latest generation of neural models whose main feature is to replicate the intrinsic discrete nature of the behavior of real neurons [51–54]. SNNs have been used to reproduce the intrinsic brain processes and pathologies [55, 56] and also for the implementation of machine-learning applications [57]. In the SNN scheme, the network processes discrete events generated by each neuron, called action potentials. Each action potential has a very short duration and obeys an specific neural mechanism of generation and propagation. From this point of view, the continuous signals of non-spiking neural models can be understood as measurements of the frequency upon generating these action potentials (frequency neural coding). In contrast, SNN are not restricted to frequency coding and richer timing or population [58] coding can be explored. A more advanced type of spiking models are the spiking neural P systems, [59–61] belonging to the membrane computing family and implemented through rules dependent on the contents of a neuron through regular expressions. For these type of models their hardware implementations seem to be relatively more complex than classical SNNs and would be out of the scope of this Thesis.

The relevance of timing codes is found in the pattern recognition brain computation. This process cannot be explained by using a frequency code [62] because pattern match is performed at a time scale of the order of dozens of milliseconds while neurons oscillate at only 100Hz. These are the results observed in different studies concerning visual pattern analysis carried out with macaque monkeys and fixing a response time of the order of 30ms [63, 64]. As the firing rate of neurons is usually below 100Hz, the coding based on firing rates of the first generation Artificial Neural Networks (ANNs) is discarded to explain pattern recognition in biological networks. To explain this high-speed pattern-recognition processing, different timing codes have been developed as the presented by Hopfield in Ref. [62].

SNNs are used for a wide range of applications such as biomedical [55, 65–67], robot navigation [68, 69], or control [70] fields. When configured as a Liquid-State

11

(36)

Machine architecture [71], SNNs can be used to solve different complex real-life problems such as pattern recognition [72], time series processing [71] or for prediction of motion of objects from images [73]. This approach is of high relevance because such a prediction task is assumed to be similar in human brain.

Although timing encoding is able to explain the fast pattern recognition developed by real neural systems, its physical implementation requires a high spatio-temporal precision with which neurons need to be wired up and the learning process could be quite complex [74]. Owing to this inherent complexity, most of the existing SNN training algorithms use unsupervised Hebbian learning [75–77], based on the implementation of a simple adaptive mechanism dependent on the relative timing of input and output spikes for each specific neuron. This local training process contrasts with the global supervised learning that is commonly used in first-generation ANNs.

The main reason of this difference (local training versus global for spiking and non-spiking neural models, respectively) is due to the difficulty of extrapolating closed-form mathematical expressions for the relationships between action potential timings.

Biological neurons usually present an unpredictable spiking pattern [78] with a clear stochastic (or random) nature. A fact that could explain the stochastic nature of spike trains is the synaptic transmission’s mechanism: there exists a probability for transmitters to be released from the presynaptic terminal every time an action potential is transferred through the axon. This apparent lack of neural reliability can be understood as a clever way of implementing a weight for each connection.

This fact may suggest that information is mainly codified through the spike firing rate [79]. That is the reason for which a frequency-based encoding is used when building non-spiking ANNs. In recent works, these stochastic mechanisms have been introduced into the SNN leading to the Stochastic Spiking Neural Networks (SSNNs) [53, 80]. This kind of networks combines both the firing rate of spike trains and the degree of correlation between neurons [53, 79, 81]. In the SSNN model proposed in Ref. [53], neurons are correlated or uncorrelated by means of the use of the threshold voltage of the membrane potential. Two neurons are correlated if they share the same threshold; otherwise, they will be uncorrelated. Depending on this aspect, the network functionality changes drastically. Making use of this double probabilistic coding (firing rate and mutual correlation), high-speed pattern recognition systems can be implemented [53]. By contrast, Neural Networks (NNs) using only a firing rate coding are incapable of providing a high-speed information processing. The proposed probabilistic encoding is much more simple than any other timing codes such as rank order coding [49, 82] or spike-time coding. [83, 84]

Moreover, it provides evident advantages in the learning process due to its simplicity.

As will be shown in this work, the proposed dual neural coding of SSNN can still be used to obtain closed-form mathematical expressions for the functionality of large networks that may facilitate the implementation of global supervised learning processes in contrast to traditional timing-based SNN models.

(37)

2.2. HARDWARE IMPLEMENTATION 13

2.2 Hardware implementation

ANNs can be implemented either on software or on hardware [85]. On the one hand the software implementation is the most flexible and easily accessible one.

However they present a huge drawback which is the fact that they can not exploit the inherent parallelism of NNs which represents a highly noticeable detriment in performance. On the other hand, the Hardware implementation can indeed exploit such parallelism. The hardware implementation has two main branches, analog and digital. The analog one is the most realistic from the biological point of view and the one that can perform better. The digital one, although it does not perform as well as the analog branch, does perform significantly better while being almost as flexible as the software counterpart. These performance advantages are the reason why the hardware implementation has been the choice over a software implementation in many applications [57, 86–88]. This Thesis is focused especially on the hardware digital branch which has been proven to work with impressive results by implementing them on Field Programmable Gate Arrays (FPGAs) [53, 80, 81, 89–91]. There are some applications that are constrained in terms of size or power supply, such as the control of machines and industrial processes [92], distributed sensory networks [93], portable medical applications [94–96] or handwriting and speech recognition systems [97]. In these cases the advantages of the hardware implementation become most obvious. Hardware implementations are also relevant in the data mining of high- volume databases [98], due to the fact that scientific and technological databases are increasing in size exponentially. To sum up, hardware dedicated to the efficient exploration of these databases in order to extract high-level information is of high interest.

2.3 Stochastic computation

Since back in the 1940s there has been a latent interest in the scientific community for the study of other computation architectures as an alternative to the traditional one known as Von Neumann’s [99]. The most interesting feature of architectures like Stochastic Computing (SC) is their high parallelism, the existence of low complexity elements for computation, the nonlinear behavior and the simplicity of interconnection.

The latter ones are of special interest for the implementation of NNs. Other than that, SC is especially suited for the area efficient implementation of computational intelligence applications on digital hardware. Such applications include pattern recognition or any other system with learning capabilities. ANNs consist of many simple computational elements. However, due to their nonlinear nature they are really hard to implement in traditional digital systems.

Paradoxically the basic concepts of probabilistic computation, or computation over random variables, were also introduced by John Von Neumann himself in a paper from 1952 that was later included in the book [100]. However, the theory for this form of computing was not developed thoroughly, nor was there any implementation possible until the 1960s. It was in 1965 when, thanks to the technological advances that took place in the field of electronics and computing, two independent research

(38)

groups developed Stochastic Computation. The first one was located in England and consisted of B.R. Gaines and J.H. Andreae. They used to work on the optimization of control systems and were especially focused on machine learning. The group internally published a document called “Stochastic Analog Computer” [101] in which they presented how to practically implement this kind of computation. The second group was located at the University of Illinois (United States of America). This group was composed of W.J. Poppelbaum and C. Afuso who were working on image processing systems. There was also an internal publication on the topic titled “Noise Computer” [102].

SC arose as a way to implement complex computing tasks at lower hardware cost and higher noise tolerance [103, 104]. Its main feature is the use of switching probabilities. For this purpose, single bit signals with a probabilitypto be in the high state are employed. Since probabilities are always in the range of[0,1], the first step to convert decimal numbers to their stochastic equivalent is to normalize the decimal numbers so that they lie in[0,1]. These decimal numbers spanning from0 to1are then discretized by binary numbers. Those binary numbers will depend on the number of bits employed in the digital design. That is, the number1is assigned to the maximum binary number that can be represented given the bit-precision, whereas 0is always0. Let us convert 3 numbers in their decimal representation to their stochastic equivalents as an example. The numbers are: 0,³/4and³/2. Firstly, they must be normalized, so they are all divided by the absolute maximum: 0, ¹/2

and1. Secondly, their binary counterparts for 8-bit are assigned: 0,127and255.

Thus, the higher the bit-width of the binary numbers in use, the lower the relative deviation when discretizing.

Let us now describe the conversion process from binary numbers to stochastic signals. That is, each binary number,X, is converted to its time-dependent stochastic equivalent,x(t), by using a random number generator,R(t), and a comparator so that

x(t) =

(1 X > R(t)

0 X ≤R(t) (2.1)

That is, if the binary number to be converted is greater than the random number, the output settles to1for the time step (assigned to the TRUE value of the comparison), otherwise it settles to0(FALSE). If the random signal,R(t), is uniformly distributed in the interval of possible values ofX, the switching probability of the stochastic signalx=p=hx(t)iwill be proportional to the binary number being converted (X).

In order to recover the original number,X, one must count (a binary counter may be used) how many clock cycles the signalx(t)is in the high state for a fixed number of clock cycles N. Due to the intrinsically probabilistic nature of this encoding, there will always be some uncertainty when the stochastic signals are converted back to numbers, except for0and1. Let us definePN(X)as the probability that the stochastic signalx(t)corresponds to the binary numberX, given that the switching

(39)

2.3. STOCHASTIC COMPUTATION 15 probability ofx(t)isp. This corresponds to the binomial distribution:

PN(X) = N X

p^X(1−p)^N⁻^X (2.2)

The average value for the count is the expected value ofX associated withp.

That is,hXi=P

X·PN(X) =N p. The standard deviation of this distribution is given by σ²(p) =N p(1−p).

Using the 2 sigma empirical rule, the relative error can be approximated as error ≈^2σ^max/N. Sigma reaches its maximum value at p=¹/2, that is, σmax = σ(p=¹/2) =^N¹^/²/2. Substitutingσ_max inerroryieldserror≈N⁻¹^/², which means that the relative error is inversely proportional to the square root of the integration time,N. Thus, the longer the integration time, the lower the relative error.

The probabilistic nature of SC is one of the reasons why it has failed historically vs. the classic computation when it comes to data processing. Nevertheless, when it comes to applications where accuracy is not a priority, such as pattern recognition, the results are outstanding when compared with those of the classic computation.

2.3.1 Correlation among stochastic signals

A key point when dealing with stochastic signals is their correlation. For example, the Stochastic Signal (SS)x= 10111010is highly correlated in a negative sense with y₁= 01000101, since their 1s and 0s never overlap. The SSy₂= 10011000 is also correlated with xbecause its 1s always overlap the 1s ofx. The SS y₃= 01011101, which is generated by rotating or shiftingxto the right by one bit, is not significantly cross correlated with x(t), but the one-cycle-delayed version ofy3is. Cross correlation may change the functionality of both combinational and sequential stochastic circuits by favoring certain input patterns. On the other hand, temporal correlation, or auto-correlation, refers to correlation between a bit-stream or part of a bit-stream and a delayed version of itself. For instance,y4= 011001110contains some auto- correlation due to the fact that 01 is always followed by 1. Auto-correlation can severely affect the functionality of a sequential stochastic circuit by biasing it towards certain state-transition behavior. However, the effect of auto-correlation in stochastic circuits is much less well understood [105] and it is out of the scope of this Thesis.

Depending on the cross correlation, the functionality of the circuitry can be substantially different [53, 106]. For instance, let us take the results of two input signals with switching activity p and q respectively, which have been processed through an AND gate. If the signals are completely correlated, which is represented as k, the AND gate provides the minimum frequency of both signals as shown in Figure 2.1a. Otherwise, if they are completely uncorrelated, which is represented as

⊥, the AND gate performs the product between the input frequencies (Figure 2.1b).

Performing the traditional multiplication poses some hurdle when implementing SC circuitry, because completely uncorrelated Random Number Generators (RNGs) are required, which consume lots of logic elements or hardware area. These RNGs employed are usually Linear-Feedback Shift Registers (LFSRs) because of their relatively small size and low cost. An LFSR is a deterministic finite-state machine

(40)

Teval = # cycles = 8 Tclock

MIN(p,q) = ¹₄ p= ¹₄ q= ¹₂ Global

Clock p q p AND q

(a)pkq, that is,pandq are completely correlated signals.

Teval= # cycles = 8 Tclock

p·q= ¹₈ p=¹₄ q=¹₂ Global

Clock p q p AND q

(b)p⊥q, that is,pandqare completely uncorrelated signals.

Figure 2.1: Correlation among signals.

(41)

2.3. STOCHASTIC COMPUTATION 17 whose behavior is pseudo-random, meaning that it only approximates a true random source [107].

Given these facts, a tool capable of predicting the circuitry’s behavior regarding correlation would be of great help. A suitable correlation metric for signals would yield a value +1 for maximum overlapping of1sand 0s(completely correlated), a value −1for minimum overlapping of1sand 0s(oppositely correlated), and a value0for independent signals (completely uncorrelated). The metric should not be impacted by the actual value of the signal, and should also provide intuitive functional interpolation for correlation values other than+1,−1or 0. This metric was called by Alaghi and Hayes [105] “Stochastic Cross Correlation (SCC)” and it is defined as follows:

SCC(x, y) =

( px∧y−pxpy

min(px,py)−pxpy ifpx∧y> pxpy px∧y−pxpy

pxpy−max(px+py−1,0) otherwise (2.3)

where xand y are stochastic signals, px and py are the probabilities ofxand y signals being1respectively,px∧yis the probability ofxandybeing1simultaneously.

SCC is0whenxandy are completely uncorrelated and+1 or−1when they are maximally similar or dissimilar respectively. Any other kind of correlation yields a value between −1and +1. In case SCC differed from the maximal values, the outcome of the stochastic function Z = Z(x, y) would be calculated with the following expression:

Z(x, y) = (1 +SCC)·F₀+SCC·F₋₁ ifSCC(x, y)<0

(1−SCC)·F₀+SCC·F₊₁ otherwise (2.4) whereF0,F+1 andF−1denote the stochastic functions implemented by the same circuit withSCC equal to0,+1and−1 respectively.

Alaghi and Hayes [105] found a way to calculate F0, F+1 and F−1 for every combinational circuit with two inputs by using Probabilistic Transfer Matrices (PTMs), which had already been explored by other authors [108]. In the PTM formulation the input data is represented by a stochastic vector I of size 2^m [109]. The elements of I are the probabilities of the possible input combinations.

For instance if m= 2 and numbers arex= 01101001 and y = 11011011 then I = ₁

8 3

8 1

8 3

8 , that is, the probability ofxy = 00is ¹₈, the probability of xy= 01is ³₈, the probability of xy= 10is ¹₈ and finally the probability ofxy= 11 is ³₈.

Every combinational circuit has a PTM representing its error-free function, that is, a matrix of size 2^m×2^l, where m and l are the numbers of the inputs and outputs of the circuit, respectively. For example the PTM of an AND gate is

0 1z

xy 0001 1011





 1 01 0 1 00 1







(2.5)

(42)

where every matrix component (pij) indicates the probability of inputiproducing outputj. Multiplying an input PTM by a circuit PTM yields the output PTM. For example, the input vector described above going through an AND gate yields the following:

₁

8 3

8 1

8 3

8

×





 1 01 0 1 00 1





= ₅

8 3

8

Now, as an example that might be useful for later sections,F0,F+1 andF−1

will be calculated for the AND and OR gates withm= 2and l= 1. The first step is to build the truth table for both cases. In the case of the AND gate it has already been calculated above, whereas for the OR gate it is as follows





 1 00 1 0 10 1







Next step is multiplying a general input by the truth tables

i0 i1 i2 i3

×





 1 01 0 1 00 1





=

i0+i1+i2 i3

i₀ i₁ i₂ i₃

×





 1 00 1 0 10 1





=

i₀ i₁+i₂+i₃

Nowik must be replaced with table 2.1. For the AND gate:

• For SCC= 0

F₀= 1−x·y x·y

• For SCC= +1

F+1= min(1−x,1−y) +|x−y| min(x, y)

• For SCC=−1

F−1= max(1−x−y,0) + min(1−x, y) + min(1−y, x) max(x+y−1,0)

For the OR gate:

• For SCC= 0

F0= (1−x)·(1−y) x+y−x·y

(43)

2.4. STOCHASTIC SPIKING NEURON MODEL 19 F₀,SCC = 0 F₋₁,SCC=−1 F₊₁,SCC = +1

i0 (1−x)·(1−y) max(1−x−y,0) min(1−x,1−y) i1 (1−x)·y min(1−x, y) max(y−x,0) i₂ (1−y)·x min(1−y, x) max(x−y,0) i₃ x·y max(x+y−1,0) min(x, y)

Table 2.1: PTM elements used to derive the stochastic function of a two-input combinatorial circuit at SCC levels0,−1and+1[110]

• ForSCC = +1

F+1= min(1−x,1−y) max(x, y)

• ForSCC =−1

F₋₁= max(1−x−y,0) min(1−x, y) + min(1−y, x) + max(x+y−1,0)

Finally substituting the above calculated expressions in (2.4) yields the most general expression.

2.4 Stochastic spiking neuron model

In this section a new bio-inspired low-cost digital neuron model for the implementation of high-volume SSNN is shown. For the implementation of huge networks an auxiliary Computer Aided Design (CAD) tool for their automatic generation has been set up. The comparison with previous hardware SSNN implementations shows a great advance in terms of circuit speed (about a factor of 85). This work is an enhanced version of the design presented in Ref. [111]. The main differences with respect to the conference paper are the inclusion of a configurable relaxation time of the membrane potential, the introduction of a conditional resetting feedback with respect to the incoming excitatory signal (that provides more stability to the neural behavior) and finally the testing of the model when a high-volume NN with more than one thousand neurons, implemented in a high-end FPGA, is inserted.

When probabilistic encoding is used, the network functionality depends on the correlation between signals [53]. The exact time at which signals fire is not so important due to the fact that information will only be encoded in the firing rate and spike correlation (i.e. neither in the shape nor the specific timing of the spikes).

For simplicity and energy efficiency in a digital implementation, spike signals are modeled by boolean values that can only be set to either a high or a low value during the smallest time steptmin. This time must be understood as the response time of the fastest neuron within the network. The boolean value of the neuron output (xj) is at the high state when there is a spike betweenti andti+tmin, otherwisexj is at the low state (see Figure 2.2). Note, that the use of this binary representation does not reduce the information contained in the signals. For the sake of clarification, in all the figures depicted in this work correlated neurons share the same color.

(44)

T ime tmin

Figure 2.2: Boolean representation of neural spikes.

Leak-integrate and fire digital neuron model

The SSNN model proposed in this work is intended to reproduce a Leaky Integrate and Fire (LIF) digital neuron model for the spiking process [112]. This model is considerably faster and its mechanism is closer to the real biological behavior than the model in Ref [53]. The proposed model here is characterized by the fact that the neuron’s membrane potential decays exponentially to the resting value and by the fact that it remembers how many spikes have taken place in the near past (see Figure 2.3). With respect to the previously referred SSNN model [53], several improvements have been achieved: the exponential decay, a more realistic closed-loop model and a faster response. As it can be observed in Figure 2.4, the exponential decay is implemented by an n-bit register that stores the membrane potential of the j^th neuronvj(k), which is updated every clock cycle following the rule below:

vj(k+ 1) =







0 if z¯jxj = 1 vj(k)

1−₂¹ⁱ

if x¯jz¯j = 1 vj(k) +E if zj = 1

(2.6) wherezj is evaluated from the input logic block that depends on the excitatory and inhibitory inputspk and p⁰_k. Equation (2.6) is implemented by the circuit shown in Figure 2.4. As it can be observed, the input logic block evaluates the signalzj

so thatzj= 1when an excitatory input is activated (∃ i∈ {1, . . . , Ne}/ xi= 1) and no inhibition is present (p⁰_j = 0 ∀ j ∈ {1, . . . , Ni}). In this case, the shift- register storing the membrane potential is updated by adding a fixed valueE if not higher than the threshold reference. If no excitation is present, then the membrane potential is relaxed exponentially to zero. In order to implement a configurable time to the leaking mechanism, the contents of the register vj is decreased by a

Development of neuromorphic hardware and non-iterative learning designs for edge computing applications

DOCTORAL THESIS 2020

DEVELOPMENT OF NEUROMORPHIC HARDWARE AND NON-ITERATIVE LEARNING

DESIGNS FOR EDGE COMPUTING APPLICATIONS

Fabio Galán Prado

DOCTORAL THESIS 2020

PhD in Physics programme

DEVELOPMENT OF NEUROMORPHIC HARDWARE AND NON-ITERATIVE LEARNING

DESIGNS FOR EDGE COMPUTING APPLICATIONS

Fabio Galán Prado Supervisor: Josep Lluis Rosselló Sanz Tutor: Claudio Rubén Mirasso Santos

Doctor by the University of the Balearic Islands

Funding

Authorship Certificate

Abstract

Resumen

Resum

Author’s Contributions

Contents

List of Figures

List of Tables

Acronyms

Chapter 1

Introduction

1.1 Neural networks

1.1.1 Types of neural networks

Cloud Server

Producer Data Data

Consumer

1.2 Edge computing

Cloud Server Producer Data

Producer/ Data Consumer

Computer Edge

1.3 Thesis goals

1.4 Thesis structure

Chapter 2

Spiking Neural Networks

2.1 Spiking neural networks

2.2 Hardware implementation

2.3 Stochastic computation

2.3.1 Correlation among stochastic signals

2.4 Stochastic spiking neuron model