• No results found

2.4 Neural networks

2.4.1 Introduction to deep learning

Deep learning is a field of machine learning and has existed since the 1940s but has been known under different names. In the beginning, deep learning was known as cybernetics, and for a period in the 1980s-1990s, it was known as connectionism or neural networks.

For the last decade, it has become known as deep learning [8].

What makes deep learning different from other machine learning techniques, is that in machine learning the features, that are chosen from the data sample as inputs to the algorithm, is decided upon before the data is fed to the algorithm. In deep learning, all the data is fed through the network, and the algorithm extracts the features from the data itself. This is illustrated in figure 2.5.

Figure 2.5: The process of a classification problem for a machine learning algorithm and for deep learning [10].

Neural networks are essential in deep learning, and the concept is based on how there are networks of neurons in the brain. In the beginning, to achieve artificial intelligence, the

10

biological functions in the brain were studied and reproduced as a machine learning model. This gave the structure and the name of artificial neural networks ANN which is inspired by networks of neurons in the brain. As the artificial neural networks developed, they stopped trying to replicate the biological functions of neurons but continued using the structure. ANN is basically an algorithm consisting of several layers of functions that are connected. This structure makes it possible to model quite complex concepts with simpler algorithms, which is very useful in for example object or speech recognition and computer vision [8] [13], where the task is complex and a bit abstract.

A neuron can be expressed as;

𝑦 = βˆ‘ 𝑀𝑗π‘₯𝑗+ 𝑏

𝑑

𝑗=1

(2.6)

where y is the output, xj are the inputs, wj are the weights of the connections between the inputs and the output, and b is the bias. The bias can alternatively be written as w0x0 where x0 always equals 1. In a neural network, the weights decide the steepness of the transfer function while the biases allow for the transfer function to be shifted left or right. Further explanation of transfer functions is given in chapter 2.4.2.

When d = 1 in equation (2.6), the equation represents the linear function, equation (2.1), where there is only one input for every output. When d is more than 1, the equation describes a hyperplane where there are multiple inputs for every output. An input can also be defined as an exponentiation causing the neuron to be a higher-order polynomial function [13]. Figure 2.6 shows the structure of a neuron. The neurons can also be connected parallel to each other which results in multiple outputs for the given inputs as shown in figure 2.7. A neuron or a single layer of parallel neurons can be called a perceptron, while multiple layers can be called a multi-layer perceptron or a neural network, figure 2.8. All layers between the input and the output layer in a neural network are called hidden layers.

Figure 2.6: Structure of a neuron.

11

Figure 2.7: Parallel neurons.

A common way of training neural networks is with online learning, which will say that the data set is divided into segments and then segment after segment is fed to the network allowing the algorithm to update the parameters after each segment [13]. This method requires less memory for storing data, and it also allows for an algorithm to be updated as new data samples are obtained. The error function for online learning will be for one data pair and not the whole sample. For a regression problem, the error function will be similar to equation (2.2). The difference is that in (2.2) the error is summed over all the data points, while for online learning the error function (2.7) is calculated for every data pair and for every calculation there is an update Ξ”wj;

𝐸(𝑀|π‘₯, 𝑦) = 1

2(𝑦 βˆ’ yΜ‚)2 =1

2(𝑦 βˆ’ 𝐰𝑇𝐱)𝟐 (2.7)

βˆ†π‘€π‘— = πœ‚(𝑦 βˆ’ yΜ‚)π‘₯𝑗 (2.8)

In equation (2.8) Ξ· is a learning factor, y is the target output, yΜ‚ is the predicted output and x is the input. The learning factor is reduced with time. The magnitude of the update depends on the magnitude of the learning factor, the input and the difference between the predicted output and the targeted output. No update will be made if the predicted value is the same as the targeted value.

A typical structure for neural networks is the chain structure as expressed in (2.9), where the function f(1) represents the first layer, f(2) represents the second layer and f(3) represents the third layer [8]. The functions are the transfer functions discussed in chapter 2.4.2.

𝑓(𝑛) = 𝑓(3)(𝑓(2)(𝑓(1)(𝑛))) (2.9) As the chain of function increases, the depth of the structure is also said to increase. The name deep learning refers to networks of a certain depth, though there are many ways of structuring a network and so the number of layers is not a sole definition of deep learning.

12

Object recognition in an image can be an example of a neural network with multiple layers.

The pixels in the image is the input, the first layer, to the network. The next layer will be the first hidden layer. This layer can look for edges in the image. The edges can so be fed to the second hidden layer which looks for lines, curves and corners. As the data proceeds through the layers the algorithms can look for a more complex combination of edges in the image such as eyes or a mouth. The last layer is the output, which will be the object that was recognized in the image.

Figure 2.8: Neural network structure with two hidden layers.