Classification - Early Detection of Alzheimer’s Disease using 3D Texture Features and 3D Convol

3.2 Methods

3.2.3 Classification

The classification is split up into different classification problems, 4-, 3-class and bi-nary problems. The four-class is: NC vs MCIs vs MCIc vs AD, three-class: NC vs MCI vs AD, and the binary problems: NC vs AD, NC vs MCI, MCI vs AD and MCIs vs MCIc.

The four-class problem and the binary problem, MCIs vs MCIc, is the main focus in this thesis as they are a measure on how well early detection of AD can be identified.

For each classification problem the subjects are split into a training set, 85%, and a test set, 15%. These sets are the same for all methods tested. The test set must not be confused with the validation set. The test set in this thesis are the used to evaluate how well the methods generalizability to unseen data. The validation set is taken from the training data in each method, to choose the best performing approach for the task for each method, e.g. which parameters/ hyperparamers best suited for the task.

Additional visit scans are added in each set (original training and test set) until the number of scans in each class is equal to the minimum number of available additional scans for a class, with priority to the visits closest to the screening. This assures equal number of instances of each class in the training and test set, and that no visit scan from the same subject are in both the training and test set, which is done to minimize the data leakage. Table 3.7 shows the number of subjects and scans in both the training and test set for each classification problem.

A RF classifier is used for classification of the features extracted from the pre-processed

MRI. For neural network there is no feature extraction done before classification, bur the same pre-processing methods are tested. The different pre-processing methods for the MRI tested are the MNI, smoothed MNI, WM, smoothed WM, skull stripped and smoothed skull stripped, which are illustrated in Figure 3.5.

Random Forest

The features extracted using the methods presented in Section 3.2.2 are classified us-ing a RF classifier, usus-ing the implementation from the scikit learn library package in python3.6.⁸. The parameters for the RF is set at default values, except for the num-ber of trees in the classifier.For the maximum amount of trees to be tested, an initial test using relatively few features and many features was conducted on the NC vs AD problem. Figure 3.7 shows the mean of the mean 10-fold cross validation result using up to 400 trees, with the standard deviation as error bars, using the 126 features from LBP-TOP (a) and the 25650 features from the combination of LBP-TOP and 3D VHOG (b). Since there seemed to be a small gain up to 150 trees, all test were conducted with 10, 30, 50, 75, 100 and 150 trees. No feature selection or feature reduction is performed before the RF classifier.

(a) 126 features from LBP-TOP (b) 25650 features from LBP-TOP + 3D VHOG

Figure 3.7: Effect of number of trees in random forest on validation results

Neural Network

The neural network tested in this thesis is implemented using python3.6 and the library package Tensorflow⁹. All networks were trained by splitting the training set into a new training set (85% of the original training set) and a validation set (15% of the original training set), with a single subjects scans only being in either the training or validation

8Available from:http://scikit-learn.org/stable/index.html

9Available from:https://www.tensorflow.org/install/

set. The test set used for neural networks is equal to the test set that were used with the RF classification of features. Because of the excessive computational cost related to 3D CNN, there is a limited set of hyper parameters tested. The networks tested differ by different number of CL, and different dropout in the last fully connected layer. The models tested were:

• 6 3D convolutional layers: The first two CL have 8 filters and a kernel size of 6x6x6, the next two CL have 16 filters and a kernel size of 5x5x5, the last two CL have 32 filters and a kernel size of 3x3x3. There is also a 2x2x2 maxpooling layer after the 2nd CL, and a 3x3x3 maxpooling layer after the 5th CL.

• 5 3D convolutional layers: The first two CL have 8 filters and a kernel size of 6x6x6, the next two CL have 16 filters and a kernel size of 5x5x5, the last CL have 32 filters and a kernel size of 3x3x3. There is also a 2x2x2 maxpooling layer after the 2nd CL, and a 3x3x3 maxpooling layer after the 4th CL.

• 4 3D convolutional layers: The first CL have 8 filters and a kernel size of 6x6x6, the next two CL have 16 filters and a kernel size of 5x5x5, the last CL have 32 filters and a kernel size of 3x3x3. There is also a 2x2x2 maxpooling layer after the 1st CL, and a 3x3x3 maxpooling layer after the 3rd CL.

• 3 3D convolutional layers: The first CL have 8 filters and a kernel size of 6x6x6, the next CL have 16 filters and a kernel size of 5x5x5, the last CL have 32 filters and a kernel size of 3x3x3. There is also a 2x2x2 maxpooling layer after the 1st CL, and a 3x3x3 maxpooling layer after the 2nd CL.

Figure 3.8: Convolutional neural network with 6 convolutional layers and 3 fully con-nected layers.

For all CL there were used zero-padding, stride of one, and ReLU as the activation func-tion. All networks were tested with 3 fully connect layers after the convolution layer, with 400 neurons in each and ReLU as the activation function. The learning was set to 10⁻⁴since this gave the best result in initial testing. Batch size was set to 16 due to limitation of GPU memory, smaller batch sizes were also initially tested but this lead reduced training speed without increasing the classification performance of the net-work. All networks were tested with 0, 10%, or 20% dropout in the last fully connected layer. Figure 3.8 shows an illustration of the network with 6 convolutional layers.

Figure 3.9: Illustration of the training of the neural networks

For training of the networks, all weights are initialized with a truncated normal dis-tribution and the biases are initialized to 0.1. The training procedure of the CNN are

illustrated in Figure 3.9. For each iteration or step, a new mini batch is used to train the network. The networks performance is tested with the validation set at fixed steps, which is set everyv al i d at i on_st epgiven by equation 3.2.

v al i d at i on_st ep=f l oor(t r ai ni ng_d at a

bat ch_si ze ) (3.2)

If the accuracy on the validation data is better than the last best validation accuracy, the network is saved. When the all the mini batches in the training data has been used or the last mini batch is smaller than the specified batch size, the training data is reshuffled and split into new mini batches. The training continues until a predefined max steps without improvement or max steps are reached. When the network termi-nates training, the network parameters with the best validation accuracy are loaded and tested with the test set.

Results

In this chapter the results from the methods described in chapter 3 are presented.

Section 4.1 presents the best validation results from each pre-processing method and all classification problems for each feature extraction method tested and for CNN.

For each feature extraction method and CNN for all classification problems, the pre-processing method with the best validation accuracy is highlighted.

The highlighted methods (best performing methods) are presented with the results from the test set in section 4.2.

In document Early Detection of Alzheimer’s Disease using 3D Texture Features and 3D Convolutional Neural Networks from structural MRI (sider 31-36)