• No results found

4. Results

4.1. Classifier selection – Initial assessment of the classifiers

4.1.1. Pressed black UOC SEM images at 250x magnification

In Figure 27 it can be seen that only the classifiers LR, SVM, and LDA obtained an accuracy over 90 %. The accuracy achieved by these three classifiers was 94 % - 95 % when all feature groups were used together. These classifiers achieved an accuracy of 90 % - 91 % using only LBP features.

The classifier AdaBoost achieved the poorest performance on average across the feature groups.

Classification based on AMT features resulted in the lowest accuracy scores on average across the

52 different classifiers. The combination given the poorest performance was AdaBoost with AMT features at 31 % accuracy.

Figure 27: Heatmap showing the classification performance for different combinations of classifiers (rows) and feature groups (columns). Three values are shown for each classifier and feature group combination. The top value is accuracy, the middle value inside round brackets is standard deviation (SD), and the bottom value inside square brackets is the average time in seconds for going through one outer fold in the n-CV. The accuracies and SDs were calculated from the sample class accuracies for each classifier and feature group combination. The colouring is based on the value of accuracy and given in the colour bar on the right.

Figure 2 shows the classification performance obtained by each classifier for the different sample classes. Here, the accuracy is averaged across the feature groups and the standard deviation (SD) is taken over all the different feature groups. This heatmap gives an indication of how different classifiers manage to classify specific classes on average over the feature groups. The classes AusMak, AusOlD, USAFAP, and USAPet were challenging to classify for all classifiers compared to the other classes. The variation is presented inside the rounded brackets where SD varied from 4 % up to 33 % across the feature groups. On average, LDA gave the highest accuracy for classifying the classes, except for classes SafNuf and YugSpB. They were classified with higher accuracy by the NB classifier, and with NB, SVM, and LR, respectively.

53 The last heatmap, Figure 29, shows how specific classes were classified given different feature groups, on average across different classifiers. It can be observed that LBP was the only feature group that obtained an accuracy above 50 % for all classes, except for the feature group “all”.

However, the two feature groups also contained all LBP features. GLSZM also gave reasonably high accuracy, where only one class had an accuracy below 50 % (i.e. 47 %).

Figure 28: Heatmap showing the classification performance on each of the sample classes (columns) using different classifiers (rows). The top value is accuracy and the bottom value inside round brackets is SD. The accuracy and SD for each classifier and sample class combination were calculated from each class sample accuracies over all feature groups. The colouring is based on the value of accuracy and given in the colour bar on the right.

54

Figure 29: Heatmap showing the classification performance on each of the sample classes (columns) using different feature groups (rows). The top value is accuracy and the bottom value inside round brackets is the SD. The accuracy and SD for each feature group and sample class combination were calculated from each class sample accuracies over all classifiers. The colouring is based on the value of accuracy and given in the colour bar on the right.

Table 10 show how frequently particular classifier hyperparameters were used by LR, SVM and LDA for both LBP features and all feature groups together. Hyperparameter frequency was investigated to check the stability of classifiers. Depending on performance, stable classifiers can be favourable above better-performing classifiers. Neither LR nor SVM had a combination of hyperparameters that occured more than 50 % of the times, indicating that these classifiers were somewhat unstable. For LDA, the solver “lsqr” and shrinkage “auto” were selected. LDA has no other tuneable parameters (sklearn.discriminant_analysis.LinearDiscriminantAnalysis, u.d.).

55 As seen in Figure 27 and Figure 28 the classifier LDA consistently achieved high classification accuracy. In addition, the setting lsqr solver and auto shrinkage were selected in 97 % of the times when LDA was used, indicating that LDA was a stable and consistent classifier. Therefore, LDA was considered to be the most promising classifier for this dataset.

In Table 11 the confusion matrix obtained for LDA using all features groups together is shown. In general, the majority of samples for most classes were classified correctly. However, on average, in four out of 20 times USAFAP samples were incorrectly predicted as AusQue, perhaps suggesting that these misclassified samples had similarities to the AusQue class. A problem with showing only a confusion matrix is that one cannot see if the same four samples are being misclassified each time, or if the misclassifications happened at random in the class. Consistent misclassifications could be treated as outliers, but this has not been examined in more detail than looking into the tracking of sample classifications. Table 20 on page 66 in chapter 4.2.1.1 give an example to see the predicted class of each sample. This gives insight into if only a few samples are consistently misclassified or not.

feature

Table 10: Overview of how frequently different hyperparameter sets were used in the outer folds in the n-CV using LR (left), SVM (middle) and LDA (right). The feature groups used are given in the left-most column, the last column gives the occurrences, and the remaining columns give hyperparameters.

56

Table 11: Confusion matrix obtained for LDA for sample classification based on all feature groups together. Values were averaged over all the n-CV runs using different combinations of samples in training and test folds. The first column gives the true classes, and the top row denotes the predicted classes. There were 20 samples for all classes in this dataset, which means that each row adds up to 20. For example, the top cell in the first column gives the true class AusMak, the remaining cells on the same row tell how many times AusMak samples were predicted as the classes specified in the top row. On average, 19.2 AusMak samples were correctly classified, but 0.1 samples were classified as SAfSUP and 0.8 samples as USAFAP of the total 20 samples.