• No results found

4. Results

4.2. Model building for SEM images for unpressed UOC samples

4.2.2. General results for SEM data

Table 28 summarizes the performance on the datasets at all magnifications originating from the unpressed SEM images. The accuracy for predicting all samples in each test are presented along with their mean accuracy. In addition, the averaged assigned class probabilities for the three tests are also shown. To clarify, prediction in Table 18 gives the mean assigned probability accuracy for colour category 1 at 100x magnification only.

The highest achieved mean prediction accuracy averaged over all colour categories was obtained when the datasets for different magnifications were concatenated. But, the achieved accuracy of 90 % was only 2 % higher than the performance achieved at 100x magnification alone. The mean assigned probability accuracy averaged over all colour categories was above 97 % for 100x, 250x, and the concatenation of all magnifications. The difference in performance between using only a magnification of 100x and the concatenation of all was small.

Colour category 1 had both the poorest mean prediction and assigned probability accuracies for 100x, 250x, and all magnifications datasets. Also, this category and along with colour category 3 contained 13 classes, whereas colour category 4, 5, and 6 contained respectively 9, 4, and 4 classes.

This must be considered when comparing the overall performance of each of the categories. A trained machine learning model on each of the black-dark brown and light brown-dark yellow samples was trained for classifying 13 classes, whereas the other categories had fewer classes to discriminate.

Furthermore, the accuracies averaged over the colour categories were not weighted according to the number of classes they each contained. As more classes were included in category 1, the

Which test L16_4_3 L36_9_1 L40_10_1 L44_11_1 L12_3_4

First 2 2 2 2

Second 2 2

Third 2 2 2

How many features occur across the magnifications

71 average accuracy for all classes would be smaller. A final note, the average number of features selected for the three runs might indicate the difficulty of classifying the datasets.

Table 28: Summary of performance on SEM datasets for unpressed samples for all colour categories. The first column denotes the magnification, the second gives a description of the summarized performance, the 3rd to 7th columns denote the colour category and the last column shows the average over all colour classes of mean prediction accuracy, mean assignment accuracy, and the average number of features.

Magnification Description CC1 CC3 CC4 CC5 CC6 100x First test acc.: 62% 92% 78% 100% 100%

Second test acc.: 69% 77% 100% 100% 100%

Third test acc.: 69% 100% 100% 75% 100%

Mean prediction acc.: 67% 90% 93% 92% 100% Mean prediction acc.: 88%

Mean assigned prob. acc.: 85% 100% 100% 100% 100% Mean assigned prob. acc.: 97%

Mean assigned prob. SD: 29% 14% 11% 14% 0% Avg. # of feats. in each CC 5.5

Avg. # of feats.: 6 4.7 5 6 6

250x First test acc.: 85% 69% 100% 100% 100%

Second test acc.: 54% 77% 89% 100% 100%

Third test acc.: 62% 92% 89% 75% 100%

Mean prediction acc.: 67% 79% 93% 92% 100% Mean prediction acc.: 86%

Mean assigned prob. acc.: 92% 100% 100% 100% 100% Mean assigned prob. acc.: 98%

Mean assigned prob. SD: 23% 19% 12% 13% 3% Avg. # of feats. in each CC 5.4 Avg. # of feats.: 7 6 5.7 4.7 3.7

500x First test acc.: 85% 78% 50% 75%

Second test acc.: 62% 89% 50% 50%

Third test acc.: 92% 78% 100% 100%

Mean prediction acc.: 79% 81% 67% 75% Mean prediction acc.: 76%

Mean assigned prob. acc.: 92% 100% 50% 100% Mean assigned prob. acc.: 86%

Mean assigned prob. SD: 23% 18% 26% 19% Avg. # of feats. in each CC 5.8

Avg. # of feats.: 6.7 5.3 6.3 5

1000x First test acc.: 46%

Second test acc.: 77%

Third test acc.: 85%

Mean prediction acc.: 69% Mean prediction acc.: 69%

Mean assigned prob. acc.: 85% Mean assigned prob. acc.: 85%

Mean assigned prob. SD: 25% Avg. # of feats. in each CC 4.0 Avg. # of feats.: 4

all magn. First test acc.: 69% 62% 100% 100% 100%

Second test acc.: 77% 92% 89% 75% 100%

Third test acc.: 92% 100% 100% 100% 100%

Mean prediction acc.: 79% 85% 96% 92% 100% Mean prediction acc.: 90%

Mean assigned prob. acc.: 92% 100% 100% 100% 100% Mean assigned prob. acc.: 98%

Mean assigned prob. SD: 22% 16% 6% 11% 8% Avg. # of feats. in each CC 4.3 Avg. # of feats.: 5 4.3 5.7 3.3 3

Mean over all colour categories

72 Two tests were performed to assess if the optimised features gave higher performance than randomly features. These tests were done by randomly selecting features for performance estimation (instead of finding optimised features) and permuting the class labels but still using optimised features. The test results are shown in Figure 34 and Table 29 . Both tests were performed on colour category 1 at 100x magnification and for each run using different hold-out test sets. The resulting performance can be compared with the obtained hold-out test accuracies given in Table 28 for optimised features.

As Table 29 reports, the selection of random features for the first run compared to the optimised test result gives a p-value of 0.215, which means that we cannot reject a null-hypotheses that the performance using optimized features was significantly better than the average accuracy obtained from prediction using randomly selected features on a 95 % confidence interval (CI). For the second and third run, the results using optimized features were significantly better than randomly selecting features from all original features on the same CI.

Figure 34: Histogram of obtained accuracies for randomly selected features (blue) and permuted class labels (orange) on the CC1 (at 100x) dataset, a 1000 times (each time with different random selection or permutation). The vertical line denotes the obtained performance using optimized features.

73

Table 29: Comparison between test run results obtained by using the optimized feature set (column 3), features selected at random (columns 4 and 5) and permuted classed labels (columns 6 and 7), on the CC1 (at 100x) dataset. The selection of random features and assessment of performance were repeated 1000 times (each time with different random selection or permutation), the same applies for the permutation test. The number of random features matches the number of optimized features on each test set.

As one of the hold-out test results was not significantly better than performance achieved at random feature selection, it was possible that most of the features in the dataset contained

meaningful information. Therefore, an in-depth investigation was conducted by manipulating the dataset. The original features were reduced randomly to only five features, and then 95 features of pure noise were added. Table 30 reports the classification results achieved on this new noisy dataset. The accuracy for each of the three test sets using optimized features (selection on the now 100 features) decreased but not as much as when randomly chosen features were used.

Thus, the difference in performance obtained between models trained on random and optimized feature was greater in this case compared with the original dataset tested in

Table 29. In Table 30, the performance for all three optimized features was significantly better that averaged performance obtained by randomly selecting features, on a 95 % CI.

Avg. acc. p-value Avg. acc. p-value

first 6 62% 56% 0.215 7.2% 0.0

second 6 69% 54% 0.027 7.4% 0.0

third 6 69% 45% 0.027 7.4% 0.0

Random features Permuted test labels Which

test run Number of Test acc.

features

74

Figure 35: Histogram of obtained accuracies for randomly selected features (blue) and permuted class labels (orange) on the modified (noisy) CC1 (at 100x) dataset, a 1000 times (each time with different random selection or permutation). The horizontal axis denotes accuracy, the vertical axis denotes the counts. The vertical line denotes the obtained performance using optimized features.

Table 30: Comparison between test run results obtained using the optimized feature set (column 3) on noisy data (5 random original features and 95 noisy features) of the CC1 (at 100x) dataset, features selected at random (columns 4 and 5) and permuted classed labels (columns 6 and 7). The selection of random features and assessment of performance was repeated 1000 times, the same applies for the permutation test. The number of random features matches the number of optimized features on each test set.

Avg. acc. p-value Avg. acc. p-value

first 2 54% 10% 0.0 7.2% 0.0

second 2 62% 8.0% 0.0 7.5% 0.0

third 2 46% 8.6% 0.0 7.9% 0.0

Permuted test labels Random features

Number of features Which

test run Test acc.

75

4.3. Model building for hyperspectral images

The LDA classifier was used for model development on the datasets originating from the hyperspectral images. As the hyperspectral images were acquired at only one magnification, the number of models developed equals the number of colour categories in the datasets. In this section, results are shown for colour category 3. In addition, an overview of the overall results for all colour categories is given in section 4.3.2. The remaining results can be found in appendix II.

4.3.1. Colour category 3

Feature selection was performed three times on the dataset for colour category 3. Different test sets were held out from the dataset in each of the three times. Table 31 shows the validation curves arising from the three feature selections. In addition, the selected features are shown to the right of their respective validation curves. In all three cases, only two features were selected. The validation accuracies were constant from 30 features down to only two features at above 90 % accuracy, before dropping down to around 60 % for the last feature. The union of the features indicates that the selected features (wavelengths) appear in two bands, as visualized in the three plots in Table 32. These plots show the spectra for all samples, the mean spectra from each sample holder, and the average spectra for each class.

The first, second and third test results are shown in Table 33, Table 34 and 28. All samples were correctly classified. On average, as Table 36 reports, eight out of eleven classes were correctly classified with assigned probabilities above 90 %. The variation of assigned probabilities is given in Table 37. The class Aus_Run was on average assigned to USA_Fal with a 32 % probability, at a SD of only 3.5 %.