• No results found

7.2 Grey level entropy matrices

7.2.1 The GLEM-features

The best classification results of the GLEM-features are shown in table 7.6.

With respect to all 134 patients and the 102 patients and both with respect to the expected CCReq and the expected CCR, the best GLEM-feature among our four types of adaptive texture features is the negative GLEM-feature (closely followed by the difference GLEM-feature). This is interesting in itself, as it indicates that the positive part of the weight arrays1do not provide the GLEM-feature with new information. On the contrary, it confuses the GLEM-feature because

1There are for each bootstrap three weight arrays in a GLEM-feature because the three used cell area groups are treated separately.

114 CHAPTER 7. RESULTS AND DISCUSSION Table 7.6: The classification results of the negative GLEM-feature when using the classification method which attained the best expected CCReq; NMSC/LDC.

All 134 patients The 102 patients CCReq 63.3 % [51.1 %, 75.4 %] 71.0 % [57.5 %, 83.6 %]

CCR 69.2 % [59.0 %, 76.9 %] 78.3 % [69.2 %, 86.5 %]

Specificity 71.9 % [57.6 %, 83.3 %] 83.7 % [70.7 %, 92.7 %]

Sensitivity 54.7 % [25.0 %, 83.3 %] 58.3 % [27.3 %, 81.8 %]

Using 28 (left) and 25 (right) learning patterns in each prognosis class.

it degrades its performance. There are two possible main reasons for this. The first is that the region which corresponds to the positive parts contains no in-formation. The second is that the information in the positive parts is the same as in the negative part, but more prominent in the negative part. Because the lower limits of CCReq’s PI of the positive GLEM-feature (for the best classifi-cation method) are 49.6 % and 53.5 % for all 134 patients and the 102 patients, respectively, we expect that the positive parts have just a little prognostic value information, but that this information is also present in the negative part.

In the discussion in section 3.2.4, we noted that it is essential to inspect and interpret the designed weight arrays in order to get a better understanding of what an adaptive texture features measure. Since we are using the bootstrap method for evaluation, we have multiple learning datasets and there are multiple weight arrays to interpret. We could overcome this problem by inspecting some of the weight arrays and plot a few which are representative, or plot the average of all weight arrays or plot the weight array designed using the entire dataset.

We will apply the latter, but it must be noted that because such weight arrays have more scenes, its estimates will be more reliable than the weight arrays that will be used in the evaluation. On the other hand, these weight arrays will give a better understanding of where the discrimination value of the property arrays is high.

Figure 7.3 shows the designed weight arrays of the three area groups for the difference GLEM-feature when using the 102 patients. The grey surroundings are the elements where the weight arrays are zero (typically because of no oc-currences), the darker lower region where the weight arrays are negative and the brighter upper region where they are positive. It is from these figures clear that the GLEM-features mainly measure the average grey level; lower grey level increases the probability of being bad prognosis. However, we also see that large grey level entropy indicates bad prognosis, even for high grey levels. Because the intensity changes in our cell images are gradual, large grey level entropy is correlated with large grey level variance. This observation is also verified by replacing the grey level entropy axis with the grey level variance in the same local window (9x9), resulting in the corresponding grey level variance matrix (GLVM) [73], which give insignificant different classification results with re-spect to both the CCReq and the CCR. Therefore, the GLEM-features can be seen as combined measurements of the average and variance in grey level.

The connection between the negative GLEM-feature and the average and

7.2. GREY LEVEL ENTROPY MATRICES 115

Figure 7.3: The designed weight arrays of the difference GLEM-feature when us-ing the 102 patients. The arrays corresponds to the cell area group[2000,2999], [3000,3999]and[4000,4999]from left to right. Each image is linearly scaled to fill the entire grey level range; the true range is from left to right: [−1.0,0.85], [−1.1,1.1]and[−1.4,1.2].

variance in grey level are visualised by the scatter plots in figure 7.4 when using the 102 patients. For these scatter plots and all following scatter plots containing adaptive texture feature(s), the values of the adaptive texture feature(s) are computed using weight array(s) which are designed using the entire dataset that is visualised (the 102 patients in figure 7.4). This makes the visualised separation of the adaptive texture feature optimistically biased, but typically only slightly because the weight arrays will typically be well filled with occurrences due to our great concern for the overfitting problem. We emphasise that such computation of the values of adaptive texture features areonly done to make plots and never during evaluation.

The corresponding weight arrays as in figure 7.3 when using all 134 pa-tients shows the exact same pattern as this figure, but each element have about 20 % less estimated discrimination value and the negative part is nearly uni-form (instead of peaked). These changes are as expected because patients with tetraploid or polyploid histograms are typically good prognosis in our dataset, but typically have the same local grey level characteristics as the patients with

Figure 7.4: Scatter plot of the negative GLEM-feature against: left) the GreyLevelAverage-feature, right) the GreyLevelVariance-feature when using the 102 patients. The blue plus sign represents good prognosis and the red asterisk symbol represents bad prognosis.

116 CHAPTER 7. RESULTS AND DISCUSSION aneuploid histograms. This is because all three ploidy histograms indicate that a significant proportion of the cell images have large IODs and thus also low grey level, see section 3.1). However, as the pattern of the weight arrays are similar, the discussion of what the GLEM-features measures is valid also for all 134 patients. Also, because the grey level average and variance are obviously affected similar by the inclusion of the patients with the tetraploid and poly-ploid histograms, the connection between the GLEM-feature and the average and variance in grey level is also still valid.

Assumptions of the estimated Mahalanobis distance between the classes Because the weight arrays are designed using the estimated Mahalanobis dis-tance between the classes, it is interesting to investigate whether the under-lying assumptions are met. To test these assumptions, we will assume that the samples within each element in the collection of the property arrays of all 134 patients can be seen as independent. We will then test the normality as-sumption of each prognosis class using the Lilliefors goodness-of-fit test [32] at significance level 0.05. This is a generalisation of the Kolmogorov-Smirnov test for the case of normality when the expectation and variance are unknown [32, p.399]. The assumption of equal variances will be tested using the standard F-test [11, pp.515–519] at significance level 0.05 (the null hypothesis will of course be that the two variances are equal). Note that this test is strongly dependent on the normality assumption [11, p.519]. In particular, the standard F-test is more dependent on the normality assumption than the pooled two-samplet-test which the estimated Mahalanobis distance between two classes can be seen as theT-statistic of, if letting the null hypothesis be equal expectations [11, p.519].

However, because none of the tests would ideally be rejected (as the appropri-ateness of using the estimated Mahalanobis distance between the classes can only be guarantied in this case), we expect that the standard F-test performs acceptably as the distributions are at least approximately normal when none of the normality tests are rejected.

Figure 7.5 shows the result of testing the assumptions. We see from the images in the left and middle column that the normality assumptions are slightly questionable. In comparison with figure 7.3 we note that the assumptions are not rejected in the most discriminative elements. This is comforting, but only a natural consequence of the central limit theorem as these are also the elements with most occurrences2. The common variance assumption is slightly more frequently satisfied and also this assumption seems most appropriate in the more interesting elements. In total, we conclude that the underlying assumptions of the estimated Mahalanobis distance between the classes seem to be generally acceptable when using the GLEM-features.