• No results found

K- Means++

4.4 Potential Biomarkers

The subset of features obtained by removing intra- and inter-correlated features from the standard feature matrix was ranked, using the MultiSURF and XGB algo-rithms, according to relevance for classifying the HPV unrelated cohort. The proce-dure is outlined in Section 3.8.3.

Preliminary estimates of feature importance was obtained with MultiSURF for the 26 most prognostic features to disease-free survival. This number of features, given in Table 4.4, was the average configuration selected for MultiSURF during classification of the HPV unrelated cohort. Each feature was assigned a weight by MultiSURF, as

Potential Biomarkers

8%

23%

42%

4% 8%

15%

Feature Categories:

Shape CT First-Order CT Texture PET First-Order PET Texture Clinical

Figure 4.27: The categorical distribution of the 26 features determined by the Mul-tiSURF algorithm as the prognostic to disease-free survival in the HPV unrelated

cohort.

described in Section 2.4.2, to represent the relevance of the feature towards disease-free survival. The distribution of the selected features grouped according category is illustrated in Figure 4.27.

Figure 4.27 shows that the majority of the 26 features selected by MultiSURF origi-nated from the CT texture category, followed by CT first-order and clinical factors.

The PET first-order features constitutes the smallest category in Figure 4.27.

The weights of the 26 features selected by MultiSURF are shown in Figure 4.28.

Larger weights signify higher relevance of disease-free survival. Abbreviations to texture feature categories are given in Table 4.2.

0.00 0.02 0.04 0.06 0.08 0.10 0.12 MultiSURF Feature Weight

Major Axis Length CT Dependence Var 32binsCT Root Mean SquaredCT Mean DeviationCT LDHGLE 64binsCT Cluster ShadeCT SRLGE 32binsCT CorrelationNaxogin DaysCT PercentileCT MinimumPET StrengthCT KurtosisPET EnergyCT MedianCT RLNUNT StageStage CT Cluster Tendency 32binsCT SALGHLE 32bins

CT Dependence Var 128binsCT Joint Energy 64binsPET LRHGLE 32binsCT Zone EntropyElongationECOG Feature Categories:

Clinical First Order GLCMGLDM GLRLM GLSZM NGTDM Shape

Figure 4.28: The 26 features (vertical axis) with the highest MultiSURF weights (ho-risontal axis) that quantifies feature relevance for classifying disease-free survival

in the HPV unrelated cohort.

Figure 4.28 shows that Major Axis Length, median CT intensity, and the CT texture features Dependence Variance (Dependence Var) and Large Dependence High Gray Level Emphasis(LDHGLE) were the four highest ranked features according to Multi-SURF weights. Note that the clinical factorsT Stage,Tumour Stage(Stage) and ECOG were also selected.

The SCCs between the ROI size and the features show in Figure 4.28 are given in Figure 4.29.

Potential Biomarkers

0.00 0.20 0.40 0.60 0.80 1.00

Correlation with tumor volume (SCC) CT Dependence Var 32bins

CT Cluster Tendency 32binsCT Root Mean SquaredCT Joint Energy 64binsCT SALGHLE 32binsCT Mean DeviationCT LDHGLE 64binsCT Cluster ShadeCT SRLGE 32binsCT CorrelationNaxogin DaysCT PercentilePET StrengthCT MinimumCT KurtosisCT MedianElongationCT RLNUN CT Dependence Var 128binsPET LRHGLE 32binsMajor Axis LengthCT Zone EntropyPET Energy

Feature Categories:

Clinical First Order Shape Texture

Figure 4.29: The SCC (horisontal axis) between the ROI and the 26 features (vertical axis) selected by MultiSURF as the most prognostic of disease-free survival.

Note that PET Energy corresponds to the highest SCC in Figure 4.29, as the feature strongest correlated to ROI, followed by Major Axis Length. Both 64 bins CT LD-HGLE and 32 bins CT Dependence Variance associates less than 0.6 SCC, and are less correlated with ROI compared to to 128 bins CT Dependence Variance.

The feature relevance of disease-free survival using XGB was estimated by calcula-tion of SHAP values, which is described in Seccalcula-tion 3.8.3, based on the 26 features selected by MultiSURF. The SHAP values represent the average contribution of fea-tures to a model prediction. Four of the 26 ranked by MultiSURF were associated with a non-zero mean absolute SHAP value, as shown in Figure 4.30.

0.00 0.21 0.43 0.64 0.85 1.06 The mean absolute of SHAP values (mean|SHAP|).

Major Axis Length

CT Dependence Var 32binsCT LDHGLE 64binsCT Median Feature Categories:

First Order GLDM Shape

Figure 4.30: The relevancy of features (vertical axis) towards disease-free survival in terms ofShapley Additive Explanations(SHAP) values (horisontal axis). A higher

mean absolute SHAP value indicates greater relevance.

Observe that the four features in Figure 4.30 corresponds to the four highest ranked features in Figure 4.28. Moreover, the SHAP value for Major Axis Length is more than three times as high as for the three other features.

The distribution of the features in Figure 4.30 are available in Appendix C, Figure C.1. Moreover, scatter plots illustrates the relationships between pairs features in-cluded in Figure C.2.

Chapter 5

Discussion

5.1 The Model Comparison Protocol

Motivated by theNo Free Lunchtheorems [23], an emphasised topic in this thesis was to reduce the bias in estimated classification error of the compared models. Studies on schemes to assess model performance have found that the nested stratified K-Fold cross-validation (CV) gave the least biased estimates [131], [92], [95], [94].

5.1.1 Nested Stratified K-Fold Cross-Validation

Despite the small bias, a drawback with the nested CV approach is the computational complexity of nested iterations [132]. GivenC number of hyperparameter config-urations to evaluate, the CV procedure trainsC ·K models. A nested CV scheme, however, trainsC·Kmodels as part of model selection as well as one model for each of theKvalidation folds in the outer loop. Thereby, the running time for the nested CV protocol depends quadratically on the choice ofK which should be be selected in according to the study objective [133].

Five or 10 folds are typical choices ofK[93]. A 5-fold nested CV scheme produces 25 models which requires less computational time compared to the 100 models trained during 10-fold nested CV. On the other hand, the effect of incrementingKhas been shown to reduce the bias in model performance estimates, since more observations

In this thesis, classification of the complete cohort of 198 patients (Experiments 1-5, Section 3.8.3) were performed with five folds to reduce the computational com-plexity of experiments. Moreover, the number of folds was increased to 10 in HPV subgroup analyses (Section 3.8.3) to account for fewer observations. The smallest number of patients included in a classification experiment were 67 patients. With this number of observations, 5- and 10-fold CV would produce training sets of ap-proximately 54 and 60 patients, respectively. In a nested protocol, these sets are divided once more into training sets of 44 and 54 patients. Note that even though 10-fold CV produces a larger training set than 5-fold CV, the size of the validation set is proportionately diminished. Reduced size of the validation set may poten-tially increase the variability in model performance estimates. Moreover, although bias reduction may generally be referred to as the main objective in model selection, variance reduction has shown to be essential to reduce model over-fitting [95]. Fur-thermore, reducing the size of the validation set could hinder sample stratification [92].

5.1.2 Stratified Sampling

A property of stratification, as used in this thesis, is the reflection of the original dis-tribution of clinical outcomes in each CV fold [92]. A further property is that strat-ification provides coverage for all subgroups in each fold. In their study on the im-pact of class distribution on classification trees, Provost and Weiss (2011) found that stratification of outcomes contributed to reducing model over-fitting [134]. This im-proved ability of the model to generalise was caused by preventing over- or under-representation of categories [134]. Nevertheless, stratification has, to the knowl-edge of the author, not been previously used in radiomics.

5.1.3 Hyperparameter Optimisation

The configuration of an algorithm affects the ability of the model to recognise pat-terns [82]. This makes hyperparameter optimisation a central part of model selec-tion. However, as pointed out by Parmar et al. (2015) [17], manual optimisation may