Radiomics using MR brain scans and RENT for identifying patients receiving ADHD treatment

(1)

Master’s Thesis 2021 30 ECTS Faculty of Science and Technology Professor Cecilia Marie Futsæther

Radiomics Using MR Brain Scans and RENT for Identifying Patients Receiving ADHD Treatment

Nasibeh Mohammadi

Data Science

(2)

i

First and foremost, I want to thank my main supervisor Prof. Cecilia Marie Futsæther, for her invaluable guidance through the project.

My special thanks go to my co-supervisor Ass. Prof. Oliver Tomic for his professional support and encouragement.

I would like to thank, Postdoc. Stefan Schrunner for his practical suggestions and PhD Candidate Anna Jenul, and MSc. Ahmed Albuni, for answering my questions.

I am grateful to the Computational Radiology & Artificial Intelligence (CRAI) Research Group and Prof. Liesbeth Reneman’s group at Clinical Research Unit of the Academic Medical Center, University of Amsterdam, Amsterdam, Netherlands, and MSc. Inger Annett Grünbeck for providing the MR images and dataset for analysis.

Last but not least, I would like to express my deep gratitude to my family for their sincere love and support.

(3)

ii

Abstract

The core purpose of this thesis was to investigate whether the methylphenidate-based (MPH) treatment of male children patients having attention-deficit/hyperactivity disorder (ADHD) led to changes to five subcortical brain structures (hippocampus, caudate, pallidum, putamen, and thalamus). The methylphenidate treated trials were compared to the placebo group. This was explored by using magnetic resonance (MR) images obtained from the effects of Psychotropic drugs On Developing brain (ePOD) study.

A radiomics approach was exploited to extract descriptors from T1-weighted MR images. Radiomics features including Local Binary Pattern (LBP), shape features and several texture features were derived from the right and left side of the chosen subcortical structures. In this context, a new feature extraction program for generating 3D LBP biomarkers was developed and a new feature selection method Repeated Elastic Net Technique (RENT) appropriate for short-wide datasets were utilised.

Thereafter, four different classification experiments were used to predict the medication class (medicated vs placebo) by using a nested cross-validation algorithm and nine supervised classifiers. The area under receiving operator curve (AUC) metric was used for evaluating the performance of classification tasks.

The performance scores suggested that there was a detectable change in the selected brain structures using MPH medication. The classification models showed AUC scores mostly above 85% especially in experiments where LBP features were used as stand- alone features or in addition to standard radiomics features. It appears that the LBP features were the most informative descriptor in this study.

The classification results were approximately the same in experiments with correlated features and without correlated features. Additionally, the higher performance obtained in our study on the same dataset as in a previous study exploiting several feature selectors indicated the capability of our feature selection method (RENT) in selecting robust features.

(4)

iii

Acknowledgements ... i

Abstract ... ii

List of Abbreviations ... v

1 Introduction and Motivation ... 1

2 Theory ... 4

2.1 Attention-Deficit/Hyperactivity Disorder ... 4

2.2 Radiomics ... 5

2.2.1 Step 1: Image Acquisition ... 5

2.2.2 Step 2: Segmentation ... 6

2.2.3 Step 3: Feature Extraction ... 6

2.2.4 Step 4: Feature Selection ... 14

2.2.5 Step 5: Modelling and Evaluation ... 17

3 Materials and Methods ... 23

3.1 Image Acquisition and Segmentation ... 23

3.1.1 The ePOD-MPH Study ... 23

3.1.2 Image Segmentation ... 24

3.2 Feature Extraction ... 24

3.2.1 Shape and Texture Features Extraction ... 25

3.2.2 3D LBP Features Extraction ... 25

3.2.3 The Feature Matrices... 27

3.2.4 Datasets ... 28

3.3 Experiments ... 32

3.3.1 Correlation Analysis ... 32

3.3.2 Feature Selection Using RENT ... 33

3.3.3 Modelling and Evaluation ... 34

4 Results ... 36

4.1 The Hippocampus ... 36

4.1.1 Feature Selection by RENT ... 36

4.1.2 Classification Modelling and Evaluation... 47

4.1.3 Heatmap Comparison of the Experiments ... 54

(5)

iv

4.2 The Caudate ... 56

4.2.1 Selected Features using RENT ... 56

4.3 The Putamen ... 63

4.4 The Thalamus ... 70

4.5 The Pallidum ... 76

5 Discussion and Further work ... 83

5.1 Selected Features ... 83

5.2 Classification Performance ... 84

5.3 Further Work ... 87

6 Conclusion... 89

Bibliography ... 91

Appendix ... 98

A. Code of 3D LBP feature extraction ... 98

B. Rotation Invariant Table ... 101

C. Modifications of Biorad feature extraction module ... 102

D. Code for Removing Correlated Features ... 107

E. RENT Configuration ... 108

F. RENT Validation Study ... 117

(6)

v AdaBoost Adaptive Boosting

ADHD Attention-Deficit/Hyperactive Disorder AUC Area Under the Receiver Operating Curve

CBF Cerebral Blood Flow

DT Decision Tree Classifier

ePOD The effects of Psychotropic drugs On Developing brain ET Extremely Randomised Tree Classifier

FN False Negative

FP False Positive

FPR False Positive Rate

GLC Gray Level Co-occurrence

GLCM Gray Level Co-occurrence Matrix

GLD Gray Level Distance

GLDM Gray Level Distance Matrix

GLRL Gray Level Run Length

GLRLM Gray Level Run Length Matrix

GLSZ Gray Level Size Zone

GLSZM Gray Level Size Zone Matrix

ID Identification Number

KNN K Nearest Neighbours

LBP Local Binary Pattern

LGBM Light Gradient Boosting Machine

LR Logistic Regression

ML Machine Learning

MLP Multi-Layer Perceptron

(7)

vi

MPH Methylphenidate

MR Magnetic Resonance

MRI Magnetic Resonance Imaging

NGTD Neighbouring Grey Tone Difference

NGTDM Neighbouring Grey Tone Difference Matrix RENT Repeated Elastic Net Technique

RF Random Forest

Ridge Ridge Regression

ROC Receiver Operating Curve

ROI Region of Interest

SVC Support Vector Machine Classifier

TN True Negative

TP True Positive

TPR True Positive Rate

(8)

1

1 Introduction and Motivation

Attention-deficit/hyperactivity disorder (ADHD) is a common psychiatric disorder among adolescents [1]–[6]. The most common medication for ADHD is methylphenidate-based treatment (MPH). However, its precise influence on the brain in the long-term is under debate [7], [8]. Since the maturation of the brain structure takes place during childhood, the usage of the drug during this sensitive phase of life can have persistent effects on brain development [9], [10].

The current research is based on the effects of Psychotropic drugs On the Developing brain (ePOD) study [11]. Currently, there are few papers linked to the ePOD study [11]. In this context, the results of Bouziane et al. demonstrates the influence of methylphenidate on the white matter of the brain [4]. Walhovd et al. (2020) assessed the effect of MPH on cortical thickness in ADHD patients [7]. They found that the usage of methylphenidate affected the development of grey matter in the right medial cortex of children. Schrantee et al. in 2016 [12] presented an age-dependent study of the cerebral blood flow (CBF) response to methylphenidate medication. They observed that the subcortical thalamic CBF was reduced in children treated by MPH. Another study in 2020 by Tamminga et al. [8] explored the effect of MPH on the patient’s performance after the treatment. They concluded that the improvement of working memory and response speed in ADHD patients was related to the treatment period and not after the treatment.

Furthermore, Grünbeck (2020) examined the changes in the grey matter of the human brain caused by MPH treatment using radiomics [13]. Grünbeck performed several classification tasks and used various feature selection methods to examine the impact of MPH medication on the five subcortical structures of the brain, including the hippocampus, caudate, thalamus, putamen, and pallidum. Her study found that some image features, particularly from pallidum and putamen, appear to be associated with MPH treatment, but these findings required further confirmation.

Radiomics is a developing field of study that aims to mine quantitative biomarkers from medical images to help the clinical decision-making process [14], [15]. Radiomics exploits advanced technologies in artificial intelligence to ameliorate the accuracy of diagnosis and treatment based on the extracted radiomics features [16]. Radiomics

(9)

2

features refer to the different types of features that can be derived from an image.

Generally, they are categorised into four main groups (shape-based, intensity-based, texture-based, and higher-order features) [17]. Radiomics utilisation needs programming and machine learning (ML) knowledge. In this context, having a standard and user-friendly tool for researchers, scientists, radiologists, and oncologists to extract reproducible and comparable biomarkers from images is demanding. The Biorad framework [18], [19] using the pyradiomics package [20] tried to address this issue. The pyradiomics package covered common methods for extracting image texture features like the Grey Level Co-occurrence Matrix, Grey Level Run Length Matrix and so forth. However, the powerful feature extraction method Local Binary Patterns (LBP) [21] was not included.

In radiomics studies, issues regarding medical image acquisition and the privacy policies regarding patient information complicate the sample gathering process [17].

In addition, in the feature extraction phase of radiomics, many biomarkers are extracted from the medical images. In this context, radiomics studies suffer from high dimensional data and few samples [22]. Repeated Elastic Net Technique (RENT) [23]

is a brand-new user-friendly feature selection tool that works by training several ensemble sub-models on unique subsets of the dataset. The authors claimed that it is appropriate for short-wide datasets and that it provides high performance relative to other feature selection methods as Laplacian score, relief, mRMR and Fisher score [23].

In this thesis, our primary objectives were to construct a robust classification model and extended the radiomics dataset in Grünbeck’s study of MPH on adolescent brains [13]. We utilised the radiomics approach to analyse the changes caused by ADHD medication in five subcortical structures of the brain by comparing treated patients to the control (placebo) group based on the images of the ePOD study [11]. In this thesis, we developed a 3D LBP extraction module that is not included in the pyradiomics package [20] and added it to the Biorad framework [19]. Thereafter, we examined classification results based on LBP features and compare them to the results obtained from other texture features and shape features. We tested RENT for selecting features and examined the robustness of features selected by RENT by modelling.

All in all, the goals of this thesis were: 1) examine whether MPH medication alters the brain structure of ADHD-diagnosed ten- to twelve-year-old male patients; 2) explore the entire radiomics pipeline for an ADHD study; 3) develop a feature extraction tool for 3D LBP features; 4) explore the efficiency of RENT as a feature selection tool; 5) employ methods for examining the short-wide datasets; 6) tackle the lack of unseen data in the radiomics study.

(10)

3

This thesis’ chapters are structured according to the IMRaD (Introduction, Method, Results and Discussion) format [24]. Chapter 1 contains a brief introduction to our work and motivations. Chapter 2 contains the theoretical background of the thesis.

Methods and Materials used in this thesis are outlined in chapter 3. The thesis's findings and experimental results are described in chapter 4. A discussion of the results and observations, and suggestions for future work are covered in chapter 5. In chapter 6, the conclusion of the goals of this thesis is given. The results that were not covered in chapter 4 are presented in chapter 7 as appendices.

(11)

4

2 Theory

2.1 Attention-Deficit/Hyperactivity Disorder

Attention-deficit/hyperactivity disorder (ADHD) is among the most frequently diagnosed childhood neurodevelopmental disorders, with an overall prevalence of 5%–8% in children worldwide [1], [4]–[6]. Boys are twice as prone to be affected by ADHD as girls [6]. ADHD manifests itself with symptoms such as hyperactivity, severe impulsiveness, distractibility, and inattention. Therefore, it adversely affects social, educational, and emotional activities [1]–[3], [6], [25]. These side effects may continue into adulthood and result in a long-lasting impairment [5], [25].

Methylphenidate (MPH) treatment is a viable and safe medication prescribed broadly for ADHD patients; however, its exact neurochemical behaviour is under discussion, and knowledge about its long-term side effects on the children’s brains is limited [3], [9], [25].

Adolescence and childhood are exceptionally sensitive and susceptible time of brain development. During this time, the development of several parts of the brain happens.

Hence, medications given during the delicate beginning stages of life may influence neurodevelopmental directions that can have more significant impacts later in life [9], [10].

Studies on Magnetic Resonance Imaging (MRI) have shown that stimulant medication influences brain development, to such an extent that untreated kids with ADHD show faster cortical thinning and smaller white matter volumes than children with ADHD using stimulant prescription [10]. Studies of medical images can play a helpful role in diagnosing and treating diseases and examining long-term changes in brain structure due to medication [1].

(12)

5

Recently, MRIs have been widely used in the study of patient’s brain structures [25].

MRIs empower research to examine the structure of brains noninvasively. Thereby, it is possible to study different brain tissues (white matter and grey matter) and various cortical and subcortical brain structures [5].

2.2 Radiomics

Recently, the advancement of digital medical records in clinics and hospitals and the availability of medical images have facilitated the introduction of a new approach to extract data from medical images, called "Radiomics" [17], [26]. Radiomics is concerned with the concept that radiological images can reveal information that is not visible to the human eye. Radiomics investigates the quantitative features of digital images and converts the images into mineable data, incorporated into clinical decision-support [17], [27]–[30].

The radiomics pipeline (Figure 1) comprises several steps, which will be discussed in this chapter. The steps are 1) image acquisition, 2) segmentation, 3) feature selection, 4) feature extraction, and 5) modelling and evaluation.

Figure 1. Radiomics pipeline includes the sequential activities of image acquisition, segmentation, feature selection, feature extraction, modelling and evaluation.

2.2.1 Step 1: Image Acquisition

Radiomics is the process of quantifying the characteristics of medical images [31]. It can be applied to different modalities of digital imaging [32]. There is no standardised image acquisition technology to use in a radiomics study [31].

The most common medical imaging protocols are Computed Tomography (CT) Scans, Positron Emission Tomography (PET) Scans, and Magnetic Resonance Imaging (MRI).

• PET Scan: This technique uses radioactive substances to scan the reaction of the organs and tissues to the drug. Utilising PET scans in combination with CT scans or MRI scans can lead to better disease diagnosis [33].

• CT Scan: Multiple X-ray images are captured from various angles around the body and combined by a computer algorithm to constitute cross-sectional images of the region inside the body [34].

(13)

6

• MRI: This screening technology uses a magnetic field and computerised radio waves. It produces high-resolution images of part of the body [35]. MRI has become an advanced technology that provides a non-invasive analysis of pathology [36].

The imaging protocols between sites and studies are usually not standardised. Also, the devices and scanners used for image acquisition may introduce noise that will affect the radiomics pipline [28]. Hence, in radiomics studies, the raw images are revised by pre-processing procedures, such as noise reduction, artefacts correction, normalisation and so forth [36].

2.2.2 Step 2: Segmentation

Lesion segmentation is a critical step in a radiomics study as the image delineation affects the quality of features extracted from the corresponding region of interest (ROI) [14], [17], [26], [36].

Segmentation can be done in manual, semi-automated or automated ways [14], [31], [32], [36].

• Manual method: an expert or group of experts annotate the boundaries of the lesion region [36], [37]. Manual segmentation is vital in the studies as a high degree of lesion border accuracy is necessary [31].

• Semi-automated algorithms: refers to the usage of standard segmentation techniques such as thresholding or region-growing. These methods usually use manual correction [32].

• Automated solutions: Nowadays, several open-source or commercial software and tools for lesion segmentation are available [14], [32].

2.2.3 Step 3: Feature Extraction

Feature extraction is at the heart of the radiomics pipeline. In this step, the images are converted to mineable data. The different types of biomarkers that can be extracted are categorised into three main groups as follows:

• Shape features: are the most direct attributes related to the geometry of the ROI, such as volume, sphericity or compacity [27], [38].

• First-order features: refer to the statistical distribution of the voxel intensity values within the segmented region, and include measures like the mean, median, uniformity, randomness, skewness, kurtosis [5], [28], [38].

• Second-order features: generally, are described as texture features. This type of features is statistical descriptors related to spatial relationships between

(14)

7

voxels. There are many texture features. Examples of texture features are Local Binary Pattern (LBP), the Grey Level Co-Occurrence Matrix (GLCM), the Grey Level Run Length Matrix (GLRLM), the Neighbouring Grey Tone Difference Matrix (NGTDM) and so forth [26], [38], [39].

• Higher-order features: are determined by applying filters and advanced methods to the images to extract patterns difficult to distinguish by eye, such as Laplacian of Gaussian filter, Fourier transform, and Gabor transform [27], [38], [39].

In the rest of this section, the different texture features used in our study will be described.

Three-Dimensional Local Binary Pattern (3D LBP)

LBP is categorised as a texture feature. The basic concept of LBP was introduced by Ojala et al. (1996) [40]. After that, many extensions to it have been proposed. Some studies proposed an extension of LBP to capture 3D textures and patterns for 3D images. In our study, we used the approach presented by Montagne et al. (2013) [41].

LBP Basic Process

The LBP basic calculation for both 2D and 3D space is the same. For calculating LBP code for each pixel/voxel in our image, the steps are as follows [41]–[44]:

1) Calculate the difference between the intensity value of the central cell (𝑔_𝑐) and its neighbours (𝑔_𝑖), denoted as (𝑔_𝑖− 𝑔_𝑐)

2) Provide a sign function (𝑠(𝑥)) that means if the neighbour cell has an intensity value greater or equal to the central voxel, it is set to 1 else 0. By concatenating obtained zero or one values, we have a binary code of length P for each centre point.

3) Convert the binary code to a base-ten decimal number by the LBP operator (Equation 1). Each decimal number represents a unique textured pattern.

𝐿𝐵𝑃_𝑃,𝑅 = ∑^𝑃−1_𝑖=0 𝑠(𝑔_𝑖 − 𝑔_𝑐)2^𝑖 where 𝑠(𝑥) = {1 𝑖𝑓 𝑥 ≥ 0 ,

0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ⁽¹⁾

P is the number of neighbouring voxels at a distance R from the central node.

4) Derive the LBP features by counting the frequency of each decimal output (histograms of patterns). This last step is necessary when we use the LBP operator for extracting texture features.

(15)

8

Figure 2 outlines the process of acquiring the decimal number in a 2D LBP.

Figure 2. Example of a 2D LBP computation, P = 8 and R = 1 [42]. The intensity value of central cell (6) works as a threshold for assigning zero or one to the eight adjacent cells and making a binary code. 𝑖 is the index of the neighbouring pixels in a clockwise direction. The decimal is the base-ten format of the binary code.

3D LBP

According to Montagne et al. [41], in 3D LBP, we only considered the direct neighbours on the axes x, y, z and excluded the neighbouring voxels on diagonals in 3D space.

The direct neighbours are surrounding voxels with R equal to 1 (Figure 3).

Figure 3. Direct neighbours of a specific cell in 3D space on the x, y, z axes. a) The spatial schematic of the central node and its adjacent cells. White nodes are 0 (lower intensity value than c), and black nodes indicate 1 (higher or equal to c). The number on each circle shows the indexing order of nodes (𝑖). c (the intensity value of central voxel) denotes the threshold value. b) The enumeration order and the position of direct neighbours on each axis with the central node as the origin point [41].

The LBP operator output (Equation 1) produces 2^𝑃 different values, referring to 2^𝑃 binary patterns (each neighbouring voxel can only have the value 0 or 1 in LBP

(16)

9

computation). Hence, if the number of adjacent cells equals 6, we have 2⁶ (= 64) patterns [45].

Rotation Invariance

It is important to recognise unique patterns from redundant ones because this increases interpretability and decreases computation time. According to [42], rotation of the image results in a different interpretation of LBP pattern location (Figure 4).

Figure 4. The effect of rotation on the neighbourhood points [42].

The rotation-invariant concept states that some patterns counted as a new pattern are generated repeatedly by rotating the image. Actually, these patterns occur by displacement of the neighbouring cells along the perimeter of the circle (if we assume a circle around the central voxel with the neighbouring cells on its perimeter) [45].

Grouping the patterns of similar scenarios enables us to remove the effect of image rotation.

According to [41], three distinct scenarios can occur at each axis by considering each coordinate (x, y, z) separately. These scenarios are:

I. Both neighbouring nodes on the axis are lower than the centre voxel (zero value) II. One of the adjacents is lower than the centre value; the other is higher than (or

equals) the centre node.

III. Both adjacent cells on one axis are higher than or equal to the centre point (one value)

We used the above scenarios to remove redundant patterns and reduce the number of patterns to 10 distinct patterns instead of 64 (2⁶). Figure 5 demonstrates these patterns. The mentioned scenarios were also used to name the different patterns. For example, LBP_300 shows the pattern has only zero points on all three axes (pattern 1 in Figure 5), whilst LBP_003 represents the pattern with all nodes equal to 1 on all three coordinates (pattern 10 in Figure 5). By contrast, the LBP_030 is the pattern in

(17)

10

which there exists one node of 1 and one node of 0 on each axis (pattern 6 in Figure 5). Table 1 provides the number of times each pattern can arise and the pattern name.

Figure 5. All possible groups of rotation invariant patterns in a 6-neighbour 3D LBP. The intensity value of the central voxel (c) is the threshold value. White nodes are cells with a lower intensity value than c (0), and black nodes indicate cells with a value higher or equal to c (1).

The number on each circle shows the indexing order of nodes (𝑖) [41].

Table 1. Name and frequency of different arrangements of 6-neighbour 3D LBP patterns depicted in Figure 5.

Pattern Number Name Multiplicities Pattern Number Name Multiplicities

Pattern 1 LBP_300 1 Pattern 6 LBP_030 8

(18)

11

Grey Level Co-occurrence Matrix (GLCM)

The texture representation technique of the co-occurrence matrix is one of the oldest statistical methods. This method determines the texture by means of grey level distribution [46]. The value of (𝑖, 𝑗)^𝑡ℎ element of the matrix is the count of voxels with grey level 𝑗 that exists in a distance of 𝑑 along the direction 𝜃 from a voxel with the value of 𝑖 [21], [46]. Figure 6 shows an example of GLCM calculation.

Figure 6. An example of GLCM computation with d = 1 and θ = 0. The element (2,1) of the GLC matrix equals 1 since only one combination of connecting voxels with intensity values of 2 and 1 in the horizontal direction exists. modified from [13], [47].

According to [21], the features such as autocorrelation, difference entropy, contrast and so forth are calculated for the GLC matrix.

Grey Level Dependence Matrix (GLDM)

The GLD matrix presents the dependencies in a grey scale image [21]. The calculation of GLDM is illustrated in Figure 7. The element (𝑖, 𝑗) of the output matrix is the number of times that the centre voxel with grey level 𝑖 has 𝑗 dependent neighbour voxels. The centre voxel with intensity level 𝑖 is dependent on its neighbour cell with grey level 𝛾 at the distance 𝑑 if |𝑖 − 𝛾| ≤ 𝛼 (𝛼 is a given scalar) [13], [21].

One can calculate Gray Level Variance, Dependence Non-Uniformity, Large Dependence Emphasis and so forth for the GLD matrix [21].

(19)

12

Figure 7. An example of GLDM computation of an image with four grey levels. Here, the distance d equals 1, and the scalar α is 0. The lineated element of the GLD matrix equals 1 since there exists only one centre voxel with value 2 and two dependencies. Modified from [13].

Grey Level Run Length Matrix (GLRLM)

GLRLM is another statistical texture method. This method constructs the output GLRL matrix by calculating the count of cells with the same grey level in a specific direction of 𝛼 [46]. For instance, two voxels with the same intensity value in the horizontal direction provide one run with length two [47]. In Figure 8, the calculation of a GLRL matrix is shown.

Figure 8. An example of GLRLM computation of an image with four grey levels in the direction of α = 0. The element (2, 3) of the GLRL matrix equals 1 since only one run with the grey level 2 and length 3 in the horizontal direction exists. Modified from [13], [47] .

Several features such as Short Run Emphasis, Run Percentage, Run Length Non- Uniformity and so forth are calculated for the GLRL matrix [21].

(20)

13

Grey Level Size Zone Matrix (GLSZM)

The grey level size zone matrix considers the area with the cells of the same grey level. The basis of extracting GLSZM features is like GLRLM construction. The matrix (𝑖, 𝑗)^𝑡ℎ element shows the number of zones with the size 𝑗 and intensity value 𝑖 (Figure 9). This matrix manifests a homogeneous texture if the matrix elements show large areas of the same grey level toward any direction [46].

Figure 9. An example of GLSZM computation of an image with four grey levels. The element (3, 3) of the GLSZ matrix equals 1 since only one zone has grey level 3 and size three.

Modified from [13].

For the GLSZ matrix, various features such as Zone Percentage, Large Area Emphasis, Size-Zone Non-Uniformity and so forth are calculated [21].

Neighbouring Grey Tone Difference Matrix (NGTDM)

The NGTDM provides the difference between the grey level of a voxel and the average intensity values of its neighbouring voxels at a distance 𝑑. The matrix consists of the sum of absolute differences for intensity value 𝑖 [21]. Figure 10 illustrates the computation of NGTDM.

The features such as coarseness, contrast, busyness, complexity, and strength are calculated for the NGTD matrix [21].

(21)

14

Figure 10. An example of a 2D NGTD matrix calculation. 𝑠_𝑖 indicates the sum of the absolute difference between grey level 𝑖 and its adjacent cells. The figure shows the computation of 𝑠₃ (𝑖 = 3, 𝑑 = 1) 𝑠₃= |3 −^4+4+1+2+3

5 | + |3 −4+4+3+2+2+1+2+3

8 | + |3 −^4+4+3+1+2

5 | = 0.78.

modified from [13].

Although LBP is a texture feature, from here on, the term "texture feature" refers to the second-order features including GLCM, GLDM, GLRLM, GLSZM and NGTDM, excluding LBP. Whenever we aimed at referring to the LBP feature, we mentioned its name directly.

2.2.4 Step 4: Feature Selection

According to [48], feature selection is crucial for problems with short-wide datasets for three reasons: 1) to tackle the “curse of dimensionality”; 2) to compact the input data for reducing model execution time; 3) to improve the result comprehensibility.

Thus, feature selection is a critical step in the radiomics pipeline because plenty of features are obtained during the feature extraction step. In addition, due to the limitations for gathering samples in clinical studies, the dataset has a small number of samples compared to plenty of features. Therefore, in this context, datasets are short- wide (few samples with many features). Moreover, the radiomics features are highly correlated, redundant, or irrelevant, affecting model performance [17], [22].

Feature selection focuses on searching for a subset of the input data with fewer features that can represent the given dataset effectively and improve the learning accuracy by decreasing the side effects of noise or redundant features [49], [50]. In past years, several methods have been proposed for answering the need for selecting features. In general, the feature selection methods are categorised as filter methods, wrapper methods and hybrid methods.

(22)

15

The filter approach refers to the algorithms for selecting features without training by any predictive model [48]. On the other hand, wrapper methods rely on learning by a predefined model and using its performance as the criteria to select features [51]. In contrast, the hybrid methods are a combination of both filter and wrapper methods [52].

Repeated Elastic Net Technique (RENT)

RENT [23] is a brand-new feature selection method introduced by the Norwegian University of Life Sciences. It is an ensemble based approach and well designed for short–wide datasets [23]. It tries to select robust features by employing logistic regression (LR) model with elastic net regularisation for binary classification tasks.

Figure 11. The blue frame demonstrates the RENT process. RENT splits and trains the input dataset across the K sub-models and selects the robust features based on three criteria (𝜏₁, 𝜏₂, 𝜏₃). The output is a dataset with the selected features [23].

According to [23], in a binary classification problem, RENT first splits the input dataset into several unique subset models (Figure 11). Then, it uses the penalised LR algorithm to train each subset model 𝑀_𝑖 separately. In each sub-model, a different subset of features may be selected by the elastic net regularisation. Finally, based on the quality criteria and the user given cut-off values. A feature will be added to the output dataset if it fulfils all of the following criteria together [23]:

1. The feature has a high score, which means that it is selected in most of the K models (𝜏₁). A user-defined threshold (𝑡₁) determines how many times the feature should be selected among all K models.

2. The feature is stable if it has few weights’ signs alternation (𝜏₂). A feature with weights of the same sign (either all positive or all negative) is ideal. The user can provide the preferred number of proportions of feature weights with the same sign (𝑡₂).

(23)

16

3. The feature frequently has non-zero weights across the K sub-models with low variance (𝜏₃). User can specify a threshold value (𝑡₃) for the level of significance.

All the quality metrics are bounded between 0 and 1 (𝜏₁, 𝜏₂, 𝜏₃ ∈ [0, 1]) [23]. It has to be stressed that a feature is selected if and only if 𝜏₁ ≥ 𝑡₁𝑎𝑛𝑑 𝜏₂ ≥ 𝑡₂ 𝑎𝑛𝑑 𝜏₃ ≥ 𝑡₃. The possibility of defining three threshold values (𝑡₁, 𝑡₂, 𝑡₃), instead of specifying the desired number of features, makes the user capable of adjusting the strictness of the feature selection process.

Elastic Net

Elastic net is a regularisation method introduced by Zou and Hastie (2005) [53].

Equation 2 shows the elastic net regularisation term (𝜆_{𝑒𝑛𝑒𝑡}) calculation [23].

𝜆_{𝑒𝑛𝑒𝑡}(𝛽) = 𝛾[𝛼 𝜆₁(𝛽) + (1 − 𝛼) 𝜆₂(𝛽)] (2) The penalty parameter 𝜆₁ (named L1 regularisation) penalise the sum of absolute values of 𝛽 (regression coefficients), and the penalty parameter 𝜆₂(named L2 regularisation) penalises the sum of squared values of 𝛽 [23], [54].

In equation 2, 𝛾 denotes the regularisation strength and is a positive decimal number.

𝛼 is a mixing parameter in the range of [0,1] [23]. RENT uses the LR classifier implemented in scikit-learn [55]. In this package, the 𝛼 parameter is indicated by the l1_ratio parameter, and instead of 𝛾 parameter, the inverse of 𝛾 named C parameter is used [55].

Compared to other regularisation models (Lasso and Ridge), the elastic net advantage is in exploiting both 𝜆₁and 𝜆₂ penalty parameters that empower the algorithm to combine shrinkage and the variable selection [54].

In [23], RENT is compared to various feature selection methods by performing several empirical experiments, and Fisher score (F- Score) and recursive feature elimination (RFE) methods provided competitive results. Thus, we chose these two feature selection methods to describe as examples of filter methods (F- Score) and wrapper methods (RFE).

Fisher Score Method

The Fisher score (F-Score) method selects features by measuring the class discriminant of each feature based on its F-Score value [56]. It is a filter-based method,

(24)

17

which means that features scores are calculated, and the features are selected in terms of their score ranks [57].

The idea behind this method is to provide a subset of features with larger distances between samples in a different class and a smaller distance between data points of an individual class [57]. The final selected feature subset contained features with a higher F-Score [56]. Equation 3 is used for calculating F-Score [58].

𝐹(𝑥^𝑗) = ∑^𝑐_𝑘=1𝑛_𝑘(𝜇_𝑘^𝑗 − 𝜇^𝑗)²

(𝜎^𝑗)² ⁽³⁾

Where:

𝑥^𝑗 is the j-th feature

𝜇_𝑘^𝑗 is the mean of the jth feature in the k-th class

𝜎_𝑘^𝑗 is the standard deviation of the j-th feature in the k-th class c 𝜇^𝑗 is the mean of the j-th feature for the whole dataset

𝜎^𝑗 is the standard deviation of the j-th feature in the whole dataset

Although the features selected by the F-Score algorithm are often suboptimal, this heuristic algorithm has some deficiencies. The F-Score method fails in cases where features have low individual score and a very high score when considering together as a whole. Another drawback is related to its weakness in handling redundant features [57].

Recursive Feature Elimination Method

The recursive feature elimination (RFE) method is a simple and popular wrapper method. RFE uses various ML algorithms as the core training method [59].

The algorithm starts by fitting an ML model on the given dataset. Then it continues by eliminating the least important features or features with lower weight coefficients. This process is repeated until the desired number of features is reached [59], [60].

Even though this method is quick and straightforward, it is not appropriate for problems with plenty of highly correlated features [61].

2.2.5 Step 5: Modelling and Evaluation

Machine learning is a subfield of the artificial intelligent area and has evolved remarkably fast [15]. Machine learning shows its unique capabilities in research areas.

It plays an essential role as an interface between medical research and computer

(25)

18

science studies [22]. The analysis of image data through machine learning concepts can empower us to understand illnesses and medications, and it can provide effective treatments and personalised medication [62].

Model Building

In radiomics studies, the objective is to exploit machine learning concepts to predict the target based on radiomics features [14]. Machine learning algorithms are categorised into two main groups:

• Supervised learning uses labelled samples as the target variable to predict the output. The target can have continuous values in a regression model or a categoric value in a classification model [15], [63].

• Unsupervised learning does not use expert labelled data. Instead, it tries to find the patterns in the data by methods such as clustering and predict the new data structure [62], [63].

Supervised Classifiers

As mentioned before, classification problems belong to the family of supervised learning methods. With regards to the number of class labels, the classification tasks can be binary or multi-class problems. There are only two class labels in binary classification; by contrast, multi-class tasks have more than two class labels. The current study is a binary classification work.

There are a variety of classifiers used in supervised learning for binary classification.

In this research, we used the following classifiers:

• Logistic Regression (LR) Classifier is an easy-to-implement algorithm. It is broadly used in medical studies because it is appropriate for defining the disease state [64]. Despite its name, it is a binary classifier that forecasts the target value using the logistic function [63]. It is not necessary to have normally distributed predictors or linearly related ones, but these will increase the model power. It assumes a linear relationship between the logit of the dependent variable (outcome) and the independent variable (predictor) [64].

• Support Vector Machines Classifier (SVC) It is an effective machine learning method. Its objective is to maximise the distance between decision boundaries and the samples [55].

• K Nearest Neighbors (KNN) Classifier seeks the given number of samples (k) near the desire training example based on a distance metric and provides the class label of the desired sample by majority voting [55].

(26)

19

• Multi-Layer Perceptron (MLP) Classifier is a basic neural network algorithm.

It has multiple nodes and layers (similar to a direct graph). The layers are the input layer, the hidden layer(s) and the output layer. All nodes in one layer are connected to the nodes in the preceding layer [65].

• Decision Tree (DT) Classifier is the basic tree classifier that is based on rules.

It groups the samples based on rules and decision making [55].

• Random Forest (RF) Classifier refers to the algorithm that models an ensemble of decision tree sub-models and provides the output based on the majority class label in all sub-models [66]. It uses a random bootstrap sample size [55].

• Ridge Classifier corresponds to an L2 regularised model [55], [67]. This algorithm moderates the weight coefficients by minimising the sum of squared residuals [68].

• Adaptive Boosting (AdaBoost) Classifier is an ensemble algorithm. It trains many weak learners by generating a sequence of classifiers and reweighting the importance of samples to find the best classifier. Larger weights are assigned to misclassified samples until the algorithm attains a model that can classify them correctly [67].

• Extremely Randomised Tree Classifier It is also an ensemble model based on decision trees. Its difference from the random forest is that it uses the entire sample instead of bootstrapping. Also, it randomly chooses the cut-points for splitting the nodes. Similarly to other ensemble models for final prediction, it uses a majority voting [69].

• Light Gradient Boosting Machine (LGBM) Classifier is a method that uses a gradient boosting decision tree procedure. It uses histogram-based concepts which convert continuous values into discrete groups (bins) [70].

Hyper Parameter Tuning

Hyperparameters correspond to any parameter of the ML algorithm set before model training starts [71]. For instance, in an ANN model, the batch size or the number of layers are hyperparameters because they are fixed before training begins; in contrast, the weights are not hyperparameters since their values are assigned during the training process [71], [72]. Because hyperparameters control the training process directly, they impact model performance significantly [72]. Simple ML algorithms do not have any hyperparameters; conversely, some others require plenty of hyperparameters to be set beforehand; in some cases, the hyperparameters are related to each other [73].

(27)

20

Hyperparameter tuning refers to the process of finding the combinations of hyperparameters that lead to the highest performance [72]. There are a wide variety of automatic tuning methods. Grid Search is a popular hyperparameter search technique that finds the best combination of given hyperparameters by checking the different combination of algorithm parameters from a predetermined parameters grid [72], [73]. Despite being simple, it is time-consuming when the dataset is large, and the parameter grid contained many alternatives [73]. Since the dataset used in this study is very short, we used this method for hyperparameter optimisation.

Model Validation

The model performance should be evaluated on unseen data, ideally data from other institutions [74], [75]. Due to patient privacy policies, gathering many medical images as samples is difficult, and medical datasets can suffer from small samples availability [17].

If independent data is not available, it is possible to split the data into train and validation groups. In this way, the algorithm can learn from the train set and predict the output based on the validation set, which is untouched during the learning process.

However, when the dataset contains few samples, the splitting approach does not work properly due to insufficient train and validation data. In this situation, cross- validation techniques are utilised for increasing the model’s generalizability [74], [75].

Among the various types of cross-validation methods, the nested cross-validation technique is useful when the model is prone to overfitting (such as small dataset issue) and whenever there is a need for hyperparameter tunning [55], [74].

Nested Cross-Validation

The cross-validation techniques assess the model's generalizability by dividing the data into training and validation sets [63]. In nested cross-validation, instead of a single layer, there are multiple layers, generally, two layers of cross-validation inner loop and outer loop [76].

Figure 12 shows a 5×3 nested cross-validation (five folds in the outer layer and three folds in the inner layer). A 5×3 nested cross-validation splits the data into five folds in the outer loop; four folds are the train set and one fold as a validation set playing the role of unseen data. The train set is again split into three folds in the inner loop, two folds for training and one fold as a test set. The model execution is repeated by changing the folds until all the folds are used as train and validation sets in the outer loop and train and test sets in the inner loop. The hyperparameter tuning is done in the inner loop. In the outer loop, the best hyperparameter set (obtained from the inner loop) is used for making the final prediction on the validation set [63], [75], [76].

(28)

21

The added outer loop removes the bias in the flat cross-validation method since the validation data has not been used to select the optimal model. This process gives us a more reliable model than the basic cross-validation form [77].

Figure 12. A 5×3 nested cross-validation (five folds in the outer loop and three folds in the inner loop).

Model Evaluation

There are a variety of metrics for evaluating the performance of classification models.

In medical studies, it is vital to differentiate between false positive (FP) and false negative (FN) misclassification [17], and the metrics used must take this into account.

In classification prediction, true positive (TP) and true negative (TN) refers to the situation that a sample is classified correctly. In contrast, false positive (FP) and false negative (FN) correspond to misclassification cases. Various metrics are calculated using FP and FN concepts; among them, the area under the receiver operating curve (AUC) is a common metric proper for a balanced dataset [74].

Area Under Curve

According to [63], for computing AUC, the first step is to plot the receiver operating curve (ROC) based on true positive rate (TPR) and false positive rate (FPR), then calculate the area under this curve. Equation 4 shows the TPR and FPR computation [63].

(29)

22 𝑇𝑃𝑅 = 𝑇𝑃

𝑇𝑃 + 𝐹𝑁 , 𝐹𝑃𝑅 = 𝐹𝑃

𝐹𝑃 + 𝑇𝑁 (4)

Figure 13 illustrates the receiver operating curve. It is clear from the figure that the AUC ranges from 0.0 (no correct classifications) to 1.0 (no incorrect classifications).

In this plot, the AUC of 0.5 shows the random classification rates.

Figure 13. An example of the Receiver Operating Curve with the Area Under Curve of 0.79.

The blue dashed line shows the random guess line [55].

(30)

23

3 Materials and Methods

3.1 Image Acquisition and Segmentation

3.1.1 The ePOD-MPH Study

In this study, images from the ePOD-MPH study [11] were considered. In 2011, a randomised double-blinded project named “the effects of Psychotropic drugs On Developing brain (ePOD)" was conducted. This research was designed to be placebo- controlled. The Clinical Research Unit of the Academic Medical Center at the University of Amsterdam in the Netherlands was the responsible authority for this project. The ePOD-MPH study was one of the three categories in the ePOD project.

In the ePOD-MPH study, the participants were randomised to receive methylphenidate or the placebo for 16 weeks. After that, a week wash-out period was conducted. The MRIs were taken before the beginning of treatment (baseline), during the treatment (after the eight weeks), and after the wash-out period (17-week). In our thesis, we referred to the baseline images as pre-treatment images and the 17-week MRIs as post-treatment images.

In the ePOD-MPH study, 100 male ADHD patients were involved in the experiment.

There was an equal number of children (10- to 12-year-old) and adults (23- to 40-year- old) among the participants. This master study is limited to examining the MRIs obtained from male adolescents. Four subjects were excluded from the current study regarding the exclusion rule: 1) The trials with no baseline or 17-week MRI led to image exclusion. 2) In addition, the images disrupted by head motion were removed from this study. Thus, 46 samples were included for analysis containing 24 placebo-treated patients and 22 MPH-treated trials, giving a balanced dataset.

(31)

24

Figure 14 shows the distribution of class labels. The trials were labelled in terms of the medication group. The cases in the MPH group were labelled as class 1 (samples’ ID from 0 to 21) versus subjects in the placebo group labelled as class 0 (samples’ ID from 22 to 45).

Figure 14. The distribution of class labels. Class 0 indicates the placebo group, while class 1 denotes the MPH treated group.

3.1.2 Image Segmentation

The images and masks used in this thesis were obtained from the study by Grünbeck [13]. In her study, raw T1-weighted MR images were used, and five subcortical brain structures named hippocampus, caudate, putamen, thalamus, and pallidum were selected for analysis. Grünbeck [13] created binary masks for the left and right side of the mentioned subcortical structures separately.

3.2 Feature Extraction

The feature extraction steps in our study contained two separate phases. The first one was related to the extracted shape and texture features done by Grünbeck [13], and the second one was related to the LBP features, which were extracted by a new programme developed in this thesis. We also modified the Biorad application [19] to add our new 3D LBP feature extraction module. This framework was developed by Langberg [18] and upgraded into a user-friendly tool for extracting radiomics features by Albuni [19].

In this section, we first explain generating shape and texture features as done by Grünbeck; then, we elaborated the LBP module and modifications of Biorad to adapt this new LBP module.

(32)

25

3.2.1 Shape and Texture Features Extraction

Grünbeck [13] used Biorad [19] for generating radiomics features. The Biorad framework [18], [19] uses pyradiomics [20], an open-source python package for generating radiomics features. This package aims to provide a reference for radiomics studies and introduce an easy tool for extracting reproducible radiomics features.

Grünbeck used the default parameters of pyradiomics for generating radiomics features. This means that for GLCM, GLDM and NGTDM, the distance between voxels was set to 1, and the threshold scalar of dependence in GLDM was set to zero.

Before extracting texture features, Grünbeck discretised the images’ intensity by using bin sizes of two and four to reduce the intensity level range of images from 256 to 128 and 64 intensity levels, respectively. This process generates two distinct feature sets named 128-bin and 64-bin sets. Furthermore, the features were extracted from the left and the right side of each subcortical brain structure. The number of radiomics features extracted by Grünbeck for one side of one of the subcortical brain structures is shown in Table 2.

Table 2. The number of radiomics features extracted from the images for one subcortical structure on one side of the brain.

Shape (3D)

Texture

GLCM GLDM GLRLM GLSZM NGTDM

128-bin 64-bin 128-bin 64-bin 128-bin 64-bin 128-bin 64-bin 128-bin 64-bin

14 24 24 14 14 16 16 16 16 5 5

3.2.2 3D LBP Features Extraction

As there is no tool in the python language (at the date of writing this thesis) for generating 3D LBP features, we developed a 3D LBP feature extraction module and integrated it into the Biorad framework [19]. Thus, now it is possible to extract LBP features in addition to pyradiomics features [21] by using the Biorad framework [19].

The code for extracting 3D LBP features is available in Appendix A. We used NiBabel, an open-source python package that supports standard neuroimaging file formats, for converting the images and binary masks into arrays [78].

According to chapter 2, only direct neighbours were considered (6 neighbours located on x, y, z axes).

The steps of extracting 3D LBP features are as follows:

1. Read the image and corresponding mask

(33)

26 2. Convert the image and mask into arrays

3. Calculate the LBP value (considering direct neighbours, which means P=6 and R=1) for the voxels in the binary mask area (as mentioned in chapter 2)

4. Map the LBP value to the corresponding rotation invariant pattern (based on the rotation_invariant_pattern table in Appendix B)

5. Calculate the frequency of patterns

6. Compute the following fraction: 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑜𝑛𝑒 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠

Ten LBP features were extracted for each side of the subcortical structure of the brain.

Modifications of Biorad Feature Extraction Module

After developing the feature extraction programme for extracting 3D LBP features from medical images, we upgraded the last version of the Biorad framework from the Albuni study [19] to make it compatible with generating 3D LBP features. The modified code of Biorad is available in Appendix C. The list of modifications of the Biorad feature extraction module is as follows:

1. Imported the 3D LBP module.

2. Added the “LBP” to its feature list.

3. Added our code to the functions named “extract_radiomics_features“ and

“get_selected_features”.

4. Added the LBP column to the template.csv file.

5. Modified the requirement.txt file of the Biorad framework to install our necessary python packages (NiBabel, Collections, Six, Pandas and Numpy). For the required packages of Biorad, see [19].

The input to the feature extraction module of Biorad is a CSV file (named template.csv) containing the location of the images and masks files and the output file location. Also, the user should choose the desired radiomics features to be extracted by inserting a 1 value in the related columns (Figure 15).

Figure 15. A sample data of template.csv file used by Biorad as an input setting for generating radiomics features. In this setting, the LBP features are selected.

The template.csv file contained the following data as an input configuration to the Biorad feature extraction module:

• image_dir: the image files’ directory.

(34)

27

• mask_dir: the mask files' directory. The name of the masks should match exactly the name of the corresponding image.

• output_file_name: the name of the output file or its directory. If the user did not insert the file path, the output file would be saved in the working folder.

• bin_width: the specific grey levels.

• The rest of the columns have the name of radiomics features: if the user typed a 1 in any of these columns, that feature would be extracted.

The output file is a CSV file containing the name of images and the specified radiomics features with their corresponding values. Figure 16 shows an example output file for LBP extracted features and shape features.

Figure 16. An example output file from the Biorad feature extraction module showing extracted shape and LBP features with the name of the corresponding image.

3.2.3 The Feature Matrices

Grünbeck [13] used two sets of MR images to analyse the changes in the brain structure of MPH-treated and placebo groups. The pre-treatment set or the baseline images was acquired before treatment started. The other set, the post-treatment, referred to the images captured at the 17-week of treatment. After the features were derived from the pre-treatment and post-treatment images separately, we obtained output files containing the extracted radiomics features for each side of the brain for each subcortical. The goal of this study was to assess potential changes in the brain structure due to MPH treatment. Therefore, for constructing the final datasets for each subcortical structure of the brain, pre-treatment (𝑃𝑟𝑒_𝑠𝑒𝑡) features were subtracted from the corresponding post-treatment (𝑃𝑜𝑠𝑡_𝑠𝑒𝑡) features by using equation 5 [13]:

𝐶_𝑚,𝑛 = 𝑃𝑜𝑠𝑡_𝑠𝑒𝑡_𝑚,𝑛 − 𝑃𝑟𝑒_𝑠𝑒𝑡_𝑚,𝑛 (5)

Here 𝐶_𝑚,𝑛 indicates the change of the feature value related to feature 𝑚, and sample 𝑛, in the feature set. Thus, the feature matrices contained the change of the corresponding radiomics feature. For each subcortical structure of the brain, we concatenated the feature matrices of the left and right part of the structure to construct the final datasets. The example in Table 3 illustrates the structure of the final dataset.

(35)

28

Table 3. An example of the structure of the final dataset constructed based on the change between post-treatment and pre-treatment features. Class 1 indicates the MPH-treated group, and class 0 denotes the placebo group [13].

Participant

ID Class Left Segment Feature 1

Left Segment Feature 2

Right Segment Feature 1

Right Segment Feature 2

0 1 𝐶𝑙1,0 𝐶𝑙2,0 𝐶𝑟1,0 𝐶𝑟2,0

1 1 𝐶_𝑙1,1 𝐶_𝑙2,1 𝐶_𝑟1,1 𝐶_𝑟2,1

…

22 0 𝐶𝑙1,22 𝐶𝑙2,22 𝐶𝑟1,22 𝐶𝑟2,22

23 0 𝐶_𝑙1,23 𝐶_𝑙2,23 𝐶_𝑟1,23 𝐶_𝑟2,23

…

3.2.4 Datasets

For each subcortical structure (hippocampus, caudate, putamen, thalamus, and pallidum), three datasets were used as an input for modelling and comparison, the

"initial dataset" and an "expanded dataset", and the "LBP dataset".

Initial Dataset

The “initial dataset” contained the shape features and texture features acquired in Grünbeck’s thesis [13]. In this way, we can compare our results with the results of her study. The “initial dataset” for each of the five subcortical brain structures has 46 rows corresponding to the 46 participants. The columns contained an ID column identifying patients, a Label column indicating class labels (0 or 1) and 328 radiomics features (28 shape feature and 300 texture features). Figure 17 illustrates the type and number of radiomics features included in the “initial dataset”.

(36)

29

Figure 17. The structure of the radiomics features (in total 328) in the “initial dataset” for every subcortical structure of the brain.

The distribution of features in the "initial dataset" is illustrated in Figure 18. The shape feature comprised 8% of the whole dataset compared to texture feature 128-bin (46%) and 64-bin (46%). There was an equal number of features from each side of the brain.

(37)

30

Figure 18. Pie chart shows the distribution of various radiomics features in the “initial dataset”.

128-bin and 64-bin refer to the texture features with, respectively, 128 and 64 grey level discretisation. Shape denotes the shape features.

LBP Dataset

The “LBP dataset” for each subcortical brain structure has 46 rows corresponding to the 46 participants. The columns are an ID column identifying patients, a Label column indicating class labels (0 or 1) and 20 radiomics features (referring to 10 LBP pattern for each side of the brain). Figure 19 illustrates the type and number of radiomics features included in the “LBP dataset”.

Figure 19. The structure of radiomics features (in total 20) in the “LBP dataset” for every subcortical structure of the brain.

Expanded Dataset

The “expanded dataset” contained the features in the “initial dataset” and the features in the “LBP dataset” (Figure 20). Thus, the “expanded dataset” for each subcortical brain structure, same as the mentioned datasets, has 46 rows corresponding to the

(38)

31

46 participants as well as an ID and a Label column. This dataset contained 348 radiomics features (28 shape feature, 300 texture features and 20 LBP features)

Figure 20. The structure of the “expanded dataset”. It contains the features from both the

“initial dataset” (shape and texture features) and the “LBP dataset” (LBP features).

Figure 21 shows the distribution of features that exists in the “expanded dataset”. The LBP features comprised 6% of the whole dataset in comparison to shape feature (8%) texture feature 128-bin (43%) and 64-bin (43%). The number of features from each side of the brain was equal.

Figure 21. The pie chart shows the distribution of various radiomics features in the “expanded dataset”. 128-bin and 64-bin refer to the texture features with, respectively, 128 and 64 grey level discretisation. Shape denotes the shape features, and LBP corresponds to LBP features.

(39)

32

3.3 Experiments

In this study, we performed four different experiments for each subcortical structure of the brain, separately, and we compared the results of these experiments in the result section. The overall workflow for performing each experiment is shown in Figure 22.

Figure 22. The workflow used for assessment for all experiments.

The only difference in the experiments is the different datasets used as the input dataset per experiment. All other steps are the same (Figure 22). For an overall overview of different experiments, see Table 4.

Table 4. An overview of various experiments. Note that the LBP dataset contained only 20 features; thus, experiment 4 did not have any feature selection step.

Input Dataset Feature selection method

Experiment 1 Initial dataset RENT

Experiment 2 Expanded dataset RENT

Experiment 3 Cleaned dataset by removing highly correlated features from "expanded dataset".

RENT

Experiment 4 LBP dataset Not Applicable

3.3.1 Correlation Analysis

The radiomics features are prone to be highly correlated. Therefore, in experiment 3, we tried to examine the correlation between features. We aimed to investigate how RENT selects correlated features and assess our model without the correlated features. The “expanded dataset” was used in this experiment to analyse the