Further improving the best two-class models

Before combinations of three-, four- and five classes are considered, further experiments will be devoted to see if the results from the models created in the previous section can be further increased.

Section 4.2.1 will specify the experiments conducted. The following section, section 4.2.2 will show the results of the experiments, before an analysis of the results is given in section 4.2.3.

4.2.1 Experiments

The following six experiments are largely based on the best performing models, shown in table 4.1. The experiments are described in detail in each of their paragraphs.

The aim of the two first experiments is to see if Relieff can outperform CFS as a feature selection method. Relieff does not output the optimal number of features. A scheme for doing so is described in section 2.4.4.

First experiment Use feature set four, but replace CFS with Relieff as the feature selection method.

Second experiment Use feature set five, but replace CFS with Relieff as the feature selection method.

Third experiment Feature set four, the concatenation of feature set one up to three, followed by feature selection gave the second best results. Adding extra features might therefore aid the model. The following features will be added to feature set four:

• Variance of circumferential acceleration in the midsystolic phase.

This feature will expectedly discriminate between the because of the different in heart rate and force of contraction.

• The correlation coefficients calculated between the longitudinal, circumferential and radial directions in the midsystolic phase. Preece et al. [50] used these features with success for human activity classification using accelerometer.

Note that the correlations coefficients are given as a 2x2 matrix. In order to be used as features, the matrix is reshaped as a 4x1 vector. The original coarse grid search will be performed, as seen in table 3.4.

4.2. FURTHER IMPROVING THE BEST TWO-CLASS MODELS

Fourth experiment Involving feature selection aids the model, hence adding all features before executing feature selection might further im-prove the results. More specifically, all features mentioned in chapter 3 will be appended into a set and tested twice. Once using Relieff and once using CFS. Note that a summary of all features is given after the results of these experiments is shown. Also note that if the previous experiment, which added even more features to feature set four, improves the results, this experiment will include these features.

It was mentioned in section 3.5 that a finer grid search would be done over the most promising region of hyperparameters on the most promis-ing models. This is performed in the two best models, and is specified in the two upcoming paragraphs.

Fifth experiment The learning rate hyperparameter, namedLearnRatein MATLB and used by AdaBoost, were not searched for in the grid search but instead fixed to 0.1. This hyperparameter will be searched for on model achieving the best result using AdaBoost. The full grid search is shown in table 4.2.

Name Values

NLearn 25, 50, 100, 200 LearnRate 0.1, 0.2, ...., 1

Table 4.2: Table showing the hyperparameters used on the most promising models which uses AdaBoost as its classifier.

Sixth experiment The initial grid search for hyperparameters for the decision tree classifier is seen in table 3.4. The search sees a relatively large gap between 8 and 64 in the MaxNumSplits. The best performing model using decision tree as its classifier will therefore search over this interval.

The complete description of the finer grid search is seen in table 4.3.

Name Values

SplitCriterion gini index, deviance, twoing MaxNumSplits 10, 15, 20, ..., 60

Table 4.3: Table showing the hyperparameters used in the finer grid search for decision tree.

4.2.2 Results

A summary of the results of the previously described six experiments are shown in table 4.4.

Exp.

1 4 Decision tree Relieff 0.76 0.78

2 5 Decision tree Relieff 0.75 0.74

3 Extended

set 4 Decision tree CFS 0.76 0.76

4 All features

AdaBoost CFS ∼ 0.81

Relieff ∼ 0.80

Decision tree CFS ∼ _0.78

Relieff ∼ 0.78

5 All

features AdaBoost CFS 0.81 0.84

6 All

features Decision tree Relieff 0.78 0.78

Table 4.4: Table showing the results of the experiments specified in section 4.2. The best result is shown in bold. Experiment number five and six are

based on the underlined results from experiment four.

4.2.3 Analysis

Experiment one and two from table 4.4 indicates that Relieff can outper-form CFS as a feature selection method. Both of them will therefore be tested when performing further experiments.

Experiment three, which added new features to feature set four, did not see an improvement. Regardless, the new features will be permanently added to feature set four. They might be deemed useful in further experiments.

The drawback of adding features is limited, as both feature selection meth-ods evidently filters away any redundant features.

Experiment four, which appended all features created, gave good results.

Amongst the combination of classifier and feature selection, AdaBoost and CFS feature selection gave the best result yet. This combination will be tested with a finer grid search in experiment five, hence marked with un-derline. Regarding decision tree, the two feature selection methods gave the same results.

The combination of decision tree and Relieff will be tested with a finer grid search, since CFS is already chosen in experiment five. This is seen in ex-periment six.

Experiment five and six did a finer grid search over the two most promising models in order to further increase the results. The finer grid search further increased the results of AdaBoost, and saw the best results achieved, hence marked with bold. The finer grid search did not aid the decision tree clas-sifier, hence the search will be discarded in further experiments.

As a summary, the best results are achieved with the following approach

4.2. FURTHER IMPROVING THE BEST TWO-CLASS MODELS

• Add all features created (summarized in the paragraph below).

• Invoke either CFS or Relieff feature selection to extract the best subset of features.

• As a classifier, use decision tree with the initial grid search seen in table 3.4, or AdaBoost with the grid search seen in table 4.2.

The number of neighbors used by Relieff were fixed to five, due to the computational complexity involved in finding the optimal number of fea-tures to include. Testing with several numbers of neighbours might have increased the performance of the model, but also increased the computa-tional complexity. However, Robniket al.[56] states that Relieff is robust in the number of nearest neighbors as long as it remains relatively small.

Additionally, the search for the optimal number of features for Relieff were done with respect to the average accuracy of all combinations of classes.

Ideally, the search would have been done for each combination of two classes. This is done however on the most relevant combination of dys-functions, and is elaborated in section 4.4.

Because adding a substantial number of features and invoking a feature se-lection method gave good results, additional experiments were conducted which added more features. They were:

• Adding the displacement/position motion to feature set four, which uses the acceleration and velocity (all in the circumferential direc-tion).

• Adding the raw displacement data to all features.

• Adding all features in all three directions.

The experiments were conducted using both CFS and Relieff as the feature selection method and decision tree as the classifier. None of the experiments saw an increase in the results, hence, the position data, and features from the longitudinal and radial directions are discarded.

Summary of all features Since many of the best performing models used all features created, a summary of all features is given.

1. Feature set one - Features from the midsystolic phase:

1.1. Peak acceleration within the first 150ms from R-peak 1.2. Peak velocity within the first 150 ms from R-peak 1.3. Mean acceleration of the first 150 ms from R-peak 1.4. Mean velocity of the first 150 ms from R-peak

1.5. Difference in displacement from R-peak to 150ms after R-peak 1.6. Minimum acceleration within the first 150ms from R-peak

2. Feature set two - Features from the IVR phase:

2.1. Peak acceleration in the IVR phase 2.2. Peak velocity between in the IVR phase 2.3. Mean of acceleration in the IVR phase 2.4. Mean of velocity in the IVR phase

2.5. Difference in displacement from R-peak to middle of IVR phase 2.6. Minimum acceleration in the IVR phase

3. Feature set three - Various features:

3.1. The three most prominent feature components of the accelera-tion in circumferential direcaccelera-tion

3.2. The number of samples used to represent a cycle.

3.3. The accumulated magnitude of the frequency components in the circumferential direction

4. Feature set four - The raw data, interpolated and/or decimated to 300 samples:

4.1. All acceleration samples from the raw data in the circumferential direction

4.2. All velocity samples from the raw data in the circumferential direction

5. Features added in further experiment:

5.1. The correlation coefficients between the longitudinal accelera-tion and circumferential acceleraaccelera-tion

5.2. The correlation coefficients between the longitudinal accelera-tion and radial acceleraaccelera-tion

5.3. The correlation coefficients between the circumferential acceler-ation and radial acceleracceler-ation

5.4. Variance of circumferential acceleration in the midsystolic phase

In document Using motion data to detect cardiac dysfunctions (sider 68-72)