• No results found

Features’ correlation to the target value and their variance are indicators of how useful they may be to make classifications. In addition, it is likely that some features are re-dundant if the features are strongly correlated with each other. These are some of the things one looks for in anExploratory Data Analysis(EDA) as performed in this section.

Before the EDA, the output from the feature extraction in Section 3.3 was formatted using the implementation shown in Appendix B.4. The EDA itself was performed using the implementation shown in AppendixB.5. Any invariant features were removed before the EDA, reducing the number of features from 475 to 417.

A qualitative inspection of the features show that most features have a mean close to zero, with some features’ mean far in excess of this. The same is true for the standard deviation, some features have much greater variability than the norm. Plots of the fea-tures’ means and standard deviations can be seen in Figures 3.5 and 3.6. This indicates that the features need to be standardised to work with some methods. Standardisation is a requirement for many techniques and learners, among them PCA, KNN, and SVM with radial bias [17]. Exactly which specific features’ mean and standard deviation de-viate from the rest is not of interest in this regard, since the existence of any in the set necessitates standardisation.

Figure 3.5: Calculated mean values across all samples for each feature. A few features have far larger means than the others. The plot is divided by red lines into three portions. The first portion from the left is the FFT-derived features, the second from the left is the DWT energy feature portion and the last is the collection of TSFRESH generated features.

The Pearson correlation of each feature to target values, that is how correlated the fea-ture is to the number ofITSCs applied to the poles, showed largely uncorrelated features with a few exceptions. DWT energy features had many correlated features, and some TSFRESH generated features were strongly correlated. TheFFT-generated features were largely uncorrelated. An overview of features’ correlation to number of ITSCs is shown in Figure 3.7. Both negative and positive correlations are useful for classification, but correlation only shows linear relationships. There may be nonlinear relationships that

0 100 200 300 400 Features

0 5 10 15 20

Standard deviation

Figure 3.6: Standard deviation across all samples for each feature. The plot is divided by red lines into three portions. The first portion from the left is the FFT-derived features, the second from the left is the DWT energy feature portion and the last is the collection of TSFRESH generated features.

are not revealed by this test. The features with an absolute Pearson’s correlation above 0.3 are shown in Table 3.5. This result is in agreement with the conclusions reached in [5], where an increase inDWT energy was a strong indication of ITSCs.

0 100 200 300 400

Features

−0.2 0.0 0.2 0.4 0.6 0.8

Correlation

Figure 3.7: An overview of feature correlations. The plot is divided by red lines into three portions. The first portion from the left is the FFT-derived features, the second from the left is the DWT energy feature portion and the last is the collection of TSFRESH generated features.

The correlations above indicates that there are several relevant features. However, many of these may beredundant if these relevant features are strongly correlated amongst themselves. An effective visual method to investigate this is to construct a correlation matrix, in which each feature’s correlation to every other is shown. Figure3.8 shows the correlation matrix of the features. The centre diagonal line is the features’ correlation to themselves, which is necessarily 1. On either side of the diagonal are mirror images of the

Table 3.5: The features most correlated to number of ITSCs.

Feature Correlation

TWE, decomposition level 9 0.890734

TWE, decomposition level 8 0.886363

IWE, decomposition level 10 0.861306

RWE, decomposition level 10 0.856935

RWE, decomposition level 11 0.826443

HWE, decomposition level 10 0.811841

TWE, decomposition level 10 0.810356

IWE, decomposition level 11 0.786659

TWE, decomposition level 11 0.556887

TSFRESH, longest strike above mean 0.549258 TSFRESH, Approximate entropy, (m=2, r=0.7) 0.412482

TWE, decomposition level 12 0.407188

TSFRESH, Longest strike below mean 0.395404

HWE, decomposition level 11 0.359168

TSFRESH, Approximate entropy, (m=2, r=0.1) 0.300201

inter-feature correlations. Here we see that both FFT and DWT features are strongly correlated amongst themselves, while TSFRESH exhibits this to a lesser degree. With this large a correlation between the features, investigating feature selection and reduction methods is merited.

Since there is a high degree of correlation between the features, a Principal Compo-nent Analysis (PCA) would give an indication of how variable the samples are. Fewer principal components necessary to capture a given percentage of original variance in-dicates that the data set contains many features of low variance or high inter-feature correlation. APCA of the data set was made to to span 95 % of the variance within the data set, resulting in 31 principal components . Of the 31 principal components, 85 % of the variance was contained within the 10 first components. This indicates that many of the features are uninformative or strongly correlated to each other, coinciding with the results from the correlation analysis.

High-dimensional data sets are unsuited to plot directly. To visualise the data set and gain some intuition about its distribution, the high-variance principal components of the PCA can be plotted. A plot of the data set along the two first principal components is shown in Figure3.9. The plot shows 16 distinct clusters, 24 if healthy and faulty clusters are counted separately, where faulty and healthy sample distributions overlap in most.

There does not appear to exist any clear decision boundary along which faulty can be discriminated from healthy. A consequence of this may be thatSVMandKNNclassifiers perform poorly on the data set.

0 50 100 150 200 250 300 350 400 0

50

100

150

200

250

300

350

400

Features

Features

0.0 0.2 0.4 0.6 0.8 1.0

Figure 3.8: The feature correlation matrix. Darker colour indicates a higher correlation between the features. The red lines separate FFT features (left-/top), wavelet energy features (middle), and TSFRESH features (right/bot-tom).

−15 −10 −5 0 5 10 15 20 First principal component

−10

−5 0 5 10 15 20

Se co nd pr inc ipa l c om po ne nt

Figure 3.9: Samples plotted along the first and second principal component.

Each point represents one sample, with red samples representing faulty ma-chine condition samples and blue samples representing healthy mama-chine con-dition samples.