FEATURE EXTRACTION 31 - Materials and methods

Materials and methods

4.4. FEATURE EXTRACTION 31

Figure 4.3: Flowchart illustrating the procedure of authentication and identification of a subject when using one model for each enrolled subject.

Decomposition

Considering that the main brain rhythms lie in the range 0.5 - 30 Hz as described in section 2.1.4 (see table 2.1), the signal is decomposed to a level of 5. The resulting sub-bands of the P300 and spatial dataset and their associated frequency ranges are presented in table 4.2. As the two datasets are sampled at different sampling rates, the sub-bands cover different frequency ranges. The mother wavelet used is bior4.4. The decomposition level and mother wavelet are chosen based on experiments

conducted in the pilot study [4].

Frequency range [Hz]

Sub-band P300 Spatial

D1 50 - 100 64 - 128

D2 25 - 50 32 - 64

D3 12.5 - 25 16 - 32

D4 6 - 12.5 8 - 16

D5 3 - 6 4 - 8

A5 0 - 3 0 - 4

Table 4.2: Frequency ranges covered by each sub-band in DWT for the P300 and spatial dataset.

32 4. MATERIALS AND METHODS

Figure 4.4: Flowchart for DWT-based features extraction, using level 5 for decom-position. This process is repeated for every channel.

Feature extraction using principal component analysis

For this method, samples from all channels are gathered in a matrix representing an individual instance. From this matrix, PCA is applied and PCs are extracted to form the feature vector.

The number of PCs used in the feature vector is based on the cumulative variance, see plots in fig. 4.5. The plots show the fraction of variance explained by each of the PCs. A threshold of 95% is marked in the plots to show how many PCs that should be included to preserve 95% of the total variance in the data. As a PCA is done on each instance individually, how many components that are needed varies for each instance. All instances must have equal-sized feature vectors, meaning we must extract an equal amount of PCs. Considering efficiency and performance, the smallest number of PCs possible should be chosen. From the plots, one can see that by selecting two components in the P300 dataset, around 95% of the variance will be retained for most instances. For the spatial dataset, one PC is enough. However, two PCs have been used for both datasets in the experiments for simplicity. Using 56 channels and two principal components, the length of the feature vector can be

4.5. CLASSIFICATION 33 calculated as 56×2 = 112.

Figure 4.5: Cumulative explained variance for all instances in both datasets.

Feature extraction using Empirical Mode Decomposition

When using EMD for feature extraction, the signal from each channel is decomposed into IMFs using the EMD algorithm described in section 2.2.3. The number of IMFs that can be extracted may vary in different channels and instances. As all feature vectors must be of the same size, a finite set of IMFs is chosen. For this thesis, the first two are selected. From both IMFs, four energy and fractal features are calculated. This means that for every channel, a set of 2×4 = 8 features are extracted, resulting in a vector size of 448 for the P300 dataset and 512 for the spatial using all channels.

Table 4.3: Summary of the feature extraction methods. The vector sizes are for using all channels.

Method Features Vector size P300 Spatial

DWT IWE, TWE 672 768

PCA PC 112 128

EMD IWE, TWE, PFD, HFD 448 512

4.5 Classification

The authentication of a subject is solved as a one-class classification problem. Two different ML models are used for classification; the first method uses the features extracted in the previous section, while the other uses DL and learns features directly from the raw data.

34 4. MATERIALS AND METHODS

Classification using OC SVM

Once the feature vector for each instance is extracted, the process is similar for the three feature extraction methods. The feature vector is fed to a OC SVM, which trains on the unlabeled data. When using the common model layout the OC SVM trains on data from all enrolled subjects and when using the subject-specific model only data from one subject is used. The hyper-parametersnu andgammaare preset and selected from an optimization problem described in detail in section 4.6. The overview of the process is illustrated in fig. 4.6.

Figure 4.6: Authentication model using OC SVM and common model layout.

Classification using deep learning

The second approach for the authentication model uses a threshold value to determine if a subject is an user or not. When enrolling a subject an autoencoder is created. The autoencoder learns how to compress the data from the specific user and reconstruct it. A 50/50 split of the recorded EEG is used when enrolling a subject, in which 50% of the data is used to train the autoencoder. The remaining 50 % is passed through the autoencoder to find a suitable threshold for the reconstruction error. In the login phase, the error between the original instances and the reconstructed ones are compared against this threshold value. The autoencoder and the threshold for the reconstruction error constitute the authentication model, see fig. 4.7.

In fig. 4.8, the reconstruction error for a set of instances are plotted. Data from the user class is in green and the remaining in red. As the plot shows the reconstruction error for the user class is much smaller than for the rest of the subjects, which indicates that a good threshold value can be found. To choose a threshold for the error, Part Average Limit (PAL) [58] was used as a basis. To find the PAL, the reconstruction error for each of the instances used in the test set under login is

4.5. CLASSIFICATION 35

Figure 4.7: Authentication model using an autoencoder and threshold. The layout is for the subject-specific model.

calculated. The median and Interquartile range (IQR) of this distribution is used to find a value that allows for variation. The IQR is a measure of statistical distributions, and is equal to the difference between the 75th and 25th percentiles [59]. The PAL is calculated as in eq. (4.2).

P AL=median±C×IQR

1.35 (4.2)

The value of C is determined experimentally, the values can be seen in chapter 5. As it is only meaningful to talk about positive values for the error, only the upper PAL is used.

Figure 4.8: Reconstruction error for autoencoder. The model is trained on subject 2. User data plotted in green and intruder data in red.

36 4. MATERIALS AND METHODS

Architecture for the autoencoder

The architecture used in the encoders of this work is inspired by CNN used in similar work, presented in section 3.6. The decoder was built with the inverse operations to reconstruct the compressed data. Two different architectures for the autoencoder has been investigated. Both models are fitted using the Adam optimizer, minimizing the MSE of the difference between input and output data.

Autoencoder with CNN

The first neural network gathers data from all channels into an EEG data matrix of sizechannels×instance_length. The signal has one feature for each sampling; the EEG voltage. Hence, the size of the input vector is 56×400×1 for the P300 dataset and 64×512×1 for the spatial. The encoder performs two 2D convolutions on the matrix with filters of different sizes to extract spatial features. Each convolution layer is followed by an activation layer, LeakyReLu. The encoder is illustrated in fig. 4.9.

The decoder performs the same operations, only inverse. A detailed description of each layer in the autoencoder is summarized in table 4.4. The output of the autoencoder is a reconstructed matrix of the same shape as the input.

Figure 4.9: Layers in the encoder of the CNN autoencoder. The size of the input vector is when using P300 data.

Multi-channel autoencoder

The second autoencoder is built around the idea presented in [60], that the convolution of a single channel can produce information that is more valuable and free of noise

4.6. OPTIMIZATION PROBLEM FOR FINDING BEST HYPER-PARAMETERS AND

In document Biometric system using EEG signals from resting-state and one-class classifiers (sider 50-56)