Experiment 1 - Robustness and Stability of Long Short-Term Memory Recurrent Neural Networks

4.4 Experiments

5.1.1 Experiment 1

The mean squaretesterror over the whole test set of the predicted liquid level of tank 1 (h₁) is summarised in Table 5.1. The two test cases considered are when the test input is not perturbed in any way and when the test input is perturbed by adversarial examples. We see that the two models LSTM`2and PE LSTM Opt-1 perform quite similar in all test cases for the hyperparameters given in TableB.1in AppendixB. The LSTM`2is 8.40% worse than the PE LSTM Opt-1 model in the nominal test case of no perturbation. It is only 5.46% worse when the input is perturbed by the FGSM-method (See section2.3.2) with a perturbation magnitude of²=0.01. On the contrary, the LSTM`2model performs 1.15% better than the counterpart, PE LSTM Opt-1, in presence of adversarial noise with a perturbation magni-tude of²=0.1. Both the models produce parameters that are relatively stable from training iteration to training iteration, as indicated by the standard deviations in parenthesis. The PE LSTM Opt-2 model performs particularly bad compared to the two other models with large spread in mean square test errors for the 10 training sessions. The results are analogous when considering the second method for perturbing the input, namely the PGD (described in Section2.3.2).

The mean squaretest error over the whole test set of the predicted liquid level of tank 2 (h₂) is summarised in Table5.2. The two test cases considered are as described for the pre-diction problem related to tank 1. As we see, the differences between the models are more significant for this estimation problem. The best performing models is the PE LSTM Opt-1,

i.e. the neural networks trained with the training procedure described in Section3.1.1. The PE LSTM Opt-1 has 40% lower mean square test error compared to the second-best model, PE LSTM Opt-2, in the no perturbation case, and 39% and 41% lower mean square test er-ror than PE LSTM Opt-2 when the input is perturbed with FGSM with the two perturbation magnitudes²=0.01 and²=0.1, respectively. The results are analogous when considering input perturbed by the PGD method.

Table 5.1: (Experiment 1) Test error (in 1×10⁻⁶) of the prediction of liquid level of tank 1 (h₁) of the cascaded tank system with min-max scaling in the range [0, 1]. The evaluation metric used is the mean square error (MSE). The average test MSE (outside parenthesis) and the standard deviation (inside parenthesis) stem from 10 training sessions producing in total 10 neural networks for each model type. The models are tested in two scenarios. The first scenario is when the test data is not perturbed in any way. The second scenario is when the test data is perturbed. Two methods are used to perturb the test data: FGSM and PGD, with two different perturbation strengths (²). The best results for the different situations are highlighted in bold.

Model No perturbation FGSM PGD

²=0.01 ²=0.1 ²=0.01 ²=0.1 LSTM`2 11.9 (0.853) 18.3 (1.03) 173(2.30) 18.3 (1.03) 166(2.01) PE LSTM Opt-1 10.9(0.732) 17.3(0.934) 175(2.11) 17.2(0.935) 167 (2.56) PE LSTM Opt-2 18.3 (4.16) 26.5 (5.04) 201 (14.7) 26.5 (5.04) 195 (15.5)

Table 5.2: (Experiment 1) Test error (in 1×10⁻⁶) of the prediction of liquid level of tank 2 (h₂) of the cascaded tank system with min-max scaling in the range [−1, 1]. The evaluation metric used is the mean square error (MSE). The average test MSE (outside parenthesis) and the standard deviation (inside parenthesis) stem from 10 training sessions producing in total 10 neural networks for each model type. The models are tested in two scenarios. The first scenario is when the test data is not perturbed in any way. The second scenario is when the test data is perturbed. Two methods are used to perturb the test data: FGSM and PGD, with two different perturbation strengths (²). The best results for the different situations are highlighted in bold.

Model No perturbation FGSM PGD

²=0.01 ²=0.1 ²=0.01 ²=0.1 LSTM`2 10.5 (2.70) 14.7 (3.72) 92.7 (21.6) 14.7 (3.72) 92.0 (21.4) PE LSTM Opt-1 7.43(1.79) 10.1(2.21) 57.1(7.64) 10.1(2.21) 57.0(7.68) PE LSTM Opt-2 8.90 (3.69) 12.4 (4.61) 81.5 (19.2) 12.4 (4.61) 80.5 (19.1)

Figure5.1and Figure5.2show the prediction of the liquid level of tank 1 and tank 2 ver-sus the target (i.e. true) values and the corresponding training and validation losses,

respec-5.1. MAIN RESULTS 67 tively. The model-type used to produce the predictions is the best-performing model from Table5.1, in which the sample of the model closest to the mean is chosen from one of the 10 training sessions. The two models LSTM`2 and PE LSTM Opt-1 have quite similar perfor-mances, and therefore we include two figures for the estimation ofh₁. The predicted value is estimated by using one feature in the input.

Figure 5.1: (Upper) Experiment 1 prediction capacity of the PE LSTM Opt-1 model given in Table 4.3. Target value is the liquid level of tank 1 (h₁). Orange dashed lines indicate the predicted values. Blue solid line indicate target values. (Lower) Corresponding train loss (blue line) and validation loss (orange line) for each epoch.

Figure5.3 show the prediction of the liquid level of tank 2 versus the target (i.e. true) value and the corresponding training and validation losses. The model-type used to produce the predictions is the best-performing model from Table5.2(PE LSTM Opt-1), in which the iteration of the model closest to the mean is chosen from one of the 10 training sessions. The predicted value is estimated by using two features in the input.

As mentioned in Section4.3, the worse results corresponding to the alternative scaling ranges are included in AppendixCin TableC.1and TableC.2, corresponding to the predic-tion ofh₁andh₂, respectively.

Figure 5.2: (Upper) Experiment 1 prediction capacity of the LSTM`2model given in Table 4.3. Target value is the liquid level of tank 1 (h₁). Orange dashed lines indicate the predicted values. Blue solid line indicate target values. (Lower) Corresponding train loss (blue line) and validation loss (orange line) for each epoch.

5.1. MAIN RESULTS 69

Figure 5.3: (Upper) Experiment 1 prediction capacity of the PE LSTM Opt-1 model in Table 4.3. Target value is the liquid level of tank 2 (h₂). Orange dashed lines indicate the predicted values. Blue solid line indicate target values. (Lower) Corresponding train loss (blue line) and validation loss (orange line) for each epoch.

In document Robustness and Stability of Long Short-Term Memory Recurrent Neural Networks (sider 83-88)