• No results found

4 Models and evaluation

5.1 Poisson regression .1 Selection of coefficients

The estimated regression coefficients for Model 1, 2 and 3 are found in Ta-bles 5.15.1, 5.25.2and 5.35.3, respectively. The gray entries indicate coefficients that were not significant at a 0.05 level. For more details, consult Table A.1A.1, A.2A.2 and A.3A.3in Appendix AA, which display the coefficients and corresponding standard error.

In the basic model (Model 1) both the explanatory variables were significant for most of the stretches. The snow depth difference was only found to be nonsignificant for the single stretch A4, while the snow depth was non-significant for the three stretches A1, A2 and A10. Both weather covariates therefore seemed to be informative for the modelling of avalanches.

In Model 2, all the available explanatory variables where tested simultane-ously. That is, both the weather covariates as well as the four spatio-temporal explanatory variables. Note that no normalization of the variables was used.

From Table 5.25.2we see that the snow depth difference is the explanatory vari-able that is significant for the most stretches. Moreover, both the varivari-ables pjt5 andςijt5 , were found significant for seven of the stretches each. These variables were the number of avalanches the 5 previous days both for the area and single stretches. The variablesp1jtandς1ijt, one the other hand were significant for two and six stretches, respectively. These represented the number of avalanches on the previous day for the area and single stretches. It therefore seemed that the day-to-day dependence might be less informative

Poisson regression 42 than the smoother 5 day interval. It should also be noted that the inclusion of the additional explanatory variables caused the weather coefficients to be smaller and of less significance.

Based on the results for Model 2, we excluded the explanatory variables for the day-to-day dependence,p1jtandς1ijt, in the updated spatio-temporal Poisson regression model. For Model 3, all coefficients were significant for at least seven of the stretches except for the snow depth, which was only significant for three of the stretches. This is displayed in Table 5.35.3.

Table 5.1: Coefficients for Model 1, which acted as the basic model depending only on weather covariates, fitted to the full data set. The coefficients that were not significant at a 0.05 level are marked gray.

Stretch

A1 A2 A3 A4 A5 A6 A7 A8 A9 A10

1 -4.14 -5.44 -4.23 -5.39 -5.22 -4.69 -5.80 -3.42 -5.27 -5.00 si -0.25 0.36 1.97 1.23 2.40 0.60 0.77 0.50 0.48 0.10

∆si 0.08 0.15 0.07 0.03 0.13 0.09 0.05 0.11 0.12 0.10

Table 5.2: Coefficients for Model 2, where all the explanatory variables were included to investigate which were more significant, fitted to the full data set. The coefficients that were not significant at a 0.05 level are marked gray.

Stretch

5.1.2 Model fit and predictive performance

The AIC for the three models are listed in Table 5.45.4. Both Model 2 and 3 have a smaller value than Model 1, thus suggesting that the additional explana-tory variables improve the model fit. The AIC score for Model 2 and Model 3 was relatively similar, with Model 2 scoring slightly lower. However, since the extra explanatory variables in Model 2 were nonsignificant for many of the stretches, it is reasonable to assume that Model 3 fitted the data better.

Table 5.3: Coefficients for Model 3, which was the updated Poisson regression model, fitted to the full data set. The coefficients that were not significant at a 0.05 level are marked gray.

Stretch

Table 5.4: AIC for the three models.

Model AIC

1 6888.9

2 6264.3

3 6297.9

Table 5.5: Skill scores from cross validation for the three different GLM models.

Model RPS MSE HR POD UAA PSS Bias

1 0.02 4.15e-2 0.88 0.35 0.62 0.24 30.4 2 15.44 3.13e6 0.88 0.39 0.64 0.28 30.4

3 5.63 4.44e5 0.88 0.39 0.64 0.28 30.4

The results from the cross validation are displayed in Table 5.55.5. The best result is marked with bold for each skill score. As can be seen, Model 1 had the best score in terms of RPS and MSE. In fact, the resulting values for these two scores was extremely large and therefore weak for Model 2 and 3, in comparison. This indicated that it was a problem with the cross validation.

The cause of this will be explained shortly. Moreover, both Model 2 and 3 had slightly higher values for the POD, UAA and PSS score, which sug-gested that the models detected more avalanches in general. The large MSE, therefore, indicated that too many avalanches were being forecasted on some of the days. However, this was not an operational problem as the general interest was on determining whether avalanche activity would occur or not.

Poisson regression 44

Figure 5.1: Predictionsλijtbased on model 2 for the year 1994. Red vertical lines indicate avalanches.

After closer examination, it was found that the poor results for Model 2 and 3 was caused by the two years 1994 and 2011. Both years included several avalanche observations that were observed either on the same or on consecutive days. This was discovered when inspecting the RPS and MSE scores for the individual iterations of the cross validation algorithm, see Table B.1B.1and B.2B.2in Appendix BBfor more details. Therefore, when either of these years were used as the test set, and hence excluded from the training set, in the cross validation algorithm, it resulted in a large coefficient for the spatio-temporal explanatory variables. This caused the predictions to spike for the relevant consecutive days in the test set. The problem is illustrated in Figure 5.15.1, were the predictions based on Model 2, for 1994, are plotted.

For one of the days it was predicted close to 2000 avalanches, even though eight actually occurred. Still, eight avalanches on the same day was unusual and since the training data did not include enough similar incidents, the estimated model coefficients for the temporal explanatory variables were too high.

To reduce the impact of unusually many avalanches we tested the model using the square root of the spatio-temporal explanatory variables instead, i.e.q

p1jt,q p5jt,q

ς1ijtandq

ς5ijt. This lowered the maximal values of these

variables, so that that overall the range of possible values would be smaller.

Hence, decreasing the chance of the extreme predictions. The results from the cross validation with the modified explanatory variables are displayed in Table 5.65.6. The updated models are denoted 2.1 and 3.1 respectively. The modification improved the scores and all three models were now found to have identical RPS. The MSE was still lowest for Model 1 and the HR had decreased slightly for both the updated models. However, for the remaining scores for binary outcomes, the updated models performed clearly better than Model 1. Furthermore, Model 3.1 appeared to be slightly better than 2.1, as was expected. It should be noted that this was not a problem when evaluating the LGMs in Section 5.2.25.2.2, since the problematic years 1994 and 2011 was included in the training setDtrain.

Table 5.6: Skill scores from cross validation for the updated GLMs: 2.1 and 3.1.

Model RPS MSE HR POD UAA PSS Bias

1 0.02 0.04 0.88 0.35 0.62 0.24 30.4

2.1 0.02 0.09 0.87 0.45 0.66 0.32 30.4 3.1 0.02 0.05 0.86 0.47 0.67 0.34 30.4

To conclude, Model 3.1 was found to provide the most accurate predictions, in spite of the MSE and HR being better for Model 1. This suggested that the spatio-temporal explanatory variables could improve binary avalanche forecasts.

5.2 Latent Gaussian models