Comparing the performance of machine learning-based methods to traditional temperature threshold methods for precipitation phase partitioning across Norway

(1)

learning-based methods to traditional temperature threshold methods for precipitation phase partitioning across

Norway

Mina Tangen

Thesis submitted for the degree of Master in Computational Science

(Geosciences) 60 credits

Department of Physics

Faculty of mathematics and natural sciences

(2)

Autumn 2020

(3)

(4)

(5)

learning-based methods to traditional temperature threshold methods for precipitation phase partitioning across

Norway

Mina Tangen

(6)

(7)

Comparing the performance of machine learning-based methods to traditional temperature threshold methods for precipitation phase partitioning across Norway

http://www.duo.uio.no/

(8)

Partitioning precipitation into rain and snow is of pivotal importance in hydrological models. Errors in correctly determining precipitation phase at the surface can propagate into other parts of the model, and there is a call for an improvement of the simple, temperature threshold based methods most commonly used in hydrological models today. This study has investigated the potential and limitations of implementing machine learning to precipitation classification. The performances of machine learning based models have been compared to that of traditional temperature threshold based methods. A regional temperature threshold was applied to all stations. In addition, optimized temperature thresholds were found at each station. Machine learning (ML) models, developed using sets of different meteorological variables, were trained at selected stations. The performance of the models was evaluated at the station scale. The models were then tested at all other stations without being recalibrated to evaluate the regional performance of the ML models. The methods used to find the optimal thresholds at the stations gave different threshold values and performance. This demonstrates the variability of the threshold methods. It was also shown that the threshold methods are sensitive to the choice of calibration data. The ML models gave a higher performance at some stations, however, these models were shown to not be spatially robust. Although there is potential in machine learning for precipitation classification, fundamental challenges limit the implementation of these methods.

(9)

(10)

I am very thankful for everyone who has helped me throughout this project. First, I would like to thank my supervisors, Norbert Pirk, Sven Decker and Terje Koren Berntsen for all valuable help, comments and support. I also want to thank Mareille Astrid Wolff and Knut Ove Nyg˚ard for allowing me to join them to Haukeliseter Testfelt to get a closer look at disdrometers and other sensors there. Thanks to Stine Mikalsen at Statens Veg- vesen for providing the data used in this project. Also thanks to Nina Elisabeth Larsg˚ard at the Norwegian Meteorological Institute for helping me out with additional questions regarding the data set.

I then want to thank expert Ellen Birgitte for helping me out with GIS and sorting my life together, and for all laughs, discussions and wasted time in the study hall and restaurant. Also a huge thank to her Chinese Beetle. I also want to thank Hanne for after midnight scooter rides home and supportive words throughout the year. As this thesis concludes my time here at Blindern, I would also like to thank NKVO for all fun times and good memories during the past 5 (!) years.

Lastly, I wish to thank my family and friends for always supporting me, cheering me on and convincing me to keep going at times I wanted to give up (and get an excavator instead of a degree). A special thanks to my roommates who have seen my full range of moods the last two years, especially during lockdown. Thanks for fueling me with laughs and food during those strange times. Also a big thanks to Skeie and her wonderful arsenal of supportive GIFs and stickers. Finally, thanks to Marte for proof-reading and listening to my many rants on the way home from the study hall.

(11)

(12)

Abstract i

Acknowledgements iii

List of Figures ix

List of Tables xiii

Abbreviations and Acronyms xv

1 Introduction 1

2 Theoretical framework 5

2.1 Precipitation formation . . . 5

2.2 Precipitation phase near the ground surface . . . 5

2.2.1 Geographical effects on precipitation type . . . 7

2.2.2 Humidity variables . . . 7

2.3 Phase partitioning methods . . . 8

2.3.1 Single temperature threshold (STT) . . . 8

2.3.2 Double temperature threshold (DTT) . . . 9

2.3.3 Humidity based PPMs . . . 10

2.3.4 Other approaches . . . 11

2.4 Measured precipitation phase . . . 12

2.5 Machine learning . . . 14

2.5.1 Defining the learning problem . . . 14

2.5.2 Evaluation of model performance . . . 15

2.5.3 Cross-Validation . . . 15

(13)

2.5.4 Evaluation metrics . . . 16

3 Data and methods 21 3.1 Study area and data . . . 21

3.1.1 Selection of relevant data range . . . 25

3.1.2 Data preparation . . . 27

3.2 Software . . . 29

3.3 Single temperature threshold (STT) . . . 29

3.3.1 Air temperature threshold . . . 29

3.4 Machine learning . . . 30

3.4.1 Decision Trees and Random Forests . . . 31

3.4.2 Input feature selection . . . 32

3.4.3 Implementation of cross-validation and model tuning . . . 34

3.4.4 Selection of training stations . . . 35

3.4.5 Regional implementation . . . 36

4 Results 39 4.1 Single Temperature Threshold . . . 39

4.1.1 Distribution of threshold values . . . 39

4.1.2 Comparison of scoring and half point method to obtainT_c . . . . 40

4.1.3 Performance of threshold values . . . 41

4.2 RandomForestClassifier . . . 47

4.2.1 Local analysis . . . 47

4.2.2 Regional analysis . . . 53

5 Discussion 59 5.1 Potential sources of error and limitations . . . 61

5.1.1 Data set limitations . . . 61

5.1.2 Spatial representation . . . 62

5.2 Evaluation of the potential and limitations of machine learning methods for precipitation classification . . . 62

5.2.1 Fundamental challenges of ML in hydrological applications . . . 62

(14)

6 Conclusion 67

6.1 Suggestions for further work . . . 68

6.1.1 Combining a GIS-based physiographic landscape classification scheme and machine learning for precipitation classification . . . 68

6.1.2 Inclusion of mixed precipitation . . . 69

Bibliography 71 A Data clean-up and anomalies 77 A.1 Description of filtering processes . . . 77

A.2 Precipitation type temperature distributions . . . 78

A.2.1 Rain/drizzle distribution . . . 78

A.2.2 Snow/hail distribution . . . 79

B Methods descriptions 89 B.1 Calculation of wet-bulb temperature . . . 89

B.2 Half point method for obtainingT_c . . . 90

B.3 Calculating distance between stations . . . 90

C Training stations description 93

D STT confusion matrices (training) 101

E RFC confusion matrices (training) 105

F ROC curves (training) 111

G Regional performance 117

H Selected maps to show regional performance of RFC models 125

I Complete Figure 4.12 (b) 129

(15)

(16)

2.1 Vertical temperature profiles. . . 6

2.2 Functional principle of the OTT Parsivel disdrometer . . . 13

3.1 SVV weather station . . . 22

3.2 SVV station network . . . 23

3.3 Sample size distribution for calibration (a) and validation (b) sets . . . . 25

3.4 Snow fraction distribution for all stations . . . 26

3.5 Distribution of station elevations . . . 27

3.6 A very simple Decision Tree. This Decision Tree corresponds to the STT method with T_c=1.1^◦C. . . 31

3.7 A Decision Tree generated bysklearn. . . 31

3.8 Snow probability as a function of relative humidity and air temperature. . 34

3.9 Snow probability as a function of wind speed and air temperature. . . 35

3.10 Snow probability as a function of precipitation intensity and air temperature. 36 3.11 Location of selected training stations in Western Norway . . . 37

3.12 Location of selected training stations in Northern Norway . . . 38

4.1 Scoring method T_athresholds . . . 40

4.2 Half point method T_athresholds . . . 41

4.3 (a): T_a threshold ranges. (b): Difference in T_c using the SC and 50/50 method for T_athresholds. . . 42

4.4 (a): T_wb threshold ranges. (b): Difference in T_c using the SC and 50/50 method for T_wbthresholds. . . 42

4.5 Variation of T_aand T_wbthresholds with elevation. . . 43 4.6 F1 score for the different temperature thresholds optimized at each station. 43

(17)

4.7 Percentage of misclassified precipitation for the different temperature thresh-

olds optimized at each station. . . 44

4.8 Distribution of F1 scores for optimized thresholds. . . 44

4.9 Distribution of misclassified precipitation for optimized thresholds. . . 45

4.10 Differences in F1 score obtained from using the country value and optimized T_athresholds using the SC and 50/50 method.. . . 45

4.11 Differences in F1 score obtained from using the country value and optimized T_wbthresholds using the SC and 50/50 method. . . 45

4.12 T_areduction in error . . . 46

4.13 T_wbreduction in error . . . 46

4.14 F1 score for all models at all training stations. The vertical dashed lines corresponds to the STT F1 scores. . . 53

4.15 Percentage of misclassified precipitation for all models at all training stations.. . . 54

4.16 Performance of models for both metrics. . . 56

4.17 Misclassified snow and rain. . . 57

A.1 Rain/drizzle distributions (calibration) (1/3) . . . 80

A.4 Location of stations flagged as problematic . . . 83

A.5 Rain/drizzle distributions (validation) (1/2) . . . 84

A.6 Rain/drizzle distributions (validation) (2/2) . . . 85

A.7 Hail/snow distributions (calibration) (1/3) . . . 86

B.1 Graphical demonstration of the half point method . . . 90

C.1 SVV127 calibration and validation snow/rain ratio . . . 94

C.2 SVV127 calibration and validation snow/rain curves. . . 94

(18)

D.1 SVV127 confusion matrices (STT) . . . 101

E.1 SVV127 confusion matrices (RFC) . . . 105

F.1 SVV127 ROC curves . . . 111

(19)

G.1 SVV127 regional performance . . . 117

G.2 SVV1564 regional performance. . . 118

H.1 Regional performance of RFC pc6 trained at station SVV123. No specific regional pattern is found. . . 126

I.1 Figure 4.12 (b) with all stations included. . . 129

(20)

2.1 DTT snow/rain fractions . . . 10

2.2 Confusion matrix . . . 17

2.3 Confusion matrix for precipitation classification . . . 17

2.4 Evaluation metrics . . . 19

3.1 Variables provided at SVV stations. . . 24

3.2 Sensor types installed at SVV stations to measure precipitation type . . . 24

3.3 Measured precipitation types and their binary representation. . . 27

3.4 STT thresholds tested for all stations . . . 29

3.5 Feature combinations . . . 33

3.6 Hyperparameter grid . . . 35

4.1 Relative feature importance (1/2). . . 48

4.2 Relative feature importance (2/2). . . 49

4.3 Training results . . . 52

4.4 Mean reduction in error (%) for all models. . . 55

C.1 Training stations details . . . 93

(21)

(22)

ACCAccuracy

AUCArea Under the Curve CVCross Validation

DLDeep Learning

DTTDouble Temperature Threshold FNFalse Negative

FPFalse Positive

FPRFalse Positive Rate KCVK-fold Cross Validation

METNorwegian Meteorological Institute (Meteorologisk Institutt) MLMachine Learning

PPMPhase Partitioning Method PWSPresent Weather Sensor RFCRandom Forest Classifier ROCReceiver Operating Curve SCScoring Method

STTSingle Temperature Threshold

SVVNorwegian Public Roads Administration (Statens Vegvesen) SWESnow Water Equivalent

TNTrue Negative TPTrue Positive TPRTrue Positive Rate 50/50Half-point Method

IPrecipitation intensity

(23)

P_rainRain probability P_snowSnow probability RHRelative humidity T_aAir temperature

Ta,maxMaximum air temperature

T_a,minMinimum air temperature

TcCritical temperature threshold

T_c,rainAll-rain critical temperature threshold

Tc,snowAll-snow critical temperature threshold

T_d Dew point temperature T_wbWet-bulb temperature uWind velocity

(24)

The seemingly easy question of whether precipitation falling from the sky takes the form of rain or snow is of crucial importance to hydrological modelling in cold environments.

Rain falling over a catchment will directly join its rivers and streams, infiltrate soils or flow over the surface as overland flow. Snow, in contrast, accumulates on the ground and is stored in snowpacks. This storage unit plays an important part in the hydrological cycle in cold environments. It is a critical source of water for billions of people worldwide, and controls power production and flood management (Mankin et al., 2015; Feiccabrino and Grigg, 2017). A catchment will in other words respond very differently to rain than to snow, and correctly prediction of precipitation phase is therefore critical in hydrological models that try to model these responses. Errors in phase partitioning will propagate into other parts of the model (Harpold et al., 2017), such as the modeled snowpack energy balance. Because rainfall adds heat to the snowpack (Dingman, 2015), an earlier warming and ripening phase can be expected if the amount of rain is overestimated (Harder and Pomeroy, 2014). An under- or overestimation of snow can affect the magnitude of spring peak snow water equivalent (SWE), timing of snowmelt and snow cover duration. Sur- face energy balance calculations can also be affected, because snow increases the surface albedo, which is a major component of the energy balance equation.

A phase partitioning method (PPM) describes the way a hydrological model determines precipitation phase. The main input to a hydrological model is precipitation data, which is normally just given as a quantity, without any information on whether the precipitation fell as rain or snow. It is therefore up to the model to infer the phase of the precipitation from other parameters, such as the near-surface air temperature, T_a. The most commonly used method in hydrological models for classifying phase is to create a single or double temperature threshold based onT_a (Harpold et al., 2017). Precipitation

(25)

falling at temperatures below this threshold is classified as snow and precipitation falling at temperatures above the threshold is classified as rain. The double temperature threshold has a region between the all-snow/all-rain thresholds where mixed precipitation occurs.

Many studies, e.g. Harpold et al. (2017), Feiccabrino et al. (2013), Ding et al. (2014), Harder and Pomeroy (2014) and Jennings et al. (2018), argue that PPMs used in hydrological models today are too simple to capture the full extent of the meteorological processes that govern precipitation type at the surface. Surface variables influence precipitation phase, however, precipitation phase near the surface is a result of latent heat exchanges as precipitation is falling, and is affected by atmospheric temperature and humidity profiles.

This is not captured in point-scale measurements of single surface variables. PPMs based on static temperature thresholds assume constant atmospheric conditions, and if the same temperature threshold is applied to a large region, local topographic and physiographic effects are not taken into account (Grigg et al., 2020). Though it is clear that surface variables are not sufficient to fully describe the physics involved in phase change processes in the atmosphere, these variables are used because they are widely available (Harder and Pomeroy, 2014). In addition, they do not require additional detailed atmospheric information beyond what is already included in the model (Feiccabrino et al., 2015). Atmospheric variables required in complex microphysical schemes add complexity and uncertainty to a simple hydrological model, which hinder implementation in hydrological applications (Wang et al., 2019).

The use of data-driven models and machine learning is increasing in virtually any sci- entific field. It opens the possibility of discovering unknown patterns and relationships in data that can give more accurate predictions. Machine learning has seen tremendous success in commercial domains and has started to become a natural tool in the geoscientific toolbox (Karpatne et al., 2018). However, fundamental properties of geoscientific datasets make machine learning models challenging to implement.

The main objective of this study is to explore the potential and limitation of using machine learning based methods for precipitation classification in Norway. The study seek to investigate the following:

(26)

• Do machine learning based models improve precipitation phase predictions at selected stations?

• How well do these machine learning based models trained at selected stations per- form at stations located in different parts of Norway?

One regional threshold, in addition to optimized thresholds at each station, will be applied to evaluate the performance of threshold based PPMs. The machine learning models will be developed using a set of combinations of meteorological variables. These will be developed at selected stations. The performance will be evaluated at the stations where they were trained. Then, to evaluate how well these models generalize to regions outside the calibration area, the models will be applied to all other stations without being recalibrated.

The thesis is structured as follows: Chapter 2 gives an introduction to concepts and relevant studies on precipitation classification, to provide a background for the rest of the thesis. That chapter is followed by a description of the data set and methodology used in the analysis. The results are presented in Chapter 4, and the implications of these results are discussed in Chapter 5. Chapter 6 summarizes and concludes the study, and gives suggestions for further work.

(27)

(28)

2.1 Precipitation formation

Precipitation describes water in any form that reach the surface from the atmosphere, and is formed in clouds (Dingman, 2015). Cold clouds denotes clouds in which ice particles are present (Lamb and Verlinde, 2011). In contrast, warm clouds are entirely made up of liquid particles (droplets). Except for over the tropics, ice crystal processes, i.e.

cold clouds, are necessary for precipitation formation, also during summer (Lamb and Verlinde, 2011; Aguado and Burt, 2015). All precipitation that reaches the ground at Norwegian latitudes have therefore originated as ice particles, and it is the atmospheric characteristics that determines whether it reaches the ground as liquid or solid (Harder and Pomeroy, 2013). The processes of precipitation formation in clouds are complex and will not be described in detail. Instead, a description of the phase change processes in the lower atmosphere that might affect precipitation phase at the surface will be given below.

2.2 Precipitation phase near the ground surface

The phase of precipitation at the ground surface is determined by the characteristics of the atmospheric layers through which the precipitation falls (Gray and Prowse, 1993). A precipitation particle falling from a cloud can melt on its way towards the surface, but can also refreeze partially or completely once melted. The phase of precipitation reaching the surface depends on the melting rate of solid precipitation, which is related to the energy available for melting (Froidurot et al., 2014). The melting process is mainly a function of the air temperature and humidity. Air temperature gradients and vapor pressure gradients drive sensible heat and latent heat exchanges respectively (Harpold et al., 2017). If the air is dry and the vapor pressure deficits are high, the latent heat losses will be high, which

(29)

Figure 2.1:Vertical temperature profiles. The green line show a situation where the precipitation phase at the ground surface is rain, the blue line corresponds to a situation where precipitation would fall as snow near the surface.

means that there is a higher probability for snow at higher temperatures (because there is less energy available for melting). If the latent heat losses are lower, combined with a general increase in temperature towards the surface (which provides an input of sensible heat) there is a higher probability for rain rather than snow.

The temperature of the lower atmosphere typically decreases with height, following the dry adiabatic lapse rate before saturation and the saturation adiabatic lapse rate once the air is saturated and condensation begins. The lapse rates vary with temperature, initial vapor pressure and elevation, and are thus spatially and temporally variable (Dingman, 2015). Figure 2.1 shows an example of a vertical temperature profile. The top layer has temperatures below the freezing point. The middle layer is referred to as themixed layer (Harpold et al., 2017) ormelting layer (Th´eriault and Stewart, 2010). The bottom of this layer is called the critical height. The surface layer below the critical height can have temperatures above or below 0 ^◦C. The depth of this layer, which is determined by the critical height, and its temperature and humidity profile, determines if melting, partial melting or no melting can occur (Harpold et al., 2017; Dingman, 2015). The situations in Figure 2.1 are examples of conditions where it is easy to determine precipitation type.

Situations where the surface temperature is closer to 0^◦C and humidity levels are neither high or low, are situations where it is more difficult to determine the precipitation phase

(30)

2.2.1 Geographical effects on precipitation type

Precipitation phase is affected by geographic barriers (Grigg et al., 2020). The depth of the surface layer required to melt a falling precipitation particle can change due to landscape effects (Feiccabrino and Grigg, 2017). Orographically enhanced precipitation, which is the result of forced mechanical lifting of air due to a geographic barrier (Ahrens, 1998), will for instance increase the required depth of melting, because more energy is needed to melt the extra precipitation. Atmospheric stability can also affect the phase of precipitation at the ground surface (Harpold et al., 2017), which influences environmental lapse rates (Feiccabrino and Grigg, 2017). Environmental lapse rates can increase due to warming from open-water bodies or decrease due to a cooling effect from snow-covered (frozen) ground (Feiccabrino, 2020).

2.2.2 Humidity variables

As humidity is the second-most important variable governing precipitation phase near the surface (Feiccabrino et al., 2015), a description of humidity variables is given below. Hu- midity reflects the amount of water vapor in the air and can be expressed through different quantities. Vapor pressure is the partial pressure exerted by vapor particles in a parcel of air. The saturation vapor pressure is the vapor pressure that is thermodynamically stable, i.e., if (under most natural conditions) the vapor pressure exceeds the saturation vapor pressure, condensation will occur. Saturation vapor pressure is a function of temperature (Dingman, 2015) and describes how much water vapor an air parcel can hold at a certain temperatureT. The ratio of actual vapor pressure and saturation vapor pressure is called relative humidity(RH) and is a way to describe the moisture content of an air parcel.

The dew point temperature, T_d, is the temperature to which a parcel of air must be cooled in order to reach saturation (Dingman, 2015). T_d can be calculated from vapor pressure. WhenT_d=T_a, the air is saturated (and RH = 100 %).

The process of evaporation absorbs latent heat. If a parcel of air is cooled to saturation by evaporation of water, and the latent heat required for the evaporation process to take place is provided by the parcel itself, the resulting temperature of that parcel is called

(31)

thewet-bulb temperature,T_wb(Dingman, 2015). T_wbis closer to the surface temperature of a falling precipitation particle than T_a (Wang et al., 2019). Furthermore, the range of temperatures at which both snow and rain occurs is smaller for T_wb than for T_a, which means that fewer samples will fall within the uncertain range where precipitation can be misclassified.

2.3 Phase partitioning methods

Precipitation measured at meteorological stations are typically reported as a quantity of water (Dingman, 2015), without any information on whether the precipitation fell as liquid or solid water. Solid precipitation is usually measured by its liquid content after being melted by some heating source at the station. The phase of the precipitation must therefore be infered from other parameters through a phase partitioning method (PPM) within the hydrological model. Below follows a decription of different PPMs used in hydrological models today and other approaches suggested in the literature.

2.3.1 Single temperature threshold (STT)

The simplest and most commonly used PPM is the single temperature threshold method (STT), showed in Equation 2.1, which simply regards all precipitation falling at temperatures below some critical temperature threshold,T_c, as snow, and all precipitation falling above this threshold as rain.

type=







snow, ifT_a≤T_c rain, ifT_a≥T_c

(2.1)

T_c is site specific (Harder and Pomeroy, 2014; Harpold et al., 2017) and must therefore be calibrated. T_c can be found in various ways. Generally, two methods are used in the literature to determine threshold values. The first method (from now on referred to as thescoringorSCmethod) is based on some scoring metric(s). The STT is applied using a set of T_c values around 0^◦C, and the threshold giving the best score is chosen as the optimal value for T_c. The other method (from now on referred to as the half point or

(32)

observed data. The snow probability (P_snow) is defined as the amount of snow instances to the total amount of precipitation instances for some given temperature bin. The bins typically have a range of 1^◦C or 0.5^◦C. Conversely, the rain probability (P_rain) is the amount of rain instances to the total amount of precipitation instances within the temperature bin, or 1–P_snow. When the rain and snow probabilities for each temperature bin is found, they can be plotted, as in Figure C.2. The temperature corresponding to the point where the P_snowline intersects theP_rainline is defined as the optimal temperature threshold, because this is whereP_rain=P_snow=50%.

Feiccabrino and Grigg (2017) and Feiccabrino (2020) used a scoring method to determine the critical temperature threshold, which was defined as theT_cthat produced the lowest error of misclassified precipitation. The error of misclassified precipitation was found by adding the number of snow (rain) instances above (below) each T_c, divided by the total amount of precipitation instances. Zhong et al. (2018) also used a scoring method, where the optimal temperature threshold was chosen as the temperature threshold that gave the lowest error of misclassified snowfall days to actual observed snowfall days, among threshold values that met two additional criteria. Behrangi et al. (2018), Olafsson and Haraldsd´ottir (2003) and Jennings et al. (2018) are among the studies that´ used the half point method to find the temperature threshold.

Using one static temperature threshold assumes constant atmospheric conditions (Fe- iccabrino and Grigg, 2017). In addition, when one temperature threshold is used over a large area (such as all of Norway), the effects of local physiographic effects, such as topography, is ignored (Grigg et al., 2020). As explained above, precipitation phase at the surface is the result of complex microphysical processes driven by sensible and latent heat gradients. This is not captured when one static temperature threshold is used to determine precipitation phase.

2.3.2 Double temperature threshold (DTT)

A double temperature threshold method (DTT) similarly defines two critical temperature thresholds,T_c,snowandT_c,rain, where all precipitation falling belowT_c,snowand aboveT_c,rain is classified as snow and rain respectively. Precipitation falling in the temperature range

(33)

Method 1 Method 2 Snow fraction T_c,rain−T_a

T_c,rain−T_c,snow 1−T_a,max−T_c,snow T_a,max−T_a,min Rain fraction 1− T_c,rain−T_a

T_c,rain−T_c,snow

T_a,max−T_c,snow T_a,max−T_a,min

Table 2.1:DTT snow/rain fractions

betweenT_c,snowandT_c,rainis classified using some transition function (see Equation 2.2).

The transition function calculates a snow/rain fraction, i.e. how much of the total amount of precipitation measured at that time is rain and snow.

type=











snow, ifT_a≤T_c

snow/rain fraction, ifT_c,snow≤T_a≤T_c,rain

rain, ifT_a≥T_c

(2.2)

Harpold et al. (2017) present two different DTTs, shown in Table 2.1, one using a linear transition function (Method 1), the other using a transition function based on daily minimum and maximum temperature (Method 2).

2.3.3 Humidity based PPMs

As noted above, humidity plays the second-most important role in determining precipitation phase, and inclusion of humidity variables (as those described in Section 2.2.2) is therefore believed to give a better phase prediction (Jennings and Molotch, 2019). As a result, different PPMs which incorporates surface humidity information can be found in the literature. One of these methods was developed by Ding et al. (2014), shown in Equation 2.3. They analyzed how precipitation type depends on meteorological and geographic parameters, and found that elevation and relative humidity have a large impact on precipitation type. The PPM developed in their study (Equation 2.3) is based on wet-bulb temperature, and thus on relative humidity and elevation in addition to air temperature.

Ding et al. (2014) suggest to use wet-bulb temperature over air temperature to determine precipitation type, because falling precipitation droplets will have a temperature closer to

(34)

type=











snow, ifT_wb≤T_min

sleet/mixed precipitation, ifT_min<T_wb<T_max

rain, ifT_wb≥T_max

(2.3)

The first step is to calculate the wet-bulb temperature. Then a parameter,T₀, is calculated. T₀is empirically related to the relative humidity and elevation. From this parameter the temperature thresholds in Equation 2.3,T_minandT_max can be calculated.

Other studies, such as Froidurot et al. (2014), have used humidity variables in combination with air temperature and other parameters to determine precipitation phase. Chen et al. (2014) and other studies have comparedT_wbthresholds toT_a thresholds, with generally satisfying results (Harpold et al., 2017). Surface humidity is however not fully representative of atmospheric humidity (Feiccabrino et al., 2015).

2.3.4 Other approaches

The PPMs described above are based on surface variables only. Some studies have used upper-air atmospheric variables to estimate precipitation phase. Meteorological models use more complex microphysical schemes based on upper-air atmospheric variables to determine phase. However, these models are typically run at spatial scales too large to capture local effects in complex landscapes (Harpold et al., 2017). Froidurot et al. (2014) studied the effect of both surface and upper-air atmospheric variables, but found that the atmospheric based models did not provide any significant improvement to the phase predictions. Harder and Pomeroy (2013) used a psychrometric energy balance method to predict precipitation phase, which require data on falling hydrometeors. Moon and Kim (2020) tested combinations of 93 weather variables, e.g., upper-atmospheric wind measurements and pressure levels at different heights. Although studies implementing upper-air atmospheric variables have shown success in Harder and Pomeroy (2013) and Moon and Kim (2020), these methods are limited in a hydrological modelling context, as they require variables that are not widely available at ground-based stations (Harder and Pomeroy, 2013).

(35)

Some studies have investigated the effect of topography and other geographical site characteristics on the local and regional scale. Feiccabrino (2015) found that using dif- ferentT_athresholds for different geographic landscapes improved classification accuracy.

The geographic landscapes were classified based on proximity to water, elevation ranges and prevailing wind direction, to account for warming from warm open-water bodies (an effect shown to influence threshold values in Dai (2008)) and landscape-driven effects.

In Feiccabrino and Grigg (2017), the process of classifying stations as geographic landscapes, which was done manually in Feiccabrino (2015), was automated, and high variability in threshold values was found across landscapes. Their method has been further automated and expanded in Grigg et al. (2020).

2.4 Measured precipitation phase

Data sets of observed precipitation type are necessary to validate PPMs. Precipitation type has historically been reported manually. This method is highly subjective and might be inconsistent, e.g., Ding et al. (2014) used manually recorded precipitation type data and found year-long records where all precipitation were just recorded as rain.

Precipitation type can also be inferred from particle size and fall-speed measured by disdrometers. There are different types of disdrometer technologies. Here, the optical disdrometer Parsivel (ParticleSizeVelocity) is described. The functional principle of the Parsivel disdrometer is depicted in Figure 2.2. The Parisvel consists of a laser sensor and a receiver. The laser sensor produces a horizontal strip of light, which is blocked when- ever a hydrometeor passes through it (OTT, 2010). The reduction of the output voltage at the receiver can therefore be used to determine particle size, because the portion of the light that is blocked by the precipitation particle corresponds to the particle diameter.

The amount of time the light beam is partially blocked is measured to determine particle fall speed. From particle size and fall speed (through empirical relations from different studies (L¨offler-Mang and Joss, 2000)) precipitation type can be determined, in addition to precipitation intensity and some other parameters. Other optical disdrometer are based on the same functional principles. In addition, some disdrometers use cameras to observe

(36)

Figure 2.2:Functional principle of the OTT Parisvel disdrometer. Figure taken from OTT (2010)

At Haukeliseter testfelt, a test site set up by the Norwegian Meteorological Institute (MET) to study errors in snow measurements due to wind, several disdrometers are installed (see Wolff et al. (2015)). Studies on the variability of these (e.g., Haugen (2015) and Aulie (2016)) indicate that there are significant errors associated with disdrometer measurements. Haugen (2015) found that all disdrometers at Haukeliseter testfelt showed improved scores for precipitation events at above 5^◦C. In addition, it was found that some instruments recorded instances ofno precipitationorunknown precipitation, when other sensors (as well as images and manual observations used as additional reference data) recorded precipitation. This mainly occurred in snowy conditions.

The results from Haukeliseter testfelt suggests that disdrometers underperform in snow events compared to rain events. In fact, optical disdrometers are very often de- signed for rain observations. Thus, particle sizes and velocities inferred from reducing the electric signal from the light beam are based on assumptions that hold for rain particles only. For instance, as described by Battaglia et al. (2010) in an assessment of the Parsivel disdrometer for snow measurements, particles are assumed to be spheroidal and falling horizontally aligned. In addition, only vertical movement through the light beam influences the determined fall speed. Similar challenges were discussed in Losnedal (2019) (which also used data from Haukeliseter). Despite challenges in measuring snow events, however, as pointed out by Harpold et al. (2017), disdrometers offer observations at a higher temporal resolution and can be used in low-light conditions, where manual observations would not be possible.

In addition to manual observations and automated measurements of precipitation type

(37)

by disdrometers, some studies, such as Rajagopal and Harpold (2016) and Harder and Pomeroy (2013), have used a combination of variables and measurements of precipitation mass to develop evaluation data sets. For instance, an expert decision-making system that classified precipitation based on observations of snow depth and snow water equivalent (SWE) was developed by Rajagopal and Harpold (2016) to evaluate the performance of three temperature-threshold PPMs.

2.5 Machine learning

Machine learning describes the field of science that involves programming computers to make them learn from data (G´eron, 2019). The field of machine learning is highly related to statistics and builds on statistical learning theory. Machine learning and statistics can be distinguished in that the aim of a machine learning problem is prediction, in contrast to fitting, which is the goal in a statistical context (Mehta et al., 2019). Another way to distinguish machine learning and statistics is that the aim of machine learning is to imitate a system’s output, rather than to identify that system (Cherkassky et al., 2006).

2.5.1 Defining the learning problem

The aim of a ML problem is to find some reproducible structure in the data and use this to make predictions on unseen data (Kuhn and Johnson, 2013). Based on whether the output to be predicted is numerical or categorical, the prediction problem can be termed regression or classification respectively. To predict whether precipitation falls as rain or snow is thus aclassificationproblem.

Although there are many forms of precipitation, classification of precipitation type is in this study simplified to abinary classificationproblem. A binary classification problem is one where an instance can be classified as one of two, non-overlapping classes (Sokolova and Lapalme, 2009). That is, precipitation is classified as either snow or rain (see Section 3.1.2 for a note on mixed precipitation).

(38)

(input variables), X = (X₁, ...,X_p) and a target variableY, where the aim is to find, or ratherlearn, some statistical relationship or pattern from the data.

This relationship can be expressed as a function, f, which then can be applied on unseen dataX⁰to make predictions ˜Y:

Y˜ = f(X⁰) (2.4)

In the case of precipitation type classification, the different meteorological variables make up the p features and precipitation type is the target variable, Y. The learning process is assisted by a set of “true” outcome values (i.e., actual measurements of precipitation type). This type of learning problem is referred to as asupervisedlearning problem (Hastie et al., 2009).

2.5.2 Evaluation of model performance

The model is evaluated by comparing predicted outcomes to actual outcomes, using some performance metric. See Section 2.5.4 for a description of such metrics. A good model will generalize well to unseen data. To validate how well the model performs when applied to unseen data, a part of the data is held out and not used when training the model.

The held out subsample of the data is referred to as thevalidation set ¹and is not looked at before the final model is developed and will be validated. The rest of the data is referred to as thecalibration set².This part of the data is used to train and tune the model.

2.5.3 Cross-Validation

The calibration set is also typically split in two parts; a training set and atest set. The process of splitting the data into a training set and a test set is called cross-validation (CV). There are different ways to divide the calibration set in training and test sets. It is clear that the performance of the model applied to the test set will vary to some de-

1This is often referred to as the test set. However, to not get confused when the concept of cross- validation is introduced and to adopt the terminology from hydrological modelling, the termvalidation set is used to refer to the part of the data removed from the very beginning and only used to validate the final model.

2This is often referred to as thetraining set, but to avoid confusion, as explained in Footnote 1, the term calibration setis preferred.

(39)

gree depending on how that test set was selected, both in terms of size of the subsample and the actual values that constitutes that subsample. The training set will therefore not be able to describe the uncertainty and variability the estimate might contain (Kuhn and Johnson, 2013). To study this variability, the process of CV can be done multiple times.

In addition, resampling the data through an iterative CV technique, allows the use of all data for training so that none is “wasted” as test data (G´eron, 2019). After a number of CV iterations have been conducted, the scores from each iteration are summarized. The mean and standard deviation of the set of scores can be calculated. The mean will give a general performance score for the model. The standard deviation will tell how much the scores are varying with the different iterations. A robust model will have a low standard deviation, because this indicates that the model is performing well regardless of which set it was trained on.

An iterative way of implementing the CV technique is the K-fold Cross-Validation (KCV) technique. In this method, the data is divided into K different folds (subsets).

Then, the model is trainedKtimes, each time holding one fold out as test set while using the rest of the data as training set, yielding a set ofKevaluation scores.

2.5.4 Evaluation metrics

As noted above, model performance is evaluated by comparing predicted outcomes to measured outcomes. For binary classification problems, there are four possible outcomes of a prediction, summarized in the confusion matrix shown in Table 2.2. A prediction is referred to as a true positive (TP) if it was correctly classified as the positive class.

Conversely, a prediction is referred to as a true negative (TN) if it was correctly classified as the negative class. Thus, true positive and true negative predictions together make up the correct predictions. The other two possible outcomes are those where instances are not classified correctly. An instance that actually belonged to the negative class, but was classified as positive, is referred to as a false positive (FP). Conversely, an instance that actually belonged to the positive class, but was classified as negative, is referred to as a false negative (FN).

(40)

PREDICTED

TRUE

Negative Positive

Negative True negative (TN)

False positive (FP)

Positive False negative (FN)

True positive (TP)

Table 2.2:Confusion matrix

PREDICTED

TRUE

Snow Rain

Snow Snow (TN) Snow predicted

as rain (FP) Rain Rain predicted

as snow (FN) Rain (TP)

Table 2.3: Confusion matrix of precipitation classification problem. TP and TN represents instances that were correctly estimated as rain and snow respectively. FN represents rain that was estimated as snow, i.e. anoverestimationof snow. FP represents snow that was estimated as rain, i.e. anunderestimationof snow.

For the case of precipitation phase classification in this thesis, snow is denoted negative and given the value 0. Rain is denoted positive and given the value 1. FP predictions in this case therefore refer to those instances where precipitation really fell as rain, but were classified as snow. FN predictions refer to those instances where precipitation really fell as snow, but were classified as rain. This is summarized in Table 2.3. The first case will therefore overestimate the amount of snow in the catchment, whereas the second case will underestimate the snow amount.

From the confusion matrix in Table 2.2, different evaluation metrics for model performance can be defined. Table 2.4 shows some of these.

Evaluation metrics for imbalanced data sets

An imbalanced data set is one where one (or more) of the classes is rare, i.e., instances are not distributed evenly across the classes. It is difficult to define exactly how small the proportion of total instances falling within a class must be for that class to be defined as rare, but as indicated in Weiss (2013), problems related to class imbalance might im-

(41)

pact performance for data sets with only moderate imbalance (imbalance ratio of e.g. 2:1).

Some evaluation metrics do not deal well with imbalanced data sets. Accuracy (ACC), which is the fraction of correctly predicted instances to all predictions (Table 2.4), is the most commonly used evaluation metric for classification (Gu et al., 2009). ACC is a multi- class metric, as it does not reflect the relative importance of the classes in its computations (Japkowicz, 2013). This makes ACC less suitable for imbalanced data sets compared to metrics with a single-class focus.

Recall is the fraction of correctly classified instances in one class to the total of predicted instances in that same class (see again Table 2.4). Precision is the fraction of correctly classified instances in one class to the total of actual instances in that same class.

From recall and precision, the F1 score can be defined, as the harmonic mean of precision and recall. That is, the F1 score balances precision and recall equally. The F1 score ranges from 0 to 1, where 1 indicates a perfect model.

Recall is sometimes referred to as thetrue positive rate(TPR) (G´eron, 2019). When plotted against thefalse positive rate(FPR, the ratio of false positives), thereceiver operating characteristics(ROC) curve is plotted. The area under the ROC curve (AUC) can be used to compare the performance of different models. Similar to the F1 score, the AUC takes on values from 0 to 1, where 1 corresponds to a perfect model.

From the confusion matrix shown in Table 2.3, the percentage of misclassified precipitation can be defined as the number of snow and rain instances classified as rain and snow respectively (FP and FN) divided by the total number of precipitation instances (FP, FN, TP and TN) (see again 2.4). This score is used in Feiccabrino and Grigg (2017). The reduction in error in Feiccabrino and Grigg (2017) is computed as shown in Equation 2.5.

error reduction=1−Tc,optimized misclassified precipitation (%)

T_c,countrymisclassified precipitation (%) (2.5)

(42)

Metric Formula Recall / TPR _{T P+FN}^{T P}

Precision _{T P+FP}^{T P}

FPR _{FP+T N}^FP

Accuracy T P+FP+T N+FN^{T P+T N}

F1 score ^{T P}

T P+^FN+FP₂

% Misclassified precipitation^a T P+FP+T N+FN^FN+FP Table 2.4:Evaluation metrics

aUsed in Feiccabrino and Grigg (2017)

(43)

(44)

3.1 Study area and data

10-minute data from 2014 to 2019 was provided from the Norwegian Public Roads Ad- ministration (Statens Vegvesen; SVV). SVV has weather stations located all over Norway, generally placed on main roads and thoroughfares. Figure 3.1 shows an example of an SVV weather station. The SVV station network is shown in the map in Figure 3.2. As seen, the SVV station network is large, but many stations are excluded from the analysis mainly because they lack measurements of one or more parameters used in this study. The parameters taken from the raw data files provided by SVV is shown in Table 3.1. Only a selection of these were used in the study, see Section 3.4.2.

Only instances of recorded precipitation within a certain temperature range were chosen for the analysis. The process of selecting the relevant study period and temperature range is described below in Section 3.1.1. This process also included removal of erroneous wind and relative humidity values. After the data from the desired temperature range was selected, two filtering processes were conducted to select the stations used in the analysis. These processes are described in more depth in Appendix A. The first process mainly involved filtering out stations with too few samples and stations which lacked one or more of the variables used in the analysis (see Table 3.5 and Section 3.4.2).

In addition, some stations were removed because they only had measurements of one precipitation type. The second filtering process was based on characteristics of the precipitation type data of the stations passing the first filtering process. Precipitation type is a categorical variable, which makes it difficult to evaluate the quality of the measured instances. This can only be done by looking at precipitation type in light of other variables.

The temperature distribution for all precipitation types (rain, snow, hail and drizzle), i.e.

(45)

Figure 3.1: Example of weather station from the Norwegian Public Roads Administration (SVV277; not used in the analysis because of lack of wind measurements). The top sensor is the OTT Parsivel²disrometer.

the frequency at which each precipitation type occurred at different temperatures, was therefore analyzed to remove stations where measurements of precipitation type seemed erroneous. This process revealed that a lot of stations had suspicious records of precipitation type (see Appendix A for a more detailed description).

Because the full data set was obtained very late in the process, and much time was spent on breaking down large files to get usable data sets for each station, the main approach to suspicious and erroneous data, both at the single instance and station scale, was to simply remove either the suspicious records or the full station set all together from the analysis. In total, after both filtering processes,73 stations were included in the analysis.

The Present Weather Sensor (PWS) OTT Parsivel and Parsivel²laser-optical disdrometer are used at most SVV stations to measure precipitation type (see Table 3.2). The

(46)

Figure 3.2:SVV station network. The blue dots are stations used in the analysis.

(47)

Variable Unit Air temperature (2 m above the ground),T_a ^◦C

Relative humidity, RH %

Dew point temperature (calculated),T_d ^◦C Road surface temperature ^◦C Precipitation intensity,I mm/t

Precipitation type -

Visibility m

10 minutes accumulated precipitation mm

Wind velocity,u m/s

Wind direction 0-360^◦

Table 3.1:Variables provided at SVV stations. Only selected variables were used as features in the analysis.

Sensor type Number of stations Present Weather Sensor OTT Parsivel² 29

Present Weather Sensor OTT Parsivel 19

Present Weather PWD31 15

Present Weather and Visibility PWD12 8

Unknown / other 2

Table 3.2:Sensor types installed at SVV stations to measure precipitation type

(48)

(a) (b)

Figure 3.3:Sample size distribution for calibration (a) and validation (b) sets

3.1.1 Selection of relevant data range

As described before, T_c varies across space and time (Harder and Pomeroy, 2014; Har- pold et al., 2017), and it is clear that both snow and rain can occur at temperatures both above and below 0 ^◦C. However, as temperatures decrease, the probability that precipitation will fall as rain also decreases (Stewart et al., 2015). Similarly, as temperatures become sufficiently high, the probability of precipitation falling as snow will be low. To study the temperature range within which the phase transition occurs, only instances with measured precipitation at temperatures between−5^◦C and 5^◦C were used in the analysis.

Dai (2008) found that the phase transition from rain to snow over land occurs at temperatures ranging from−2^◦Cto+4^◦C. Ye et al. (2013) used 34 years of hourly data from 547 stations in Northern Eurasia, and found that critical temperatures of the half point (i.e., for 50% rain/snow) for most stations ranging from 0.5^◦C to 1.0^◦C. However,T_c was found to be lower at some stations, the lowest critical temperature being−1.0^◦Cat one station.

The selected temperature range of−5^◦C to 5^◦C will therefore capture the transition phase of precipitation. Other studies of precipitation type have used data in similar, but varying, temperature ranges, e.g. −3^◦C to+5^◦C(Froidurot et al., 2014; Feiccabrino, 2015; Feic- cabrino and Grigg, 2017) and −10^◦C to 10^◦C (Ding et al., 2014). Only instances from November - April were used, as this it the period where most snow fall. The analysis is restricted to instances where precipitation is measured.

Data from 2014-2018 was used for calibration for the STT method. This calibration data set was also used to train and test the RFC models. Data from 2019 was held out and used to evaluate the performance of the STT method and the RFC models. The ratio of the calibration sample size to validation sample size was similar at all stations,

(49)

Figure 3.4: Distribution of snow fraction across stations (after binary conversion). The snow fraction is defined as the ratio of snow instances to total precipitation instances (snow ratio=_snow+rain^snow ). The dashed line represents a snow ratio of 50 %.

although not exactly the same. Data sets in ML applications are often split into calibration and validation sets based on a fixed ratio (e.g., 80/20). Selecting data from the same time period was found more expediently in a hydrological setting, however, because of temporal and spatial dependencies in the data across stations. The distribution of sample sizes for the stations included in the analysis is depicted for calibration (a) and validation (b) data sets in Figure 3.3. Most stations have sample sizes between 5000 and 15000 for calibration, and between 1000 and 4000 instances for validation.

The snow ratio, i.e. the amount of snow instances to the total amount of precipitation instances within the chosen temperature range across all stations is shown in Figure 3.4.

As seen, most stations have a slight skew in the snow/rain distribution from the selected period. The dashed line in Figure 3.4 represents a snow ratio of 50 %. Most stations have snow ratios higher than this, meaning that snow samples are over-represented in the data set. The sets used for validation generally have a higher over-representation of snow samples compared to that of the calibration sets.

As a result of the filtering processes mentioned above, the elevation range of the stations included in the study is very narrow. The distribution of station elevations can be seen in Figure 3.5. 55% of the stations are located at elevations below 100 m.a.s.l.

(50)

Figure 3.5:Distribution of station elevations

Measured type Translated type Code representation Binary code representation

Sno Snow 2048 0

Regn Rain 512 1

Yr Drizzle 256 1

Hagl Hail 1024 0

Opphold No precipitation 0.0 -999

Uspesifisert Unspecified 4096 -999

Table 3.3:Measured precipitation types and their binary representation.

3.1.2 Data preparation

Binary conversion

The precipitation types available are snow, rain, drizzle¹and hail. The types are given as a set of numerical codes and must therefore be converted to binary codes before the algorithms can be implemented. The precipitation types and their codes given in the data are shown in Table 3.3. Solid precipitation is given the code 0 and liquid precipitation is given the code 1. Drizzle is considered liquid, and thus given code 1. Hail is considered solid, and therefore given code 0. The codes in the data do not specify if the liquid precipitation is freezing. It is assumed that precipitation measured as rain or drizzle is liquid.

1Drizzle are droplets with diameter smaller than 0.5 mm (Sumner, 1988).

(51)

Precipitation is here considered to be binary, i.e., it takes the form of either solid (snow) or liquid (rain). It is, however, possible that rain and snow can co-exist when phase change takes place (i.e., at temperatures around 0^◦C), because of uneven particle sizes (Thériault and Stewart, 2010). The proportion of snow and rain in a mixed precipitation event is generally not recorded, which makes it difficult to include in the studies (Froidurot et al., 2014; Jennings et al., 2018). The proportion of rain and snow may vary (Froidurot et al., 2014), and considering mixed precipitation to always be half snow and half rain is too simple (Feiccabrino et al., 2013). However, the temperature range over which the mixed precipitation change from containing mainly snow to mainly rain is relatively short (Feiccabrino et al., 2012). Froidurot et al. (2014) also showed that the distribution of mixed precipitation is bell-shaped with maximum at theP_rain/P_snow intersection point (i.e., where the probability of rain and snow is 50%). A similar bell-shaped curve for mixed precipitation was also shown in Ólafsson and Haraldsdóttir (2003), and found in Feiccabrino et al. (2013) when data from all 19 stations used were aggregated.

Nevertheless, the temperature range over which there is a probability of both rain and snow occurring is quite large, especially in cases with low relative humidity (Jennings et al., 2018). Many studies that use data sets with mixed precipitation observations con- sider these, but choose to exclude mixed precipitation instances after analyzing the effect of including them, which is typically found to be small (e.g. Feiccabrino et al. (2012)).

The data set used in this project does not contain mixed precipitation events, and even in the temperature ranges where the probabilities of rain and snow are similar, precipitation recorded is considered all snow or all rain. It is however important to keep this in mind when reviewing the results presented below.

Wet-bulb temperature

The wet-bulb temperature was calculated using the formula given in Ding et al. (2014), simplified by Dingman (2015). The formula for calculating the wet-bulb temperature is

(52)

3.2 Software

The high-level programming language Python was used to implement all methods. The Python model Scikit-learnwas used to implement the machine learning algorithms.

Scikit-learn(referred to assklearnfrom now) is a Python module building onnumpy andscipy modules, which provides several machine learning algorithms both for unsu- pervised and supervised learning (Pedregosa et al., 2011). It provides own submodules for preprocessing, cross-validation and evaluation metrics. sklearnversion0.19.1was used².

3.3 Single temperature threshold (STT)

The STT method was implemented for all stations, using different temperature thresholds.

Since the data set did not include mixed precipitation, the DTT method and the humidity method presented by Ding et al. (2014) were not implemented.

3.3.1 Air temperature threshold

Both the scoring method and half point method were tested on all stations to find the optimal threshold value for all stations. Details of the procedure used to find theP_rain/P_snow intersection point from snow/rain probability curves is described in Appendix B.2. The

2The most important difference to note if another version is used is the default values for the hyperpa- rameters of the ML algorithms.

Temperature threshold values T_c=1.1^◦C Country value T_c=T_a(SC)^◦C Calibrated at each station T_c=T_a(50/50)^◦C Calibrated at each station T_c=T_wb(SC)^◦C Calibrated at each station T_c=T_wb(50/50)^◦C Calibrated at each station

Table 3.4:STT thresholds tested for all stations

(53)

metric used in the scoring method was the F1 score.

STT was tested on all stations for the temperature thresholds shown in Table 3.4. T_a (SC) is the optimal temperature threshold found at each station through calibration using the scoring method. A range ofT_c values from−3^◦C to+3^◦Cwith a step of 0.1^◦C was applied to the 2014 - 2018 calibration data set for each station. The F1 score for eachT_c was recorded. The value ofT_c producing the highest F1 score was chosen as the optimal threshold for that station. T_a(50/50) is the optimal temperature threshold for each station found using the half point method. T_c =1.1^◦C is the “country value” for Norway used in Feiccabrino and Grigg (2017). The temperature thresholds in Table 3.4 were tested on all stations individually. These results, particularly the country value, were used as benchmark values for the rest of the analysis.

In addition to using the near surface air temperatureT_a, the STT method was applied using the wet-bulb temperature,T_wb as threshold variable. The only difference in apply- ing T_wb was to use a slightly different range of temperatures for the calibration period.

As the wet-bulb temperature by definition always is lower than the air temperature, the calibration range was set to−6.0^◦C to+3.0^◦C. The range was set quite large, because it is uncertain how much the wet-bulb temperature will deviate from the air temperature.

3.4 Machine learning

The algorithm chosen for the ML implementation is the Random Forest algorithm, described below. A common approach in ML applications is to test various algorithms, and choose the predictor that gave the most satisfactory results (G´eron, 2019). However, due to time constraints and simplicity, only one algorithm was tested. It is difficult to say in advance which algorithm is the one best suited. Random Forests was chosen in this project because it does not make any assumptions of the structure of the data, and thus requires no scaling of the input features. In addition, it builds on an intuitive concept of Decision Trees, which translates well to the STT method. In fact, the STT method is an

(54)

3.4.1 Decision Trees and Random Forests

Figure 3.6:A very simple Decision Tree. This Decision Tree corresponds to the STT method with Tc=1.1^◦C.

Figure 3.7:A Decision Tree generated bysklearn.

Decision Trees are based on a relatively intuitive concept. Figure 3.6 shows an example of a very simple Decision Tree (which corresponds to the STT method). A more complex Decision Tree generated bysklearncan be seen in Figure 3.7. Each square is called a node, and the top node is the root node. Between each node a yes/no question is asked, i.e., the data instance is evaluated based on some definite criteria. Based on the answer to the question, the instance moves down to another node. Depending on the problem, this node can again have two daughter nodes. Nodes that do not have any children, are termed leaf nodes. Once an instance has reached a leaf node, it is classified. Each x_i in Figure 3.7 corresponds to a feature variable.

A Random Forest is an ensemble of Decision Trees. Ensemble methods combine