• No results found

For the most part, the historic market data files seem formatted for human readability. As such, a number of pre-processing steps is necessary to extract the relevant data, then to transform and store it in data structures accepted by our machine learning libraries.32 The purpose of these steps is to get a complete andtidy time series from November 2, 2011 to December 31, 2017 with one row per consecutive hour of power delivery, one column per feature (i.e. input variable), and

31A detailed explanation of atmospheric air flow vectors is provided by George Mason University (n.d.).

32We used the open-source softwares R and Python for the pre-processing, and the open-source machine learning librariesscikit-learn andKeras for building the neural networks.

one column for the output value (i.e. the response variable). As this thesis focuses on buyers in the Nordic countries in Elbas, we filter to data related to Nordic bidding areas, where relevant.

For the Elbas ticker data, trades between a Nordic buyer and German seller are also included, as explained in Section 2.1.

For each dataset provided by Nord Pool in Table 1, the data from the raw files are read in for processing using the open-source language R. In order to combine the datasets into one consistent time series with one row per delivery hour, the datasets must have the samegrain, or level of detail. Some have higher grain and must therefore be aggregated — such as the Elbas ticker data at the level of individual trades. Others have a long format where a given delivery hour is repeated for different values of some variable, and must be transposed to a wide format with such combinations instead stored as separate features — such as the Elspot files, with spot prices for multiple areas expressed as separate rows for the same delivery hour, which must instead be transposed to one row of separate columns.

The rest of this section describes key pre-processing steps. After the Nord Pool datasets have been transformed and compiled into one cohesive time series, the resulting market dataset is exported toPython and pre-processed further for use in modelling. The resulting sets of in total 328 final input variables are summarised in Table 14 in Appendix F.

Limiting to the data available for market participants

To build models that can feasibly be used in practice, we limit their access to data based on what is realistically available to market participants when making a prediction six trading hours in advance of delivery. As such, the latest available Elbas price is that of the delivery hour that precedes it by six hours,33hence we add corresponding lagged variables of the price and volume.

Regulating prices and dominating directions are also published one to two hours after delivery (Scharff & Amelin, 2016), hence the related variables are lagged by eight hours.

Dealing with missing input values

Nearly 3% of the 264 input variables’ values are missing, but the vast majority belong to 6 features with nearly 100% sparsity34. Dropping these variables reduces it to 0.3%35, but our

33Market participants can see individual trades, but we aggregate the last six hours into one volume-weighted price. Hence, the latest available such price is that of six hours ago. Section 4.2.1 provides more details.

34Sparsedata have a lot of gaps, or missing values.

35Table 16 and Table 19 in Appendix F.2 give an overview of the remaining 46 variables with missing values, and how many that are missing for each.

models cannot handle missing values directly. We therefore replace them with the value zero.36 For any relevant feature, we also add a ”missing” indicator saying whether the associated feature was missing for the corresponding observation. This combination of zeroimputation and missing-indicators37was successful in other work, both related to deep learning (Lipton, Kale, & Wetzel, 2016) and linear modelling (Wooldridge, 2016).

Outliers and missing output values

Similarly to Lago, De Ridder, Vrancx, and De Schutter (2018), we do not omit outliers of the output variable, in order to train models that are better at predicting significant spikes in the price. However, the dependent variable is missing for 156 of 54,048 observations. When the preceding and succeeding values exist, the missing value is simply imputed linearly. This more than halves the number of missing values, and the remaining gaps are removed by dropping the corresponding dates of power delivery.38 The resulting dataset contains 53,400 observations.

One-hot-encoding categorical variables

Similarly to dummy variables in traditional methods, one-hot-encoding splits categorical vari-ables into indicator varivari-ables corresponding to each level of the category. This makes it easier for the model to capture their meaning, as it might otherwise treat them asordinal variables whose relative order of levels holds meaning — such as a quality rating. Thus, we one-hot-encode cat-egorical variables related to the hour, weekday, month, and each Nordic buyer area’s regulating price dominating direction. The number of features therefore increases from 264 to 328.

Standardising continuous numeric variables

Neural networks have been found to perform better when continuous numeric variables have similar scales (Hastie, Tibshirani, & Friedman, 2009). Common approaches are to either scale variables in the range [0,1] — referred to asnormalisation — or to centre each variable around zero mean and with unit variance — referred to asstandardisation.39 Normalising is simpler and makes no assumptions of the distribution of the variables, but is sensitive to outliers. Standardis-ing is less susceptible to outliers, but assumes the variables follow a Gaussian distribution. Most

36For some variables the value 0 already has meaning, in which case replacing missing values with zeroes indubitably adds noise. However, there are so few such cases that we asses the downside to be negligible, compared to using a more convoluted strategy of handling missing values.

37There are more sophisticated methods that potentially could have been used. However, due to time con-straints and the degree of missing values being limited in scale, we chose the imputation/indicator approach.

38Alternatively we could drop only the observations, but we would like to build sequential models based on whole days of data. Hence, we drop the 27 dates that contain the remaining 72 missing delivery hours of prices.

39Appendix C provides details.

of our numeric variables are relatively normally distributed40, hence we standardise them all41 based on their parameters in the training set42. Standardisation is done before missing values are replaced, so the non-missing values are unaffected by the imputation. The one-hot-encoded categorical variables and ”missing” indicators are excluded, as these are already binary.

4.2.1 Calculating Aggregated Elbas Buyer Prices

As there may be several Elbas trades for each hour of power delivery, we use avolume-weighted average price (VWP) of trades for a given hour of power delivery as the hourly intraday buyer price. This is the output variable our models will be trained to predict. A similar approach is taken by e.g. Knapik (2017) and Tanger˚as and Mauritzen (2014)43 in their analyses. For each hour of power delivery, h, the volume-weighted average price,V W Ph, is calculated as:

V W Ph = PN

n=1(Pn,h×Vn,h) PN

n=1Vn,h (1)

where Pn,h is the price (EUR/MWh) and Vn,h is the volume (MWh) in traden for delivery in hour h. N is the total number of transactions for hourh.

As we predict an aggregate VWP over the six hours remaining until delivery, it is likely some trades for the corresponding hour have already been settled. This information is available to traders, and we engineer a feature that captures the VWP over all hours of trading up to the point where a prediction is made. We do not divide it into shorter intervals, as this resulted in excessive data sparsity, which would necessitate extensive imputation and make it too noisy to be useful. Hence, we predict the near VWP, whereas this additional feature of information on preceding trades is the far VWP. An illustration of calculating these two prices is provided in Figure 4. The far price for delivery at hour 18 is calculated by aggregating trades settled between when the corresponding Elbas market opened at 14:00 CET the preceding day, and when we make a prediction at the beginning of hour 12. Thenear price is what our model aims to predict, and represents an aggregate of any trades settled from, and including, hour 12 and when trading closes at the end of hour 16.

40Appendix C.1 shows the distributions of some of the continuous numeric variables.

41A possible improvement could be to instead normalise those variables that are clearly non-normal, but we saw impact on performance from we tested doing so.

42We thus avoid passing information from the validation or test sets to the training set in allowing variables’

values and distributions in those sets to influence what the model sees during training. These sets should be out-of-sample, and are therefore scaled using parameters computed on the training set.

43Tanger˚as and Mauritzen (2014) compute the volume-weighted average on a daily frequency.

Figure 4: Calculation of a far and anear volume-weighted average intraday price (VWP), with price prediction for power delivered during hour 18 as an example.