** State of the Art**

**2.5 Background of Forecasting**

The different energy vectors and prices are highly variable over time. There-fore, having reliable forecasts is critical for producers, consumers, and retail-ers. To optimally self-schedule production units, the operator needs accu-rate forecasts of prices before bidding time [76]. To optimally opeaccu-rate the

power plant, the manager needs accurate forecasts of energy demands before scheduling the production. Moreover, if the generation depends on renewable energies which depend on solar irradiation or wind, forecasts are necessary for scheduling production on the energy market.

Time series analysis is one way to approach such problems. This method-ology of forecasting focuses on the historical behavior of a dependent vari-able. Therefore, time series models can be used to analyse and predict future movements based on the past behavior of the dependent variable. Several forecasting methods have been used in the energy field, and literature sur-veys are presented in [77, 78]. Time series-forecasting single models can be summarised as follows [79]:

Stochastic models

Regression models

Artificial intelligence based models

Stochastic models are inspired by financial literature and are widely ap-plied to forecast of energy-related indices and variables. There are several stochastic models which have been employed for modelling and forecasting:

RandomWalk [80], Mean Reverting Processes [81], Brownian Motion Pro-cesses [82], Ornstein–Uhlenbeck ProPro-cesses [82], Inverse Gaussian Process [83], and Jump Diffusion Processes [84].

Regression-type models are based on the relationship between the de-pendent variable and the number of exogenous variables that are known or can be estimated. The most common approaches that employ regres-sion models are auto regressive integrated moving average (ARIMA) mod-els [85, 86, 87] and generalized autoregressive conditional heteroskedasticity (GARCH)-family models [80, 88].

For more than half a century, ARIMA models have dominated many areas of time series forecasting. This regression model is fitted to time series data with forecasting purposes. It is composed by an autoregressive model (p), moving average model (q), and differencing degree (D). Mathematically, it can be expressed as ARIMA (p, D, q). In an ARIMA model, the future value of a variable is assumed to be a linear function of several past observations and random errors. Nevertheless, stationarity is a necessary condition for building an ARIMA model used for forecasting. A stationary time series is

characterised by statistical characteristics such as a mean and autocorrelation structure that is constant over time [89].

In recent times, artificially intelligent models have been extensively used to capture unknown or excessively complex structures in the time series.

These models are growing in popularity. Examples of such models include ANNs, support vector regression, wavelets, and genetically evolved models.

In the past, ANNs were used extensively for oil price forecasting. Currently, ANNs are used across the whole energy field. Since the late 1990s, ANNs have been used in the field of energy forecasting: different variables have been used as inputs to an ANN to predict generation values [90]. However, the design of ANN models that use specific sets of design constraints relies on expertise with similar applications and is subject to trial and error processes [91].

Artificial neural networks can model richer dynamics and approximate any continuous function of inputs. Evidence of the efficiency of these models can be found in [92, 93, 94]. Artificial neural networks have been found to outperform the regression models in terms of high resolutions [95, 96].

Furthermore, there is a substantial difficulty in training networks which may require a large amount of iterations before the network could converge [96].

nonlinear autoRegressive models with exogenous neural network (NARX) build recurrent neural networks by adding an autoregressive model with ex-ogenous variables (ARX). The NARX model relates the current value of a time series to the current and past values of the exogenous series that influ-ences the series of interest. Another common ANN approach which enables the discovery of the relationship between the inputs and the output data is the multi-layer perceptron network [95].

In a multi-layer perceptron, neurons are grouped in layers. Only for-ward connections exist, which creates a structure that can learn and model a phenomenon. To produce a forecast, a fixed number of past values are set as inputs. The output is the forecast of the future value in the time series [97]. In this thesis, the ANN model used is the NARX configuration, due its proven outperformance of the multi-layer perceptron [98].

In this thesis, two different kinds of models are selected due to their high performance in forecasting time series. Furthermore, due the proved im-provement of the accuracy of the energy forecast, both models are supported with an explanatory variable [99]. The specific models used and improved by the explanatory variable are as follows:

Based on regression models: ARIMAX [100, 96, 101]

Based in artificial intelligence models: ANN [102, 103]

### 2.5.1 Explanatory Variable

Both forecasting models accept and are improved using a time series that
is related to the one under forecast. This time series are referred to as
explanatory or exogenous variables. The correlation between these two time
series is studied by means of a Pearson or aR^{2} correlation study. The results
of either study ranges from [1, -1], where 1 is a direct and perfect correlation
and -1 is a perfect and inverse correlation. Figure 2.8 depicts some examples.

Figure 2.8: Examples of the correlation of two variables

The time series used as explanatory variables should have the following features:

Have a direct or inverse relation with the main time series

Same data granularity as the main time series

Have as many future values as desired forecasted values