• No results found

Part III / Proposed Method 42

10.1 Theoretical Analysis

This subsection briefly discusses the assumptions and validity of the CQR method. For a more thorough and in-depth discussion, the reader is referred to the paper by (Romano, Patterson, & Cand`es, 2019).

The CQR algorithm assumes the sample pairs (Xi, Yi), i= 1, . . . , n+1 to be exchangeable, just as conformal predictors, and constructs prediction intervals with the same marginal coverage guarantee, regardless of the distribution of the data. Additionally, the authors report that if the conformity scores calculated using Eq. (27) are almost surely distinct10, the resulting intervals are nearly perfectly calibrated, meaning that the actual coverage of the prediction interval is almost identical to its designed coverage level. Having a coverage close to the designed coverage level assures that valid coverage is obtained, and can avoid overly wide prediction intervals by reducing the occurrence ofovercoverage. For

10Distinct conformity scores are scores whose values only occur once for each i∈ I2.

overly wide prediction intervals, the actual coverage level can be significantly above the designed coverage level, which may not always be preferable. Generally, when constructing prediction intervals, the actual coverage should be approximately equal to the designed coverage, avoiding both undercoverage and overcoverage to assure that the prediction intervals are representative and informative.

In the experiments, the authors focus on CQR in combination with quantile regression neural networks and quantile regression forests, and remarks that their experiments have shown that when using quantile neural networks as the underlying regression algorithm, the intervals tend to be too conservative, constructing unnecessarily wide prediction in-tervals. They avoid this problem by tuning the nominal quantile levels of underlying quantile neural networks as additional hyperparameters, which is proven not to invalidate the coverage guarantee.

11 Ensemble Batch Prediction Intervals

(Xu & Xie, 2020) present a conformal prediction-inspired method for building distribution-free prediction intervals for time series, termed Ensemble Batch Prediction Intervals, or EnbPI for short, reporting that their method is suitable for non-stationary, dynamic time series. As described in Section 9, conformal predictors based on either transductive or inductive interference assume the samples to be exchangeable, making them unsuitable for time series. Contrarily, the EnbPI method does not assume exchangeability; instead, it places mild assumptions on the error process and the accuracy of the underlying regression algorithm, and can therefore be applied to time series data. Additionally, the method does not require data splitting as in split conformal prediction, which is advantageous for small-sample problems.

The EnbPI method, summarized in Algorithm 3, constructs probabilistic forecasts by aggregating point forecasts produced using bootstrap ensemble estimators. The ensem-ble estimators produce predictions by applying a regression algorithm to bootstrapped samples drawn from the training data and aggregating the results into a single prediction using the mean aggregation function. The method assumes that the samples, (xt, yt), are generated according to a model on the form

Yt=f(Xt) +t, t= 1,2,3, . . . (28) The goal of the underlying regression algorithm is to estimate the functionf using a leave-one-out estimator ˆf−t, where the ˆf−i leave-one-out estimator excludes the i-th training sample (xi, yi) from its training dataset. The prediction intervals produced by the EnbPI algorithm are on the following form:

CT,tα = ˆf−t(xt)±(1−α) quantile of {ˆi}t−Ti=t−1 (29) The symmetric prediction interval is centered at the point prediction ˆf−t(xt), with a width equal to the (1−α)-th empirical quantile of the latest T available residuals, i.e.

the quantile of a list with length T, indexed by i. The residuals are calculated using the absolute error between the training sample labels and the leave-one-out estimators as the conformity score, defined as

ˆ

i =|yi−fˆ−i(xi)|.

Algorithm 3: Ensemble Batch Prediction Intervals (EnbPI)

input : Training data {(xi, yi)}Ti=1, regression algorithm A, confidence levelα,

aggregation function φ, number of bootstrap models B, batch sizes, and test data {(xi, yi)}Tt=+TT+11 , with yt revealed only after the batch ofs prediction intervals with t in the batch are constructed.

output: Ensemble prediction intervals {CT ,tφ,α(xt)}Tt=+TT+11

Similarly to conformal prediction, the EnbPI method is used in an online setting, but includes a batch size parameter, s, determining the rate at which the model receives feedback. The feedback allows the method to be adaptive to dynamic time series, while only being fitted once, done by updating the list of available residuals after eachspredicted time steps. When the model receives feedback, the list containing the latest T available residuals is updated, where thesnew absolute residuals between the predicted and actual observations in the test dataset are added, and the s earliest residuals are removed.

For s = 1, the prediction intervals are built sequentially, presenting the model with a new sample point (x, y) immediately, but can be increased, i.e. s > 1. If the model never receives feedback, i.e. s =∞, the prediction intervals are all based on the training residuals, resulting in the intervals for all time steps in the test data having equal width.

Producing prediction intervals with a fixed length is often unsatisfactory, and the authors report that the batch size parameter should therefore be kept as small as possible, but its value should be dependent on the data collection process.

11.1 Theoretical Analysis

This subsection briefly discusses the assumptions and validity of the EnbPI method. For a more thorough and in-depth discussion, the reader is referred to Section 4 in the paper by (Xu & Xie, 2020).

As mentioned above, the time series data generating process by the EnbPI algorithm is assumed to follow a model on the form:

Yt=f(Xt) +t, t= 1,2,3, . . . ,

where mild assumptions on the time series’ stochastic errors and the underlying regres-sion algorithms are made. The error process {t}t≥1 is assumed to be stationary and strongly mixing, replacing the exchangeability assumption required by conformal pre-dictors. The term strong mixing was introduced by (M. Rosenblatt, 1956), and refers to asymptotic independence. A stochastic process is strongly mixing if the dependence between X(t) andX(t+T) goes to zero as the number of time steps between the two observations increases. The authors state that a highly non-stationary time series that exhibit arbitrary dependence still can be strongly mixing, or even have independent and identically distributed errors, and argue that the assumption made on the time series’

error process is mild and general, even verifiable (Xu & Xie, 2020).

Further, the estimated errors, ˆt, are assumed to be close to the true errors, t. For this assumption to be valid, overfitting must be avoided. To assure that the estimated residuals resemble the test residuals, out-of-sample training residuals obtained via leave-one-out training estimators are used. Some regression algorithms, such as neural networks, construct the optimal model by finding the model parameters that minimize the training error. The in-sample training errors are often small compared to out-of-sample errors, and by using the out-of-sample training residual during the construction of the prediction intervals in Eq. (29), unrepresentative residuals are avoided.

The ensemble learners are used to estimate the unknown modelf. The ensemble regression algorithms are only trained once and are used to predict the center of the prediction intervals for the future time steps. Hence, the assumption placed on the ensemble learners is that they must model f with satisfactory accuracy. The authors report that in reality, this assumption can fail when the batch size parameter,s, is large and time steps far into the future are predicted. The characteristics of non-stationary, dynamic time series can significantly change over time, reaching change points that alter the underlying model f, resulting in the predictions of the ensemble models being unrepresentative for the new f for t > T. However, valid coverage can still be obtained if a small batch size parameter is used, but the resulting intervals become inflated if the out-of-sample absolute residuals are large.

The EnbPI algorithm constructs approximately marginally valid prediction intervals, if the assumptions made about the error process and the underlying regression algorithms hold, and does so without assuming data exchangeability, which makes it suitable for time series data.

12 Proposed Method: Ensemble Conformalized

Quantile Regression