Evaluation of distribution mapping based bias correction methods

(1)

NCCS report

^{no. 1/2014}

Evaluation of distribution

mapping based bias correction methods

Asgeir Sorteberg

^1,2

, Ingjerd Haddeland

³

, Jan Erik Haugen

⁴

, Stefan Sobolowski

^{5, 2}

and Wai K. Wong

³

1Geophysical Institute, University of Bergen, ²Bjerknes Centre for Climate Research, University of Bergen, ³Norwegian Water Resources and Energy Directorate, ⁴Norwegian Meteorological Institute, ⁵UNI Research

(2)

2 Norwegian Centre for Climate Services (NCCS) is collaboration between the Norwegian Meteorological Institute, the Norwegian Water Resources and Energy Directorate and UniResearch.

The project's main purpose is to provide decision makers in Norway with information relevant for climate change adaptation. In addition to the partners, the Norwegian Environment Agency is represented on the Board.

The NCCS report series includes not only reports where one or more authors are affiliated to the Centre, but also reports that the Centre has initiated. All reports in the series have undergone a professional assessment of at least one expert associated with the Centre. Reports in this series can also be included in report series from the institutions where the main authors are affiliated.

(3)

3 Title:

Evaluation of distribution mapping based bias correction methods

Date 2014-11-20

ISSN nr.:

2387-3027

Report no.

no. 1/2014 Author(s):

Asgeir Sorteberg, Ingjerd Haddeland, Jan Erik Haugen, Stefan Sobolowski and Wai K. Wong

Classification

● Free ○ Restricted

Client(s):

NCCS

Client's reference

Abstract

The report outlines some of the challenges in doing bias correction of climate models and tests various flavors of distribution mapping based correction methods. This is done for three sites in Norway and using data from one regional climate model. Main findings are that while the corrections generally improved the fit to the observations in 1/5th of the validation measures it didn’t. The methods did a reasonable job in conserving the climate change signal and on average the climate signal was within +-2% of the original in 29-39%

of the individual climate change measures. In around 13-19% of the climate change measures the signal deviated more than 10% from the original climate change signal after the bias correction were performed. Introduction of correction of wet days introduced a systematic overestimation of the positive changes and underestimation of the negative changes after bias correction. In general no method was superior to the others, but some methods were better than other in correcting the data towards the observations, and some did a better job than the others in conserving the climate change signal.

Keywords

Bias correction, precipitation, downscaling, regional climate models

Disciplinary signature Responsible signature

(4)

4

1. Introduction ……….. 5 2. Considerations in doing bias corrections ……… 6 3. Correction methods and evaluation procedure …………... 7 4. Conclusions ……….. 19

Abstract ……… 20

(5)

5

1. Introduction

This is a short note where a few different bias correction methods are compared to investigate the difference between them and to see if there are great differences in their performance in terms of reproducing the observational statistics and not introducing artificial climate change signals.

What is bias correction?

Pure bias corrections should ideally correct the discrepancy between a model and reality on the scales resolved by the model. According to the recommendation of the Joint Working Group on Forecast Verification Research (JWGFVR), the comparison should be performed between gridded data sets with the grid resolution of the models degraded by a factor of 3–4 to take into account numerical filter effects.

What is downscaling?

In contrast downscaling attempts to resolve the scale discrepancy between the resolutions required for impact assessment and the models resolution.

Mixing bias correction and downscaling

In the literature the distinction between bias correction and downscaling is less clear as many bias correction attempts also include a downscaling component (e.g. bias correction of coarse

resolution data to a point location or a finer grid).

Why bias correction?

The most obvious reasons for bias correcting model output are imperfect model representations which seriously hampers the quality of impact models. Thus, in some cases one might argue that the loss of physical consistency between variables which will occur in the corrected dataset is less serious than a present day climate which is very far from reality.

Methods for reducing the biases:

1. Reduce the bias by improving the models

2.

The development of multi-model ensembles (averaging over the ensemble tend to

reduce the bias compared to single model approaches in addition it provides an

uncertainty estimate)

3.

Model output statistics (MOS) which is masking the model biases in a post processing step.

While the three methods above are pure bias correction methods we may also add a fourth

method Empirical-statistical downscaling (ESD) which is a perfect prognosis approach and not a bias correction method per se. Here some aspects of the model output is considered perfect (usually the large scale flow) and the intention is to establish links between observed large scale predictors and observed local scale predictands rather than correcting model errors.

Most used bias correction methods:



Delta change approach



Multiple linear regression

(6)

6



Variance Scaling



Power Transformation



Local intensity scaling



Distribution mapping

2. Considerations in doing bias corrections Suitability of observationally based estimates

How close are the observationally based estimates to the truth? Observation-derived data may be biased due to non-representativeness of the underlying observations, instrument limitations (such as undercatch for precipitation) and assumptions made in the

interpolation procedure.

Are there inhomogeneities in the observed dataset? Does the timeseries suffer from changing variability as the observational network changes (for example extremes

interpolated with influence from three stations may be different than for two, what about the variability in a data empty region that at some point in time gets influenced by a new station etc.).

Suitability of selected RCM/GCM

Is the RCM/GCM simulation chosen a suitable starting point for predicting climate change on the smaller scales? Does it include the wanted processes (explicitly or through some sort of parameterization)? Does the simulated control climate resemble the observed?

Misrepresentation of event size

Is model data representing the scales of the events correctly? For example is the bias correction method able to deal with the possible overestimations in the extent of

convective cells (minimum one grid cell), and spatial misplacement of precipitation events, e.g., due to non-resolved orographic effects.

Temporal correlations

Are temporal correlations in the model correct? For example if a model has the wrong length on dry and wet spells this should be accounted for in the bias correction.

Spatial correlations

Are spatial correlations in the model correct? Most bias correction methods are not designed for correcting errors in spatial correlations since the bias corrected data will inherit much of the spatial correlation structure of the original model data. This is particularly true when the bias correction also act as a downscaling method (for ex. bias correcting a 50 km grid to a point value or a 1km gridsquare). As this often is done without regards to correcting the spatial covariability, the area mean dry vs wet days gets

overcorrected and the area-mean extremes are overestimated even if the distributions

(7)

7 looks reasonable in the different points. Randomization in the form of adding uncorrelated noise might weaken the spatial coherence and partly alleviate the problem.

Stationarity and the choice of length of the control period

Does the finite length control period used to derive the bias correction cover the entire spectrum of variability? Most bias correction methods correct the distribution of the variable over a selected time interval. This distribution is, in fact, a mixture of various other distributions, depending on the different weather conditions. A very long control time is needed if decadal to multidecadal variability is seen in the observations. Additionally, since the relative frequency of different weather conditions might change in a future climate, the resulting mixed distribution might also change. Thus the corrections are potentially not valid under climate change. Does the method accounts for this?

Physical Consistency

Does the method correct the different variables in a physically consistent way? Links and feedbacks between the meteorological parameters are often broken with bias corrections.

A revealing example is that bias correcting maximum and minimum temperatures may end up with minimum temperatures being higher than the maximum on certain days. Various analog methods or having different corrections for different weather types have been shown to alleviate this problem to some extent. In the end the loss of physical consistency may have severe effects on things like snowmelt and evapotranspiration.

Selection of timescale for bias correction

At what time scale should the data be corrected? Most bias correction methods determine and correct biases at one aggregation time (days, months season) thus assuming that biases occurs mainly on this time scale. For example bias correcting daily data may induce very different decadal variations in the bias corrected data than in the original model.

3. Correction methods and evaluation procedure

Table 1 gives the different correction method tested. All methods in Table 1 are correcting the model’s control data (1971-2000) to the observed data (1971-2000) for each month individually.

The exception is the EmpQuantERA method that is correcting the control simulation towards a simulation with the same model using “perfect” boundaries (ERA Interim).

(8)

8 Table 1: Summary of the different bias correction methods used.

Method Short Name

Gro- up

Short description:

Control run

Short description:

Scenario run Empirical

quantile mapping

EmpQuant NVE Estimate empirical CDF for observed and RCM intensities for a fixed set of quantiles and develop a correction based on the differences. Wet day frequency is also corrected by removing RCM rainy days below a threshold.

Correction developed for the control period applied to the RCM run for both intensities and frequencies. If

intensities exceed empirical quantile values,

extrapolation using tricubic spline is applied.

Empirical quantile mapping

EmpQuant2 MET Similar to EmpQuant, with empirical CDF for observed and RCM intensities for a fixed (201) set of quantiles estimated for the full precipitation distribution. Linear interpolation between quantiles.

Correction developed for the control period applied to the full distribution with linear and bounded extrapolation for intensities outside the range of empirical quantiles.

Empirical quantile mapping

EmpQuantE RA

MET As EmpQuant2, but use of additional RCM data forced by re-analysis data during training. Will allow for additional GCM uncertainty e.g. due to natural climate variability in the control period

As EmpQuant2

Bernoulli- gamma distribution

BernGam NVE Find model’s wet-day threshold so wet-day frequency equals the observed for each month. Corrects the intensity of the wet days, by fitting the control data to a gamma distribution which is based on the observed data.

Correction developed for the control period applied to the RCM run for both intensities and frequencies.

Double gamma distribution

DouGam NVE Find model’s wet-day threshold so wet-day frequency equals the observed for each month. Corrects the intensity of the wet days, by fitting the control data to a gamma distribution which is based on the observed data. Values over the 95^th percentile are corrected using a second gamma function.

Correction developed for the control period applied to the RCM run for both intensities and frequencies.

Empirical- mixed gamma- pareto distribution

EmpGamPar GFI Find model’s wet-day threshold so wet-day frequency equals the observed for each month.

Corrects the intensity of the wet days, by fitting the control data to a mixed Gamma-Pareto distribution which is based on the observed data.

Apply wet-day threshold found for control run.

Climate change signal for the whole distribution found by using the ratio between the value where the cumulative distribution functions of the control and scenario values has the same value.

Intensity of the wet days corrected by fitting the scenario data to the observed mixed Gamma-Pareto distribution and multiplies back the climate change signal.

(9)

9

Validation sites:

Bergen, Oslo Tromsø

The bias corrections are validated by the following measures:

1. Ability to reproduce today’s climate.

Table 2 shows the 24 selected validation measures for validating the ability of the different methods to reproduce today’s climate. We select validation measures that test the

different parts of the distribution and some measures to see if the temporal evolution of the timeseries looks reasonable (length of dry and wet spells, variance and lagged

autocorrelations). It should be noted that this validation point is not unproblematic as it is unclear to what extent the corrected data should follow the observations. The observations from the (often randomly chosen) historic period provides only an estimate to the

underlying statistical populations of the variable, and there will be biases in the observed statistics associated with often short observed record length from which the empirical or theoretical probability distributions were estimated. At the tails of the distribution the problem is potentially important even if the observed probability distribution is calculated based on a large sample size since the tails often are sensitive to addition of a few new observational extremes.

Table 2: Validation measures for validating the ability of the different methods to reproduce today’s climate (24 measures).

Mean Error: Bias Corrected vs observations Wet day

frequency (days)

Mean length of wet spell (days)

Maximum length of wet spell (days)

Mean length of dry spell (days)

Maximum length of dry spell (days) Mean 1-day

precipitation (mm)

Mean 1-day intensity on wet days (mm)

25 percentile 1- day intensity on wet days (mm)

75 percentile 1-day intensity on wet days (mm)

95 percentile 1-day intensity on wet days (mm) 99 percentile 1-

day intensity on wet days (mm)

99.5 percentile 1- day intensity on wet days (mm)

Maximum 1-day precipitation (mm)

Variance 1-day precipitation (mm²)

1-day lagged autocorrelation in 1- day precipitation Accum. 10-day

precipitation (mm)

25 percentile 10- day accum.

precipitation (mm)

95 percentile 10-day accum. precipitation (mm)

99 percentile 10-day accum. precipitation (mm)

99.5 percentile 10-day accum.

precipitation (mm)

Maximum 10-day accum.

precipitation (mm)

Variance 10-day precipitation (mm²)

5-day lagged

autocorrelation in 10- day accum.

precipitation

(10)

10 2. Ability to reproduce the original climate change signal.

A bias correction method should ideally reproduce some aspects of the original climate change signal. If it doesn’t the whole point of running the RCM in the first place is lost (if we want to adjust the climate change signal, we should probably employ a downscaling

technique, not a bias correction method). The most intuitive is to conserve therelative precipitation signal. We select validation measures that test if changes in the different parts of the distribution are conserved (Table 3).

Table 3: Validation measures for validating the ability to reproduce the original climate change signal (15 measures). All measures are Mean Errors in the scenario-control org. vs scenario- control corrected in %.

Mean Error: Scenario-control org. vs Scenario- control corrected

1-day mean precipitation (%)

1-day mean intensity on wet

days (%)

25 percentile 1- day intensity on

wet days (%)

75 percentile 1-day intensity on wet

days (%)

99.5 percentile 1- day intensity on wet days (%)

max 1-day intensity on wet

days (%)

10-day mean accum.

precipitation (%)

Precipitation (%)

95 percentile 10-

day accum.

precipitation (%)

99 percentile 10-

day accum.

pecipitation (%)

99.5 percentile 10- day accum.

precipitation (%)

max 10-day accum.

precipitation (%)

Validation Procedure:

The 39 validation measures are calculated for each month and for each station. Ranking is done by ranking the bias correction methods for each month and each validation measure at each validation site, then taking the average ranking. This means that for the ranking based on the ability to

reproduce today’s climate the ranking are based on 864 (3sites*12months*24 measures) individual rankings. For ranking the ability to reproduce the original climate change signal it is based on 540 (3sites*12months*15 measures) individual rankings. In the final overall ranking the ability to reproduce today’s climate and ability to reproduce the original climate change signal is weighted equally.

The above described validation procedure is only addressing a few of the issues that should be considered in doing bias corrections. It does not assess the suitability of observationally based estimates or the suitability of selected RCM. Nor does it investigate the the issue of physical consistency. As we only validate points we do not address the ability of the methods to correct the

(11)

11 errors in spatial correlations and the models ability to reproduce areal precipitation and conserve areal precipitation changes.

Correction of the original data

How many of the validation measures are improved with bias correction?

To get an overview of the results, the fraction of the validation measures that is improved after bias correction is given in Figure 1. The figure shows how often the corrected data is better than the original for the 864 individual evaluations. Up to 80% of the individual validations show an improvement with the DouGam and EmpQuant2 corrections, while it is slightly lower for the BerGam, EmpGamPar and EmpQuant methods. As excepted the EmpQuantERA lies below the others as this method is not correcting the RCM data towards the observations, but towards the perfect boundary simulation of the selected RCM (using ERA Interim as boundary conditions). It is surprising that in 20-25% of the validation measures the bias correction method actually

deteriorate the original data.

Figure 1: Percentage of the 864 individual evaluations where the corrected data is performing better than the original.

Is one method better than the others in correcting the data?

Figure 2 shows the average ranking based on 864 individual rankings based on the ability to reproduce today’s climate. DouGam, EmpQuant and EmpQuant2 perform rather similar and are slightly better than BerGam and EmpGamPar because they perform better for the 10 day accumulated values. For the 1-day values there are no methods that are clearly better than the others, but there is an indication that the BernGam method is not doing well for the 1-day extremes. Maybe because fitting the extremes to a gamma distribution is giving a poor fit? The double gamma distribution (DouGam) seems to be doing a better job. As expected the

EmpQuantERA is outperformed by the other methods as this method is not correcting the RCM

(12)

12 data towards the observations, but towards the perfect boundary simulation of the selected RCM.

Table 4 shows the average of the absolute values of the Mean Errors (using the absolute values to make sure that no large under and overestimations cancels the error). The data is averaged over 12 months and 3 sites. Note. That this does not say if the corrections are over or underestimating the observed values just the average deviation from the observed in absolute numbers for the different validation measures. This is also slightly different to the ranking as one big deviation may influence the values in this table (as it will increase the average), but will have less influence on the ranking where the total ranking is the averaged ranking and not averages of the real numbers as the last row in Table 4. Thus, the lowermost line will not correspond directly to the ranking in Figure 2.

Average over all measures the absolute mean error was a 10-16% deviation from the observed value. This means that on average we could expect the corrected values to be 10-16% away from the observed (sometimes underestimations and sometimes overestimations) for any given month.

Figure 2: Left: Ranking of the different models for the 24 validation measures (each of the 24 rankings are average ranking over 12 months and 3 sites). Red means best ranking and blue lowest.

Right: Average ranking based on 864 individual rankings of the ability to reproduce today’s climate (mean ranking for the 24 validation measures, over 12 months and 3 sites)

Table 4 (next page): The average of the absolute value of the Mean Error [abs( (Mod-Obs)/Obs)] in

% for the different validation measures (Table 2). The data is averaged over 12 months and 3 sites.

Note. That this does not say if the corrections are over or underestimating the observed values just the average deviation from the observed in absolute numbers for the different validation

measures. This is also slightly different to the ranking as one big deviation may influence the values in this table (as it will increase the average), but will have less influence on the ranking where the ranking is averaged and not the real numbers. Thus, the lowermost line will not correspond directly to the ranking in Figure 2.

(13)

13 Evaluation measure Short Name Bern-

Gam Dou- Gam

Emp- Gam- Par

Emp- Quant

2

Emp- Quant-

ERA

Wet day frequency (%) wday_freq 0.4 0.9 0.6 0.4 4 7

Mean length of wet spell (%)

wetspl_mean 10.9 11.1 10.8 10.7 11.8 14.6

Maximum length of wet spell (%)

wetspl_max 25.3 25.4 26.3 25.3 25.1 26.4

Mean length of dry spell (%)

dryspl_mean 10.1 10 10.5 10.3 11.9 12.5

Maximum length of dry spell (%)

dryspl_max 24.6 25.1 26.1 25.1 26.8 25

Mean 1-day precipitation (%)

wday_int_mean 0.6 1.8 4.8 2.6 4.5 9

Mean 1-day intensity on wet days (%)

precip_mean 0.5 1.8 4.6 2.8 0.6 12.5

25 percentile 1-day intensity on wet days (%)

wday_int_25 11.6 12.6 12.1 7.8 25.2 34.8

wday_int_75 8.1 3.9 6.5 2.5 3.9 10.7

wday_int_95 14 3.8 11.5 7.8 2.2 8.1

wday_int_99 13.8 4.5 7.8 13.9 6.4 11.4

99.5 percentile 1-day intensity on wet days (%)

wday_int_99.5 16.9 4.9 6.4 14.8 8.8 11.1

Maximum 1-day precipitation (%)

wday_int_max 27.7 13.5 14.6 0 5.3 21.3

Variance 1-day precipitation (%)

precip_var 18.5 3.4 16.3 12.2 2.5 17.8

1-day lagged autocorrelation in 1-day

precipitation (%)

lag_autocorr 20.2 18.9 21.6 19.9 20.4 19.7

Accum. 10-day precipitation (%)

precip_mean_accum 2.9 3.3 5.1 3.1 2.8 12.1

25 percentile 10-day accum. precipitation (%)

wday_int_25_accum 12 11.6 12.4 11.8 11.7 24

wday_int_75_accum 5 4.9 6.1 5.4 4.7 11.5

wday_int_95_accum 9.7 7.8 10.7 7.9 8 11.6

wday_int_99_accum 17.5 12.6 15.1 11.6 12.2 16.3 99.5 percentile 10-day

accum. precipitation (%)

wday_int_99.5_accu m

19 14 16.5 13.7 14.3 17.5

Maximum 10-day accum.

precipitation (%)

wday_int_max_accu m

17.9 15.7 17.3 14.5 15.1 18.2 Variance 10-day

precipitation (%)

precip_var_accum 20 15.4 22.2 15.2 16 23.9

5-day lagged autocorr. in 10-day accum.

precipitation (%)

lag_autocorr_accum 13.6 13 12.9 12.7 12.8 12.9

Average over all measures 13.4 10.0 12.5 10.5 10.7 16.2

(14)

14

Conservation of climate signal

How well do the correction methods conserve the climate signal?

Figure 3 show how many of the 540 (3sites*12months*15 measures) climate change signals from the corrected time that has changes with the same sign as the original data. All methods show that around 9 of 10 climate change signals have the same sign after the correction. Thus, if corrections are done for 12 months individually we could expect 1 of them to change sign after the bias correction is performed.

Figure 3: Percentage of the 540 (3sites*12months*15 measures) climate change signals from the corrected time series that have the same sign after correction.

To see more in detail how well the correction methods was able to conserve the signal we divide the deviations from the original climate signal in our 15 evaluation measures , 3 sites and 12 months into into 4 bins (less than 2% difference, 2-5%, 5-10%, and above 10% difference). For example if the climate change signal is +30% in the original data and +23% in the corrected it will be categorized in the 5-10% difference bin. Similar if the climate change signal is +30% in the original data and +37% in the corrected it will also be categorized in the 5-10% difference bin.

For EmpGamPar and EmpQuantERA around 37-39% of the climate change signals in the various measures are within +-2% of the original climate change signal. This is reduced to around 28-29%

for BernGam, DouGam, EmpQuant and EmpQuant2. Around 13-15% of the climate change signals are deviating with more than 10% from the original for EmpGamPar and EmpQuantERA. This is increased to 18-19% for BernGam, DouGam, EmpQuant and EmpQuant2. Thus on average around 1 to 2 of 12 months in every change signal has been distorted by more than 10% while 4 to 5 of 12 month will have distortions of less than 2%.

(15)

15 Figure 4: Percentage of the 540 (3sites*12months*15 measures) climate change signals from the corrected time series that has deviations from the original climate change signal within the given bins (less than 2% difference, 2-5%, 5-10%, and above 10% difference)

Are there systematic changes in the climate change signal?

Averaged over all the months, sites and measures that had a positive climate change signal the original data showed a change of +17.1%. Averaging over all measures that had a negative climate change signal gave a mean change in the original data of -9.3%. After bias correction all methods show a tendency for overestimating of the positive changes. On average it was changed from 17.1%

to 19.4% thus the positive climate change signals was exaggerated with 14%. There was an underestimation of the negative changes from -9.3% to -6.9% thus, an underestimation of the negative changes by 27%.

The reason lies in the wet day frequency correction. If there is a general increase in precipitation in the scenario run the wet day frequency correction will have a tendency to set more values to 0 in the control than in the scenario run. This will give a tendency give a more positive climate change signal after the correction. To understand this we can make an example: If we have 5 values [0.95 3 5 8 11mm] in the control and the relative change is 10% for all values the mean precipitation change is 10%. Then assume that we do a wet frequency correction so values less than 1 are set to 0 to get the correct frequency. After this correction the mean change will become 13.9%. Thus, in the case where the frequency correction is changing more values to zero in the control than in the scenario this will inevitably distort the original climate change signal to give a more positive change.

If the frequency correction is changing more values to zero in the scenario than in the control (not the case at our test sites) it will have the opposite effect.

(16)

16 Is one method better than the others in conserving the climate change signal?

Figure 5 below shows the average ranking based on 540 (3sites*12months*15 measures) individual rankings on the ability to reproduce the original climate change signal. It is clear that for the 1-day values EmpGamPar are better than the other methods on most measures while EmpQuantERA are the best for the 10 day accumulated values. Thus in the overall ranking (Figure 5, right) these two are the best methods, but as noted above for different reasons. For the three other methods there is no clear pattern in the ranking. It should be noted that it is possible to get better conservation properties if wet day frequencies are not corrected. Much of the problem in conserving the climate change signal is related to the wet day frequency correction. To understand this we can make an example: If we have 4 values [2 4 6 0.5] in the control and [3 4 6.5 2] in the scenario the change is going to be 24%. Assume all data is corrected with a factor 0.8 Then we get a change after the correction which is still 24%, however if we also say that all values below 1 is set to zero to correct the wet day frequency we get a change that is 29%. Thus, the frequency correction has distorted the climate change signal. Table 5 shows the average (over all sites and months) deviations in the absolute difference between the original climate change signal and the corrected in % (this means that if the deviation from the org. climate change signal were for ex. +5% -4% and +12% the mean deviations in the absolute differences would be 7%). The absolute numbers are used to avoid compensating under and overestimation giving a good average. On average over all 15 measures, all sites and months, the typical monthly climate signal was 14.6% and the deviation from the true signal was between 5.3% and 7.1%. Thus the average deviation of the monthly change from the true signal after correction was an artificial climate change signal in the order of 1/3to 1/2 of the original signal.

Figure 5: Left: Ranking of the different models ability to reproduce the original climate change signal for the 15 validation measures (each of the 15 rankings are average ranking over 12 months and 3 sites). Red means best ranking and blue lowest. Right: Average ranking based on 540 individual rankings of the ability to reproduce the original climate change signal (mean ranking for the 15 validation measures, over 12 months and 3 sites)

(17)

17 Table 5: Average climate change (%) in the original data (averaged over all months and all sites) is in the Org column. The other columns are average (over all sites and months) deviations in the

absolute difference between the original climate change signal and the corrected in % for the different climate change measures (table 3). Note that this does not tell if the bias correction is under or overestimating the change just the average deviation from the original climate change signal in %.

Evaluation measure Short Name Org Bern- Gam

Dou- Gam

Emp- Gam-

Par

Emp- Quant

2

Emp- Quant-

ERA 1-day mean

precipitation (%)

wday_int_mean 10 4 3.7 3.3 3.7 4.1 3.7

1-day mean intensity on wet days (%)

precip_mean 12.9 3.4 3.1 0.9 3 2.9 2.4

25 percentile 1-day intensity on wet days

(%)

wday_int_25 13.5 16.4 16.7 14.7 20.7 18.8 19.5

(%)

wday_int_75 11.4 7.3 7.4 4.7 6.5 6.2 5.7

(%)

wday_int_95 11.6 5.5 6 4 5.9 4.8 5

(%)

wday_int_99 14.3 6.2 7.5 5.2 7.5 8.3 5.8

99.5 percentile 1-day intensity on wet days

(%)

wday_int_99.5 15.1 6.6 6.9 6.2 7.3 7.4 6.2

max 1-day intensity on wet days (%)

wday_int_max 22.7 6.9 8 6.4 7.4 6.4 5.8

10-day mean accum.

Precipitation (%)

precip_mean_a ccum

13.3 3.5 3.1 1.4 3.1 3 2.4

25 percentile 10-day accum. Precipitation (%)

precip_int_25_

accum

20.8 12.6 12.6 8.6 11.2 9.5 10

75 percentile 10-day accum. Precipitation (%)

precip_int_75_

accum

14.8 3.3 3 2.7 3 3.2 3.1

95 percentile 10-day accum. precipitation

(%)

precip_int_95_

accum

11.9 4.1 4.1 2.9 5.1 3.8 3

99 percentile 10-day accum. pecipitation (%)

precip_int_99_

accum

14.3 5.6 7.1 6 6.5 6.9 3.8

99.5 percentile 10-day accum. pecipitation (%)

precip_int_99.5 _accum

15.8 6.4 6.4 5 6.4 5.3 4.9

max 10-day accum. precip_int_max _accum

17.3 6.4 7.3 7.6 9.2 6.5 5.9

Average over all measures 14.6 6.5 6.9 5.3 7.1 6.5 5.8

(18)

18 Can we perform some useful overall ranking of the methods?

Below is the ranking of the ability to reproduce the original climate change signal+ the ability to reproduce today’s climate weighted equally.

DouGam, EmpGamPar and EmpQuant2 perform almost equally well but for different reasons.

DouGam performs well on the ability to reproduce today’s climate, EmpGamPar is good in reproducing the original climate change signal and EmpQuant2 do around average in both reproducing the original climate change signal and on the ability to reproduce today’s climate.

Figure 6: The average ranking based on 540 (3sites*12months*15 measures) individual rankings of the ability to reproduce the original climate change signal.+ 864 ranking (3sites*12months*24 measures) individual rankings of the ability to reproduce today’s climate (both equally weighted).

(19)

19

4. Conclusions



75-80% of the individual validations (table 2) show that the model data is closer to the observed after the correction method has been performed



In 20-25% of the validation measures (table 2) the bias correction method actually deteriorates the original data.



We could expect the monthly corrected values to be 10-16% away from the observed (sometimes underestimations and sometimes overestimations).



9 of 10 climate change signals have the same sign after the correction (thus, if corrections are done for 12 months individually we should expect 1 of them to change sign after the bias correction is performed)

 The average deviation of the monthly change from the true signal after correction was an artificial climate change signal in the order of 1/3 to 1/2 of the original signal.



28-39% (approx. 4 to 5 of 12 months) of the climate change signals in the climate change various measures (table 3) is within +-2% of the original climate change signal after the bias correction



13-19% of the climate change signals in the various climate change measures (table 3) is deviating with more than 10% from the original climate change signal after the correction (approx. 1 to 2 of 12 months).



The climate change signal has a systematic overestimation (mean change over all positive climate change signals increased from 17.1% to 19.4%) of the positive changes and underestimation (mean change over all negative climate change signals reduced from -9.3% to -6.9%)of the negative changes after bias correction.

The reason lies in the wet day frequency correction (see explanation in the report).



Overall the methods DouGam, EmpGamPar and EmpQuant2 perform almost equally well and the ranking is depending on what performance measure we emphasize. The ability to reproduce today’s climate or the ability to conserve the climate signal.



3 sites and 1 RCM is probably too little to get a robust ranking of the methods, but the evaluation shows that all methods are performing in accordance to the

intention of the methods and there does not seem to be any serious bugs in any of

the implementations.

(20)

20

APPENDIX

OBSERVED and MODELS FOR THE CONTROL PERIOD (1971-2000)

Wet day frequency

Mean wet day intensity

(21)

21 Mean daily precipitation

99.5 percentile on wet days

Figure A1: Seasonal cycle and deviation from the observed in mean 1971-2000 1-day precipitation, wet day frequency, mean intensity and 99.5 percentile on wet days for Bergen, Oslo and Tromsø.

Red: Observations, Blue: Original RCM control data, others: Bias corrected data.

(22)

22

CLIMATE CHANGE SIGNALS

Mean daily precipitation

Mean daily intensity

(23)

23 99.5 percentile on wet days

Figure A2: Seasonal cycle in (scen-cntrl)/cntrl (%) for 2071-2100 mean 1-day precipitation, mean intensity and 99.5 percentile on wet days for Bergen, Oslo and Tromsø (upper row) and the deviation in climate change signal between the corrected and original (lower row). Blue: Original RCM control data, Others: Bias corrected data.

Evaluation of distribution mapping based bias correction methods

NCCS report