• No results found

Simple linear regression with x = FV

6 Analysis of fair value measuring

6.6 Testing hypothesis 1

6.6.2 Simple linear regression with x = FV

In other words, the coefficient of determination measures the proportion of the variation in y that is explained by the variation in x (Keller, 2005).

The coefficient of determination does not have a critical value that enables us to draw conclusions whether the model is good or poor. The value of R2 has to be evaluated based on professional judgment, and in context to other statistical test. However, the higher value of R2,the better the model fits the data sample (Keller, 2005).

Looking at the regression printout we can find the coefficient of determination:

R-Sq = 79,5%

As we can see, R2 tells us that 79.5 % of the variation in the dependent variable, TP, is explained by the variation in the independent variable BV.

Brief summary

The standard error of estimate is somewhat large compared to the sample mean, which suggest that the model fits poorly. On the other hand, a linear relationship exists

between the depended and the independent variable. In addition, the coefficient of determination implies that 79.5 % of the variation in TP is explained by BV.

We have seen that the requirements for the error variable might not be satisfied. The standard error of estimate is also large. Thus, I would say that the model does not fit the data very good. On the other hand, I have managed to prove that a linear relationship exists and the explanation power is surprisingly good.

6.6.2 Simple linear regression with x = FV

Like I did in paragraph 6.6.1, when assessing the regression equation with BV as

independent variable, I have to start by drawing a scatter diagram to determine whether a linear model appears to be appropriate. Notice, that the procedure is the same as before. The only different is FV instead of BV on the x-axis in diagram 10 below:

78

Diagram 10 – Scatter plot of TP vs FV

To the left in diagram 10 all observations are plotted against each other. To the right, the most extreme observation is removed to check for influential observations. However, the linear relationship does not seem to change, which implies that the observation is an outlier. During the discussion of outlier in the provisos regression, I had no evidence to suggest that the data were wrong. Even though it is possible to define some of the observations in diagram 10 as outliers, the same conclusion as before is valid. Some of the observations are simply very large, which lies in the nature of private equity investments. Thus, I will keep the observation as a part of the data sample.

Based on the scatter plot, I would say that a linear relationship does exist. However, as before, the relationship is not very strong.

The regression equation23

Calculating the regression line yields the following equation:

In other words, if we assume FV = 4 mill, the equation yields;

which implies a transaction price equal to 48.78 mill. For every one million increase in the fair value estimate, the transaction price will increase with 1.02 million.

Regression diagnostics

To evaluate the requirements involving the probability distribution of the error variable, I am going to use a similar residual plot for the dependent variable as previous:

23 Complete regression printout is enclosed in appendix B.

79

Changing the independent variable from BV to FV has not improved the histogram or the normal probability plot. In fact, the distribution seems to be quite similar as before. In this case, the histogram is still bell shaped but not completely centred round zero. The normal probability plot is supposed to follow a straight line. As you can see, some of the plots are off the blue line. As I concluded previously, some variation has to be expected.

Thus, I would say that the normal probability plot does not abandon the assumption that the error variable is normally distributed.

Heteroscedasticity

The plot up to the right has basically the same pattern as before. Ideally, the plots should have been evenly distributed around zero. The plots in our diagram are, as before, not evenly distributed around zero, but seem to have more plots below zero. Still, there is no severe change in the spread of the plotted points. Thus, I would assume that the variance is close to constant.

Non-independence of the error variable

Changing the independent variable has not changed the classification of data. Because the sample consists of cross-sectional data, I expect error dependency to be absent. To be sure, I will use the same Durbin-Watson statistic to test for autocorrelation. Since sample size (n=55), number of independent variables (k=1) and level of significant (α=0.05) is the same as before I can use the same values of d (page 74) to answer and

.

80

The calculation of the Durbin-Watson statistic shows d = 1.94. Unlike the test statistic where x = BV, we do not have to reject the null hypotheses. Thus, the sample does not consist of autocorrelation which means that the errors are independent of each other.

The biggest change when applying FV as the independent variable, compared to BV, is the Durbin-Watson statistic that shows no evidence of autocorrelation. The

requirements regarding non-normality and constant variance are still wage in direction of assuming that the requirements are fulfilled. Thus, we have to consider that the least square method is not the best estimator, and that inference is not valid. Regardless, the estimates are still unbiased.

I. Standard Error of Estimate

When assessing the regression equation with FV, the standard error of estimate

decreases from 112,998 to 82.8. The sample mean is , which still makes the to appear large. However, as Keller (2005) wrote, there is no predefined upper limit for

. The best we can do is to compare the values of with each other:

Because > the regression model with FV as independent variable fits the

observations best. Thus, if the decision was to be made entirely based on we should choose the model with FV as independent variable. Notice that the squared figure only denotes which equation the standard error of estimate belongs to.

II. Testing the Coefficient

I have to test whether there is a linear relationship between the value of y (TP) and the value of x (FV). As before, I have to test the following hypotheses:

H0: β1 = 0 H1: β1 ≠ 0

If the null hypothesis is true, no linear relationship exists. If the alternative hypothesis is true, some linear relationship exists.

Predictor Coef SE Coef T P Constant 44,70 12,49 3,58 0,001 FV 1,01686 0,04911 20,71 0,000

81

As the printout shows, the test statistic is t = 20.71 with p-value = 0. A large t-value and a low p-value result in overwhelming evidence to infer that a linear relationship exists between the transaction price and the fair value estimate.

III. Coefficient of Determination

Since I have determined that a linear relationship exists next is to measure the strength of the relationship. R2 is given by the regression printout:

R-Sq = 89,0%

As we can see, R2 tells us that 89.0 % of the variation in the dependent variable, y, is explained by the variation in the independent variable x. Comparing with shows that is larger than :

Consequently, FV is a better predictor of the transaction price than BV.

If the difference between the two coefficients of determination had been less, it had been necessary to test whether the difference was statistical significant by applying e.g.

bootstrapping. Briefly explained bootstrapping is the practice of estimating properties of an estimator by measuring these properties when sampling from an approximating distribution. One standard choice for an approximating distribution is the empirical distribution of the observed data. In the case where a set of observations can be assumed to be from an independent and identically distributed population, this can be implemented by constructing a number of “resamples” of the observed data set. Each

“resample” is obtained by random sampling with replacement from the original data set (Wikipedia, 2010). One possible test statistic24 could have been:

The numerator is simply the difference between the calculated coefficients of

determination. The denominator, on the other hand, is the averaged standard deviation of each coefficient of determination computed by bootstrapping. If you would like to

24 The test statistic is a result of conversations with my thesis advisor Knivsflå.

82

read more about bootstrapping, Efron and Tibshirani (1986) have written an article that explains the method in more detail.