Can stock returns be used as a proxy for corporate bond yields?

(1)

Handelshøgskolen i Tromsø

Can stock returns be used as a proxy for corporate bond yields?

—

Stian Ek Grønning

Masteroppgave i Samfunnsøkonomi… Mai 2014

(2)

(3)

2 Contents2

Introduction ...4

Existing literature ...5

Background Theory ...7

Stocks ...7

Options ...7

Valuing options at the exercise date ...7

Valuing options before the exercise date ...8

Bonds ...9

Valuing bonds ...9

Yield to Maturity ... 10

The yield curve ... 10

Theoretical analysis ... 11

Valuing bonds when there is a significant risk of default using option pricing theory: ... 11

Estimating fictitious stock price for a non-publicly traded firm when the bond price is known: ... 15

Methodology and results ... 17

Data description ... 17

Regression analysis ... 19

Autocorrelation ... 20

Stationary vs. nonstationary variables ... 23

The Dickey-Fuller test ... 25

Cointegration ... 26

Multicollinearity ... 27

Time lagged variables ... 28

Information Criterions ... 28

Dummy variables ... 29

(4)

3

The regression models ... 30

Results ... 33

Discussion ... 35

Conclusion ... 37

References ... 38

(5)

4 Introduction

This paper investigates the relationship between stocks and bonds issued by the same firm.

Knowing a predictable relationship between two markets will be interesting for any investor, especially if this relationship is explained by a time factor. Suppose we could estimate the price of a derivate in terms of its underlying asset and this relationship was delayed in time. It would then be possible to forecast the future prices in of one of the markets, and hence, there could be arbitrage opportunities.

I expect to find some correlation between stocks and bonds issued by the same firm because they are both functions of the firm’s risk. Suppose a firm experiences a period of adverse business and runs with a considerable deficit. The risk of bankruptcy will increase and hence it is less desirable to own the company. The stock will then be traded at a lower price. The same applies for the firms bonds. If the firm goes bankrupt, it will fail to make the promised payments to the bondholders. An investor will require a risk premium to hold a risky bond, and this is achieved only by buying it for a lower price. Thus, an increase in the overall risk will reduce the value of the bond.

By the same argument, I expect to find a correlation between the creditworthiness of the firm and the degree of correlation between the firm’s bond yield and its share price. Rating

agencies provide ratings describing the creditworthiness of corporate bonds. In this paper I have used Standard & Poor’s rating system. AAA is the best grade. Following that comes AA, A, BBB, BB, B, CCC, CC C and D. The grade D means that the firm has failed to pay its debt and is bankrupt. Only 1.1 % of the top rated firms have defaulted after 20 years, while over 80% of the C-rated firms have defaulted in the same period. This results in the first null hypothesis:

H0: There is no correlation between a firm’s rating and the degree of correlation between its change in yield and stock return

(6)

5 I also expect that if there is a substantial amount of information asymmetry among the

investors, the informed could choose one particular market at the expense of the other. It is then possible that there may be a time lagged relationship between the markets. Hence, the second null hypothesis:

H0: There are no time lags between the stock and the bond movement within the same firm

Further, the reminder of this paper consists of six sections. The second is a literature review where I discuss what others have found. The third section contains some important

background theory. This is included to help the reader to understand the theoretical concepts in this paper. In the fourth section I discuss some theoretical concepts regarding the

relationship between the yield and the stocks within the same firm. The fifth section,

methodology and results, is a statistical analysis where I investigate the relationship between stocks and bonds using various regression analyses. Here I present the models used for estimation, give some background theory on the procedures and present my results. In the sixth section, discussion, I will present some ideas regarding the analysis. The seventh section concludes the paper.

Existing literature

Kwan (1995), the basis for this study, have studied this relationship and found several significant correlations. He finds that changes in bond yields are significantly (at the 0.1%

level) and negatively correlated with the issuing firm’s contemporaneous and lagged stock returns, but there are no correlation between the bond yield and the leading stock return.

This means that when new information is made available it impacts the stock market first. If the market were frictionless, it would be expected that the prices of the bonds and stocks

would adjust to this change instantly and thus simultaneously.

He also finds that the firm size has an

explanatory value. He investigates 327 firm’s weekly returns and yields and splits them into

Variable Q1 Q2 Q3 Q4

-0.3235 -0.242 -0.2944 -0.1212

-0.137 -0.2434 -0.2811 -0.2533

R² 0.28 0.34 0.42 0.47

The regression coefficients of the correlation between stocks and bonds. Contemporaneous and lagged based on firm size

(7)

6 four quartiles according to size. The R² increases with the firm size, from 0.28 in the quartile for the smallest firms, to 0.47 for the largest firms. All values are significant at the 0.1% or 1% level. The coefficients don’t change much in magnitude over the sizes, but the correlation has an increasing explanatory value.

He further finds that there are a significant correlation between the credit rating of the firms and their correlation between bonds and stocks. The yield of top rated bond is uncorrelated with the firm’s stock price. The bond yield depends only on the risk free rate. In the table below the coefficients of his regression are presented (t-values in parentheses). The correlation increases when the rating goes down.

One issue here is the R² of BB- and B-rated firms. If we follow the scale downwards from AA, the R²diminishes slightly and suddenly drops to almost zero. The coefficients are still highly significant. This implies that there are other factors than the rating that explains the correlation. A possible explanation is that the rating covers aspects like the coverage-, leverage-, liquidity-, profitability- and cash flow-to-debt-ratio, not just the volatility of the firm.

Hotchkiss and Ronen: (2002) have analyzed the informational efficiency of the high yield corporate bond market using daily and hourly price data. They did not find support for the claim that stock portfolio returns lead bond portfolio returns, but they managed to detect a contemporaneous correlation between stock and bond returns.

Norden & Weber (2009) have done an even more comprehensive analysis than Kwan’s. They find the same general results. In addition they find that American firms have a stronger negative correlation with their bonds than European firms, and that telecommunication firms exhibits a much stronger correlations than others.

Variable AAA AA A BBB BB B

Rt -0.1963 (-1.47)

-0.0878 (-2.64)

-0.1033 (-3.58)

-0.3489 (-8.76)

-0.5011 (-7.24)

-0.4079 (-5.85) Rt-1 -0.2015

(-1.84)

-0.1981 (-6.37)

-0.2483 (-9.15)

-0.3313 (-5.08)

-0.3309 (-5.08)

-0.1656 (-5.55)

R² 0.61 0.5 0.41 0.4 0.04 0.04

The regression coefficients of the correlation between stocks and bonds based on the rating of the firm

(8)

7 Background Theory

Stocks

Ownership of a stock (sometimes called a share) represents a claim to one proportion a firm’s assets and earnings. In other words, stockholders are the owner of the firm. If a firm has 10,000 stocks outstanding and an investor owns 5,000 of these, he owns half of the firm. In an efficient market, the combined value of the stocks will represent the firm’s value. If a

business reports of a negative change of its profits or an unexpected deficit, the business is considered less valuable and the price of the stocks will decline. Corporations have limited liability. This means that the owner of a stock can never be held personally responsibility for more than the value of the stock. This means that a creditor cannot demand that a stockholder puts up more money to pay of the firm’s debt.

Options

Options have properties that are highly interesting in the pursuit of an answer to the research question. There are different options associated with the ownership of bonds and stocks. I will return to this issue later.

An option is a contract which gives the owner the right to buy or sell an underlying asset for a specific price before a specific time. The issuer of the contract is obligated to buy or sell (depending on the contract) the underlying asset if the owner chooses to use this right. We then say that the option is exercised.

There are two kinds of options. A call option gives the owner the right to buy the underlying and a put option gives the right to sell. Usually, one contract is an agreement to buy or sell 100 stocks. Unlike futures and forwards, where the owner is obligated to exercise the contract, the option only gives the owner the right, not the obligation to exercise it. This freedom may be of great value for the owner, and hence the contract has intrinsic value. The underlying asset may be several things, such as stocks, currencies, stock indices, futures etc.

Valuing options at the exercise date

A call option will have value on the final day of its lifetime if the underlying asset is worth more than the exercise price. If the value of the stock is 140, and the strike price is 100, the

(9)

8 owner can then use the option to buy the stock for 100 and sell it immediately for 140

(assuming no commission) and then earn 40 in no time. Hence, the option is worth 40, assuming there are no commission fees. A put option works in the opposite way. If the stock is worth 100 and the strike price is 140, the owner sells the stock for 140 and buys it back again for 100, and earns 40. This will also mean that the option is valueless if the value of the underlying asset is on the “wrong” side of the strike price. If the stock is worth 130 and the strike price is 140, it would make no sense to exercise a call option.

An option cannot take any value. A call can never be worth more than the underlying asset. If this was not true, you could earn money by buying the stock and selling the option. A put can never be worth more than the strike price.

Valuing options before the exercise date

The determination of the value of an option before the expiration date is a much more

complicated matter. This problem was solved by Fisher Black and Myron Scholes in 1973 in their famous work “The Pricing of Options and Corporate Liabilities”. In addition to the current value of the underlying asset ( ) and the strike price ( , the value now depends on some more factors:

 T: Time to expiration of the option

 r : The risk free rate

 σ: The volatility of the underlying asset

The value of the option on the expiration date and an earlier point in time may differ due to the possibility of a change in the value of the underlying asset. For example, if the strike price on a call option is 80, the underlying is worth 70 and there are three months remaining of the contract, there is always a possibility for a positive change in the underlying’s price of more than 10 during the remaining time. An option is therefore never totally worthless. There will always be at least a faint hope for a profitable price change. The amount of time to expiration and the volatility of the value of the underlying are hence important factors in estimating the possibility of a desirable price development. The price itself is calculated with Black &

Scholes differential equation. I will not go through this process in detail here.

(10)

9 Bonds

A bond is an instrument of indebtedness of the bond issuer to the holder, a way of acquiring capital. The issuer can be a government, municipality or corporation. It basically works like this: The bond is written and sold for a certain value to a holder. At the end of the bonds lifetime the issuer pays the holder back with interest. The ownership of bonds can be transferred in the secondary market.

There are several types of bonds. The most common is the coupon bond. Here is the issuer obligated to make fixed payments to the holder according to an agreed upon indenture in addition to the principal (face value) in the end of the bonds lifetime. Another type is the discount bond.

This bond is sold at a lower price than the principal. The holders return is the difference between the price and the face value. There are also other types of bonds which coupon payments are not fixed, but are derivatives of other financial sizes, like various indices and rates.

Some bonds are callable. This means that the issuer can choose to pay off the principal early.

If the macro environment changes and the general cost of money decline, the bond will be relatively costlier for the issuer. The bond may then be called, and new bonds with different indentures are written. This implies a risk for the holder of the bond because he might not find an equally good investment opportunity if the bond is called, and hence the required return is higher if the bond is callable.

Stocks and bonds are both securities. One important difference between the two is that the owners of the stocks are investors, and the owners of the bonds are lenders. The bondholders have priority and will be repaid before the stockholders in the event of bankruptcy. We will return to this matter later on.

Valuing bonds

The price of a safe bond is the present value of all the cash flow generated by the issuer to the holder. The discount factor is here the risk free rate since the bond is risk free. In practice the risk free rate may vary over different periods, and this needs to be incorporated into the calculation.

(11)

10

The price of the bond changes over time because the holder of the bond earns continuously a share of the next coupon payment. This is called the accrued interest. If a coupon bond is sold one second before a coupon payment is made, the owner of the bond in the recent period is entitled to the incurred rates. The invoice price equals the stated price plus the accrued interest.

Yield to Maturity

The most commonly used measure for the return of the bond is called yield to maturity. This is the internal rate of return (IRR) on the investment in the bond, assuming that all coupons are reinvested in that yield.

The yield curve

The yield curve shows the relationship between the time to maturity and the yield. The most common shape of this curve is depicted below. The concavity of the curve shows that bonds with a longer maturity promise a higher yield than those with a shorter maturity. This is because investors require a risk premium due to the uncertainty of future events. The further the maturity is into the future, the higher risk premium is demanded. The shape of the yield

curve changes over time and is of great interest because it gives an idea of the expected future interest rates and economic activity. Therefore, the slope of the yield curve is important. If the slope is steep, there is a large gap between short- and long- term bonds.

(12)

11 Theoretical analysis

Valuing bonds when there is a significant risk of default using option pricing theory:

One essential aspect regarding bonds is the difference between the promised and actual payment. If the payment is risk free, such as Norwegian government bonds, the promised and actual payments are always the same. Due to the general risk of default, corporate bonds have on average a higher promised yield than government bonds. The compensation for this risk, the risk premium, will depend on the particular corporation issuing the bond, the current market conditions, the rate of safe bonds, etc.

Merton (November 1973) presents a theory of how to value bonds when there is a significant probability of default. He defines the risk as “…the possible gains of losses to bondholders as a result of (unanticipated) changes in the probability of default…”. He further claims that a difference in price between two bonds with equal term structures will solely be caused by the difference in the probability of default. Both Merton (spring 1973) and Black and Scholes (1973) recognize that this can be done with the same method as with option pricing.

They have 8 assumptions for their theory to work. They are as follows:

1. There is no transaction cost or taxes.

2. There are a sufficient number of investors participating in the market who can buy or sell as much as they wish of an asset to the market price.

3. The borrowing and the lending rate are the same.

4. Short-sales of all assets are possible.

5. Trading takes place continuously in time.

6. The Modigiliani-Miller theorem that the value of the firm is invariant to its capital structure obtains.

7. The term structure is flat with certainty. This means that future values are discounted continuously

8. The dynamics of the value of the firm, V, can be described as

where α is the expected rate of return on the firm per unit of time, C is the total dollar payouts by the firm to its shareholders, is the variance of the return on the firm and dz is a standard

(13)

12 Gauss-Wiener process. That all these assumptions hold in every point in time may perhaps be a little too much to wish for, but the author claims that assumption 1-4 may be substantially weakened and argues that the rest of them will hold.

The theory is described by assuming there exists a security whose market value, Y, can be written as a function of the value of the firm and time. The dynamics of the security’s value can be written as:

1)

Since , there is a relationship between , and and the corresponding variables described in assumption number 8. By use of Itô’s lemma, we can write the dynamics of Y as:

2)

Subscripts denote the partial derivatives. If we compare I. and II: we can show that the instantaneous returns on Y and V are perfectly correlated:

3)

Now, assume that we take a loan and invest the money in the firm and the particular security.

is the amount of money invested in the firm, is the amount invested in the security and is the size of the loan. ( . If dx is the instantaneous return from the

portfolio, we have:

4)

(14)

13

The investment strategy is chosen so that the stochastic parts of the firm and the security cancel each other. If this is accomplished the portfolio is nonstochastic. If there are no arbitrage profits, the expected return on a zero net investment is zero.

5)

a.

b.

If , the solution to equations in 5 exists if and only if

6)

Now, substitute for and from equation 3 and rewrite:

Rewrite again and simplify:

7)

This is a parabolic partial differential equation for F. It must be satisfied by all securities whose values can be written as a function of the value of the firm and time. To solve this equation, we need an initial condition and two boundary conditions. These conditions are what distinguish one security from another (e.g., the debt of a firm from its equity). Equation 7 says that in addition to time and the value of the firm, the function for the value of the security Y=F(•), depends on the risk free rate, the variance of the firms value, the payout policy of the firm and the promised payout policy to the holders of the security. It also shows

(15)

14 that F does not depend on α, the expected return of the firm, the utility functions of the

investors and other assets that are not present in the portfolio.

I will now review the simplest case of theoretical bond pricing using what we have found so far. Suppose there is a firm with only two classes of claims: A simple discount bond and equity. If F is the value of the debt issue, we can write equation 7 as:

8)

Where because this as a zero coupon bond and if we assume that the firm cannot issue any new claims on the firm, pay cash dividends or do share repurchase prior the time T. , the length of time until maturity, hence . We will need two boundary conditions and an initial condition to solve equation 8. By definition, where f is the equity’s value. F and f cannot take negative values due to the limited liability, so we have the lower boundary:

Moreover, the bond cannot be worth more than the issuing firm. Hence the upper boundary:

The initial condition arises from the following argument: The firm promises to pay an amount of money to the bondholders at the maturity date T. If the firm defaults and the payment is not met, the bondholder takes over the company and the shareholders lose everything. So, if we have that , where B is the amount of money to be paid to the bondholders at the time T, the firm will pay out, and the value of the equity will be . If , the firm defaults and the bondholders effectively acquires the firm. Thus, the initial condition for the debt at is:

(16)

15 Now the equation can be solved with the standard method of separation of variables.

However, these calculations can be avoided by looking at a problem already solved by the literature. The value of equity will in this case equal . If we substitute for F in equation 8 and use the same boundary conditions, we have that:

9)

Subject to the initial condition:

If we rearrange 9) and change the notation to and , we have exactly the same as equation 7 in (Black & Scholes 1973. P.643):

This is an equation for a European call option on a non-dividend-paying stock where the firm value in equation 9 corresponds to stock price, and B corresponds to the exercise price. I have not taken on the challenge to do this calculation, but I have shown that this is a valid way of estimating the price of a high risk bond.

Estimating fictitious stock price for a non-publicly traded firm when the bond price is known:

As mentioned if the former section of this paper, when a firm issues bonds, the bondholders effectively acquires the firm and the stockholders obtain an option to buy it back. If the stockholders can provide more equity, they have the option to pay off the bondholders whenever they find it suitable. This is because most corporate bonds are callable. The stockholders have in effect bought a call option on the assets of the firm from the

bondholders. Owning a corporate bond is, by the same argument, equivalent to holding a safe bond and at the same time giving the firm’s stockholders a put option on the firm’s assets.

(17)

16 Thus, the stockholders are given the opportunity to sell the assets to the bondholders if the firm defaults.

The value of the put is the value of limited liability for the stockholders. They cannot be held economically responsible for the firm’s debt. So, if the firm defaults, the stockholders will walk away from their firm’s debt and hand over the firm’s assets to its creditors/bondholders. The stockholders lose their entire deposit of money and the bondholders lose their claim to the firm minus the value of the assets. This will imply that the call option is valuable if the business runs profitably and the firm is able to pay its debt. On the other hand, in the event of a default, the put option will be the valuable one. But, as previously mentioned, due to the fact that financial and economic conditions may change over time, none of the options will be completely worthless before the default is a fact.

If we now look at the put-call parity,

“Value of call + present value of exercise price = value of put + stock price”

and regard the “present value of exercise price” as the present value the promised payment to bondholders for sure next year, the parity can be rewritten as:

“Value of call + present value of promised payment to bondholder = value of put + stock price”

Since holding a corporate bond is equivalent to holding a safe bond and at the same time giving the firm’s stockholders a put option on the firm’s assets, we can again rewrite the equation:

“Present value of promised payment to bondholder - value of put = stock price - value of call

= value of the bond”

To sum up: The put option has a positive value, and the call option is worthless on the maturity date if the assets are worth less than the debt. If there is a positive amount of time before the debt expires, the value of the options is not necessarily the same as on the time of

(18)

17 maturity. This means that if the bond price is known in the exchange, and the stock is not publicly traded, a theoretical stock price can easily be calculated if the price of the option is known. This can be done with the Black & Scholes formula for option valuation. The only unobservable variable here is the volatility of the stock, but it is possible to do reasonable assumptions here.

Methodology and results

To investigate the relationship between the stock return and the bond yield issued by the same firm, I have done various regression analyses on stock and bond data. I have tried several different models, here including variables on level form, on difference form, with different time lags, distributed lag models, autoregressive distributed lag models and various regression using dummy variables.

Data description

The dataset comprises of data for 47 consecutive business days, containing the price of the stock and the yield of the bond for 27 American companies. Moreover, I have used the closing price of the stock and the yield calculated from the last corresponding bond

transaction of the day. All stock prices are adjusted for dividends, merges and splits. It would be preferable to use intraday data, but due to the fact that bonds have a very low trade

frequency, good intraday data does not exist for very few bonds, and seldom for several days in a row. The requirements for a firm to be included in the analysis are as follows:

1. Its stock has to be traded publicly in a stock exchange.

2. The firm’s bond has to be traded approximately every day. If there were periods of no trading, the data would provide poor information and the two time series would not be suitable for comparison.

3. It must be rated by Standard & Poor’s prior to the first day of estimation and the rating score could not change within the period. I could have used Moody’s or Fitch’s rating system as well. They are equal for the purpose of this paper. The choice of S&P is

(19)

18 arbitrary.

4. The bonds maturity date needs to be in 2017 or close to this year. It would be natural to use bonds with approximately the same maturity so the differences in yield among the firms would not be due to the yield curve. The choice of 2017 is arbitrary, but my choice fell on this year since it was a little more frequent represented, and hence it was easier to find bonds with a common maturity year.

5. The bonds are a coupon bond with fixed payments. The change in yield cannot be subject to special coupon calculation.

6. The bonds are of senior security level. This means that if a firm has issued several bonds with different security levels and faces payment difficulties, the payouts will happen after a priority scheme where the senior security bonds will be paid out before other subordinated bonds.

7. The bonds need to be callable. The callability will impose a risk for the holder, and hence it affects the price and the yield. This is a very common property of corporate bonds, and I assume that the presence of this feature may affect the change in yield, and hence I have used only callable bonds.

Beyond these requirements, the choice of firms is arbitrary. I have not taken parameters like firm size or age, amount of debt, industry sector or geographical location into account. This is a weakness of this paper, but I had to draw a line for the scope of this paper somewhere.

The reason why I have used the yield instead of the price of the bond has two main reasons.

Firstly, the yield to maturity is the most common measure for the return on bonds, and I have no reason to deviate from other literature. Secondly, one price of the bond may give different yields, depending on the amount of time until maturity. Hence, an investor that requires a certain return on his portfolio will place different bids on the bond at different point in time.

The table on the next page shows the yield arising from a purchase of a bond for the price in the left column at the time in the top row. The bond in the example matures on the 7^th of April 2024, pays a 5% coupon and has a face value of 100. The time premium/ yield curve is not taken into consideration in this example, but the argumentation is still valid.

(20)

19

Price 4.7.2014 4.7.2015 4.7.2016 4.7.2017 4.7.2018 4.7.2019 4.7.2020 4.7.2021 4.7.2022 4.7.2023 90 6.37 % 6.48 % 6.63 % 6.82 % 7.07 % 7.43 % 7.97 % 8.87 % 10.69 % 16.23 % 100 5.00 % 5.00 % 5.00 % 5.00 % 5.00 % 5.00 % 5.00 % 5.00 % 5.00 % 5.00 % 110 3.79 % 3.68 % 3.55 % 3.38 % 3.16 % 2.84 % 2.37 % 1.57 % 0.00 % -4.65 %

Suppose that a firm has a constant risk, and the investor has a certain opinion about the future of this firm and then requires a yield of minimum 3% for this bond. The table shows that the investor would pay 110 for the bond in 2018, but not in 2019. Hence, the change in price between two subsequent periods is not 100% comparable to the change in price between two other subsequent periods if there is a considerable time span between them. This difference will be larger if the price is far from the face value. Although, this may me nit- picking for the sake of this paper, but it will be an important aspect for research on long maturities.

Regression analysis

The various relationships investigated in this paper are estimated with regression analyses. I will now explain how this works, address some common problems one might encounter, and suggest some ways of solving such problems.

We are often interested in knowing how changes in economic variables (e.g., price) will affect other economic variables (e.g., consumption). A regression analysis is an approach for

modeling this relationship, described in mathematical terms. The most basic method, the linear regression, explains the dependent variable (y) as a function of the estimated parameters , the independent variable , and an unexplained error term :

Given a sample for the population we wish to investigate, we obtain an estimate of the parameters:

The residual, is the difference between the predicted and the observed

dependent variable. The ordinary least square method (OLS) obtains the parameter values that

(21)

20 give the smallest sum of squared errors, and hence, the most accurate line through the plotted data.

There are several assumptions that are necessary for the regression result to be unbiased and consistent. They are:

 The errors are random, follow a normal distribution and have a mean of zero.

 The errors are uncorrelated.

 The variance of the errors is constant through time.

 The sample is representative for the population.

 The independent variables are independent of each other. It should not be possible to express one variable as a linear combination another.

 The independent variable must take at least two different values.

There are several other aspects that need to be considered before the analysis is done. The input data should be diagnosed to reveal possible problems (we shall return to this issue shortly), and one should check if some form of data correction is needed. A thorough analysis of the dataset will also help to determine which regression model that should be applied. It is further important to distinguish between cross-section data, which is data on a number of economic units at a particular point in time, and time-series data, data collected over time on one particular economic unit. This paper predominantly investigates time-series data. The first issue one should then consider is autocorrelation and the possible presence of a unit root.

Autocorrelation

Autocorrelation is a problem that arises in time series regression when the dependent variable is a function of its former value. It can be described as:

This is a violation of the assumption of constant variance in the errors. This is detected because it produces autocorrelation in the observable residuals. This means that the current error affects not just the current value of the dependent variable, but also its future values. As an example, suppose that a natural disaster creates a fear of shortage of a certain resource,

(22)

21 driving the price of this resource up for an extended range of time. This event is not predicted by the model, and hence it is a part of the error term.

Autocorrelation in the error can be due to an autocorrelated dependent variable whose

autocorrelation is not sufficiently explained by the independent variables and their lags. It can also be caused by omission of an autocorrelated independent variable.

Autocorrelation does not make the coefficients (betas) biased, but, when the autocorrelation is positive, the standard errors tends to be underestimated, and hence the t-value tends to be overestimated. This may lead us to a conclusion where a coefficient is significant when it is actually not and hence committing a type II error. If it is possible to correctly model the autocorrelated errors, an alternative estimator with a lower variance may exist. A lower variance gives a higher probability to obtain a more accurate coefficient estimate.

How do we detect and measure autocorrelation? The most common procedure is to calculate the Durbin- Watson d- statistic. This is a bound test for the null hypothesis that the errors are serially uncorrelated. The alternative is that the errors follow a first order autoregressive process. The test statistic is:

T is the number of observations and is the residual error at time t. The statistic ranges from 0 to 4, where low values indicate that successive error terms are, on average, close in value to one another. They are then said to be positively correlated. High values indicate the opposite.

The error terms are then, on average, much different in value from one another, and they are negatively correlated. As mentioned earlier, the errors need to be normally distributed with a zero mean for the regression analysis to be valid. A rule of thumb is that the value of the statistic should lie between 1 and 3, preferably 2. The exact critical values depend on the number of observations, the desired level of significance and the number of independent variables. The values can be found in appendices of statistical texts. Two values are reported, and if the statistic lies between them, the autocorrelation test is inconclusive. If the null hypothesis of no autocorrelation is rejected, the whole regression analysis may be useless because it does not give a trustworthy result. If we use a lagged dependent variable as an

(23)

22 independent variable, the d- statistic is not always reliable. Durbin’s h-statistic should then be used. This is an asymptotic normally distributed statistic for large samples. This means that is follows a standard normal distribution. The regression can then be tested with the null

hypothesis of no autocorrelation against the 2-sided alternative of autocorrelated errors.

Hence, at a 5% level, the decision rule is if -1.96 < h < 1.96 do not reject the null hypothesis. This statistic cannot always be computed because the square root of a negative number may be required

So what do we do when we reject the null hypothesis of no autocorrelation? Donald Cochrane and Guy Orcutt published a seminal paper in 1949 which describes a procedure to adjust a linear model for serial correlation in the error term. The steps in the estimation procedure are as follows:

Step 1:

Use the ordinary least squares- regression to obtain the residuals

Step 2:

Run the regression:

This gives the least squares estimate of ρ as

Step 3:

Use the estimate to obtain observations of Y* and X* as:

, and

, , for t=2,3,...,N

An estimate of is obtained from an OLS regression of Y* on X*. A new set of residuals is then calculated and steps 2 and 3 are repeated until successive estimates of ρ differ by less than 0.001. After the procedure is done, the Durbin- Watson statistic will be closer to 2. If the initial residuals were subject to a large extent of autocorrelation (D.W. close to 0 or 4), the correction may not be perfect, but the null hypothesis of zero autocorrelation will at least be inconclusive. If the amount of autocorrelation is very large, and the independent variables

(24)

23 seems not to be explaining the variation in the dependent variable, the variable could be nonstationary.

Stationary vs. nonstationary variables

A time series variable is stationary if its mean and variance is constant over time. This will imply that the covariance between two arbitrary values in the time series depends only on the time span between them.

Most economic variables are random. This means that it is not possible to perfectly predict it, so the true value is not known until it is observed. A model that produces such a time-series variable is called a stochastic process. A univariate time-series model is an example of a stochastic process where the value of a single variable is only dependent of its former values and past error terms. This can be described as:

This is an autoregressive model where the errors are independent, with zero mean and with a constant variance. The fact that implies that the process is stationary. The variance of this process can be shown to be the constant , which is not dependent of t.

A non-stationary variable is described as not having the property of mean reversion. This means that it does not tend to return to its previous mean after a shock, like a stationary process will. Consider an autoregressive model fluctuating around a linear trend:

-60 -40 -20 0 20 40 60

Example of a random, stationary process with zero mean

(25)

24 This process contain a growth term, , and its mean will depend on t. When , the process will be stationary around its trend.

A process will also exhibit nonstationarity if . We then say the process has the

property of a unit root. This means, if we disregard the intercept and the growth term for now, that the model takes the form:

This is called a random walk model. The value of depends only on its former value plus the stochastic error term. This means that process evolves through time, and hence the variance and the mean will also follow the same randomness. Since the random error is added for each time, and this is the only reason for change in the variable besides it former value, the time- series will move in unpredictable directions, and it will be impossible to predict the variables next value. The mean values of subsamples will be dependent on the samples period. A unit root will cause problems in statistical inference. A regression can be highly inaccurate or spurious if one or more of the variables exhibits this kind of different “behavior” at different points in time.

-50 0 50 100 150 200 250

Example of a nonstationary process with stationary trend

(26)

25

If nonstationary variables were to be used in a simple regression model, the results could indicate that there is a significant relationship between them when there actually is none. If the two time series have fairly similar shape, an OLS-regression could indicate a quite significant relationship, but in reality they just happen to drift in the same direction.

The Dickey-Fuller test

The tests for stationarity and unit root were developed by David Dickey and Wayne Fuller in 1979. As mentioned before, an autoregressive model like is stationary when and nonstationary when . This means that the interesting thing to examine here is the value of ρ. More accurately, we test if ρ is equal to one or significantly less than one.

Tests for this purpose are called unit root tests for stationarity. There are three different versions of the Dickey-Fuller test for a unit root. Note that the most common version of this test is on difference form:

1.

This is a plain test for a unit root.

2.

The second version is a test for a unit root with drift.

3.

The last one is a test for a unit root with drift and a time trend.

-5000 0 5000 10000 15000 20000

Example of random walk model

(27)

26 The procedure runs several tests to check the null hypotheses , and . In unit root test on difference form we test of the coefficients equals zero, not 1. A problem with this test is that if the null hypothesis is true, is nonstationary and has a variance that increases as the sample size increases. This will

transform the distribution of the usual t-statistic. Therefore, we use another statistic called a τ (tau) statistic with its own unique critical values. We reject the null hypothesis of

nonstationarity if . In other words, in the case of the time series is stationary.

A further developed version of the test, called an augmented Dickey-Fuller test (ADF-test), allows for the possibility that the error term is autocorrelated. This problem may arise if the applied model fails to capture the full dynamic nature of the process by lacking some

important lag terms. The number of sufficient lagged terms can be determined by examining the autocorrelation function of the residuals. In practice, the augmented Dickey-Fuller test is always used by statistical software to ensure that the error terms are uncorrelated.

Cointegration

An interesting feature of a model that exhibits a unit root, such as , is that it becomes stationary if we take the first difference, which means that we add its former lag to the equation. It can then be written as . Models that has this property is said to be integrated of order one, and is denoted as I(1). Stationary variables are integrated of order zero, denoted I(0). The order of integration is the number of times the time series has to be differentiated to make it stationary.

In general, nonstationary variables should not be used in a regression due to the possibility of obtaining a spurious result. However, there is an exception to this rule. If we have two variables which are nonstationary and integrated of order one, we expect their difference, or any linear combination of them, such as , to be integrated of order one as well. But there are exceptions where the linear combination is integrated of order zero. If this is the case, the said variables are cointegrated. This means that they share a common

stochastic trend, and hence they never diverge too far from each other. If this is the case, the residuals must be stationary. The regression will then not be spurious. A Dickey-Fuller test, as previously described, can reveal the stationarity of the residuals.

(28)

27 To sum up:

Multicollinearity

Multicollinearity is a phenomenon that arises in a multivariable regression analyses when two or more of the variables are highly correlated. This is a problem because it makes it difficult to make inferences about the individual coefficients. In a regression analysis we are often interested in knowing what happens with the dependent variable when we change one of the independent variables, and hold the other ones fixed. This will not give a sensible

interpretation if there is a dynamic relationship between the independent variables. Here are some examples of issues that may arise if the model has a problem with highly correlated independent variables:

 An independent variables coefficient that is expected to be an important predictor may turn out to be insignificant or have the wrong sign.

 Omission or inclusion of an independent variable may change the values of the other regression coefficients drastically.

 The standard error of the affected variables may be estimated to be too large compared to if they were “alone” in the equation. If we test if the coefficient is equal to zero, we may be led to a failure to reject a false null hypothesis of no effect.

If the same pattern of multicollinearity is maintained through the time series, it may not be a severe problem for the overall model. The predictive power is not diminished as the

dependent variable is a function of a bundle of independent variables. So we could chose to leave the model as is, despite multicollinearity, depending on the research question. If this is a problem, ridge regression or principal component regression can be used to solve the

problem.

In time series data, the presence of multicollinearity between a variable and its lag will be the same as autocorrelation.

(29)

28 Time lagged variables

A lagged variable is a time series variable that has been moved in time. This is may be useful because the causality between cause and effect may take some time. If this is the case, and we look at the instantaneous relationship between x and y, we might not find the relationship between the variables. Therefore, we “push” one or several variable one or several steps in time, so that the regression is corrected for this “time error”. To reveal a possible lagged relationship between the variables, it is beneficial to use some statistical software that simulates different lags and lags combinations, and then reports the best model based of information criterions.

A model that is relevant for this paper is the autoregressive distributed lag model: a model that contains both lagged ’s and ’s as independent variables. With p lags of y and q lags of x, the model can be written as.

Information Criterions

There could be a problem to determine the best model to use in econometrical analyses.

Information criterions are measurements for the relative quality of a statistical model. They cannot be tested for since they don’t provide a statistic in absolute sense. The criterions are based on a trade-off between goodness-of-fit and the complexity of the model. AIC is given by

and the Schwarz criterion is given by

(30)

29 Where:

 SSE: Sum of squared errors

 N: Number of observations

 K: Number of variables

The formulas are quite similar, but SC penalizes extra variables more heavily than AIC for N>8. One should use the model with the smallest value of AIC or SC. One important aspect regarding this method is that it will not tell if the model fits the data poorly in general, only that it fits the data better than others.

Burnham & Anderson (2002) recommend that AICc (AIC corrected for finite sample sizes) should be the preferred measurement for model selection regardless of sample size. This criterion is AIC with an addition:

AICc converges to AIC as N gets large. The use of AIC when N is not many times larger than k², will increase the probability of selecting models that have an excessive number of

parameters, and this can be a problem in some cases.

Dummy variables

A dummy variable is binary. This means that it can only take two values, namely zero and one. It is used to present non-quantitative properties. This can be gender, model, location, color or a grade in an ordinal scale. The zero and the one indicates the presence or the absence of a property. Usually the variable D is defined as:

(31)

30 In a regression analysis, a dummy variable can be used to capture the changes in the intercept and/ or the slopes. Another neat property of the dummy variable is that it can be used to describe interactions between qualitative factors. Intercept variables for qualitative properties are additive. This means that the effect of each factor is summed up and added to the

regression intercept. Hence, the regression model assumes that the dummy variables are independent of each other, but this is not always the case in reality. Though, if they are independent, it is easy to interpret the coefficients estimated from the regression. If the qualitative factor is present, just add its coefficient to the intercept.

If we are interested in knowing whether a qualitative factor is significant in the explanation of the variation of the dependent variable, we could perform an F-test to check if all the

coefficients are equal to zero at the same time. If this is the case, the qualitative factor does not explain the variation of the dependent variable.

The regression models

This econometrical analysis is done in two main steps. The first is an analysis on each individual firm and the second part is a study of the relationship between the firms. I start with 10 different regression models and various correlation estimates of on each firm. I have done a kind of “model mining”, where I have searched for a model that can explain the

relationship between stocks and bonds issued by the same firm. This must not be mistaken for data mining, where one searches for data that fits a desired finding. Due to presence of

autocorrelation, I have used the Cochrane- Orcutt iterative estimation in the first 8 models, but not it 9 and 10, where a lagged dependent variable is included in the regression. The neat property of C.O.- estimation method is that it corrects the errors for autocorrelation regardless of if this is a problem or not. This means that autocorrelation is fixed when it is present, and otherwise, the regression is the same as OLS- estimation.

I have used the following notations: Y is the yield, S is the price of the stock and ε is the error term. Since the literature suggests that the change in the yield is a function of the stock return, I have focused predominantly on this relationship. I have used the following regression models:

(32)

31

1.

This is a simple regression on levels.

2.

This regression is on the change in yield and the return from the stock as a percentage.

3.

Difference form. Contemporaneous and leading x-variable.

4.

Difference form. Contemporaneous and lagged x-variable.

5.

Difference form. Leading, contemporaneous and leading x-variable. I have chosen to do these three regressions (3., 4., and 5.) partly to investigate the

multicollinearity present in the independent variables, and partly to check if some of the various lags capture the true timing property of the relationship between stocks and bonds.

6.

Model number 6 is a series of regressions done in a loop. The first regression is the same as model 2. The second is the same but with a lagged dependent (

variable. The third loop is with the same variable lagged two times, and so on. This is done because I am trying to reveal the length a possible time lagged relationship between stocks and bonds. This method will show the most probable distance in if the relationships regardless of the significance of the variable. However, there are no economic theories that suggest that this time span will have a duration of several days, but I have checked out it nevertheless. The most ideal would be to do

(33)

32 the loop several times for each company where different number of time lags of the stock return (independent variable) would be included. I have not done this because this would require hundreds of regression, and it was not possible to find any software that could do this. It would also be convenient lag the variables both way, say for t+5 to t-5, but this was also not possible with the software, but I solved this problem with model 7.

I needed a method to determine which of the tested time lag(s) that would explain the time lag in the best manner. I solved this problem by using the Akaike

information criterion (AIC) and the Schwarz criterion (SC).

7.

This model is principally the same as nr 6, but now is the independent variable lagged. This is the same as leading the dependent variable.

8.

The literature suggests that stock return leads the change in bond yield, and I have here tried to see if I could obtain the same result with one days lag.

9.

This is an autoregressive lag model on difference form. I have included this model to see if I could correct the autocorrelation by adding a lagged dependent variable to the equation. This is not important to answer to research question.

10.

This is the same as model 9, but on level form.

(34)

33 Results

First, all the variables were diagnosed with the augmented Dickey- Fuller test for unit root.

As mentioned before, the presence of a unit root means that the variable exhibits a stochastic behavior, and hence the variance is not constant through time and one of the main

preconditions for a regression analysis is violated. One third of the yield variables and almost all the stocks (as expected) had this property. This problem was overcome by doing the regressions mainly on difference form or checking the residuals for stationarity, and hence cointegration of the variables. But, since only a few firms tested positively for cointegration, I will focus on the findings on difference form. This is also the most common approach in the literature.

Model 2 found, as expected, that the highest rated firms show no significant correlations between the change in yield and the return of the stocks (on the 5% level). The highest rated firm with a significant relationship has the rating BB+, which is just below investment grade, and is the highest speculative grade. All the other firms that have this relationship also have a lower rating than BB+. With other words, the rating plays a role here. I have tested my results and managed to successfully reject my first null hypothesis. This was done with a dummy- variable regression to obtain the coefficients. Then I tested if all the coefficients were equal to zero at the same time. The reported p-value was 0.000. I could not number the various rating grades and use an ordinary OLS. This is because the grade scale is not numerical, but ordinal.

The “distance” in-between the grades are not known.

I expected to find a negative coefficient in model 2. It is reasonable to believe that an increase in the stock price will be associated with an overall reduced risk for the company, and hence a lower risk for the bonds to default on the payments. In this case we would also expect the yield to decline due to a reduced risk premium. Although this is the case for most of the

BB+ BB BB B B CCC+ CCC CCC CCC- CC

-2.081 -1.212 1.883 -1.041 -8.149 0.553 -6.536 -5.941 -3.687 -2.841 P 0.032 0.062 0.045 0.164 0.015 0.826 0.672 0.272 0.371 0.000 0.362 0.325 0.237 0.142 0.241 0.05 0.22 0.342 0.141 0.492 Table: Regression with the model

(35)

34 companies, there is one exception where the coefficient is positive, and I cannot explain this and it may be a random finding.

Model 3, 4 and 5, the models with different combinations of leading, contemporaneous and lagged variables did not find any new significant coefficients. One interesting result in model 3 is that the inclusion of an insignificant time lag to the contemporaneous and significant variable changed the beta-coefficient. The reported p-value was changed considerably. The contemporaneous variable was made insignificant. This indicates that there is a problem with multicollinearity in these models, and hence I have not managed to find a model that captures the true relationship of the timing property.

Model number 6 and 7 estimated the length of the best lag between the variables, and found, besides the contemporaneous already found, that there were a significant lag for the stock return of one day for one of the firms. This may be due to chance. The next model looked specifically at the one-day lag. The same result was found here. In other words, I have not successfully managed to detect a lead-lag relationship between the change in yield and the stock return. This means that I have not successfully managed to reject my second null hypothesis of no time lags.

Model number 9 and 10 came up with some results that may be of interest. Highly significant lagged ΔY’s as independent variables seems to be a common feature here. This would be expected on level form since the time series is autocorrelated, but this implies that some yield- time series are autocorrelated at a higher order.

I did not find the relationships for any rating to be significant on the 0.1% level, like Kwan (1995) and Norden & Weber (2009) did. Their accurate results may be due to a better dataset.

I have constructed my dataset manually by looking at transaction data, and because this is an extremely time consuming work, I have a maximum of 4 firms per grade, including the sub grades with a minus or a plus. Other papers have used data from hundreds of firms over a longer time span, and this will provide richer information.

It is also possible that they have used better and more complex models. Both Norden &

Weber (2009) and Hotchkiss & Ronen (2002) have used a vector autoregressive model. This

(36)

35 captures both lead and lag relationships within and between stationary variables in a

simultaneous multivariate framework.

Norden & Weber (2009) have already found out that there are parameters other than the rating that explains the relationship between stocks and bonds. Firm size, leverage ratio and industry sector are some parameters that plays a role here.

Discussion

The time plot for ΔY for Beazer Homes has a fairly consistent variance. It is tested to be stationary. The mean of Y has a weak declining tendency, but is fairly stable, and has a value of 5.32% in the given period. It is also not correlated with the firm’s stock return, which has a totally different trend behavior. The price of the bond must necessarily follow the same basic pattern as the yield. Hence the question: Why is this bond bought and sold for several

different prices?

The correlation matrix (next page) for the different time lags shows that the subsequent ΔY’s are negatively correlated to each other. This means that a change in the yield tends to be followed by a change with the opposite sign. I will therefore assume that the yield, and hence the price, will be possible to forecast to a certain extent. The direction of the price change should at least be possible to predict quite often. Remember that this pattern is consistent

(37)

36 throughout the whole period of more than two months. It is possible that this pattern has existed before and after the given period to. I will return to this subject in the discussion part.

LdY0 1.00

LdY1 -0.57 1.00

LdY2 0.08 -0.59 1.00

LdY3 0.01 0.08 -0.56 1.00

LdY0 LdY1 LdY2 LdY3

The correlation matrix for the time lags of the change in yield.

A visual inspection of the plot of the development of various yields and the corresponding stock prices shows that they often coincide to a large extent, although their relationship is not statistically significant. This may imply that there is an economic relationship between them, although they are not statistically significant. Due to different opinions and expectations among the investors, the stochastic part accounts for a large portion of the movement. I think a logic reason to why the investors disagree on the price of a bond is that the method for calculating a bond price under risk is a very difficult task. I will therefore assume that a considerable proportion of the bond bids are based of belief and guessing. Again, this is just an assumption.

An interesting question that would naturally follow the research question is what do the firms that exhibit a significant relationship between the stocks and the bonds have in common? The only common property I have successfully managed to detect is that these firms have one huge peak in the stock return. The magnitude of this movement in the level of the stock price is so large that it probably instantaneously affects the overall risk and hence the

creditworthiness of the firm and therefore also the bonds. The plot for ΔY and stock return for the firm Best Buy (next page) illustrates this perfectly. We can clearly see that the points are almost perfectly collected around the origin in a circular fashion, but there is ONE point in the far North West corner. On January 15^th the firm published a financial statement bearing negative news, and the stocks fell rapidly by 28.6% and the yield increased by 6%. If this one point is omitted from the regression, the result is no longer significant. This is also the case for several other firms.

(38)

37 Conclusion

In this paper, I have investigated the relationship between the change in yield and stock returns for 27 firms for 47 consecutive business days. Firstly, I have found that there is an inverse relationship between the rating of firms and the degree of correlation between the stock return and the change in yield for bonds issued by the same firm. Secondly, I have not successfully managed to demonstrate that this relationship has a timing property in general. It was found for only one of the firms. Thirdly, it seems that the most common feature among the firms that exhibits a significant relationship is a large and sudden change of the stock price, but I have not statistical evidence for this claim.

(39)

38 References

Articles:

Banz, R.W. (1981): The Relationship Between Return and Market Value of Common Stocks Barsky, R. B. (1989): Why Don’t the Prices of Stocks and Bonds Move Together?

Black, F. and Scholes, M. (1973): The Pricing of Options and Corporate Liabilities.

Burnham, K. P. and Anderson, D. R. (2002): Model Selection and Multimodel Inference: A Practical Information-theoretic Approach (2nd ed.)

Cochrane, D. and Orcutt, G. H. (1949): Application of Least Squares Regression to Relationships Containing Auto-Correlated Error Terms

Durbin, J. (1970): Testing for Serial Correlation in Least-Squares Regression When Some of the Regressors are Lagged Dependent Variables"

Fama et al (1989): Business Conditions and Expected Return on Stocks and Bonds

Keim, S. and R. F. Stambaugh (1986): Predicting Returns in the Stock and Bond Markets Kwan, S. H. (1995): Firm-Specific Information and the Correlation Between Individual Stocks and Bonds.

Merton, R.C (spring 1973): A Rational Theory of Option Pricing

Merton, R.C (November 1973): On the pricing of corporate debt: The risk structure of interest rates.

Norden, L. and Weber, M (2009): The Co-movement of Credit Default Swap, Bond and Stock Markets: an Empirical Analysis.

Books:

Hull, J.C (2012): Options, futures, And other Derivatives. Eighth edition. Chapter 4, 9, 10 and 23

Bodie, Kane, Marcus (2011): Investments and Portfolio Management. Ninth edition. Chapter 14 and 15

Myers, B. (2003): Principles of corporate finance. Seventh edition. Chapter 1, 20, 22 and 24.

Carter Hill, R., Griffiths, William E., Lim Guay, C. (2012): Principles of Econometrics.

Fourth Edition. Chapter 6, 7, 9 and Appendix 9A

(40)

39 Lind, Marchal, Wathen (2010): Statistical Techniques in Business and Economics.

Web pages:

http://www.standardandpoors.com/ratings/definitions-and-faqs/en/us