Statistical arbitrage : high frequency pairs trading

(1)

Statistical Arbitrage: High Frequency Pairs Trading

Ruben Joakim Gundersen

Supervisor: Associate Professor Michael Kisser

NORWEGIAN SCHOOL OF ECONOMICS

Master Thesis, MSc. Economics and Business Administration Financial Economics

This thesis was written as a part of the Master of Science in Economics and Business Administration at NHH. Please note that neither the institution nor the examiners are responsible − through the approval of this thesis − for the theories and methods used, or results and conclusions drawn in this work.

(2)

i

Abstract

In this thesis we examine the performance of a relative value strategy called Pairs Trading. Pairs Trading is one of several strategies collectively referred to as Statistical Arbitrage strategies. Candidate pairs are formed by matching stocks with similar historical price paths. The pairs, once matched, are automatically traded based on a set of trading rules. We conduct an empirical analysis using high frequency intraday data from the first quarter of 2014. Our findings indicate that the strategy is able to generate positive risk adjusted returns, even after controlling for moderate transaction costs and placing constraints on the speed of order execution.

(3)

ii

Preface

This thesis marks the end of my master studies at NHH. The work has at times been challenging, but at the same time it has also been very rewarding. I would like to thank my supervisor, Michael Kisser, for valuable input and suggestions during the writing process.

(4)

iii

1. Introduction

In this paper we examine a popular quantitative investment strategy commonly referred to as “pairs trading”. The basic concept of pairs trading is remarkably simple; one identifies a pair of stocks that exhibit historical co–movement in prices.

Subsequently, if significant deviations from the historical relationship are observed, a position is opened. The position is formed by simultaneously selling short the relative winner and buying long the relative looser. When the prices eventually converge the position is closed and a profit is made. The strategy builds upon the notion that the relative prices in a market are in equilibrium, and that deviations from this equilibrium eventually will be corrected. Applying a pairs trading strategy is therefore an attempt to profit from temporary deviations from this equilibrium.

According to Gatev, Goetzmann & Rouwenhorst (2006) pairs trading strategies have been used by practitioners on Wall Street in various forms since the mid–

1980s. The strategy is often said to have originated within Morgan Stanley in a group led by Nunzio Tartaglia. The focus of the group was to develop quantitative trading strategies by employing advanced statistical models and information technology. The group sought to “mechanize” the investment process by developing trading rules that could be automated. Pairs trading was one of the resulting strategies. The group used this strategy with great success in 1987 – when the group is said to have generated a profit of $50 million – but were dissolved in 1989 after a period of poor performance. In the last decades, as technology has become more accessible, the strategy has been increasingly popular with investors.

Pairs trading is often placed in a group of quantitative trading approaches collectively referred to as statistical arbitrage strategies. The arbitrage part in this context is somewhat misleading as arbitrage implies a risk free profit opportunity at zero upfront cost. A pairs trading strategy is by no means risk free. There is no guarantee that the stocks in a pair will converge. They could even continue to

(7)

2 diverge, resulting in significant losses. Furthermore, the strategy is also often claimed to be “market–neutral”, meaning that the investor is unexposed to the general market risk. However, while it certainly is possible to create market neutral pairs, the total market risk of a position depends on the amount of capital placed in each stock and the sensitivity of the stocks to such risk.

*

In the first part of this thesis we explore the background and the theoretical basis for a pairs trading strategy. In addition we compare the performance of two existing pairs trading methods by applying them to sets of simulated data.

In the latter part of the paper we conduct an empirical analysis of a concrete pairs trading strategy. Through this analysis, we seek to determine if a pairs trading strategy delivers returns that are superior when compared to a buy–and–hold strategy. We use high frequency data for stocks listed on the Oslo Stock Exchange.

The obtained results indicate that it is possible to generate positive risk–adjusted returns by following a pairs trading strategy. The results are robust after controlling for transaction costs, and placing restrictions on the execution speed.

Specifically, we report annualized returns as high as 12 after costs %. In addition, the standard deviations of the returns are low. This combination leads to an impressive Sharpe ratio exceeding 3. We find that the constructed portfolios have close to zero exposure to market risk.

(8)

3

2. The concept of pairs trading

The pairs trading strategy is based on the concept of relative pricing. If two securities have identical payoffs in all states their price should also be identical.

This is a variant of the principle commonly referred to as the Law of One price (LOP). Lamont and Thaler (2003, 191) defines the LOP as follows “[…] identical goods must have identical prices”. It is important to note that the prices do not need to be “correct”, from an economical point of view, for the LOP to be valid. The LOP simply asserts that stocks yielding identical payoffs should have the same current price. The law is therefore applicable to the relative pricing of the stocks in a market, even if the pricing is economically incorrect (Gatev et al., 2008). We can further extend the example with identical payoffs to a situation where the payoffs are very similar but not identical. In such a situation the prices of the securities should also be similar. If a temporary deviation from this relative pricing relationship occurs it should be possible to exploit this by taking a position that generates a profit when the deviation is corrected. Pairs trading is one example of a strategy aiming to profit from such temporary deviations.

Before a pairs trading strategy can be implemented on a practical level we need to address some fundamental questions: What pairs of stocks are suitable? When should a position be opened or closed? How should one determine the amount of capital placed in the individual long/short positions? As we will see in section 4, there are multiple approaches to pairs trading, all offering different answers to these questions. Even so, the basic structure of a pairs trading strategy is common for all approaches. The first step involves identifying a pair of stocks whose prices appear to move together according to some fixed relationship. The period of time used to establish such a relationship is referred to as the formation period. After the suitable pairs are identified we enter the trading period. In this period we continue to observe the spread. If a significant deviation from the relationship is observed a position is opened. The investor then buys long some quantity of the relative looser and sells short some quantity of the relative winner.

(9)

4 The following figure graphically illustrates the concept of the pairs trading strategy.

Figure 2.1 – A pairs trading example

The figure shows two simulated stock prices on the left scale. In addition a dummy variable (right scale) indicates if a position is open or closed . A position is opened if the spread exceeds a previously calculated entry–threshold value. The position is closed at the next crossing of the prices. In this specific example, a position is opened at , and later closed when the prices cross at . At a position is again opened. This position stays open until . An intuitive way to understand the payoffs that would result from a trade is to think of the spread between the two stocks as a synthetic asset. When a position is opened the trader is effectively selling the spread short, speculating that it will decrease.

When the stock prices later cross the value of the spread is zero. The trader then closes the position, and earns a profit equal to the value of the spread at the time the position was entered.

Since pairs trading is a relative value strategy, a framework for assessing the relative value development in a pair is essential. In the hypothetical example above the two stock price series both start at unity. This makes calculating the relative changes in values simple. At any given point in time the cumulative returns to the series are directly observable. Any return differences between the stocks are therefore easily calculated.

0 1 2 3 4 5

0,2 0,4 0,6 0,8 1 1,2

0 50 100 150 200 250 300

Time

Stock A Stock B Position

(10)

5 Obviously, in a real situation the stock prices will not be as well behaved as in this example, but instead start at values that vary widely. This makes the comparison more complicated. In addition, Do, Faff & Hamza (2006,4) point out that the raw price spread between two stocks is not expected to stay at a constant level, even if the stocks yield identical returns¹. This makes the raw price spread unsuitable as an indicator of when a position should be opened or closed. In order to overcome these issues we need to apply a transformation to the series. By transforming the price series we achieve price level independency and we are able to consistently assess the relative value development in the stocks.

2.1 Normalization of stock prices

In previous academic literature (Engelberg, Gao & Jagannathan, 2009; Gatev et al., 2006), a common transformation to achieve price level independency is to construct cumulative return indexes for the stocks. These indexes reflect the total return since the beginning of a period, adjusted for dividends and splits. The indexes are then rebased to some constant common for all stocks considered. In the literature this transformation is usually referred to as normalization of the stock prices.

Example – Normalized price series

As the concept of normalization is central to this thesis we will provide a concrete practical example of the procedure. In the example we will consider the intraday development of the two stocks Seadrill and Fred Olsen Energy on January 6^th 2014.

The figure on the next page shows the raw price series.

1 To see this, think of the spread between two stocks A and B currently trading at 15 and 20 NOK respectively.

The spread at time 0 is equal to . Now we assume a 100 % return in period 1. The spread then also doubles because .

(11)

6 Figure 2.2 – Raw price series

We now normalize the series by rebasing both stocks to a value of one at the first observation. Mathematically this is done according to equation 2.1

2.1

Where is the normalized price of stock at time and the raw price of stock at time . The next chart shows the normalized price series.

Figure 2.3 – Normalized price series

With the transformation applied the relative value development of the stocks is directly comparable. It is now possible to consistently quantify the level of divergence.

241 243 245 247 249 251

09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

FOE SDRL

0,985 0,99 0,995 1 1,005 1,01

09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00

FOE SDRL

(12)

7

3. Pairs trading in previous literature

In this section we will review some important work previously done on the topic of pairs trading.

Gatev, Goetzman & Rouwenhorst –“Pairs Trading: Performance of a Relative Value Arbitrage Rule” (1998, 2006)

This study is one of the earliest academic investigations of pairs trading, laying the foundation for much of the subsequent research. In the paper a simple trading strategy is back–tested. Pairs are identified by finding stocks that exhibit historical co–movement in prices. Specifically, stock pairs that minimize the total distance between normalized price series are identified as potential candidates. This approach is therefore commonly referred to as the distance approach.

The dataset consists of daily closing prices from the US stock market over the period of 1962 to 2002. Significant excess returns of up to 11% annually (before costs) for self–financing pairs are reported. The authors attribute the excess returns to an unknown systematic risk factor not yet identified. They support this view by pointing out that there is a high degree of correlation between the returns to portfolios of non–overlapping pairs. The correlation is present even after accounting for common risk factors by applying an augmented² version of the Fama and French three factor model. In addition, the analysis shows that the returns to the pairs trading strategy were lower in the latter part of the sample, something the authors attribute to lower returns to the mentioned unknown risk factor. The study was first circulated as a working paper in 1998. In 2006 the sample period was extended and the paper was officially published.

2The model usually referred to as the Fama-French three-factor model was published by Eugene Fama and Kenneth French in 1992. The aim of the model is to attribute stock returns to exposure to different systematic risk factors. The three original factors included were the general market risk exposure, book-to-market ratio, and market capitalization. The augmented version used in this study includes momentum and reversal as two additional factors. For more information consult Fama & French (1992).

(13)

8 Vidyamurthy – “Pairs Trading, Quantitative Methods and Analysis” (2004).

In this book the author proposes the use of the cointegration framework introduced by Engle and Granger in 1987. The property of cointegration is used both to identify pairs, and to generate trade signals. Intuitively, the idea of cointegration as used in this book can be explained in the following way. Consider two series each consisting of two components: one random component and one non–random component. In addition, assume that the random component is common for both series. Then, by combining the series in a specific ratio, we obtain a new series only consisting of the non–random components. By applying the cointegration framework Vidyamurthy attempts to identify pairs of stocks where the random components cancel out. Pairs with this property would be attractive for a pairs trading strategy as their spread would be expected to stay at a constant value. The book is essentially a practitioner’s guide to pairs trading and offers no empirical results.

Elliot, Hoek & Malcom – “Pairs Trading” (2005).

In this paper the authors present an approach to pairs trading where the spread is modelled as a random variable with mean–reverting properties. Specifically, it is assumed that the spread approximately follows an Ornstein–Uhlenbeck process.

The approach offers some advantages. Because the spread is modelled as a variable with certain statistical properties it is possible to forecast time to convergence and probabilities for further divergence. The paper is purely theoretical and offers no empirical analysis of the approach.

Lin, McCrae & Gulaty. –“Loss protection in pairs trading trough minimum profit bounds: A cointegration approach” (2006).

The authors present a variant of the cointegration approach suggested by Vidyamurthy (2004). The cointegration coefficient (the slope in a regression between the two stocks in a pair) determines the ratio in which the stocks are bought or sold. Using this approach the necessary conditions for a trade to deliver a minimum profit is derived. The minimum profit is used to cover trading costs and profits. The empirical part of this study is limited to an analysis where one single pair of stocks is examined.

(14)

9 Engelberg, Gao & Jagannathan – “An Anatomy of Pairs Trading: The Role of Idiosyncratic News, Common Information and Liquidity” (2009).

Following the approach outlined by Gatev et al. (2006) the authors document significant excess returns. In addition, the paper aims to explain the factors affecting the returns. Four main findings are reported. First, the return to a trade is sensitive to the time passed between divergence and convergence. The return potential decreases exponentially with time after divergence. The authors introduce a rule where a position failing to converge in the first 10 days is closed automatically. This leads to an increase in profits from 70 bps per month to 175 bps per month. Second, it is shown that the profits to a trade are related to news affecting the companies. If the observed divergence is the result of firm specific news, the divergence is more likely to be permanent. The third observation shows that if the information shock is common to both stocks, then some of the profits to the trade can be attributed to differences in the time the market needs to adjust the prices to reflect the news. Fourth, profits are affected by owner structure and analyst coverage. If both stocks in a pair are owned by the same institutional investor the profits are reduced. Similarly, if the stocks in a pair are both covered by the same analyst, the returns are generally lower.

Do & Faff – “Does Naïve Pairs Trading Still work?” (2010)

In this study the authors attempt to replicate the results found by Gatev et al., (2006) by using the same dataset as in the original study. Their results do agree with those found in the original study with only minor discrepancies. In

addition, the authors expand the data sample to include observations up to the first half of 2008. In the subsample stretching from 2003 to 2008 the excess returns have declined to a point where they are essentially zero. The authors note that there seems to be an increased risk of non–convergence in this sub–

period, i.e. that the spread continues to widen after a position is opened.

Bowen, Hutchsinson & O’Sullivan – “High Frequency Equity Pairs Trading: Transaction Costs, Speed of Execution and Patterns in Returns” (2010)

This is one of very few academic studies we have found that examines a pairs trading strategy using high frequency data. Following the approach used by

(15)

10 Gatev et al. (2006), the authors analyze the year of 2007 in the UK stock

market. Moderate excess returns are documented. The returns are found to be highly sensitive to timing and transaction costs.

Hoel – “Statistical Arbitrage Pairs: Can Cointegration Capture Market Neutral Profits?” (2013)

Following a cointegration approach this paper back–test the performance of a pairs trading strategy over the years 2003 through 2013 in the Norwegian stock market. Hoel adopts the cointegration weighing approach proposed by Lin et al., (2006). The study shows that this implementation would have resulted in large losses, both cumulative and in most sub–periods.

George Miao – “High Frequency and Dynamic Pairs Trading Based on Statistical Arbitrage Using a Two–Stage Correlation and Cointegration Approach” (2014).

Using high frequency data from the US market Miao shows that pairs trading during 2012 and 2013 were extremely lucrative. The author reports that the strategy outperformed the S&P500 by 34 % over a 12–month trading period (before costs). The pairs formation procedure is divided in to two steps; in the first step, potential pairs are pre–selected based on their correlation coefficients.

In the second step, a test for cointegration is applied to identify the best pairs.

The selected pairs are then subsequently traded when deviations from the estimated relationship arise.

(16)

11

4. Different approaches to pairs trading

In this section we present the details of the most common pairs trading approaches.

4.1 The Cointegration approach

This approach relies on the statistical concept of stationary processes. Harris &

Sollis (2003, 27) defines a time series as stationary if the following three conditions are satisfied:

[ ] [ ]

[ ]

If the spread between two assets can be confirmed to follow a stationary process it is possible that the pair can be used successfully in a pairs trading strategy. By satisfying condition 1 the expected value of the spread is constant at all times. This implies that the value of the spread is expected to revert to the mean should a deviation occur.

Next, consider two series that are both integrated of first order ( ) and therefore non–stationary³. Generally any linear combination of two such series would also be and non–stationary. However, if the series share common stochastic trends it might be possible that some linear combination of the series could result in a stationary series. In that case the stochastic terms cancel out and we are left with the stationary part. This concept is referred to as cointegration. A more formal definition of cointegration found in Lin et al. (2006) is quoted below.

Let be a sequence of I(1) time series. If there are nonzero real numbers such that becomes an I(0) series, then are said to be cointegrated.

This concept is essentially what we want to exploit in pairs trading. We want to identify stocks that are exposed to some set of common factors so that their relative

3 A non-stationary series is a series not meeting the conditions for a stationary process. If a non-stationary series is integrated of order 1 the series must be first differenced once in order to become a stationary series. When first differencing a series the previous observation is subtracted from the current one. This yields a new series consisting of the period-to-period changes.

(17)

12 valuations can be reasonably well described as a fixed relationship. The prices of the stocks are then expected to follow similar paths and thus yield a stationary, and therefore mean–reverting, spread.

The standard framework for evaluating cointegration and estimating the linear relationship between stocks is based on regression analysis. The regression takes the form of equation 4.1. As discussed in section 2.1 the raw spread between two stocks is unsuitable as an indicator of the relative value development. Recall that the spread between two stocks is not expected to stay at a constant level even if the stock returns are identical. In the cointegration framework, this problem is addressed by using the natural logarithm of the prices instead of raw prices. This transformation ensures price level independency. See appendix A for further details on the transformation.

4.1

The slope coefficient is referred to as the cointegration coefficient between the two securities. In economic terms, is the expected percentage increase in the price of stock A when the price of stock B increases with one percent. This translates to the expected return in stock A over some period, given the return in stock B over the same period. Vidyamurthy (2004, 106) argues that should be interpreted as a premium that the investor receives for holding one unit of stock A instead of units of stock B. It would also be possible to interpret in a purely technical sense without any economical meaning. A third option is to run a regression with no intercept.

After obtaining the coefficient estimate the spread between the two securities is defined as

̂ 4.2

(18)

13 Given that the estimated relationship in 4.1 is valid, the variable will be a stationary zero mean random variable. Notice that in equation 4.2 is equal to in equation 4.1.

Practically, we can test for cointegration by analyzing the residuals resulting from the regression described in equation 4.1. The residuals are tested for stationarity by using an appropriate test such as the Dickey–Fuller test. In the Dickey–Fuller test we attempt to describe the time series as an process⁴ and then test for a unit root.

4.3

If in equation 4.3 and the results are significant⁵, we conclude that the series is stationary.

The specific test of cointegration outlined here is usually referred to as the Engle–

Granger two–step approach, and is widely used to determine cointegration between two variables. While the Engle–Granger procedure is simple and seems to be preferred in previous literature, other tests for stationarity are possible.

Vidyamurthy (2004) analyzes the resulting series of residuals obtained from the regression in equation 4.1. The number of times the series crosses the mean are measured. A high number of crossings are interpreted as evidence for stationarity.

If the results from the stationarity test indicate cointegration the pair is selected for trading. In this step we monitor the value of the spread . Any value except zero indicates a departure from the relationship estimated in 4.1. If the deviations exceed some threshold value q a position is opened. If then, according to our estimates, stock A is overvalued compared to stock B. The trader then opens a short position in stock A and a long position in stock B. Conversely, if then

4 An process is a process where the current value of the series is dependent on the previous value.

. If we have that and the series is a random walk.

5 The standard critical values do not apply in the DF test. Instead one must use custom critical values valid for use with the test.

(19)

14 stock A is relatively undervalued compared to stock B and the opposite positions are entered. The positions are closed when decreases to a value lower than some threshold value.

Vidyamurthy (2004, 75) makes an attempt to link the cointegration method to the arbitrage pricing theory (APT)⁶. It is argued that the cointegration coefficient should be interpreted as the relative risk factor exposure in the two stocks. So that one unit of stock A exposes the investor to the same amount of systematic risk as units of stock B. Do et al., (2006, 6) criticizes this argument by pointing out that it does not account for the risk free rate of return in a way consistent with the APT.

Specifically, according to the APT an investor holding units of stock B will receive units of the risk free return in addition to any return due to systematic risk exposure. On the other hand, an investor holding one unit of stock A will receive one unit of the risk free return in addition to the return due to systematic risk exposure. Equations 4.4 and 4.5 illustrate the problem.

Assume that

4.4

Where is the excess return to the systematic risk factors, the sensitivity to those factors and the risk free return.We now compare a position equal to one unit of stock A to a position consisting of units of stock B.

4.5

We can see that the return to position A will differ from the return to position B due to the difference in the risk free returns components.

6 Stephen Ross introduced the APT in 1976 and argues that stocks are exposed to various systematic risk factors, and that the development in these factors dictates the returns to individual stocks. It then follows that stocks with identical factor exposures should have identical returns. If this is not the case then arbitrageurs would exploit this and thus eliminate the deviation. For more information on the APT we suggest that the reader consult the original paper by Ross (1976).

(20)

15 In the light of the discussion above, we must point out that the cointegration approach is not the only pairs trading method with little support in current asset pricing models. Sparse support in such models is common for all existing pairs trading methods. If a pairs trading strategy is able to generate excess returns this could be an indication that the current asset pricing models fail to capture all sources of systematic risk.

4.2 The Distance approach

The distance approach is the most commonly used method in previous academic literature. Gatev et al., who introduced this method in their 1998 study, explain that it is based on conversations with traders actively applying a pairs trading strategy.

Pairs are identified by calculating the sum of squared differences between normalized stock prices over some time period. The pairs are then ranked in descending order, based on their sum of squared differences. The procedure for calculating the sum of squared differences is shown in equation 4.6.

∑

4.6

Where is the cumulative sum of squared differences between the normalized prices. refers to the normalized stock prices. The pairs with the lowest sums will be the pairs with the highest degree of comovement and thus be the pairs with greatest potential for use in a pairs trading strategy.

A property of this approach is the implicit assumption of return parity; I.e. this method matches stocks that yield the same return in the same period. This point is sometimes mentioned as a weakness of this method (Do et al., 2006). On the other hand the authors of the mentioned study point out that the nonparametric nature of this approach leaves less room for estimation errors than more complex methods.

It is important to note that while cointegration is not explicitly tested in equation 4.6, the distance approach also relies on the cointegration property. (Gatev et al., 2006) argues that most, if not all, high–potential pairs identified, will be pairs of

(21)

16 cointegrated stocks. Theoretical justification for this assertion is found by assuming a pricing framework where asset prices are driven by the development in common non–stationary factors. Bossaerts & Green (1987) and Jagannathan &

Viswatnathan (1988) are cited as examples of such pricing frameworks. The pairs with the lowest sums of squared differences are expected to be pairs with near–

equal exposure to the same systematic factors. The pairs will therefore move together in a fashion that leads to cointegration.

Assuming that an attractive pair is identified, we shift to the trading period. In this step the spread between the securities is calculated and monitored continuously. If the spread exceeds some predefined value a position is opened.

If

If 4.7

The position is closed when the spread converges to a value equal to a predetermined closing condition. In previous literature a position is often opened when the spread deviate by more than two standard deviations as measured over the formation period. (Gatev et al., 2006; Do & Faff, 2010). In the mentioned studies the position is closed at the next crossing of the normalized price series.

Naturally, a higher threshold for entering a position would yield a higher profit per trade than a lower value. On the other hand, a lower threshold–value will lead to more trades, potentially increasing the total profits. It is therefore difficult to determine whether total profits increase or decrease with higher threshold values.

4.3 The Stochastic approach

In this framework the spread between two stocks is modelled as a stochastic variable with mean reverting properties.

Previous literature does not offer any guidelines describing how to identify potential pairs. Instead it is assumed that such a pair is already identified. The

(22)

17 pairs identification could be done qualitatively, by choosing stocks with similar fundamental characteristics. Alternatively, one of the two previously discussed formation methods (the distance approach and the cointegration method) could be used.

Assuming that a pair of appropriate stocks is identified we provide an outline of the method. The approach assumes a continuous time framework where the spread is modelled as an Ornstein–Uhlenbeck process. The Ornstein–Uhlenbeck process is often described as the continuous time counterpart to the discrete process⁷ (Neumaier & Schneider, 1998). The spread is defined as the difference in the prices⁸ between stock A and stock B. This difference is assumed to be driven by a state variable that follow a state process described in equation 4.8.

√ 4.8

Where , and are constants, and is a Gaussian noise term with zero mean and a standard deviation of one.

will then be a normally distributed variable with the following properties

4.10

With

4.11

And

4.12

If the value of is such that . Equation 4.8 can also be represented as

7 For a description of the AR(1) process, see footnote 4.

8 In a review of the method Do et al., (2006) stresses the importance of using log-transformed prices in order to achieve a price-level independent spread.

(23)

18

4.13

Where and √ .

We then let with { satisfying the stochastic differential equation

( ) 4.14

where is a standard Brownian motion. The parameters of the Ornstein–

Uhlenbeck process (A, B and C in equation 4.13) can now be estimated. The parameters can easily be estimated using OLS (an example is provided in appendix B). However, the previous literature also employs more sophisticated iterative algorithms such as the Kalman Filter⁹. The variable is then normally distributed with an expectation conditional on the previous observation. Appendix C shows the calculation of the conditional expectation and the conditional probability distribution.

The actual and observable spread is then assumed to be equal to the state variable plus some noise.

4.15

Where and is . The spread is therefore also normally distributed with the same expectation as .The trader can now compute the conditional expectation of given . A position is opened if the current spread deviates significantly from the estimated value.

Do et al. (2006) point out that this method offer several advantages. Because the spread variable is assumed to be normally distributed the variable captures the property of mean reversion explicitly. Furthermore, it facilitates forecasting, allowing the trader to calculate, amongst other things, the estimated time until

9 An iterative algorithm taking a series of noisy measurements observed over time as input and returns an estimate of the true value. The estimate at time t is a weighted average of the prediction given the observation at t-1 and the actual observation at time t. The method is named after Rudolf Kalman who developed the procedure.

(24)

19 convergence. On the other hand, the mentioned authors also point out that this method, like the distance approach, assumes return parity. A perhaps more fundamental involves the implicit assumption of normality in the spread between two equities. This assumption does not hold empirically (Steele, 2014).

Despite the mentioned advantages, the stochastic approach has not been tested much empirically. We have not been able to find papers empirically testing the stochastic approach.

5. Simulation testing and choice of approach

In this thesis we will focus on the distance and cointegration approaches. The methods both offer a complete strategy for pairs trading, including an algorithm for pairs selection. Furthermore, both approaches have been tested empirically and have been found capable of delivering excess returns (Gatev et al. 2006; Miao, 2014;

Bowen et al., 2010).

In this part of the thesis we generate simulated data. We use this data to test the performance of the distance method versus the cointegration approach. Based on the results from the testing we select the approach to be used in the empirical part of this paper. The tests we conduct will focus only on the procedures for pairs selection. The trading step of a pairs trading strategy requires us to specify several different parameters¹⁰. Determining the superior approach is then difficult as each parameter configuration will yield different, and perhaps contradictory, results. On the other hand, the pairs selection algorithms proposed in the previous literature yield unambiguous results without requiring us to specify any parameter values.

We therefore base our choice of method by examining the performance of the two selection procedures.

10 Parameters are amongst others: entry and exit thresholds, relative weight of capital placed in the long and short leg of a portfolio, number of pairs traded etc.

(25)

20 We generate simulated data using two different models. Common for both models is that the generated simulated stock pairs are moving according to a time–invariant relationship. The stock pairs generated will therefore mimic real world stock pairs that would be suitable for pairs trading.

In the first model we produce pairs of stocks where the cumulative returns are in parity. (I.e. if stock A increases by 1% over some time period this is also expected to be the increase in stock B). In the second model we allow for some pairs to have returns that are not in parity.

Both methods perform satisfactory under the first setup with the distance approach slightly outperforming the cointegration method. Using data generated with the second model we find that the distance approach still performs well but slightly less so than the cointegration approach. However, the most important insight resulting from the simulation testing is not directly related to the pairs ranking. The results show that the cointegration coefficient estimates appear to be very sensitive to noisy data. The estimates quickly deteriorate as the level of noise increases. Based on the results found in this part we decide to use the distance approach in the empirical part of this thesis. In the following section we present the procedure for the simulation testing and discuss the results in detail. The python code used to implement the pairs identification procedures is found in appendix D.

5.1 Model one – The granger representation theorem

The data used in the first part of the simulation testing is generated using a set of equations commonly referred to as the Granger representation theorem.

We have

5.1

Where and are the rates of error correction, specifying at which rate the series return to the equilibrium after deviations occur. specifies the equilibrium

(26)

21 distance between the two series. For simplicity we set and meaning that the equilibrium distance between the series is zero. Furthermore, is always equal to . The error terms are two normally distributed random variables with zero mean and a standard deviation of one. We examine different specifications for the error terms. In the base case the terms are uncorrelated. In the alternative cases we specify various levels of correlation between the terms. All series generated will consist of an arbitrarily selected number of 5 000 observations.

We define ten groups each consisting of ten simulated pairs. The groups are distinguished by only containing pairs with a specific value of . The sets of error terms are common for all groups. This means that the first pair in the first group will use the same set of error terms as the first pair in the second group and so on.

Therefore, the first pair in any group will be identical to the first pair in any other group except for the different values of . The same is true for the second, third etc. pairs in each group. This is important as it ensures a consistent ranking by isolating the impact of changing the error correction rates.

Table 5.1 – Simulation parameter values

Group No. of pairs in group

1 0.001 10

2 0.005 10

3 0.01 10

4 0.02 10

5 0.05 10

6 0.1 10

7 0.2 10

8 0.3 10

9 0.4 10

10 0.5 10

Notes: This table presents the parameter configuration for the tests on data generated by the modified granger representation model. We test the methods for pairs formation by their ability to assign the pairs to their respective groups.

Error correction rate. Determines the magnitude of the error correction that follows deviations from the equilibrium between the series. Note that .

We will vary the error correction rate according to the table above. Recall that so we are effectively varying both and . The maximum value we

(27)

22 specify for the error correction rates 0.5. This value results in the closest possible relationship¹¹ between the stocks in a pair. Values lower or higher than 0.5 leads to under– or over–corrections compared to the case where .

The outlined setup results in a total of 100 pairs. We apply the pairs formation algorithms to the simulated pairs and observe which pairs that are identified as having the highest potential for pairs trading. Pairs with low values for is expected to be placed at the bottom of the ranking. The reason for this is that the series in a pair will only loosely follow each other when the error correction rates are low. In contrast, pairs with higher values for the error correction rates tend to follow each other closely and should be ranked higher. Therefore, applying the pairs ranking algorithms to the 100 simulated pairs should sort all pairs in descending order based on their error correction rate values. As we test ten different error correction rate values, this translates to assigning the pairs to ten different groups

5.1.1 Results

The complete ranking of the pairs is found in Appendix E. The distance approach places all 100 pairs in the expected groups with no exceptions. The cointegration approach also assign all pairs to their respective groups when regressing series on series . However, reversing the variable ordering and regressing on turns out to misplace two pairs. This observation exposes an undesirable property of the cointegration approach; the ranking is sensitive to the ordering of the variables in the regression setup. In other words, regressing series on series gives a different test–statistic and cointegration coefficient than regressing on .

The above discussed problem of order sensitivity is previously discussed in Hoel (2013) and in Gregory (2011). We note that in the first paper the cointegration coefficient is used as the hedging factor. i.e. how much to go long/short in each stock in a pair. Hoel points out that if the OLS regression is used to determine the

11 This argument is based on simulation results. We run 1 000 000 trials changing the error correction rate randomly between 0 and 1 and found 0.5 to be the optimal value. Code and results are available on request.

(28)

23 coefficients, the resulting hedge factors¹² would be inconsistent. The solution suggested in the two mentioned papers is to replace the OLS regression procedure with a different procedure called orthogonal regression. The OLS yields an expression that minimizes the vertical distance to the fitted line. Using the orthogonal regression the perpendicular distance from the data–points to the fitted line is minimized. This results in invertible coefficients, and yield test statistics that are insensitive to variable ordering. A more elaborate description of the orthogonal regression procedure is found in Appendix F.

When using the orthogonal regression 98 out of 100 pairs are assigned to the expected groups. The same pairs that were “misplaced” when using OLS is misplaced also when applying the orthogonal regression.

In addition to the test setup described above, we explored several cases where the error terms in equation 5.1 were correlated. These cases are important to investigate as pairs formed from related stocks are likely to experience correlated idiosyncratic shocks. Consider the two soft drink producers Pepsi and Coca Cola. It is plausible that a negative event affecting Coca Cola will result in higher sales for Pepsi. An example of such a scenario could be a case where Coca Cola experiences sudden delivery problems. Obviously this would negatively impact the revenues of Coca Cola. At the same time it is possible that some consumers originally planning to buy Coca Cola instead buys Pepsi and therefore contribute positively to the revenues of Pepsi. We model such a relationship by allowing for correlated error terms. We found no significant changes when such correlations (both positive and negative) were specified. Both methods produced rankings identical to the case with uncorrelated error terms.

5.2 Model two – The Stock & Watson Common trends model

Looking at the results from the previous analysis it becomes apparent that the Granger representation theorem produces pairs of simulated stocks with parity in the returns. This conclusion is motivated by the obtained cointegration coefficient

12 In this setting this means that the positions taken in each stock a pair would be different in the case where the trader regress stock on stock compared to the case where stock is regressed on stock .

(29)

24 estimates (appendix E). The cointegration coefficient values are essentially one for the majority of the pairs. The interpretation of the slope variable in a log–log regression is the change expected in conditional on the change in . A cointegration coefficient equal to one therefore implies that the cumulative returns of the simulated stocks are in parity. This could be a problem for the validity of our results. Do et al. (2006, 4) criticizes the distance approach for implicitly assuming return parity between the stocks in a pair. It is claimed that this assumption is a serious limitation of the method and that the distance approach only is able to identify pairs with parity in the returns. Our setup, producing only pairs with parity in returns, might therefore unintentionally be biased in favor of the distance approach. For that reason it is necessary to explore how the methods fare when we allow for pairs to have returns related in other ways than in a one to one ratio. For this purpose we will use a modified version of a model commonly referred to as the Stock & Watson common trends model (Vidyamurthy 2004).

The original Stock & Watson model is shown in equation 5.2. The series both consist of a common nonstationary component and an individual stationary component and .

5.2

The series share the common trend but their exposure to the trend vary depending on the values of and . This implies that it is possible to construct a new series by forming a linear combination of and such that the common non–

stationary component cancels out. This would leave us with a series consisting only of the stationary components.

(

= 5.3

In order for the common trend to cancel out we must have that

(30)

25

5.4

By setting equal to the value found in 5.4, we are left with only the components.

5.5

Recall that a stationary series has the property of mean reversion. Two stocks that can be combined to produce a stationary spread therefore have the potential to be used for pairs trading.

As previously mentioned we will use a slightly modified version of the Stock &

Watson model. The modifications enable us to control how closely the stock returns are related. The model we use is specified in equation 5.6.

5.6

Where is a non–stationary trend that is specific for each series. is a constant.

As in the original model is the common non–stationary trend. The stationary component of the series is equal to a constant plus some Gaussian noise:

. This model enables us to control the return relationship between the simulated stocks by adjusting and . In addition we can control the strength of the relationship between the stocks by increasing or decreasing the sensitivity to the non–stationary factors. This is done by specifying different values for and . As in the previous test we will create a total of 100 pairs with 5 000 observations in each series. The table below shows the parameter configuration setup used to generate the data. The parameters not specified in the table are discussed separately.

(31)

26 Table 5.2 – Simulation parameter values

= Group Number of

pairs

0.1 1

0.1 1 5

0.5 2 5

1 3 5

2 4 5

3 5 5

0.25 1

0.1 6 5

0.5 7 5

1 8 5

2 9 5

3 10 5

0.5 1

0.1 11 5

0.5 12 5

1 13 5

2 14 5

3 15 5

1 1

0.1 16 5

0.5 17 5

1 18 5

2 19 5

3 20 5

Notes: This table presents the parameter configuration for the tests on data generated by the modified Stock & Watson common trends model. We test the methods for pairs formation by their ability to rank the pairs according to the setup in this table.

Sensitivity to the non–stationary factor common for both simulated stocks Sensitivity to the non–stationary factor specific for the simulated stock.

As seen in the table above, the sensitivity to the common factor in series is always set to one ). Therefore, as an example, if we set and run a regression on the form of equation 4.1¹³, we expect the estimated value of to be close to 0.5.

Likewise, we expect if we reverse the order of the regression.

The non–stationary terms and are the cumulative sums of a series of random variables that are . Finally we set and = . This means that the stationary component for both series oscillates around a value of 1 000¹⁴. The second and third components of the series are the random walk elements introduced by the two non–stationary terms. Five sets of noise are generated and used in all groups. As discussed earlier this is helpful when trying to isolate the impact resulting from changes in the parameters. This setup will generate pairs with close relationships when the exposure to the individual non–

stationary factors and are low.

13 That is

14 The value 1 000 is selected in order to avoid negative observations.

(32)

27 5.2.1 Results

The cointegration approach correctly identifies all the pairs least sensitive to the specific non–stationary factors i.e. the most closely related series. However, as the amount of noise added to the series increases we observe some deviations.

Generally, the approach seems capable of identifying pairs, even when the returns are not in parity. Observing the results from the distance approach we find that pairs with and low sensitivity to the non–stationary terms are ranked as the most promising candidates for trading. This is not surprising; recall the criticism put forward by Do et al. (2006). The way the ranking algorithm in the distance approach is constructed leads it to rank pairs with parity in the returns higher than pairs with no such parity. However, in spite of the parity assumption the distance measure also returns a decent ranking of the pairs not in parity. As in the previous case, specifying correlations between the error terms does not change the results. The complete ranking is found in appendix G.

5.3 Consequences of inaccurate cointegration coefficient estimates

Analyzing the results from the tests we notice an unexpected property of the cointegration coefficient estimates. The cointegration approach produces good estimates of the coefficient when the level of noise added to the series is low.

However, as the noise level increases the estimates quickly deteriorate. When we examine the bottom half of the ranking tables, (appendix E and G) we see that several of the coefficient estimates are very poor. The most extreme cases are observed when applying the orthogonal regression; many of the estimates have the wrong sign. In addition, some of the estimated coefficients have two–digit values and are clearly absurd. The results using OLS is somewhat better but a significant fraction of the estimates still have the wrong sign and are generally inaccurate¹⁵.

When practically implementing a pairs trading strategy the trader would use financial time series to determine the relationship between the stocks in a pair.

15 It is important to notice that the inaccuracies are observed even if the Dickey Fuller tests show that the series are highly cointegrated with test statistic values below -20.

(33)

28 Even in the case where two stocks are identically exposed to the same risk–factors it is reasonable to assume that the observed relationship between the stocks would be a noisy realization of the underlying true relationship. This is due to random idiosyncratic shocks¹⁶ affecting stocks. As our results show, noisy data might result in inaccurate estimates of the cointegration coefficient.

When basing a pairs trading strategy on the cointegration approach the cointegration coefficient estimate is crucial. The estimate is used in all steps of the process. A signal instructing the trader to open a position is generated when the value of the spread exceeds the threshold value. The amount of capital placed in stock A relative to stock B will be dictated by the estimated coefficient value.

Finally, the position is closed when the spread decreases to a level below the exit threshold. Imagine what would happen should the estimated coefficient be incorrect: Entering and exiting positions would be dictated by a false or inaccurate relationship. In addition the ratio of which the stocks are bought and sold would also be determined by an invalid relationship. It is easy to imagine that this would lead to unexpected results, and perhaps significant losses.

5.4 Choice of method

Given the ranking results produced by the two methods and the observed inaccuracies of the coefficient estimates, we decide to use the distance approach for the empirical analysis. This has two potential consequences; at one hand we might exclude possible profitable trading opportunities because we risk overlooking pairs without parity in the returns. On the other hand, given that the distance approach requires less parameter estimates, we reduce the risk of losses resulting from inaccurate estimates.

16 This could be market shocks such as liquidity shocks or business related shocks such as fires, equipment malfunction etc.

(34)

29

6 Testing a high frequency pairs trading strategy

In the last part of this thesis we back–test a pairs trading by strategy applying the distance approach described in the theoretical part. We first discuss the setup of the test before presenting the results.

6.1 Data

Our dataset consists of three months of high frequency intraday data. The sample period starts at January 2^nd 2014 and ends on March 31^st. During this period there were a total of 63 business days. The universe of stocks considered are the stocks listed on the OBX¹⁷ index. The OBX index consists of the largest 25 stocks listed on the Oslo Stock Exchange. This results in a total of 276 possible pairs. Considering only the largest stocks ensures that the stocks we use will have an adequate level of liquidity. The liquidity of the stocks selected for trading is a critical factor when testing any strategy that involves selling shares short. In low liquidity stocks there are often a low supply of shares made available for shorting. This could make a seemingly profitable opportunity impossible to exploit.

All data were downloaded daily from the Norwegian web broker Netfonds trough an automated process. The data were manually adjusted for dividends and corporate actions (see appendix H for a complete list of the adjustments made).

The raw data is listed in a chronological Tick–by–Tick format. The nature of the lists is such that whenever a change in the price quotes occurs, a new entry is appended. Each entry consists of a timestamp, the bid price and the ask price. The uneven update frequency of the lists leads to two practical problems. First, the

17. One stock – pharmaceutical company Algeta – is excluded from our universe. The background for this exclusion is an offer to buy all shares in Algeta for 362 NOK per share put forward the 19^th of December 2013. This offer effectively pegged the price of Algeta stock to a small range just below the bid price. The trade volume also dropped significantly after this bid. This makes the stock unsuitable for pairs trading as the stock no longer is affected by factors common with other stocks but only reacts to news regarding the transaction. Due to the low trade volume in it is also unclear whether it would have been possible to short Algeta during this period. We therefore argue that a trader following a Pairs Trading strategy would have excluded Algeta from the possible pairs based on information available at the time. The bid was accepted in February 2014 and Algeta was eventually delisted in the beginning of March 2014.

(35)

30 price observations of the various stocks do not coincide in time; there can easily be a change in one stock at one time without a corresponding change in any other stock. Second, the number of observations will vary from stock to stock as there is a different number of price changes occurring in different stocks. Both these problems are however easy to overcome. The lists contain all changes happening over the trading day. This implies that if no change is recorded at a given point in time, the bid and ask prices offered must still equal to the prices in the previous entry. It is therefore possible to compare the observations and “stretch” out the lists by copying the last observation. This ensures that all stocks will have the same number of observations, and that the observations coincide in time. We illustrate the expansion procedure with a concrete example.

Example – Expanding lists of observations

Stock A Stock B

Time

(HHMMSS) Bid Ask Time

(HHMMSS) Bid Ask

090000 100 100.1 090000 51 51.5

090002 100 100.2 090001 51.1 51.4

090003 99.9 100.1 090004 51.1 51.2

… … … …

Lists with raw observations

In this example we have lists of observations of two fictional stocks. As we can see, the timing of the observations is not equal. While both stocks have an observation at time 09:00:00, only stock B has an observation one second later. Due to the fact that the lists are updated only when changes in quotes occur we can easily solve this problem. The information at time 09:00:01 in stock A must be equal to the quote at 09:00:00. We therefore create a new observation at 09:00:01 with identical values as the 09:00:00 observation. This process is repeated for both stocks and for all timestamps. The tables below show the lists of observations that would result when applying the process to the lists in this example.

Statistical arbitrage : high frequency pairs trading