• No results found

The Text Premium and Stock Returns

N/A
N/A
Protected

Academic year: 2022

Share "The Text Premium and Stock Returns"

Copied!
123
0
0

Laster.... (Se fulltekst nå)

Fulltekst

(1)

Master Thesis

Thesis Master of Science

The Text Premium and Stock Returns

Navn: Tobias Ingebrigtsen, Kristian Andersen

Start: 15.01.2019 09.00

Finish: 01.07.2019 12.00

(2)

The Text Premium and Stock Returns

Kristian Andersen

BI Norwegian Business School

Tobias Ingebrigtsen

BI Norwegian Business School

Thesis advisor: Iv´ an Alfaro

BI Norwegian Business School

June 30, 2019

Abstract

This thesis proposes a novel approach to portfolio sorting based on the following: (i) the time-varying structure of company filings, and (ii) its exposure towards a common text source. We construct a similarity measure which allows us to identify under- and over- performing stocks in a way such that we can construct portfolios with an increasing rate of return. We discover that a long-short strategy based on these portfolios will yield a significantly higher risk-adjusted return than the benchmark index of 10.05% annually, and is not accounted for by the risk-factors in the conventional five-factor model proposed by Fama and French (2015).

Keywords: machine learning, textual analysis, common risk-factors, latent dirichlet alloca- tion (LDA), natural language processing (NLP), computational linguistics

Email: kristian.andersen2@student.bi.no

Email: tobias.ingebrigtsen@bi.no

Email: ivan.alfaro@bi.no

(3)

Acknowledgement

We gratefully acknowledge the help from our supervisor, Iv´an Alfaro. Your guidance and will- ingness to quickly respond round the clock has been invaluable to the quality of our thesis. We would also like to thank Øyvind Norli for motivating the use of computational linguistics. Fur- thermore, we want to thank Espen Henriksen, Patrick Konermann, SeHyoun Ahn, Andrew Sang Jun Lee, Christian Bjørland, and Martin Blomhoff Holm for helpful discussions and comments.

Tobias: I want to acknowledge the exceptional patience the Modeling Unit of Norges Bank has shown while I have worked on my thesis, in particular, Kenneth Sæterhagen Paulsen and Karsten Gerdrup. Lastly, I want to thank Julie for your endless love and support.

(4)

Contents

1 Introduction 1

2 Related Literature 3

3 Methodology 6

3.1 LDA Topic Classification . . . 7

3.1.1 Formal Notation of LDA . . . 7

3.2 Sentiment Analysis . . . 10

4 Data 11 4.1 Data Sources . . . 11

4.1.1 U.S. 10-K and 10-Q Reports . . . 11

4.1.2 FOMC Reports . . . 11

4.1.3 Financial News . . . 11

4.1.4 Center for Research in Security Prices . . . 12

4.2 Data Structuring . . . 12

4.3 Assessment of the Data . . . 13

4.3.1 Assessment of U.S. 10-K . . . 13

4.3.2 FOMC Reports . . . 16

4.3.3 Financial News . . . 18

5 Constructing the Portfolios 20 5.1 The Similarity Measure . . . 20

5.2 Assessing the results . . . 21

6 Adding News to FOMC Corporas 21 7 The Long-Short Strategy 25 8 Diagnostics 26 8.1 Splitting the Sample . . . 26

8.2 Varying Amount of Topics . . . 27

8.3 Varying Amount of Portfolios . . . 28

(5)

8.3.1 5 portfolios . . . 28

8.3.2 15 portfolios . . . 29

9 Fama-French Multifactor Models 29 9.1 Fama-French Three-Factor Model . . . 30

9.1.1 10 portfolios . . . 30

9.1.2 5 portfolios . . . 30

9.2 Fama-French Five-Factor Model . . . 31

9.2.1 10 portfolios . . . 31

9.2.2 5 portfolios . . . 32

9.3 Interpreting the results . . . 33

10 Conclusion 37 10.1 Open questions . . . 37

11 Appendix A 78 11.1 Norwegian Data . . . 78

11.1.1 Norwegian Annual Reports . . . 78

11.1.2 Norwegian Quarterly Reports . . . 78

11.2 U.S. Data . . . 79

11.2.1 U.S. 10-K Reports . . . 79

11.2.2 U.S. 10-Q Reports . . . 80

11.3 Data Collection . . . 81

11.3.1 Web Scrape . . . 82

11.4 Assessment of Data - A few interesting observations . . . 82

12 Appendix B 83 12.1 Sentiment Analysis Predictability . . . 83

13 Appendix C 89 13.1 Firm-Specific prediction model . . . 89

14 Appendix D 101 14.1 Portfolio, Semi-Annual Frequency . . . 101

(6)

List of Figures

1 Textual Analysis & Finance (Maximilian Rohrer, 2018) . . . 1

2 The graphical model for LDA. Illustration: David M. Blei. . . 8

3 Visualization of 10-K LDA topics . . . 14

4 The widths of the gray bars represent the corpus-wide frequencies of each term, and the widths of the red bars represent the topic-specific frequencies of each term 15 5 Visualization of FOMC Reports . . . 17

6 FOMC Report Visualized . . . 18

7 LDA Topics Financial News - t-SNE . . . 19

8 Portfolio Returns without News . . . 44

9 The cumulative returns for the Long-Short strategy against the S&P 500 Total Return Index . . . 60

10 Portfolio Returns with News . . . 61

11 Portfolio Returns on the first sub-sample (7.22.1999 - 02.24.2008) . . . 63

12 Portfolio Returns on the second sub-sample (07.15.2008 - 07.13.2018) . . . 64

13 LDAvis 100 Topics . . . 65

14 LDAvis 25 Topics . . . 66

15 LDAvis 15 Topics . . . 67

16 Portfolio Returns on the model using 100 topics. . . 71

17 Portfolio Returns on the model using 25 topics. . . 72

18 Portfolio Returns on the model using 15 topics. . . 73

19 Portfolio returns with 5 portfolios, no news added . . . 74

20 Portfolio returns with 5 portfolios, with news . . . 75

21 Portfolio returns with 5 portfolios, no news added . . . 76

22 Portfolio returns with 5 portfolios, with news . . . 77

23 Bloomberg headline . . . 82

24 Illustration of relationship between Equinor’s stock price, sentiment of the annual reports, and brent spot price. . . 84

25 Sentiment from both the transcript and report, including the stock price. . . 85

26 Jaccard’s Distance . . . 98

27 Equinor Prediction Results . . . 99

28 Hydro Prediction Results . . . 99

(7)

29 Telenor Prediction Results . . . 100

30 Marine Harvest Prediction Results . . . 100

List of Tables

1 LDA Equinor 2010-2017 . . . 9

2 LDA Equinor 1980-1985 . . . 9

3 Value-Weighted with News - Monthly. . . 23

4 Equal-Weighted with News - Monthly. . . 24

5 Value-Weighted with News - Monthly. . . 28

6 Fama-French Three-Factor - monthly frequency . . . 31

7 Fama-French 5 Factor - monthly frequency . . . 33

8 FF5 coefficients for the 10 Value-Weighted portfolios. . . 35

9 FF5 coefficients for the 5 Value-Weighted portfolios. . . 36

10 Value-Weighted without News. . . 45

11 Equal-Weighted without News . . . 46

12 Equal-Weighted with News . . . 47

13 Value-Weighted without News . . . 48

14 Equal-Weighted without News . . . 49

15 Value-Weighted with News . . . 50

16 Equal-Weighted with News . . . 51

17 Value-Weighted without News . . . 52

18 Equal-Weighted without News . . . 53

19 Results from Long-Short strategy with 10 portfolios . . . 54

20 Results from Long-Short strategy with 5 portfolios . . . 55

21 Results from Long-Short strategy with 15 portfolios . . . 56

22 Results from Long-Short strategy with 10 portfolios . . . 57

23 Results from Long-Short strategy with 5 portfolios . . . 58

24 Results from Long-Short strategy with 15 portfolios . . . 59

25 Portfolio performance on each sub-sample. . . 62

26 Portfolio returns on using 100 topics . . . 68

27 Portfolio returns on using 25 topics . . . 69

28 Portfolio returns on using 15 topics . . . 70

(8)

29 Granger causality test, S&P500 . . . 87

30 Granger causality test, firm level . . . 88

31 Prediction Results OBX . . . 94

32 Industrial Predictions - S&P500 . . . 95

33 Firm Specific Prediction - S&P500 Companies . . . 96

40 Value-Weighted with News Semi-Annual. . . 101

34 FF5 coefficients for the 10 Value-Weighted portfolios. Results are unaffected by introducing heteroskedasticity and autocorrelation robust (HAC) standard errors. 102 35 FF5 coefficients for the 5 Value-Weighted portfolios. Results are unaffected by introducing heteroskedasticity and autocorrelation robust (HAC) standard errors. 103 36 Value-Weighted with News Semi-Annual. . . 104

37 Equal-Weighted with News Semi-Annual. . . 105

38 Value-Weighted without News Semi-Annual. . . 106

39 Equal-Weighted without News Semi-Annual . . . 107

41 Equal-Weighted with News Semi-Annual. . . 108

42 Value-Weighted without News Semi-Annual. . . 109

43 Equal-Weighted without News Semi-Annual. . . 110

44 Value-Weighted with News Semi-Annual. . . 111

45 Equal-Weighted with News Semi-Annual . . . 112

46 Value-Weighted without News Semi-Annual . . . 113

47 Equal-Weighted without News Semi-Annual . . . 114

48 Fama-French 5 Factor - semi-annual frequency . . . 115

49 Fama-French 3 Factor - semi-annual frequency . . . 115

(9)

1 Introduction

In recent years, one can easily identify a sharp increase in the interest in extracting the value of information embedded in textual data sources. Alongside, there has been an expansion in the coverage of newly minted techniques by many kinds of literature, which elevates the quality of model performance to the next level by incorporating textual data into their analysis. Re- cent publications (Larsen & Thorsud, 2015; Larsen & Thorsrud, 2017; Luss & Aspremont, 2009;

Bollen, Mao & Zeng, 2011) find that there is predicting power for key economic variables, asset prices and volatility in the news and other text-based sources. The recent progress in machine learning techniques allows us for a thorough analysis of text, and to extract meaningful infor- mation. In our thesis, we will show how this information can be extracted from company filings (10-K and 10-Q) to measure sentiment and to identify topics in the text. Further analysis of the text follows, and we see how the text reveals company events and how this can help in predicting further price movements.

Figure 1: Textual Analysis & Finance (Maximilian Rohrer, 2018)

It is well documented that exposure towards certain factors can explain the cross-section of returns (Lintner, 1965; Fama & French, 1993; Fama & French 2015). However, despite the fact that there have been attempts to investigate the text-equivalent topics in the literature (e.g., Gao, 2016), there are still more needs to explore the topic both theoretically and empirically

(10)

rigorously. In his 2016 paper, Gao constructs a document risk score by measuring the covariance between words appearing in the form of management discussions and analysis (MDA) and stock returns and shows that by introducing a text-factor he was able to improve on the standard Fama- MacBeth cross-sectional regression. However, we want to investigate if the similarity between company filings and a common text source can explain the cross-section of stock returns. To decide what this common source should be is challenging. In the well-known Capital Asset Pricing Model (CAPM) the common source of risk is said to be the market risk, and that the expected risk premium for any risky asset is the market price of risk times the loading of market risk,

EtRiiλt,

whereβiis the loading of risk, andλis the market price of risk (the market risk-premium). This simplistic model is beautiful because of the clear economic intuition it comes with, that variation in the market is the common source of risk, and that assets should be priced relative to its loading of this systematic source of risk. Unfortunately, despite its beauty, the CAPM does not hold in the data (Black Scholes, Jensen, 1972; Ross, 1978; Fama & French, 2003; Dempsey, 2013). Since the CAPM came to life in the mid-’60s, there have been large advancements in computational technology, which has allowed researchers to explore more combinations of factors. Ultimately this has led to countless papers based on arbitrage pricing (APT models), and perhaps the most well-known is the multifactor models proposed by Eugene Fama and Kenneth French. One of these models is the Fama-French five-factor model,

EtRi=biEtRm+siSM B+hiHM L+riRM W+ciCM A.

This model holds better in the data, but some of the beautiful economic intuition and simplicity has been lost. The common component in these models is that loading on factors should be priced in all assets. However, these models are based on quantifiable numbers. By applying tools of computational linguistics, we want to explore whether differences in the cross-section of stock returns can be explained by their loading on a common text factor. We have investigated the appropriateness of Federal Open Market Committee reports as this common text source and show how the results can be improved by adding 2.1 million financial news headlines. Our results contribute to the literature on seeking systemic risk-factors that explain the cross-section of returns. This is one of the building blocks of finance which draws massive attention, as practitioners widely use it and deeply investigated by researchers in the modern-era of asset pricing. Our results also contribute to the expanding literature on textual analysis in finance. In

(11)

this literature, researchers have looked for evidence of a text-implied risk embedded in certain sections of the company filings. We will show that by using the entire filings can help to identify deeply underperformed stocks, and to the best of our knowledge are the first to do so.

We begin with a thorough discussion of related literature. Section 3 introduces Latent Dirich- let Allocation to which we will use to extract topics from text and a sentiment measure. We will show how the latter is connected to stock returns by a Granger causality test. Section 4 gives a complete description of the data, both for Norwegian listed firms and for U.S. listed firms.

In Section 5-6, we introduce the similarity measure and explain how portfolios are constructed and sorted. Section 7 takes on the long-short strategy and show how portfolio sorting can be used to generate a return which has a higher risk-adjusted return than the benchmark index.

Section 8 robustness checks our results, and Section 9 tests whether the returns are abnormal by running the Fama-French three & five-factor model. In Appendix A, we give a much more thorough explanation of the data and show how the LDA model can capture news events. In Appendix B we run an experiment to show that sentiment measured in financial news has pre- dictive power, and in Appendix C we propose a method to build firm-specific models to trade on intraday movements based on company filings filed the same day. Finally, the code written for this project is extensive, and everything is available on GitHub.1

2 Related Literature

Prediction of future stock price movements has been a prominent matter of many previous em- pirical studies. The Random Walk Theory (Malkiel, 1973) states that a purely stochastic process determines prices, and as such, predicting them is not possible. Followingly, it is impossible to beat the market. However, with advances of machine learning tools, it has been proven em- pirically that stock prices are predictable (Bondt and Thaler, 1985; Jegadeesh, 1990; Lo and MacKinlay, 1990; Jegadeesh and Titman 1993).

Financial markets and the prediction of asset prices present a very complex issue, as financial time series are inherently noisy, non-stationary, and deterministically chaotic (Kumar et al., 2006). If an investor were able to predict the movement of stock prices accurately, it might yield vast profits for the investor. However, due to the complexity of stock market data, the development of useful models for predicting stock prices is considered to be very complicated (Kara et al., 2011). The literature presents several attempts to develop an efficient model for

1https://github.com/tobiasi/TextPremium. Published under the MIT license.

(12)

predicting the stock price, where Kara et al. (2011) found that the artificial neural networks had the best average performance and was found to be significantly better than the other models.

More recent work (Das and Chen, 2007; Tetlock, 2007; Tetlock et al., 2008; Si et al., 2013;

Xie et al.,2013; Wang and Hua, 2014) has applied Natural Language Processing techniques to help analyze the effect of web texts on stock market prediction, finding that events reported in financial news are essential evidence to stock price movement prediction.

Asset pricing models often program the appearance of new information by a jump process, such as the Poisson Process (Gardiner, 2009), although the characteristics of the underlying jump process are weakly related to the underlying source of information, if at all (Luss & Aspremont, 2009). However, Luss & Aspremont (2009) managed to show empirically how text from news articles can be used to predict intraday price movements of financial assets using support vector machines. From these results, they were able to observe that while the direction of returns is not predictable using either text or returns, the size of the returns is. Also, adding the text classification factor produced significantly improved performance compared to historical returns all by itself. However, we believe there is a shortcoming to their approach. The issue is that they used general news as their text source and that their method builds upon Bollerslev (1986), by using the news-data as an additional component in a GARCH-type framework. It is well-known that volatility clustering models such as GARCH do not take stand on the direction of returns, but rather the general magnitude. In Appendix C we run an experiment where we confirm that utilizing company filings as well as financial news give us a better understanding of the company’s current situation, thus giving us more refined information set in which we can use to judge what direction the stock price should take on the same date as the filing is made public. Furthermore, Luss & Aspremont points explicitly out the selection of choosing the words in the dictionary as an essential decision for further research, which may be solved by implementing a financial dictionary, such as the ones developed by Loughran & Mcdonald (2011).

In the previous decades, we have witnessed a vast increase in the number of documents and the digitization of these documents. Managing this increase requires new tools for automatically organizing, searching, indexing, and browsing extensive collections of data (Blei & Lafferty, 2006). As machines advances in complexity, the research in machine learning and statistics has developed new techniques for finding patterns of words in document collections using hierarchical probabilistic models (Blei et al., 2003). Blei, Ng & Jordan (2003) describes the Latent Dirichlet Allocation (LDA) as a generative probabilistic model for collections of discrete data such as a text corpora. LDA is further computed as a three-level hierarchical Bayesian model, where each

(13)

item of a collection is modeled as a finite mixture over an underlying set of topics (Blei et al., 2003). Moreover, there are several possible extensions to the LDA model, such as extending it to continuous data or other non-multinomial data or even make the topic distribution be conditioned on features such as ”paragraph” or ”sentence” (Blei et al., 2003).

Sentiment analysis seeks to identify the viewpoints underlying a text span (Pang & Lee, 2004).

The sentiment analysis uses a negative word count to measure the tone of the text by using a dictionary such as the Harvard Dictionary which defines what words that are considered posi- tive and negative. However, Loughran & Mcdonald (2011) proved empirically that dictionaries developed for other disciplines misclassify words that are commonly used in financial contexts.

They found that in a large sample of 10-Ks during 1994 to 2008, almost three-fourths of all words that classify as negative in the recognized Harvard Dictionary, were words that are typically not considered as negative in financial contexts. They developed several financial word lists that were supposed to more accurately reflect the underlying tone of financial texts, including word lists that are linked to firms accused of accounting fraud and to firms reporting material weak- nesses in their accounting controls. Essentially, their results suggested that textual analysis can contribute to our ability to understand the impact of information on stock returns.

Since Markowitz entrenched the foundation of Modern Portfolio Theory in 1952, it has been established that the returns of securities are too intercorrelated for diversification to eliminate all variance. Intuitively, Markowitz argued that an investor should consider expected returns as a desirable thing and variance of return an undesirable thing. This lead to William F. Sharpe’s paper in 1966 on mutual fund performance and the ”reward-to-variability ratio,” and builds on Markowitz’ mean-variance paradigm, which assumes that the mean and standard deviation of the distribution of one-period return are sufficient statistics for evaluating the prospects of an investment portfolio (Sharpe, 1994). The paper published in 1994 by Sharpe introduced the Sharpe Ratio, which is a reference to the original version as well as more generalized versions.

The Sharpe Ratio is essentially a measure of the average return earned in excess of the risk-free rate per unit of volatility. Intuitively, we would like to find the portfolios that have the highest risk-adjusted return.

We know from the literature that the Sharpe-Ratio is ten times higher on days when essential state variables are announced (Savor & Wilson, 2013), with the average announcement-day excess return from 1958 to 2009 being 11.4 basis points versus 1.1 basis points for all other days.

This suggests that more than 60% of the cumulative annual equity risk premium is earned on announcement days, which means there exists an announcement premium, whereas stocks with

(14)

high past announcement period volume earn the highest announcement premium (Choi, 2014;

Lamont & Frazzini, 2007). The first announcement premium was discovered by Beaver (1968), and has since been documented by several other academics (Chari et al., 1988; Ball & Kothari, 1991; Cohen et al., 2007; Frazzini & Lamont, 2007; Savor & Wilson, 2016). Kalay & Loewenstein (1985) were able to prove the same finding for firms announcing dividends empirically. Moreover, as stated in Savor & Wilson (2016), neither of these papers find statistical evidence that the excess returns related to announcement days can be explained conventionally by increases in systematic risk. Savor & Wilson suggested that since these results mean investors learn more about future economic conditions around announcements, it means they should be less willing to hold assets, such as stocks that covary positively with this news, even if the variance of their returns is itself not much higher. This implies that if such shocks are persistent, even modest increase in volatility around announcements can result in vast increases in the market risk premium.

Fama & French (1993) found evidence of five common risk factors in the returns on stocks and bonds, whereas three of them are stock-market factors: an overall market factor, a firm size factor which measures small market capitalization firms minus big market capitalization firms and, the book-to-market equity factor which measures high book-to-market ratio (value) stocks minus low book-to-market ratio (growth) stocks. In more recent research, Fama & French (2015) presents a five-factor asset pricing model directed at capturing the size, value, profitability, and investment patterns in average stock returns, which they suggest performs better than the previously mentioned three-factor model. We want to investigate whether it is possible to use a sample of FOMC reports and financial news data as a common risk factor amongst the listed firms and that their exposure over time should be able to explain the cross-section of returns, and further measure the exposure towards the different topics in such a way that we are able to build portfolios based on sorting the exposures.

3 Methodology

Text classification is the task of automatically sorting a set of documents into categories from a predefined set (Sebastiani, 2002). It is estimated that 80% of the world’s data is unstructured, with roughly 2.5 billion gigabytes of data being created every day (IBM, 2016). Essentially, classifying text might be more relevant now than ever. The text classification is one of the most fundamental tasks in Natural Language Processing, and two of the most powerful applications of text classification is the topic labeling and the sentiment analysis.

(15)

3.1 LDA Topic Classification

In this paper, we will be using the state-of-the-art method (Yi & Allan, 2009), i.e., the Latent Dirichlet Allocation. This method is preferred above the Latent Semantic Analysis (LSA) as LSA has received much criticism due to its inadequate statistical foundation. The main criticism towards LSA is the case that it erroneously assumes Gaussian noise on term frequencies, which empirically follows a Poisson distribution. Even the extension of the LSA (pLSA) is incomplete, as there is no statistical model at the document level, which may result in overfitting. Conse- quently, we employ the LDA as it builds upon this shortcoming by using a Dirichlet prior for the topic distribution within documents (Blei et al., 2003).

Latent Dirichlet Allocation (LDA) is a generative statistical model that allows sets of ob- servations to be explained by unobserved groups that explain why some parts of the data are similar. We treat data as observations that arise from a probabilistic generative process that includes latent variables, where these variables reflect the thematic structure for the collection.

What the topic classification ultimately is after is to identify the hidden structure that likely generated the observed collection of words.

3.1.1 Formal Notation of LDA

The topics areβ1:k, where eachβk is a distribution over the vocabulary. The topic proportions for thedth document areθd, where θd,k is the topic proportion for topick in documentd. The topic assignments for thedth document arezd, where zd,n is the topic assignment for thenth word in documentd. Lastly, the observed words for documentd arewd, wherewd,n is thenth word in documentd, which is an element from the fixed vocabulary. The generative process for LDA corresponds to the following joint distribution for the hidden and observed variables,

p(β1:K, θ1:D, z1:D, w1:D) =

K

Y

n=1

p(βi)

D

Y

d=1

p(θD)YN

n=1

p(zd,nd)p(wd,n1:K, zd,n) .

(16)

Figure 2: The graphical model for LDA. Illustration: David M. Blei.

In Figure 2, each node is a random variable and is labeled according to its role in the generative process. The hidden nodes — the topic proportions, assignments, and topics—are unshaded. The observed nodes — the words of the documents—are shaded. The rectangles are “plate” notation, which denotes replication. The N plate denotes the collection of words within documents; the D plate denotes the collection of documents within the collection.

We will use a sampling-based algorithm, where we attempt to collect samples from the pos- terior to approximate it with an empirical distribution. More specifically, we will use Gibbs Sampling, a Markov Chain Monte Carlo (MCMC) technique, mainly in order to make an infer- ence with Bayesian models on the corpus.

In Table 1 we did an LDA topic classification on Equinor’s (formerly known as Statoil, the largest company listed on Oslo Stock Exchange) annual reports for 2010-2017, and for 1980-1985 in Table 2. What Table 1 and Table 2 suggests is that words such as ”Statoil” and ”compani”

appears to be of significant value, although the words do not help explain the topics of the annual reports. This implies that we have to clean the corpus more precisely in order for the LDA topic classification to find better-related words of the underlying topics of the annual reports.

What is clear is that technology and new developments in the company are considered very important to Equinor. This comes as no surprise, since as we know the largest countries in the world is shifting their focus to renewable sources of energy, the complete opposite of the oil and gas related business which the company has operated in. Moreover, we see that risk management is also one of the most important topics, most likely due to the highly fluctuating commodity prices we have seen in the last few years. In contrast, a topic classification on the company’s annual reports from 1980-1985 reveals a different picture. This brings up the question: Are some topics more important than others in explaining a particular aspect of Equinor? Here we limited the number of topics to only the eight most important ranked by probabilities, although we can

(17)

easily extend it to a case of 100 topics, and track how their relative importance change over time.

Table 1 - LDA topic classification on Equinor’s Annual reports, 2010-2017

Technology Exploration Commodities Reporting Financials Performance Risk Mngt. Stock

1 statoil statoil gas statoil tax nok financi share

2 develop field oil report asset reserv cash board

3 oper product market norwegian incom billion statoil statoil

4 busi develop natur state cost product risk committe

5 activ oper product financi estim increas usd corpor

6 result well crude statement rate prove invest member

7 risk project statoil petroleum loss oper liabil compani

8 product north transport intern recognis incom interest execut

9 project interest volum account amount price nok sharehold

10 technolog sea price requir futur million capit meet

Table 1: LDA Equinor 2010-2017

Table 2 - LDA topic classification on Equinor’s Annual reports, 1980-1985

Operations Market Stock Financials Management Exploration Director Oil Fields

1 statoil oil nok cost compani statoil manag field

2 norwegian product million account board block vice gas 3 develop statoil consolid financi general norsk director oper

4 activ market share profit shall gas pres billion

5 insur crude statoil depreci meet oil statoil platform

6 compani will compani liabil director hydro presid million

7 work transport valu incom assembl explor engin develop

8 project increas amount current two licenc sen will

9 oper price invest statement art saga rock product

10 year per interest consolid matter oper area start

Table 2: LDA Equinor 1980-1985

(18)

3.2 Sentiment Analysis

Sentiment analysis is a way to try to capture the emotions in text systematically. One could imagine that since the company itself writes the annual report that it would reflect what their outlooks for the company are, thus the sentiment in the annual report should reflect the sentiment of the management of the company. If this is true, it means that a measure of the sentiment would work as a forward-looking measure. The simplest case of sentiment analysis is

St=

PPositive wordst

PNegative wordst.

The first challenge that arises is how to classify the emotional state of words. This part is crucial to get the most out of the analysis, as words can have different meanings given context. The trivial example isbank, which can mean a financial institution or a sandy beach by a river. We must, therefore, be careful when choosing the sentiment library (Loughran, 2011). This can be taken one step further by Open Information Extraction (Open IE), which allows us to work withtuples rather than individual words. Usingtuples allows for representation by structured events so that the actor and object of events can be better captured (Ding, 2014). A simple example is”Apple sues Samsung” which we would not be able to use to accurately predict the price movements by Apple and Samsung by the formerly introduced methods. However, a structured representation can be { Actor = Apple, Action =sues, Object = Samsung }. Given this representation, it is easier to predict movements in the share price of both Apple and Samsung. We will however, not go in that direction with this thesis, as we are rather interested in the common exposures rather than firm-specific events. Moreover, the availability of data which can be obtained free is limited.

In Appendix B we run an experiment to show that there is valuable information embedded in the sentiment of news.It can also be interesting to try to implement this structured representation into the model proposed in Appendix B to improve its forecasting abilities.

(19)

4 Data

The main data source is the 10-K and 10-Q reports for all the companies on the S&P 500, and the FOMC reports. We will also be using financial news for the S&P500 companies, as well as annual/quarterly reports from the ten largest Norwegian firms based on trading volume.

4.1 Data Sources

4.1.1 U.S. 10-K and 10-Q Reports

The S&P 500 companies 10-K and 10-Q reports are all gathered from the SEC EDGAR database.

The 10-K/10-Q are very comprehensive reports filed annually/quarterly by a publicly traded firm and contains a detailed overview of the company’s performance. The SEC requires the 10-K/10-Q in order to keep investors aware of a company’s financial condition, as well as allow the investors to obtain information before they decide on investing in shares or bonds in the corporation.

Detailed explanation of the reports as well as how the data were collected may be found in Appendix A.

4.1.2 FOMC Reports

The Federal Open Market Committee (FOMC) is a committee within the FED which is charged under United States law with overseeing the nation’s open market operations and make decisions about the direction of the monetary policy. In other words, the FOMC makes key decisions about the U.S. interest rates and the growth of the United States money supply. A change in policy would usually result in either buying or selling U.S. government securities on the open market to promote the growth of the national economy. The committee meets eight times a year, and are obliged to submit written reports to the Congress when the meeting contains discussions about the conduct of monetary policy, economic developments, and prospects for the future. Essentially, we are interested in using the FOMC reports as a proxy for the ”state of the economy.”

4.1.3 Financial News

Financial news can be gathered from several newspapers in the U.S. However; those newspapers have archives that usually saves news a few months before it disappears. Consequently, we had to find a news corpus that has saved the news for several years. Furthermore, we would like the news to be financial news that specifically reviews the S&P500 companies. We were able to

(20)

identify such a news corpus from Thomson Reuters, which had 6.1 million headlines dating from 2007 to 2017, and would like to test whether these headlines can be appended to the FOMC reports in order to generate an even stronger proxy for the ”state of the economy.”

4.1.4 Center for Research in Security Prices

The Center for Research in Security Prices (CRSP) provides historical market data and is a part of the Booth School of Business at the University of Chicago. CRSP is known for its vast and extensive historical stock market databases, which is why academic researchers have relied on CRSP for accurate databases for decades. Naturally, we will gather our stock market data from CRSP to make sure our data set is as accurate as possible.

4.2 Data Structuring

The EDGAR web scrape we use gathers our initial set of data, which consists of all the S&P 500 companies 10-Ks and 10-Qs. The data are downloaded as regular text files directly from EDGAR, and since the main purpose of these text documents is to extract meaningful information we must effectively manage and categorize the text documents for their subjects or themes. We will now carefully explain the necessary steps in order to discover the hidden topics.

1. First of all, we have toclean each text document. The purpose of cleaning the data is to remove any undesirable words that will not contribute to topic modeling.

(a) Removal of stop words. Stop words are commonly used words such as ”the,” ”a,”

”an,” and ”in.” These words are not considered to be essential in determining the theme of each text document, and thus, we remove all English stop words.

(b) Lemmatization. In a similar manner to the removal of stop words, the lemmatization aims to reduce inflectional forms. However, lemmatization is the process of grouping together the different inflected forms of a word in such a way that all the words can be analyzed as a single item. For example, ”car”, ”cars”, ”car’s”, and ”cars’” would all be represented by ”car”. This is a beneficial method as it implies that the context of the sentence will be preserved.

2. The second part is the building of a word dictionary. In each of the text documents, all the words are just regular words represented by letters. In each of the dictionaries, however, all the unique words from each text document are given IDs as well as a frequency count. Essentially, the dictionary is the vocabulary of the corpus where every single word

(21)

is represented by a number, i.e., its ID, rather than the word itself. Furthermore, we visualize the most common words of the dictionary. If some of those words are found to content-neutral, we add them to a list of words that will be removed as well.

3. When the dictionary is complete, it is used to transform the text documents into acorpora that consists of the ID of the words rather than the words itself.

4. Finally, we arrive at the LDA estimation itself, which is the training phase of topic modeling. From this, we get a table that shows all the topics and the top ten terms that are related to the topic. Ultimately, we would like to see that each term that is listed for the same topic should be highly associated as well as having some variation between the different topics.

In order to use our Thomson Reuters news corpora we had to clean the data. First of all, it contained financial news about many firms that are not as relevant to our analysis as the companies that are on the S&P500, which potentially could contribute to much noise. To get rid of the noise we sorted the news corpora in such a way that we were left with only the S&P500 companies, which meant the amount of news headlines went from 6.1 million observations to 2.1 million.

4.3 Assessment of the Data

As we expect the terms to be highly associated within each topic, we perform a t-Distributed Stochastic Neighbor Embedding (van der Maaten & Hinton, 2008). We want to make sure that each of our data sources contributes to the analysis rather than generate noise, and thus we test each of the data sources separately.

4.3.1 Assessment of U.S. 10-K

The following test was performed on 3M CO’s 10-K for 2017 and gave the following results.

From Figure 3, we see a clear pattern that terms which belong to a particular topic are highly associated, as well as an indication that there is a variation between each topic. This suggests that our data can be interpreted and that it is not just random noise. LDAvis further support this in Figure 3, which visualize the fit of an LDA topic model to a corpus of documents. Also, we made a few interesting observations worth noting that are shown in Appendix A.

(22)

(a)(b) Figure3:Visualizationof10-KLDAtopics

(23)

Figure 4: The widths of the gray bars represent the corpus-wide frequencies of each term, and the widths of the red bars represent the topic-specific frequencies of each term

In Figure 3 and Figure 4, two visual features provide a global perspective of the topics. First, the areas of the circles are proportional to the relative prevalences of the topics in the corpus (Sievert

& Shirley, 2015). Second, it is the ability to select a term to reveal its conditional distribution over topics as well as revealing the most relevant terms for that topic. The relevance of termw to topickgiven a weight parameterλis defined as

r(w, k|λ) =λlog(φkw) + (1−λ) log φkw

pw ,

whereλdetermines the weight given to the probability of termwunder topickrelative to its lift.

(24)

In our test sample of 3M Co, we see that the first topic comprise 5.5% of the corpus. Initially, however, the first topic only comprised 4.9% of the corpus. This is because it contained much standard, non-specific terms such as the name of the company, ”company” and ”include” which are commonly used words in 10-Ks and 10-Qs, which are not considered normal English ”stop words.” This is similar to the LDA classification we did on Equinor in Table 1 and Table 2, where

”Statoil” and ”compani” are words that consistently appears, but do not provide any value for our analysis. Thus, we have applied the LDAvis visualization in order to make sure we get rid of terms that do not help to explain the underlying topic of the corpus.

4.3.2 FOMC Reports

The FOMC reports are published two times a year and contains much macroeconomic information which we will apply as a proxy for the state of the economy. In order to make sure these reports are comprehensive enough to serve as a proxy for the state of the economy, we have visualized the report for the second half of 2009. As this was shortly after the financial crisis, we would expect to see topics such as unemployment, inflation, and interest rates. However, we also expect to see measures being taken by the FED to improve the situation, such as increasing federal funding and encouraging private investments. From the visualization of Figure 6, we found that unemployment, inflation, and interest rates were important topics, but we also found that topics such as encouraging mortgage and consumer spending were essential topics. This is an indication that the FOMC reports are quite comprehensive and that they can be used as a proxy for the state of the economy.

As a lot of our data are from the early 1990s, it is essential that we assess whether all of the data is interpretable, or just a collection of noise. In order to do this, we visualized all the FOMC reports, and found that the reports from before 1999-07-22 consistently had less association between topics and more noise than the ones after. Consequently, we chose the appropriate starting point to be the 1999-07-22 report. Figure 5. illustrates why we made our decision. From Figure 5 we clearly notice that there is significantly less association between topics and a much larger degree of noise in 1999-02-23 report. Intuitively, we set the starting point as 1999-07-22 in order to reduce the noise in our sample.

(25)

(a)1999-02-23(b)1999-07-22 Figure5:VisualizationofFOMCReports

(26)

Figure 6: FOMC Report Visualized

4.3.3 Financial News

The FOMC reports are only published twice a year and mostly contain macroeconomic informa- tion. Intuitively, we would like to extend the reports by a financial news corpora that reports information about more factors than just macroeconomics. Once again, we see a clear pattern that terms which belong to a particular topic are highly associated, as well as an indication that there is a variation between each topic. In this particular case of financial news, we see an even more precise pattern than in the case of the 3M Co 10-K. Intuitively, this is a dataset that can be interpreted and thus contributes to the analysis, rather than generate additional noise.

Assessing the financial news we also made a few very interesting observations, which are shown in Appendix A.

(27)

Figure 7: LDA Topics Financial News - t-SNE

(28)

5 Constructing the Portfolios

In this section, we will cover the procedure on how portfolios can be sorted based on a similarity measure, which will later be used to investigate further whether exposure to common topics can help explain the difference in returns across firms. A rationale behind this idea can be that some firms ex-ante beliefs, predictions, or opinions about what might drive future economic growth not necessarily fit the ex-post realizations given by the Federal Open Market Committee reports.

The idea is that these disagreements can realize different returns, or that at least firms with a much higher degree of disagreement on aggregate will experience different returns that the firms with a much lesser degree of disagreement. Another interpretation can be that the similarity with the Federal Open Market Committee report, which we think of as acommon text can reflect different firms risk attributes.

5.1 The Similarity Measure

The similarity measure must be constructed in a way which captures the notion of disagreement between the Federal Open Market Committee reports and the reports published by each firm.

To recognize a topic, we use the Jaccard distance, and we restrict each topic to be the set of the 25 words which contribute the most to each topic. If we let Ai be the sets of words for the companyi, andF be the sets of words for the FOMC reports, then we can calculate the distance as

dJ(Ai,F) = 1−J(Ai,F) =|Ai∪ F | − |Ai∩ F |

|Ai∪ F | .

The problem we faced was that the two texts could be fundamentally different, which makes it hard for us to get meaningful matches across all topics. By construction, dJ(Ai,F)∈[0,1], wheredJ(Ai,F) = 0 (0) implies perfect similarity (dissimilarity). To isolate the effect of highly dissimilar topics, we subtract one and take absolute values,

dJ(Ai,F) =|dJ(Ai,F)−1|,

which then leaves us with an adjusted Jaccard distance representative of the total amount of match found by the Jaccard distance. For each company filing and corresponding FOMC report, we are then left with a 40×40 matrix with the adjusted Jaccard distances for each topic. Since we are interested in the similarity between the company filings and each topic of the FOMC report,

(29)

we fix each FOMC topic column and sum across all rows to construct the similarity score,S:

Sti=

40

X

j=1

dJ(Aij,t,Ft),

for each company i. Naturally, firms different in lengths of available reports. In the case of missing reports, we disregard them and focus only on the available reports in each periodt. For eachtwe are left with a vector of scoresStwhere each row corresponds to a company. We then sort this vector from the least to largest and divide it into ten equally sized portfolios.

5.2 Assessing the results

Here we will look at how the value-weighted and equally weighted portfolios in each decile performed. To compute portfolio returns, we use daily data from CRSP to calculate the equity value for each company, where the weights correspond to a value weight in each decile. We rebalance the portfolios each time the Federal Open Market Committee published a new report, call it time t. If we say that time t+ 1 is the time when the next report is published, then we use timetdata to calculate weights and timet+ 1 data to find the realized logarithmic returns for each company. We then sum across all firms in each portfolio to get the realized portfolio returns. Table 10-11 and Figure 8 summarizes the value-weighted and equal-weighted returns.

What the results indicate is that if you were to fit a line to these points, the slope of this line would be positive. This implies that stock returns might be proportional to the similarity loading through their filings on the common FOMC report.

Further, we see that the average market capitalization rate of the companies in each portfolio is approximately the same, as well as the number of companies that appear in each portfolio.

This indicates that none of the deciles are biased towards a particular type of company. One could perhaps think about a scenario where a specific type of company prone to substantial stock returns reappear in the last decile, whereas the opposite happens in the first decile. However, what we see is that the different companies move around in the different deciles, indicating that our sorting picks up on the time-varying structure of each company filing.

6 Adding News to FOMC Corporas

Our results from running the similarity measure on firm filings and FOMC reports seem to in- dicate that there is a positive relationship between the similarity score and return on stocks.

(30)

However, the results are noisy, and our results do not indicate a monotonically increasing be- havior. Since the FOMC reports are very general, we might fail to capture relevant firm-specific features. Therefore, we choose to update the FOMC corpora with 2.1 million news headlines, which might help to reduce the shortcomings of the generality of the FOMC reports. We then re-run the entire estimation procedures to get an updated similarity measure ˆSti. With this new measure, we hope to decrease the noisiness in the results and to get increased monotonicity.

We sort portfolios based on both ˆSti and construct both value-weighted and equally-weighted portfolios. The results are summarized for both portfolios in Table 3 and 4.

From Table 3 we see that the difference between the upper and lower decile increased sub- stantially, from being 0.29% per month (3.54% p.a.) to 0.74% per month (9.94% p.a.). Figure 10 (a) shows an increased monotonic behaviour compared to Figure 8. The equal-weighted results do no change much, and it seems that its value-weighted counterpart is the better one. It is not clear why we only see improvements in the value-weighted case. One explanation could be that the occurrences in news headlines are much more frequent if the company is larger, thus the news we incorporated has much more information on the larger companies than the smaller. If this is the case, then we would perhaps expect to see this reflected in our results the way we see, since a value-weighted approach gives a much higher weight on larger companies than on smaller.

Further, we see again that the average market capitalization rate of the companies in each portfolio is approximately the same, as well as the number of companies that appear in each portfolio. As in the previous case of non-news portfolios, this implies that none of the deciles are biased towards a particular type of company.

(31)

10PortfolioReturns,Value-Weighted Portfolio12345678910(10-1) Mean0.000.100.400.290.250.69**0.260.540.720.730.74 t-stat-0.0170.2851.2410.8720.7352.1840.7761.4842.198**2.108**2.678** t-stat*-0.0160.2661.2970.8740.7202.5000.6911.4302.209**2.384**2.384** SE*0.0040.0040.0030.0030.0040.0160.0040.0040.0030.0030.003 Skewness-3.25-1.663-1.45-1.53-0.37-1.51-2.44-1.57-1.24-1.461.86 Kurtosis23.939.71410.258.968.739.9317.0210.2111.4810.8415.94 Companies315358368391389384385379364369 Averagemarketcap$28.32$27.13$24.98$26.04$26.82$26.76$27.72$27.21$25.18$24.99 N247 *Thestandarderrorsareheteroskedasticityandautocorrelastionrobust(HAC)upto6lags. Wecalculatestandarderrorsusingaconstantregression.Meanandstandarddeviationinpercentage.Averagemarketcapinmillions. Table3:Value-WeightedwithNews-Monthly.

(32)

10PortfolioReturns,Equal-Weighted Portfolio12345678910(10-1) Mean0.49*0.66**0.69**0.61**0.76**0.92***0.70**0.82***0.78***0.62**0.13 t-stat1.7362.3312.2702.1082.6423.2712.5362.7012.6682.1510.924 t-stat*1.7542.2872.1602.0462.5103.5452.5302.7402.6352.1310.984 SE*0.0030.0030.0030.0030.0030.0030.0030.0030.0030.0030.001 Skewness-2.18-1.829-1.73-1.55-1.50-1.06-1.24-1.771-1.51-1.37-0.68 Kurtosis14.5412.5312.6810.6310.377.958.5712.0810.868.436.43 Companies315358368391389384385379364369 Averagemarketcap$28.32$27.13$24.98$26.04$26.82$26.76$27.72$27.21$25.18$24.99 N247 *Thestandarderrorsareheteroskedasticityandautocorrelastionrobust(HAC)upto6lags. Wecalculatestandarderrorsusingaconstantregression.Meanandstandarddeviationinpercentage.Averagemarketcapinmillions. Table4:Equal-WeightedwithNews-Monthly.

(33)

7 The Long-Short Strategy

In this section, we provide insight into how a long-short strategy can yield greater risk-adjusted returns than the benchmark index. First, the statistical properties at the monthly frequency reported in Table 19 is inspected, and then we follow up with a performance evaluation at the semi-annual frequency in Table 22.

Only the value-weighted portfolio with news exhibits a statistical significant mean the the 5%

significance level. This gives us two indications. First, the value-weighted portfolio composition seems to be the better one. Second, by adding news we go from statistically insignificant results to statistical significance. From these two observations we are especially interested in investigating the value-weighted portfolios with news further, as they provide the most intriguing results.

Since we use S&P 500 listed companies exclusively, we use the S&P 500 Total Return index as the benchmark. Table 22 reports all the first four moments, as well as the correlation between the S&P500 TR index and the Sharpe-ratio for at risk-adjusted measure with a zero risk-free rate for simplification. Table 22 tells us that only the value-weighted portfolio based on the updated corpora has a higher realized return over the period, yielding more than 100% in additional returns on average each half year. Moreover, the risk-adjusted return is also significantly higher, indicating that this portfolio is better than a passive investment in the S&P index alone. We also find a negative correlation between the S&P 500 returns and three of the four portfolios, indicating to us that there could are significant diversification benefits from combining the index with these portfolios.

Due to the possible diversification benefit, we choose to look at two other portfolios, P1, and P2. P1 is the optimal portfolio where we allocate between the value-weighted long-short portfolio with news and the SP500 Total Return index. P2 is the optimal portfolio when we allocate between then ten value-weighted portfolios with the news. We set this problem up as a simple Karush-Kuhn-Tucker problem,

maxω ω0R

ω0Σω−1

subject to |ω| ≤3,

N

X

n=1

ωi= 1,

whereωis a vector of weights,Ris a vector of returns, and Σ is the covariance matrix associated

(34)

with these returns. We restrict |ωi| ≤3 ∀i= 1, ..., N, and we want our weights to sum to 1.

We solve this problem numerically, and the results are summarized in Table 19. We see that by just usingP1 we can significantly improve our results, raising the risk-adjusted ratio from 0.36 to 0.65. We also see that using all the ten value-weighted portfolios inP2we can further increase the Sharpe ratio, from 0.36 to 1.67. However,P2requires you to trade all stocks on the S&P 500 index, which will be costly both in terms of transaction costs but also through shorting fees. P1, on the other hand, requires you only to trade 40% of the stocks on the S&P 500 index and an ETF on the index itself. Moreover, these two portfolios are based on ex-post realizations completely, so it is not an entirely fair comparison. However, it highlights the potential diversification benefit that our sorted portfolios can yield.

8 Diagnostics

In this section, we want to robustness check our results. First, we will show that our results are robust to sub-samples by splitting the sample into two parts. Then we will investigate the sensitivity to several topics by estimating a set of new models with 25, 75, and 100 topics. Lastly, we will see how the results change if we change the number of sorted portfolios.

8.1 Splitting the Sample

First, we will evaluate our results to different sub-samples of our time-series. This is to avoid that our results are strongly affected by some events, like significant outliers. We split it in half by using up to the first half of 2008 for the first part and then use the remaining data from the second half of 2008 for the last part. This way of splitting allows us to check the performance through two financial crises, namely the technology bubble at the beginning of the current millennium and the financial crises in the later part of 2008. What we will be looking for is the same relationship as in Figure 9, that is, a positive relationship between portfolio returns and similarity score. Figure 11-12 as well as Table 25 summarizes the results.

What we see is that for the value-weighted portfolios we have a similar result as for the entire sample for both sub-periods. For the equal-weighted portfolios, we see that the strategy fails in the first sub-period, but gives similar results in the subsequent period. These results strengthen our hypothesis that portfolios can be sorted based on the company filings exposure towards a common text source and that the value-weighted portfolios are robust to different sub-samples, although the equal-weighted portfolios fail on the first sub-sample.

(35)

8.2 Varying Amount of Topics

In this subsection, we will look at how our results change if we vary the number of topics in the LDA-models. We are interested in two things:

1. How well does the LDA model on the updated FOMC corpora handle different numbers of topics?

2. How well does our sorting algorithm handle these numbers of topics?

First, to investigate the performance of the LDA model on different numbers of topics (15, 25, 40, and 100 topics) we use LDAvis as a visualization tool. The LDAvis provides us with a visualization of how well the LDA topics fit the corpus of a document. From the visualization, it is clear that too many topics have minimal benefit and mostly leads to much noise. Figure 13 shows the visualization of 100 LDA topics and makes it very clear that any topic beyond the first 40 topics seems to be close to identical, and thus there is no marginal value added from increasing the number of topics to more than 40. However, we would also like to test with less than 40 topics as we want our model to be as parsimonious as possible to avoid unnecessary noise. Figure 14 and Figure 15 illustrates that fewer topics seem to generate less noise and have more variation between each topic. Nevertheless, Figure 15 indicates that 15 topics might be too few as we would risk not having enough topics to capture everything in the data, whereas 40 topics, illustrated in Figure 3, might be slightly too many. This is further supported by the results which are summarized in Table 26-28.

Second, we want to see how the different portfolios perform with different amounts of topics.

From the visualization, we would expect the portfolios based on 100 topics to underperform due to the inefficiency of the LDA models, whereas it is hard to predict how the results with several topics less than 40 will perform. Table 26-27 and Figure 16-18 summarizes these results. As expected, the portfolios based on 100 topics shows no clear pattern, both for the value-weighted and the equal-weighted portfolios. This confirms our conclusion from the visualization, that the high number of topics creates too much noise, as the LDA model struggles to separate the topics from each other. The portfolios based on 25 topics, however, does still show an evident pattern in the value-weighted case, but not in the case of the equally-weighted portfolios.

This subsection concludes that our results are sensitive to the number of topics. Since we use LDA topic classification to break down the text, it is essential to choose the number of topics carefully. Too many topics introduce too much noise by overlapping topics, whereas too few may leave out important information which can be used to improve the portfolio sorting.

(36)

8.3 Varying Amount of Portfolios

It is standard in the literature to use ten portfolios when sorting (e.g., Fama & French, 1996;

Leong et al., 2009), and so we chose to use that as our benchmark. However, it would be interesting to see how varying the number of portfolios affects our results. We therefore also look at the results from using 5 and 15 portfolios.

8.3.1 5 portfolios

First, we look at how the results change when we only sort companies in 5 portfolios. The monotone tendencies are apparent, and the resulting long-short strategy gives an annual return of about 6.18% as shown in Table 5, with a higher risk-adjusted semi-annual return than in the 10 portfolio case, shown in Table 23. First, see that the monthly return generated by the (5-1) spread portfolio is lower than the (10-1), however with lower standard deviation as well.

This results in a return which, even though lower, has a higher Sharpe-ratio. See Table 20 and Table 23 for complete results.

5 Portfolio Returns, Value-Weighted

Portfolio 1 2 3 4 5 (5-1)

Mean 0.15 0.37 0.41 0.37 0.66** 0.50***

t-stat 0.455 1.234 1.322 1.150 2.064 2.629

t-stat* 0.431 1.286 1.234 1.054 2.179 2.870

SE* 0.004 0.003 0.003 0.004 0.003 0.002

Skew -2.59 -1.59 -0.56 -1.74 -1.49 0.17

Kurt 18.73 11.21 9.83 10.19 12.26 14.20

Companies 387 405 412 411 398

Average market cap $26.56 $25.68 $27.66 $27.31 $25.41

N 247

Mean and standard deviation of returns in percentage.

Average market cap in millions.

We calculate standard errors using a constant regression. *The standard errors are heteroskedasticity and auto- correlastion robust (HAC) up to 6 lags.

Table 5: Value-Weighted with News - Monthly.

(37)

8.3.2 15 portfolios

Here we will look at the case where we increase the number of portfolios used in the sorting to 15. Statistical evidence on monthly frequency as well as performance on semi-annual frequency, are reported in Table 21 and Table 24 respectively. The results indicate that the value-weighted spread portfolio (15-1) performs much worse than the (10-1) and (5-1), with an annual return of only 3.79% as shown in Table 15. Even though there is a tendency of the positive relationship between the similarity score and return, the increased number of portfolios appears to have introduced significant noise, see Figure 22.

9 Fama-French Multifactor Models

Now that we have established that we can generate returns by sorting portfolios according to a similarity score we want to check whether these returns are abnormal, that is, if it is not accounted for by well-known risk factors. We will test this by running both the Fama-French three factor model, as well as the Fama-French five-factor model. The Fama-French three-factor (Fama & French, 1993) in all its simplicity is an extension of the CAPM, that aims to describe stock returns through three factors. The first is the market risk (MKT), which is the difference between the expected return of the market and the risk-free rate. In other words, it is the excess return required as compensation by the investor for the additional volatility of returns above the risk-free rate. The second is the outperformance of companies with a small market capitalization relative to companies with a large market capitalization (SMB). The justification of SMB is that in the long-term, companies that have a small market capitalization is more likely to experience higher returns than companies with large market capitalization. The third factor is the outperformance of high book-to-market companies versus low book-to-market companies (HML). The factor rationalization of the HML is that value companies, i.e., high book-to-market ratio companies tend to have higher returns than growth companies, i.e., low book-to-market ratio companies.

The Fama-French three-factor model was further extended by Fama and French in 2015 when they introduced the Fama-French five-factor model. The Fama-French five-factor model proposed two new factors to the model. The fourth factor they proposed is the difference between returns on diversified portfolios of stocks with robust and weak profitability (RMW), and the difference between the returns on diversified portfolios of the stocks of low and high investment firms, which

(38)

they refer to as conservative and aggressive (CMA).

According to the Fama-French setup, abnormal returns are indicated by an intercept which is statistically significantly different from zero. We will report all estimates in separate tables, and use a significance levelα= 5% when referring to statistical significance.

9.1 Fama-French Three-Factor Model

We use data on the three factors and the risk-free rate from Kenneth French’s data library2.The Fama-French 3 factor model is specified as

Ri=α+biRmt +SiSM Bt+hiHM Lt,

where Rm is the market excess return, SM B is the small minus big factor, andHM Lt is the high minus low factor. Finally,Ri is the excess return of asset i and will be the excess return of the 5 and 10 portfolios long-short strategy. First, we run the regression on the 10 portfolio case, and then on the 5 portfolio case.

9.1.1 10 portfolios We specify the model as

R10=α+biRtm+SiSM Bt+hiHM Lt,

where R10 is the excess return of the 10 portfolios long-short strategy. To calculate the excess return, we subtract the risk-free rate we got from Kenneth French’s web site from the returns we calculated earlier. The results are summarized in Table 6. As stated in Table 6, the three- factor estimates for the 10 portfolio long-short strategy is not jointly statistically different from zero. We notice that the intercept is statistically different from zero, which implies that we can generate returns in excess of what the asset pricing model would imply.

9.1.2 5 portfolios We specify the model as

R5=α+biRtm+SiSM Bt+hiHM Lt,

where R is the excess return of the 5 portfolios long-short strategy. To calculate the excess return, we subtract the risk-free rate we got from Kenneth French’s web site from the returns

2https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

(39)

we calculated earlier. The results are summarized in Table 6. As stated in Table 6, the three- factor estimates for the 5 portfolio long-short strategy is jointly significantly different from zero.

Moreover, as the intercept is statistically different from zero, it implies that we are also able to generate abnormal returns with the 5 portfolios long-short strategy.

Table 6: Fama-French Three-Factor - monthly frequency 5 Portfolios

Coefficient SE t-stat Intercept 0.0039** 0.002 2.291

MKT -0.0806 0.064 -1.252

SMB 0.1989*** 0.073 2.738 HML -0.2683*** 0.085 -3.157

N 247

R2 0.10

F-stat 13.21

10 Portfolios

Coefficient SE t-stat Intercept 0.0065** 0.003 2.139

MKT -0.0617 0.122 -0.507

SMB 0.054 0.160 0.343

HML -0.1891 0.172 -1.099

N 247

R2 0.01

F-stat 0.6520

Standard errors are autocorrelation and heteroskedasticity robust (HAC)

9.2 Fama-French Five-Factor Model

The Fama-French 3 factor model is specified as

Ri=α+biRmt +SiSM Bt+hiHM Lt+riRM Wt+ciCM A,

where Rm, SM B, and HM Lt are the same factors as before, but now we add the robust and weak profitability factor (RM W), and the conservative and aggressive factor (CMA).Ri is still the excess return of asset i and will be the excess return of the 5 and 10 portfolios long-short strategy. First, we run the regression on the 10 portfolio case, and then on the 5 portfolio case.

9.2.1 10 portfolios We specify the model as

R10=α+biRmt +SiSM Bt+hiHM Lt+riRM Wt+ciCM A,

Referanser

RELATERTE DOKUMENTER

(2014) provide an example of a risk function for defined responses generated from real- world navy sonar sources, from an opportunistic exposure study of Blainville’s

The cost of using force to secure national interests in the near abroad may increase significantly if economic growth is hampered and/or Russia’s role in international

tech level wear Size of R&D University SectorQualof University Research chinqualof uniresearch Hiring soldiersPromoting Soldiers..

Keywords: gender, diversity, recruitment, selection process, retention, turnover, military culture,

Abdul Hamid Karami, the former Mufti and governor of Tripoli, dismissed by the French regime, became the leader of the resistance movement, which led to the creation of the Republic

In its eight years of life, HTAi has greatly contributed to the spread of HTA around the world; through its Policy Forum, it has also provided guidance on and helped to evaluate

In particular, by decomposing the textual data into daily news topics, and using a mixed frequency time-varying Dynamic Factor Model, I show that it is possible to obtain nowcasts

We want to obtain a dense representation of text regions, so that we can use this information to find other layout entities such as main text blocks and lines.. We start by populating