Assessment of Data - A few interesting observations

We made a few interesting observations which might be worth noting. In 2017, the$5 billion lawsuit filed against 3M Co due to water-pollution was the 4th most crucial topic and comprised 5.0% of the corpus. However, in 2018, the lawsuit turned out to be the most crucial topic, comprising 5.5% of the corpus. Shortly after the importance of the lawsuit increased, 3M Co saw a significant decrease in stock price.

Figure 23: Bloomberg headline

This is an indication that the topic and its importance might be used in order to predict the future direction of the stock price. Identifying topics of such magnitude are essentially one of the greatest strengths of our estimation and provide us with reason to believe that if we apply it properly, we might be able to extract valuable information that the market would use a lot more time to get hold of.

Furthermore, after extending the FOMC reports with the news, we made a few interesting observations. Before extending the reports with the news, the topic ”government bailout” was the fifth most important, comprising 3.4% of the corpus. However, by adding the news, the

”government bailout” topic became the fourth most crucial topic, comprising 6.1% of the corpus.

Furthermore, by adding the news corpora, the topic model was able to extract the names of several of the most impacted companies from the government bailout, such as JPMorgan Chase and Bear Stearns. This implies to us that adding a financial news corpora not only improves the efficiency of the reports as a proxy for the state of the economy but also can be used in order to acquire firm-specific information. Ultimately, we believe the financial news corpora should be incorporated into the model to provide us with more accurate results.

12 Appendix B

12.1 Sentiment Analysis Predictability

In this section, we provide three experiments in order to show that sentiment in text sources can yield predictive power over stock price movements.

Figure 24: Illustration of relationship between Equinor’s stock price, sentiment of the annual reports, and brent spot price.

From Figure 24, we see that there appears to be a relationship between these variables, as we would expect. However, due to the low frequency of the annual reports, it might work as a sparse variable for prediction. Luckily, the most extensive stocks on the Oslo Stock Exchange also comes out withquarterly reports, which will provide more observation points. Ultimately what we would like is a data source which would provide daily observations of both topics and sentiment, such as the Thomson Reuters financial news corpora, which provides 6.1 million financial news from 2007-01-01 to 2016-12-31. This little experiment has proven that there is information to be extracted from text sources and that they appear to be correlated with the firm’s stock price as well.

We go one step further and look and the quarterly reports and the associated transcripts from these meetings. We believe that the higher frequency will allow us to capture more of the variation of the stock price. We summarize our findings in Figure 25. What we see from this figure is that a large movement in the sentiment measured in the transcript is followed by a significant movement in the stock price of Equinor. This strengthens our assumption that there might be valuable information embedded in the transcripts of these meetings. This also

motivated us to try to get data with an even higher frequency. Please note that the transcripts and stock price are not perfectly aligned in time, since the quarterly report are often published at the beginning of the next quarter. The stock price is end-of-quarter.

Figure 25: Sentiment from both the transcript and report, including the stock price.

To go even one step further, we experiment with our Thomson Reuters financial news corpora. To make sure the sentiment provides us with a factor that affects stock prices, we apply the Granger causality test (Granger, 1988). The Granger causality is a statistical analysis that determines whether one time series is useful in forecasting another. Granger defined the causality relationship with two principles. The first principle is that the cause happens before its effect, whereas the second principle is that the cause has unique information about the future values of its effect.

Given these assumptions, the following hypothesis can be tested for the identification of a causal effect ofX onY.

wherePrefers to probability, A is an arbitrary non-empty set, andI(t) andI_−X(t) respectively

denote the information available as of time t. If this hypothesis cannot be rejected, there is statistical evidence to say that X Granger causes Y. Ultimately, we would like to use the Granger causality to show how the sentiment affects the stock prices.

To do this, we observe that although news topics are available, there is a significant underre-action in the market price to news and bright patterns of continuation in the time following the initial news release (Larsen & Thorsrud, 2017). Intuitively, the general observation is that the stock prices are expected to react after the sentiment is given and thus we compare the sentiment of the dayt−i, wherei= number of lags, with the stock price of the dayt. We want to perform this experiment on two levels; first on an aggregate level where we test whether the sentiment can Granger cause the S&P500 index, and secondly we would like to test if the sentiment can Granger cause stock prices on a firm-specific level.

In order to perform this experiment, we have to perform several necessary steps. First of all, we calculate the log returns of the opening prices from dayt−ito dayt. This is a crucial step as we would like to test whether the change in sentiment values Granger causes the change in stock price. Moreover, we have to calculate the sentiment values, which are given by

St=

PPositive words_t PNegative words_t,

and further divide them into categories with the probabilities of the sentiment, having a positive or a negative impact on the given asset. Moreover, we construct a variable consisting of the mean of all the highest scoring sentiment values for each date, i.e., we pick the highest probability for each piece of news, regardless of the score being positive or negative, and then calculate the log changes in average sentiment value between each day. However, because the score we are using is probabilities, it means it does not indicate whether the sentiment of the news at a given date has the higher probability of impacting the asset positively or negatively, as it is given by a number between 0 and 1. Thus, we construct a dummy variable which counts the amount of positive and negative sentiment for each date. If

whereP(+) represents the probability of the sentiment being positive, andP(−) the probability of the sentiment being negative, then the dummy variable appends the number 1, and if

Number of Lags p-value F-test Number of days with news

the dummy variable appends the number -1. This is further incorporated into the variable where we calculated the log changes in average sentiment value between each day, in order to indicate whether the probabilities of the highest sentiment value scores were mostly positive or negative.

Once this has been incorporated, we can perform the Granger causality test in order to test whether the changes in sentiment values Granger causes the changes in stock prices.

By running the test, we found that on an aggregate level, there was statistical evidence that the sentiment of news Granger causes the changes in S&P500. The Granger cause test yielded the following results

Furthermore, we would like to test whether the sentiment of news Granger causes firm-specific stock returns. In this case, we look at the firm-specific news rather than a collection of all the news. In order to this, we have to separate the news corpora in such a way that we extract only the news that is relevant to the specific firms we would like to test. Also, in order to reduce the noise, we only pick the most relevant articles for firms with more than 30.000 news articles in the sample period. The relevance is scored as ’1’ if the name of the company is mentioned in the headline. When we have done this, the procedure is equivalent to the one for the aggregate level. By running the Granger cause test for a collection of firms, we found that although the firm-specific results are not as consistent as the aggregated ones, there is statistical evidence to say that the sentiment of news Granger causes firm-specific stock returns. The test gave the following results summarized in Table 30.

Firm Number of Lags p-value Number of days with news

Table 30: Granger causality test, firm level

13 Appendix C

13.1 Firm-Specific prediction model

Although our thesis is mainly about creating portfolios and testing whether the exposure to macroeconomic topics can help explain the difference in returns, we are also interested in explor-ing the predictive power of natural language processexplor-ing in the case of firm-specific stock returns.

This appendix will thus be dedicated to a time-series model that uses ensemble learning to detect complex relationships within a data set.

In general, Σ can be virtually anything, and it is reasonable to believe that we will be interested in changing or extending it as we progress. However, as a starting point, we are interested in finding predictive power in Norwegiann and S&P500 listed firms. More specifically, we want to find firm-specific risk factors in the returns of stocks which has yet to be documented. Our goal is to provide greater insight, building a model that provides researchers and practitioners more precise estimates, allowing for more well-informed decisions and/or interpretation of events that drives returns.

Related Literature

There are several types of information which firms are constitutionally obliged to publish at a certain point in time. Consequently, they are not able to time the disclosure of good and bad news. However, they might be able to communicate the distinct pieces of information in different styles (Larcker & Zakolyukina, 2012). Hence, many firms may use press releases voluntarily to reduce the information asymmetry and, as a result, the risk associated with their business (Graham et al., 2005). When such disclosures contain favorable information about the firm, the firm-specific risk – represented by the cost of capital and the volatility of stock returns – shrinks significantly.

On the contrary, the disclosure of unfavorable information is followed by an increase in these measures (Kothari et al., 2009). Furthermore, Penman (1987) found that news in earnings reports released during the first two weeks of calendar quarters was, on average, good news that affected the stock prices favorably, whereas reports issued later in the quarter were more likely to have an unfavorable effect on the stock price. This implies that firms release earnings reports early when they have good news, and delaying reports when the news is bad. This is consistent with the findings of Patell & Wolfson (1982) which found that the likelihood of ”bad news” corporate disclosures increases after the close of trading for the day, whereas ”good news”

is more likely to be released when the security markets are open. Intuitively, firms might try to publish unfavorable information and reports at points in time that might suit them better depending on the content of the announcement. Our model will not be affected by which point in time the reports are released, and thus we might be able to work around the issue of timed disclosures of good and bad news. However, it might be affected by the way the announcements are communicated, and consequently, this is a factor we have to be aware of.

In finance, a broadly acknowledged theory suggested by Eugene Fama is the Efficient-Market Hypothesis. More specifically, the semi-strong form of the Efficient-Market Hypothesis has be-come a broadly accepted financial theory. The semi-strong form suggests that all publicly avail-able information is immediately and thoroughly reflected by stock prices (Fama, 1970). Recent research, however, reveals that the news is reflected in the stock prices approximately half an hour after the publication of the press release and that the adjustment of the trading volume needs even more time (Muntermann & Guettler, 2007). Furthermore, Thorsrud & Larsen (2017) managed to show that although news topics are available in the morning, before the market opening, there was a significant underreaction in the market price to news and bright patterns

of continuation in the days following the initial news release. While our primary data source is annual- and quarterly reports, the same goes for this. This is one of the most important findings related to our thesis. From our sample, there is a clear pattern that reports are published before the market opening.

Consequently, our model depends on the market not being semi-strong efficient and thus that the Efficient-Market Hypothesis to some degree is violated, in such a way that we can react quicker than the market. We believe that we will be able to extract and interpret the information contained in company filings, news, and other text sources quicker than the market can react.

Thus, it seems theoretically possible to earn abnormal returns on the disclosure of firm-specific information, such as company filings and inside trading.

Methodology

Ensemble Learning

We want to apply an ensemble method since it helps us detect more complex relationships within a data set. An ensemble is simply a supervised learning algorithm. In general, we want to predict Σ based on a sample ofX topics (yi, xi). The algorithm works by taking a loss functionL(ˆy, y) as an input and search for a function ˆf that has low expected prediction lossE(y,x)

L( ˆf(x), y)i on anew data point from the same distribution.

Random forest

To try to identify possible drivers of Σ we propose that we should use a regression tree, more specifically arandom forest method.

Source

Here, each x_i can be thought of as a topic or event ranked by its relative importance and Σ, is the predicted outcome.

Choosing the optimal depth of the tree

Since this is a thesis on prediction, we are concerned about the out of sample performance.

In theory, using a random forest, we can overfit it to get a perfect fit. However, overfitting generally results in terrible out of sample performance, giving us an incentive to keep the tree as parsimonious as possible. Therefore we create an out-of-sample experiment inside the original sample, by fitting one part of the data and as which level of regularization leads to the best performance on the other part of the data. Initially we wanted to increase the efficiency of this procedure through cross-validation, by randomly partition the sample into equally sized sub-samples, which we refer to as folds. The estimation process then involves successively holding out one of the folds for evaluation while fitting the prediction function for a range of regularization parameters on all remaining folds. Finally, then pick the parameter with the best estimated average performance. However, we went back on this idea since it will potentially give us an unfair advantage. Many of the tools we are using are just recently developed, and the unfair advantage then comes from the fact that we are competing against a market with much simpler tools. To use the analogy of Espen Henriksen, if we participated in a Formula 1 race in the 60’s with a modern car then even we would most likely win the race. However, this does not imply that we are better race car drivers. We therefore use the last ten percent of the sample as a validation sample, and the former 90% for estimation.

Initial analysis & results

In this section we explore the predictive power of just the topics. We are initially interested in two things:

• Can LDA-classified topics explain the time-series of returns?

• Can LDA-classified topics explain the cross-section of returns?

Our aim in this section is to shed light on how models purely consisting of LDA-classified topics perform. This can serve as a benchmark for further implications. After thorough investigations, we have documented that indeed topics are highly time varying, serving as a good indication for the focus of the companies from quarter to quarter. To visualize this, we compute the Jaccard-distance of three LDA models estimated on the annual reports from 2014, 2015 and 2016 respectively. First, the Jaccard distance is a way of measuring the dissimilarities between two sets. In this case, each set is the words contained in each topic of each model. LetAandB denote the sets of words in two topics respectively. Then, the Jaccard distance can be computed as

dJ(A,B) = 1−J(A,B) =|A ∪ B| − |A ∩ B|

|A ∪ B| .

The result is illustrated in Figure 26. Initially we were hoping for time-varying topics, but we were surprised by how much each report actually vary from year to year. We chose to continue with the Jaccard distance as a similarity measure to keep track of topics over time.

Constructing the data matrix

To construct the data matrix which we will use to estimate our models, we had the following in mind. When the annual report is released it contains much information about the economies in which the company operates and in general a lot about the company itself, and a lot more than in each quarterly report. We therefore want to use this as a benchmark, and then for each quarterly report see how these topics change over time. If we let Dt ⊇ {Dt,1, Dt,1, ..., Dt,i} be the superset of all topics from an LDA model estimated in year t for containing i topics, then we use the Jaccard distance to see how each topicichange through the subsequent year.

First we will show how the scores are constructed, then an intuitive explanation will follow.

We construct ann×imatrixX, where each row corresponds to a time periodn∈ {(t,Q)}. We construct the scores by looking at how each topicichanges over time relative to its importance

to the document. Formally,

Xn=i−Tn,i, whereTn,i indicates the index of topiciin periodn,

T_n,i= argmin

T∈[1,i]

d_J(A_n,D_t).

Remember that n is simply a finer filtration than t due to the higher frequency of quarterly reports. We are free to scaleX as we like, the important thing is just that we capture how each topic’s relative importance change over time.

In our baseline model we are mainly interested in looking at whether we can predict the outcome, that is, whether the stock return will become positive or negative on days when the quarterly reports are published. We use an indicator function1{yn>0} to construct our binary vector,

I_n= 21{yn>0}−1,

wherey_n is the return of a stock in periodn. We see thatI_n takes on value−1 if we observe a stock return less than or equal to zero, and 1 otherwise. It then follows that an optimal strategy would be to short the stock whenIn =−1 and go long whenIn= 1.

Now that we have our data matrix and our response variable we can proceed to our regression

In document The Text Premium and Stock Returns (sider 90-123)