Let’s Google it
Can Google search indices nowcast
Norwegian retail sales and unemployment rate?
Jon Ellingsen
Master thesis
Department of Economics University of Oslo
May 10, 2017
c Jon Ellingsen 2017
Let’s Google it.
Can Google search indices nowcast
Norwegian retail sales and unemployment rate?
Jon Ellingsen
http://www.duo.uio.no
Print: Reprosentralen, Universitetet i Oslo
Preface
First of all, thanks to my supervisor, Leif Anders Thorsrud, for introducing me to Big Data and its applications in economics. Thank you for supporting me on my way to becoming a better econometrician.
I would also like to thank the research department at Norges Bank for letting me occupy a desk during the writing, and my office mate, Eyo Herstad, for great discussions during many coffee breaks. Also, I want to thank Nina Larsson Midthjell at the bank, for being a great mentor for me the last couple of years and helping me get in touch with my supervisor.
From Blindern, I would like to thank Professor Steinar Holden for awarding me the scholarship in macro and monetary policy issues for this thesis.
Last, I want to thank my wife, Kaja, for all her support during the process.
Although I have occupied a desk at Norges Bank during the writing, this thesis is not linked to Norges Bank in any way. And of course, any errors in the thesis are my own.
Abstract
Nowcasting economic aggregates with the use of timely data is an impor- tant challenge faced by central banks, and forecasting in general. I address this challenge by using timely search value indices (SVIs) from Google Trends to nowcast retail sales and the unemployment rate in Norway. By using least angle regression techniques to dynamically rank the top predictors from a large set of SVIs, I perform out-of-sample nowcasts where the nowcasting models include the top ranked SVIs. My findings show that all the nowcasting mod- els perform far better than a random walk, and some of the models that only include SVIs as predictors perform equally good as an AR(1) in nowcasting the unemployment rate. However, my findings suggest that, on average, none of the SVIs provide any valuable information complementary to simple au- toregressive models, like an AR(1). Nevertheless, due to their timeliness, the results indicate that SVIs may be valuable for nowcasting. Interestingly, when I only focus on the financial crisis period, where the Norwegian economy was hit by an unexpected large shock, some of the SVI models outperform an AR(1) by a substantial amount in nowcasting the unemployment rate. This finding suggest that the SVIs property as an early indicator detecting turning points, should be investigated further.
Contents
1 Introduction 1
2 Litterature and contribution 4
3 Data 6
4 Variable selection 13
5 Out-of-sample nowcasting 19
6 Empirical results 22
7 Discussion 37
8 Conclusion 39
Appendices 43
A Stationarity tests 43
B Correlations between the SVIs over time 44
C Correlations between the SVIs and the target variables 45
D List of SVIs 46
E Breakdown of the nowcasting models 47
1 Introduction
Predicting key economic aggregates is an important task for policy makers, including central banks. The lack of macroeconomic variables measured in real-time has led economists to search for other types of data than the standard data from the national accounts, in order to assess the current economic fluctuations. A few examples are surveys, interest rate spreads and other types of high frequent financial data. In the litterature, this is known as nowcasting. The basic principle of nowcasting is to use data, that are published earlier (more timely) and possibly also at a higher frequency than the target variable, as an early indicator before the official statistics are published, see Ba´nbura et al. (2013). Nowcasting the economy by using timely information sets, may improve policy decisions that, by nature, have to be made in real-time.
Nowadays, due to technological improvements, we face many new types of timely data. A particularly interesting trend is the evolution of so called Big Data, see Section 2. From a nowcasting perspective, these new sources of information have the potential to provide more accurate assessments of economic fluctuations. However, as the name indicates, these data are BIG. The implication is that, in order to avoid bringing a lot of noise into the predictive models, we need selection methods to help us make sense of the large volume of data available.
A special type of Big Data is internet search data. Through their online service, Google Trends1, Google publish disaggregated, near-real-time and high frequent data on internet search behavior. I refer to these series as search value indices (SVIs).
According to Choi and Varian (2012), SVIs are often correlated with economic indicators. Due to their timeliness and the large volume of data available, SVIs may provide valuable information used to nowcast economic fluctuations. There are reasons to believe that there is a widespread use of Google’s search engine in Norway. According to Statistics Norway (2015) and Statistics Norway (2016), a large share of the households in Norway exploit the internet to obtain information.
In 2016, 64 pct. of the respondents ordered goods online the last 12 months from a Norwegian producer. 86 pct. used the Internet for finding information about goods and services. Also, in 2015, 28 pct. of the respondents used the Internet to look for a job or sending a job application. It is plausible that many of these used search engines to find that information. According to StatCounter (2016), Google has approximately 90 pct. of the market share on search engines in Norway, making
1Seewww.google.com/trends.
it a representative source for search activity.
For Norway, few studies have used SVIs to nowcast the economy2. I use monthly SVIs from Google Trends back to 2004, to nowcast two target variables - retail sales and the unemployment rate. I choose these target variables because they are reported on a monthly frequency and closely monitored by market analysts and policy makers. My hypothesis is the following: to the extent that people use Google as a source of information related to choices they are about to make, the development in the Google SVIs may reveal intentions driving economic events before they are captured by the official statistics. Hence, the SVIs work as proxies for the public interest in factors related to retail sales and unemployment.
As noted in Da et al. (2015), the key to build an accurate predictive model with SVIs, is the identification of relevant sentiment-revealing search terms. In this paper, I do the following higher order steps. First, I collect timely data on SVIs for Norway from Google Trends. I download SVIs that I, subjectively believe to be potential predictors for the target variables. In particular, I use SVIs at different levels of aggregation - single queries and aggregated categories defined by Google - in separate analysis. This gives me a high dimensional data set, a typical characteristic of Big Data, consisting of more than 200 SVIs in total. Second, in order to build a nowcasting model that can give accurate out-of-sample predictions, I use the TS- LARS algorithm, developed by Gelper and Croux (2008), an algorithm that uses least angle regression, see Efron et al. (2004), to fit linear regression models to high- dimensional data. The TS-LARS ranks the SVIs, according to predictive power, and gives me a subset of the top ranked SVIs to include in the final nowcasting model, according to an information criterion.
In order to evaluate the performance of the nowcasting models, I divide the time series into two samples: a training sample and a test sample. I use the training sample to fit the models with the TS-LARS algorithm, and the test sample to evaluate their out-of-sample performance. The estimation is done both with an expanding and a rolling window. I compare the perfomance of the nowcasts from the SVI models to the performance of two simple benchmark models - an autoregressive model of order 1 (AR(1)), following Choi and Varian (2012), and a random walk.
I have three main findings. First, single query SVIs tend to be highly unstable over time, especially in the beginning of the sample period, from 2004 - 2006. This
2The only study, that I know of, is Anvik and Gjelstad (2010) who used SVIs to nowcast unem- ployment in Norway. However, this study had a very short sample compared to what is available today.
finding, as well as a desire to capture broader trends, suggests that the SVIs should be aggregated into broader measures, to capture the common underlying signals.
Second, all the nowcasting models that include the top ranked SVIs as predictors perform far better than a random walk. Further, I find that the models that include SVIs, and not any autoregressive terms, as predictors, in some cases, on average, perform equally good as the AR(1) in nowcasting the unemployment rate. However, I find that the SVIs, on average, do not provide any valuable information comple- mentary to the AR(1). I stress that this paper uses end-of-month data. Hence, the advantage of using SVIs for nowcasting during the current month, before the data on the target variable for the previous month are released, is not quantified. Since some of the SVIs are equally good predictors as the lag of the target variable, and available earlier, using them for nowcasting throughout the current month might be valuable. Third, inspired by Choi and Varian (2012), I find that some of the nowcasting models that included SVIs as predictors outperformed the AR(1) during the financial crisis in 2008/2009. Hence, it might be that the largest potential in SVIs is related to their ability to nowcast sudden fluctuations more accurately than simple autoregressive models.
My findings suggest that, for Norwegian data, further work is still to be done, in order to extract the valuable signals from the large set of SVIs available. An interesting extension to my study might be to use statistical methods to aggregate the query SVIs into categories based on their common variation, e.g. by using principal components or dynamic factor models. Further, it would be interesting to investigate the nowcasting performance of the SVIs throughout the month, in order to exploit their timeliness.
I have used the software R, see R Core Team (2016), to perform all the analysis in the thesis. The codes are available upon request. The rest of this paper is organized as follows. Section 2 describes Big Data, nowcasting and the litterature on the use of SVIs for prediction. Section 3 describes the data. Section 4 describes the variable selection methodology. Section 5 describes the out-of-sample exercise. Section 6 presents the empirical results. Section 7 provides a discussion of my approach to the problem, and possible future extensions to the paper. Section 8 concludes. The Appendix at the end provide additional information.
2 Litterature and contribution
Big Data is a collective term for massive data sets that consist of large, more varied and complex structures, see Zikopoulos, Eaton, et al. (2011). These types of data are usually characterized by three properties - the three v’s - variety, velocity and volume. Variety refers to all the different types of structures on the data, e.g.
newspaper text, see Thorsrud (2016), and twitter feeds, see Antenucci et al. (2014).
Velocity describes the gathering process as close to real time and volume refers to the high dimensionality of the data available.
Big Data becomes relevant in the field of economics due to the lack of ”hard”
economic data available in real-time. Spesifically, one type of big data, SVIs, may be relevant for nowcasting, both due to the velocity of the data which is near real- time, and because of the large volume of dissagregated series. As pointed out in Wu and Brynjolfsson (2015), the SVIs may reveal valuable information about the individual’s intentions to make an economic transaction3. If this is the fact, these revealed intentions may be used to predict economic aggregates.
My study relates to several attempts to use internet search data for prediction, in various fields. A famous example outside the field of economics is Ginsberg et al. (2009), who used search data to predict the incidence of influenza-like diseases.
Within the field of economics there has been much focus on variables like unemploy- ment, retail sales, consumption and house prices4. The reason is twofold. Firstly, these variables are all important for the state of the economy and linked to im- portant variables like GDP and inflation. Secondly, they are typically reported on a monthly frequency. Since Google Trends starts in 2004, it is hard to evaluate predictive performance of variables that are measured at a lower frequency. This paper relates most closely to Anvik and Gjelstad (2010)5, in terms of the research question and area of application. They use 19 different queries, aggregated into 4 categories motivated by search theory, to nowcast Norwegian unemployment. One of their main findings is that adding SVIs to an ARIMA model of the unemployment improves the prediction accuracy by up to 18 pct. over twelve months. However, due
3The economic transaction can e.g. be related to buying goods, or it can be related to the employ- ment situation.
4According to Choi and Varian (2012) the first attempt to use internet search data for prediction in economics was Ettredge et al. (2005).
5As far as I know, Anvik and Gjelstad (2010) is the only study on the use of SVIs for nowcasting the Norwegian economy.
to a short sample6, these results may be driven by a few large events, e.g. the fact that their pseudo-out-of-sample exercise started in June 2009, right after the Norwe- gian unemployment rate had been through a substantial negative shock during the financial crisis, see Figures 2a and 2b. Fortunately, we now have a longer sample of data available, and hence I am able to assess the stability of the performance of the SVI models during a longer period of time. Further, I improve upon the analysis in Anvik and Gjelstad (2010) by using a dynamic variable selection method to choose the top predictors throughout the sample, and therefore allow the nowcasting model to be more flexible over time.
There are several other examples of papers studying the use of SVIs to predict economic variables, and I will briefly present some of them. Perhaps the most fa- mous example is Choi and Varian (2012). A particularly interesting finding in this paper is that some SVIs seem to help in identifying some of the turning points in initial claims for unemployment. Carri`ere-Swallow and Labb´e (2013) create an index consisting of 9 SVIs, specifically queries containing 9 different car manufacturers, to nowcast automobile sales in Chile. They find that the models that incorporated the constructed index outperformed both the in-sample and out-of-sample fit relative to simple benchmark models, specifically ARMA models. Vosen and Schmidt (2012) use principal components analysis on a set of SVIs and use the extracted factors as predictors to nowcast consumption in Germany. They find that the models that include the SVI factors improve the out-of-sample nowcasts relative to survey-based indicators like the consumer confidence indicator and the retail trade confidence in- dicator, although not all the differences were statistically significant. Da et al. (2015) use online dictionaries to obtain a large set of daily SVIs, and perform backward- rolling regressions to pick the top predictors that they use to construct an index measuring investor sentiment. McLaren and Shanbhogue (2011) demonstrate the interest in SVIs from the perspective of a central bank, here represented by the Bank of England. They use SVIs related to unemployment and housing to nowcast unemployment and house prices in the UK. In an out-of-sample exercise they show that an SVI representing one single query (for each target variable) outperforms an AR(2) model in nowcasting unemployment and house prices, respectively, over a period of 31 months. It should be noted that they use the whole sample to choose the SVIs, including the evaluation sample on 31 periods.
In terms of methodology, there are several approaches for variable selection.
6The sample used in Anvik and Gjelstad (2010) was only approximately 6 years.
Some widely used examples are GETS, see Campos et al. (2003), forward stepwise regression and the Least Absolute Shrinkage and Selection Operator (LASSO), see Tibshirani (1996). LAR has several advantages. Firstly, contrary to GETS, LAR is designed to handle the case where the number of variables exceed the number of observations, because the full unrestricted model is never estimated. Secondly, LAR does not involve any testing, and thus avoids the challenge of inflated type-1 errors. Thirdly, the LAR is less ”greedy” than e.g. forward stepwise regression, because it does not project all the covariates on the response, and hence does not force the residuals to be orthogonal to the predictors. Fourthly, contrary to the LASSO, the LAR algorithm is easier in a time series context because it does not require the specification of any tuning parameters, which are usually chosen by cross validation.
3 Data
I use monthly data from January 2004 - January 2017. This sample is chosen because the data from Google Trends start in January 2004. The structure of this Section is the following. First, in Section 3.1, I describe the SVIs collected from Google Trends, including the selection process, see 3.1.1, and the transformations I apply, see Section 3.1.2. Second, in Section 3.2, I describe the two target variables, i.e.
retail sales and the unemployment rate, and the transformations I apply.
3.1 Search value indices
I use the service Google Trends to collect the search value indices (SVIs)7. To collect the SVIs, I use the R-package gtrendsR, developed by Massicotte and Eddelbuettel (2017). Google Trends provides indices of search activity on specific terms across time and geographical location. These indices measure the fraction of queries that include the specified term relative to the total number of queries, within a specific geographical area at a specific time. This feature of the data adjusts the SVIs for a general common trend in search activity. Furthermore, the SVIs are scaled such that the highest point in each SVI is set to 100. As a consequence, it is not possible to compare the volume of different queries from Google Trends8. To summarize; if
7Stephens-Davidowitz and Varian (2014) provide a detailed introduction to using Google Trends for research.
8It is possible to scale multiple SVIs so that they may be compared in volume, but the number of SVIs compared simultaneously is restricted to 5.
an SVI is increasing, this should be interpreted as an increase in searches for the specified search term as a percentage of the total amount of searches. The SVIs go back to January 2004. When downloading an SVI from a sample larger than 5 years, the frequency is monthly, and the monthly SVIs are updated on a daily basis.
This timeliness make them interesting from a nowcasting perspective.
There are some more aspects of the search data from Google Trends worth men- tioning. Firstly, due to privacy issues, all searches below an unreported threshold, in total volume, will be reported as 0. Hence, in smaller countries, like Norway, one might encounter this problem more often than in e.g. the US. Secondly, repeated searches from the same person over a short period of time are eliminated. Thirdly, the data reported comes from an unbiased sample of the population of searches.
Hence, the SVIs will vary from sample to sample, making the analysis more vul- nerable to outliers. I download the SVIs at different days and find that they are relatively stable over time, and hence that the effect of the sampling property proba- bly is negligible9. See Figure 12 in Appendix B for the a summary of the correlations between each particular SVI downloaded at different dates.
There are several challenges associated with using SVIs for prediction. Three of these are especially relevant for the problem adressed in this paper. First, what is the appropriate delimitation for choosing the set of potential predictors among all the available SVIs? Second, what is the appropriate level of aggregation of these SVIs? Third, how to choose the top predictors in a predictive model from the large set of potential predictors? See Section 4 for my approach to the third challenge concerning the variable selection. I address the first and second challenge below, in the opposite order.
3.1.1 The selection of relevant SVIs
I address the second challenge, of choosing the appropriate level of aggregation, in the following way. One the one hand, the low level of aggregation is one of the main reasons that we use SVIs for prediction. On the other hand, too low level of aggre- gation may lead the model to pay too much attention to random noise. There are many statistical methods for aggregating time series, e.g. simple unweighted means, principal components, dynamic factors etc. To keep the analysis simple, transpar- ent and easy to interpret, I choose the levels of aggregation available directly from
9One could also use an average of the SVIs downloaded at consecutive days to adjust for the sampling effect.
Google Trends. Google Trends provide two types of SVIs - single queries and aggre- gated categories. Each query is assigned into one or multiple categories by Google.
The categories are divided into main categories and subcategories, which refer to the level of aggregation. For example, one of the main categories are ”Shopping”.
This main category has several subcategories, and one of them is the subcategory
”Apparel”. One of the top queries, in terms of number of searches, within the sub- category ”Apparel”, is ”Nike”. Figure 1 displays these three SVIs. The left column displays the raw indices, and the right column displays the transformed SVIs I use in the analysis.
I address the first challenge, of defining the appropriate delimitation for choosing the set of potential predictors among all the available SVIs, in the following way.
First, I pick, from a list of over 1400 predefined categories and subcategories in Google Trends, a subset of SVIs that are related to the target variables. Many of these categories lack the amount of data necessary to be reported in Google Trends. For retail sales, I end up with 51 category SVIs that form one set of potential predictors for retail sales. Next, I obtain the top 25 queries in the chosen categories10, subjectively remove the queries that are unrelated to retail sales and add my own queries based on intuition. As for the categories, a large share of these queries lack the amount of data necessary to be reported in Google Trends. I end up with 148 query SVIs that form another set of potential predictors for retail sales.
I repeat the same excercise for the unemployment rate. There, I end up with 6 category SVIs and 31 query SVIs. The reason for the substantial difference in the amount of SVIs related to retail sales and unemployment, is simply because there are less predefined categories directly related to (un)employment11. In sum, I end up with 51 category SVIs and 149 query SVIs related to retail sales, and 6 category SVIs and 31 query SVIs related to unemployment. I refer to Table 6 in Appendix D for a list of the categories and queries I have used.
10Google Trends reports top queries, in terms of the volume of searches, for each category.
11In order to decrease the risk of spuriousity, I choose to only include categories that are directly related to (un)employment
(a)Main categegory: Shopping.
Original index.
2004 2006 2008 2010 2012 2014 2016 50
60 70 80 90 100 110
50 60 70 80 90 100 110
(b)Main categegory: Shopping.
Transformed monthly growth rate.
2006 2008 2010 2012 2014 2016
−10
−5 0 5 10 15
−10
−5 0 5 10 15
(c)Subcategegory: Apparel.
Original index.
2004 2006 2008 2010 2012 2014 2016 40
60 80 100
40 60 80 100
(d)Subcategegory: Apparel.
Transformed monthly growth rate.
2006 2008 2010 2012 2014 2016
−10 0 10 20
−10 0 10 20
(e)Single query: Nike.
Original index.
2004 2006 2008 2010 2012 2014 2016 20
40 60 80 100
20 40 60 80 100
(f )Single query: Nike.
Transformed monthly growth rate.
2006 2008 2010 2012 2014 2016
−40
−20 0 20
−40
−20 0 20
Source: Google Trends
Figure 1. Examples of search value indices from Google Trends. The rows show SVIs on three different levels of aggregation, from highest to lowest. The series in the right panel are the seasonally adjusted and winsorized growth rates from the original indices in the left panel, as described in section 3.1.2. January 2004 - January 2017.
3.1.2 Transformations of the SVIs
All the SVIs are transformed in the following way, as in Da et al. (2015). First, in order to remove trends in the data, I take the first difference of the logarithm of the series, and get an approximation of the monthly growth rate. Second, I winsorize the data at the 95 pct. level in order to remove extreme outliers that are present in the data. This means that I, for each SVI, set all observations below the 2.5 percentile equal to the 2.5 percentile and the observations above the 97.5 percentile to the 97.5 percentile. Third, due to the presence of seasonality in the SVIs, I seasonally adjust all the winsorized growth rates by regressing them on 12 monthly dummies12, where the associated residuals are kept as the seasonally adjusted series. Finally I multiply the series by 100, to get percentages as the unit. These winsorized and seasonally adjusted growth rates are the series I continue to work with. This simple method for seasonal adjustment is fine, as long as the seasonal pattern remains constant over time. Given the large amount of series, which would make individual investigation time consuming, I choose to do this simplification. The right column in Figure 1 show three examples of the transformed SVIs. Table 2 gives a summary of the main statistics for the transformed SVIs. Figure 13 in Appendix C shows the distribution of the correlations between the transformed SVIs and the transformed target variables. These figures suggest that the SVIs and the target variables are weakly correlated contemporaneously when we look at the whole sample.
3.2 Target variables
I use the monthly and seasonally adjusted index of retail sales and the registered unemployment rate13 reported by the official sources, Statistics Norway and NAV, respectively. Both of the variables are normally reported with approximately 1 month lag. Hence, the timeliness of the SVIs gives me data to use for nowcasting approximately 1 month before the lag of the target variable is reported. In order to get stationary target variables, I transform the series to remove trends present in the data. For the index of retail sales, I take the first difference of the logarithm of the series, and get the approximated monthly growth rate. As with the SVIs I multiply this series by 100. For the unemployment rate, I simply take the first difference, and get the change in the rate, measured as percentage points. I perform
12To avoid perfect multicollinearity, I exclude the intercept.
13The unemployment rate from NAV that I use is the one measuring the rate of ”totally unem- ployed” registered at NAV.
(a)Index of retail sales.
2004 2006 2008 2010 2012 2014 2016 80
90 100 110 120
80 90 100 110 120
(b)Monthly growth rate of retail sales.
2006 2008 2010 2012 2014 2016
−4
−3
−2
−1 0 1 2 3
−4
−3
−2
−1 0 1 2 3
(c)Unemployment rate.
2004 2006 2008 2010 2012 2014 2016 1.5
2.0 2.5 3.0 3.5 4.0
1.5 2.0 2.5 3.0 3.5 4.0
(d)Monthly change in the unemployment rate.
2006 2008 2010 2012 2014 2016
−0.2
−0.1 0.0 0.1 0.2
−0.2
−0.1 0.0 0.1 0.2
Sources: Statistics Norway and NAV
Figure 2. Target variables. Retail sales and unemployment rate. All the data are seasonally adjusted from the source. January 2004 - January 2017.
an Augmented Dickey-Fuller test, see Dickey and Fuller (1979), to ensure that the transformed variables are stationary, see Table 5 in Appendix A. The reason that I choose to nowcast the seasonally adjusted series is that these are the series payed most attention to by market analysts and policy makers, who want to correct for the
”normal” seasonal variation. Hence, I am interested in how well the SVIs nowcast the cyclical components, rather than the seasonal components. Using unadjusted data might lead us to think that the SVIs perform well, even in the case where their only predictive power is related to the seasonal components of the data. See Figure 2 for plots of the target variables and their transformations.
Variable Source Frequency Comments
Google SVIs Google Trends Monthly Relative frequency index
Monthly data updated every day
Transformations:
- Monthly growth rate in percentages∗ - Winsorization at 95 pct.∗∗
- Seasonal adjustment by monthly dummies∗∗∗
Number of SVIs related to retail sales:
- Categories: 51 - Queries: 148
Number of SVIs related to unemployment:
- Categories: 6 - Queries: 31
Extracted : 03.04.17 Retail sales Statistics Norway Monthly Volume index
Normally published 28-30 days after the end of the month
Seasonally adjusted from source
Transformations:
- Monthly growth rate in percentages∗
Extracted: 13.03.17
Unemployment rate NAV Monthly Registered totally unemployed
Normally published 4 weeks after the end of the month
Seasonally adjusted from source
Transformations:
- Monthly change in percentage points
Extracted: 23.03.17
∗Approximated by 100 times the first difference of the logarithm of the series.
∗∗This is done in order to remove extreme outliers that are present in several of the SVIs.
∗∗∗I regress the transformed SVIs on 12 monthly dummies where I exclude the intercept.
The associated residuals are the seasonally adjusted series.
Table 1. Description of the data. January 2004 - January 2017.
Mean Median Standard deviation
∆log(RSt) 0.1532 0.2788 1.0747
∆log(SV Ic,tRS)∗ -0.0000 0.0077 9.1767
∆log(SV Iq,tRS)∗ 0.0000 -0.1325 16.8900
∆Ut -0.0062 -0.0082 0.0702
∆log(SV Ic,tU) -0.0000∗ 0.0000∗ 8.8558
∆log(SV Iq,tU) 0.0000∗ -0.0000∗ 16.5911
∗Numbers are too low to be reported with 4 decimals.
Table 2. Descriptive statistics. For the SVIs, all the statistics refer to the median of the relevant statistics for the individual SVIs. The unit of measure is percentages.
Table 2 displays some descriptive statistics for both the target variables and the SVIs. The standard deviations indicate that the SVIs are a lot more volatile than the target variables, and, as we should expect, the volatility is negatively related to the level of aggregation. Hence, categories that are weighted averages of single queries, are less volatile than single queries. In order to simplify the variable names, I introduce some notation that I will use throughout the paper. Let ∆log(RSt) and
∆Ut denote the transformed retail sales index and unemployment rate in period t, respectively. Further, let ∆log(SV Ix,ty ) denote the transformed SVIs in period t, where x = {c, q} refers to whether the SVI is at the category or query level, and y={RS, U} refers to which target variable the SVI is related to.
4 Variable selection
After collecting all the SVIs that form the set of potential predictors, see Section 3.1, I use a variable selection mechanism in order to reduce the dimensionality of the nowcasting model. I use an algorithm building on LARS14, first introduced by Efron et al. (2004), which is an algorithm for fitting linear models to high-dimensional data.
The LARS is designed for cross-sectional data. Because I work with time series, I build on the time series extension of the LARS, known as the TS-LARS, developed
14LARS is an abbreviation for Least Angle Regression, and the S represents ”LASSO” and ”Stage- wise” which are related algorithms.
by Gelper and Croux (2008). There are two differences between the LARS and the TS-LARS. Firstly, the TS-LARS includes the predictors as blocks, consisting of the lags of the predictors. Secondly, while the original LARS uses the Mallows’s Cp as the selection criterion, the TS-LARS uses the Bayesian Information Criterion (BIC), which is more suitable for time series. I make small modifications to the TS-LARS algorithm to make it fit my problem. First, in section 4.1, I explain the original TS-LARS algorithm. Then, in section 4.2, I explain my extensions.
4.1 The TS-LARS algorithm
The goal of a predictive model is to, in periodt, make a prediction of a target variable in periodt+h, which is denotedyt+h, whereh= 0 represents the nowcasting model.
To do so, a large number, m, of potential predictors, xj,t, where j = 1, . . . , m, may be considered. This problem can be represented by the following unrestricted time series model:
yt+h =β0,0yt+. . .+β0,p0yt−p0
+β1,0x1,t+. . .+β1,p1x1,t−p1 +. . .+βm,0xm,t+. . .+βm,pmxm,t−pm+εt+h
(1) whereh≥1 is the forecast horizon and the intercept is excluded for simplicity. The history of the target variable is included up to lagp0 and the history of predictorj is included up to lagpj wherej = 1, . . . , m. Each of the predictors enters the model as a block, i.e. a matrix where the columns are the lagged values of the predictor.
Denote thej’th predictor block asxj. All the time series are covariance stationary15 by assumption. In order to simplify calculations, the TS-LARS standardizes all the variables to have a mean of zero and a unit variance16. Therefore, there is no intercept in the model. The hypotheses is that only a subset of the predictors in model (1) are relevant for predicting the target variable. Hence, the aim of the TS-LARS algorithm is to obtain a reduced model from model (1), that hopefully will improve the predictive power. The TS-LARS algorithm can be divided into two main steps. First, it ranks the potential predictors by the use of least angle regression, see section 4.1.1. Second, it chooses the optimal number of predictors
15A time series is covariance stationary if the mean and covariance of the process do not depend on time.
16This is done by subtracting the mean and divide by the standard deviation.
and lags to include in the predictive model, by minimizing the BIC over the set of ranked predictors, see section 4.1.2.
4.1.1 Ranking the predictors
The following explains how the TS-LARS algorithm ranks the predictors. First, the algorithm fits an autoregressive model to the target variable by OLS17. Denote the standardized18 residuals from that model,z0.
Now, the aim of the TS-LARS is to find the predictors that best predict this residual. The TS-LARS thus ranks the variables according to how much they im- prove the in-sample fit from the simple autoregressive model. To do so, the algorithm finds the first ranked predictor block, xj, which is the one that maximizes the R2 from an OLS regression of z0 on xj for j = 1, . . . , m, denoted R2(z0 ∼ xj). Recall that xj is a matrix whose columns are the lagged xj’s. The first ranked predictor block is denoted by x(1). x(1) is then included as the first predictor block in the active set, denotedA. The active set,A, will expand by one predictor block for each stage, and it will always include all the predictor blocks ranked so far. Denote the complement of the active set byAc, i.e. all the predictor blocks not ranked yet. Let H(1) be the projection matrix on the column space of x(1) such that
H(1) =x(1)(x0(1)x(1))−1x0(1) (2) Hence, ˆz0 = H(1)z0 will be the vector of fitted values. Furthermore, the current target variable,z0, is updated by removing the effect of x(1):
z1 =z0−γ1zˆ0 where 0≤γ1 ≤1 (3) In OLS, γ1 = 1, but this will not be the case in least angle regression. γ1 is called the shrinkage factor, and represents the nice property of the TS-LARS; it shrinks the OLS parameter, ˆz, towards zero. γ1 is chosen as the smallest positive value, such that:
R2(z0−γ1zˆ0 ∼x(1)) =R2(z0−γ1zˆ0 ∼xj) where j ∈Ac (4) The solution of condition (4), as shown in Gelper and Croux (2008), is the same solution obtained by solving the following quadratic equation in γ:
z00(H(1)−Hj)z0+z00(H(1)Hj+HjH(1)−2H(1))z0γ+z00(H(1)−H(1)HjH(1))z0γ2 = 0 (5)
17The order of the AR model that the TS-LARS starts by fitting is specified by the user.
18Standardized refers to a time series that has a mean of zero and unit variance.
where Hj = x(j)(x0(j)x(j))−1x0(j). As shown in Gelper and Croux (2008) there are two solutions for condition (5), where at least one of them is between zero and one.
As described above, γ1 is chosen as the smallest positive solution to condition (5), wherej runs over the whole non-active set, Ac.
We can simplify equation (5) and avoid using multiple matrix multiplications.
Let ˜x(1) be the standardized ˆz0, such that:
˜
x(1) = zˆ0
s1 where s21 = zˆ00zˆ0
T −1 (6)
where T is the number of observations in ˆz0. Further, Gelper and Croux (2008) show that equation (5) is equivalent to:
(T−1)s21−z00Hjz0+2(z00Hjx˜(1)−(T−1)s1)(s1γ)+((T−1)−x˜0(1)Hjx˜(1))(s1γ)2 = 0 (7) which is computationally faster to solve.
The shrinkage parameter,γ1, in equation (3), is chosen simultaneously with the next ranked predictor block that is included in the active set. Denote the second ranked predictor block byx(2). Now, the active set,A, contains two predictor blocks.
Furthermore, the current target variable is updated according to equation (3), and then scaled as in equation (6). These operations form the first step in the ranking part of the TS-LARS algorithm.
All the further steps in the ranking process have the same structure. Let the current step be denotedk ≥2. Now,A containsk ranked predictor blocks, denoted x(1), x(2), . . . , x(k). Denote the current target variable by zk−1, and let ˜x(i) be the standardized vector of fitted values for i = 1,2, . . . , k. Now the, TS-LARS will obtain the so-called equiangular vector, uk, which is defined as the vector that is equally correlated with the vectors ˜x(1),x˜(2), . . . ,x˜(k). Let that spesific correlation coefficient be denoted byak:
ak= Cor(uk,x˜(1)) = Cor(uk,x˜(2)) =. . .= Cor(uk,x˜(k)) (8) LetRk be the correlation matrix for ˜x(1),x˜(2), . . . ,x˜(k), and let1kbe a vector of ones of length k. Then,
uk = (˜x(1),x˜(2), . . . ,x˜(k))wk where wk= R−1k 1k q
10kR−1k 1k
(9) Next, the current target variable is updated as in equation (3), but now it moves along the direction of the equiangular vector, uk, such that:
zk=zk−1−γkuk (10)
Again,γkis obtained as the smallest positive value such that the following condition holds
R2(zk−1−γuk ∼x˜(k)) = R2(zk−1−γuk ∼xj) (11) The chosen predictor, denoted xk+1, is then added to the active set, A, and the current target variable is updated according to equation (10), standardized and denoted zk. Gelper and Croux (2008) prove the following lemma:
Lemma 4.1 For every step k ≤1 in the TS-LARS algorithm, it holds that 1. The current response zk−1 has equal and positive correlation with all
(˜x(1),x˜(2), . . . ,x˜(k)) in the active set:
rk=Cor(zk−1,x˜(1)) =. . .=Cor(zk−1,x˜(k))≥0 (12) 2. For every j not in the active set, it holds that
R2(zk−1 ∼xj)≤r2k
3. For the solution γk to (11) it holds that 0≤γk ≤rk/ak
They show that, from the above lemma, we can replace the index k in equation (11) by any other number from 1 tok. Further, they show that, by usingrkfrom equation (12), condition (10) is equivalent to solving the following quadratic equation inγ:
(T −1)r2k−zk−10 Hjzk−1+ 2(zk−10 Hjuk−(T −1)akrk)γ+ ((T −1)a2k−u0kHjuk)γ2 = 0 (13) The TS-LARS will solve equation (13) for allj ∈Acand picks the smallest positive solution. The chosen predictor block is, as before, added to the active set, and denoted x(k+1). The TS-LARS will continue this process until all the potential predictor blocks are ranked.
4.1.2 Selecting the optimal number of predictors and lags
The last step in the TS-LARS algorithm chooses how many of the top ranked pre- dictors and lags of them to include in the final predictive model. This selection is done by minimizing the bayesian information criterion (BIC). The BIC is defined as:
BIC =−2log-likelihood +ln(n)k (14)
where n = # of variables and k = # of parameters. Hence, because a lower value is preferred, the BIC rewards the in-sample fit, by the first term, and penalizes the number of parameters in the model, by the second term.
The selection process is performed in the following way. For every step in the TS- LARS where a new predictor block is included, the algorithm estimates the model with the predictor blocks in the active set with OLS, and stores the BIC value.
In addition every step is performed for different lag lengths, and hence different number of columns in the predictor blocks. The selection process thus boils down to choosing the model with the lowest BIC value among all the estimated models.
For simplicity, the lag length is fixed across predictors. As an example, if I specify the model to rank 5 variables and include maximum 2 lags, the number of models to evaluate by the BIC will be 10 (5×2).
4.2 Modifications of the TS-LARS algorithm
I perform two modifications of the TS-LARS algorithm, presented in Gelper and Croux (2008), to suit my research question. Firstly, since I perform nowcasting, I include the predictors contemporaneously in the predictive model, as well as their lags. Secondly, in addition to conditioning on the AR(1), as in the TS-LARS al- gorithm, I also run the run the TS-LARS algorithm without conditioning on the AR(1), i.e. I run a predictive model that only includes the SVIs as predictors. The first approach, where I condition on the AR(1), will highlight the value added, rela- tive to an AR(1) model, produced by the SVIs. The second approach will highlight how well the SVIs perform in predicting the target variable at their own. Due to the timeliness of the SVIs, the second approach will make nowcasts possible during the current month before the lag of the target variable is published. However, I stress that this analysis use end-of-month data for the SVIs, due to the practice of Google Trends concerning historical data.
I set the maximum number of predictors to include in the model to 519. Further- more, I allow maximum 1 lag of the predictors to be included. I have performed a crosscheck where I find that increasing the maximum number of predictors to include in the model to 10 does not improve the performance. In terms of interpretability, I prefer a parsimonious model over a large one, and hence choose to set the restriction in the TS-LARS to maximum 5 predictors each period.
19Lags of the predictors and the autoregressive term comes in addition to these 5 predictors.
5 Out-of-sample nowcasting
In order to evaluate the nowcasting performance of the models, I split the sample in two; a training sample and a test sample. The training sample is used to fit the model, and the test sample is used to evaluate the out-of-sample performance of the model. From a prediction point og view, we are typically interested in the latter. Evaluating the model by its out-of-sample performance reduces the risk of overfitting20. The initialtraining sample runs from February 2004 - December 2009, and gives me 71 observations. The initial test sample runs from January 2010 - January 2017, and gives me 85 observations. The reason for this division is that I get a reasonable amount of data for the estimation, as well as a long test sample to evaluate the models over time. By starting the out-of-sample exercise in 2010, I do not include the financial crisis in 2008/2009 in the test sample. However, I perform an additional out-of-sample exercise for this particular period. I compute the out-of-sample nowcasts using two different methods: expanding window and rolling window. The first has the advantage of increasing the sample size along the way. The latter has the advantage of paying more attention to the recent past than the far past. This can be an advantage in those cases where the time series are subject to large shocks in the middle of the sample.
As pointed out in Scott and Varian (2014), an effective nowcasting model will consider both the past behaviour of the target variable and the contemporaneous signals from e.g. SVIs. Hence, I run two different specifications of the predictive model. In the first type of model I condition on an AR(1) model. I denote these models by AR-SVI models. The second type of model has SVIs as the only predic- tors. I denote these models by SVI models. To distinguish between the models that include the category SVIs and the query SVIs, I add a subscript, c or q, respectively.
The first approach, where I condition on the AR(1), will highlight the value added, relative to an AR(1) model, produced by the SVIs. The second approach will high- light how well the SVIs perform in predicting the target variable at their own. The following elaborates on the out-of-sample exercise.
5.1 Estimation
Lettand tdenote the first observations in the traning and test sample, respectively.
Further, let ¯t =t−1 and ¯T denote the sample from t to ¯t. Finally, let t∗ denote a
20Overfitting refers to a model with too many parameters relative to the number of observations.
This increases the risk of obtaining noisy predictions out-of-sample.
period within ¯T. I use the TS-LARS algorithm to fit the appropriate model up to period ¯t. Next, I estimate the following nowcasting models with the optimal number of ranked predictors, saym∗, and lags, say p∗, by OLS:
∆log(RSt∗) = β0∆log(RSt∗−1) +
m∗
X
i=1 p∗
X
j=0
βi,j∆log(SV Ix, i, tRS ∗−j) +εt∗ (15)
∆Ut∗ =β0∆Ut∗−1+
m∗
X
i=1 p∗
X
j=0
βi,j∆log(SV Ix, i, tU ∗−j) +εt∗ (16) where p∗ = {0,1} depending on whether a lag of the SVIs is included or not and x={c, q} depending on whether category SVIs or query SVIs are used. Note that, in the SVI model,β0 = 0 by construction in equation (15) and (16). In the AR-SVI model, there is no restriction on β0. In the rolling estimation case, t will increase incrementally by one for each period, such that the length of ¯T remains constant. In the expanding estimation case,tremains constant, so that the length of ¯T increases for each period. I estimate all the models up to the second to last observation in the test sample.
5.2 Prediction
I use the estimated OLS-parameters from models (15) and (16) to make nowcasts of the target variables in period ¯t+ 1:
∆log( ˆRS¯t+1) = ˆβ0∆log(RS¯t) +
m∗
X
i=1 p∗
X
j=0
βˆi,j∆log(SV Ix, i,RS ¯t+1−j) (17)
∆ ˆUt+1¯ = ˆβ0∆U¯t+
m∗
X
i=1 p∗
X
j=0
βˆi,j∆log(SV Ix, i,U ¯t+1−j) (18) From equation (17) and (18) we observe the following. Firstly,RSt¯and U¯t become available in the end of period ¯t, see Section 3.2. This means that I can only perform a nowcast with the AR-SVI models at the end of the current period, when the lag is released. Remember that this still gives a prediction of the target variables 1 month prior to the release of the official statistics. However, the SVI model can give a nowcast of the target variable in period ¯t from the start of period ¯t, as the SVIs are published in near-real-time. How the SVIs perform in nowcasting throughout the current month is something I do not investigate due to the way Google Trends report historical data. However, if the searches are relatively constant within a month, nowcasts from the SVI model throughout the current month may be valuable.
5.3 Evaluation
The out-of-sample exercise provides predictions of the target variables for each pe- riod in the test sample. In order to evaluate the predictive performance of the models, I compare the out-of-sample predictions from the SVI/AR-SVI models to two benchmark models - the AR(1), following Choi and Varian (2012), and a random walk. The predictions from the AR(1) are equal to the predictions from equation (17) and (18), with the restriction that all the βi,j = 0. Further, the predictions from a random walk is equal to the predictions from the AR(1) but with a restriction that β0 = 1.
I use the root mean squared error (RMSE), as described in Bjørnland and Thorsrud (2015), to measure the accuracy of the predictive models. Denote the length of thetest sample by P. The RMSE is defined as:
RMSERS = 1 P
P
X
i=1
∆log(RS¯t+i)−∆log( ˆRSt+i¯ )2
= 1 P
P
X
i=1
eRS¯t+i
2
(19)
RMSEU = 1 P
P
X
i=1
∆U¯t+i−∆ ˆU¯t+i
2
= 1 P
P
X
i=1
eU¯t+i
2
(20) where eRS¯t+i and eU¯t+iare the prediction errors in period ¯t+ifor the prediction of retail sales and the unemployment rate, respectively21. A lower RMSE indicates better predictive performance.
In order to evaluate whether there are any statistically significant differences in the predictions given by the different models, I use the Diebold-Mariano (DM) test, see Diebold and Mariano (1995) and West (1996). The test is performed in the following way. Letdi denote the difference between the squared predition errors from two models, say model 1 and 2, such thatdi = e2¯t+i,1−e2¯t+i,2, wherei= 1, . . . , P. The DM-test runs the following regression by OLS:
di =β0 +ui (21)
and performs a test of the following hypotheses:
H0 :β0 = 0 or H1 :β0 6= 0 (22) If H0 is rejected the differences in the predictions given by model 1 and 2 are statistically significant. I use heteroskedasticity and autocorrelation (HAC) robust standard errors to compute the t-statistics, in order to be on the safe side. I perform
21The RMSE is a symmetric loss function and larger deviations are penalized relatively harder.
the DM-test on all the nowcasts given by the different SVI/AR-SVI models relative to the predictions given by an AR(1) and a random walk.
6 Empirical results
The following presents the empirical results for the nowcasts of retail sales, see Section 6.1 and the unemployment rate, see Section 6.2.
Tables 3, in Section 6.1, and 4, in Section 6.2, report the RMSE for the different nowcasting models for retail sales and the unemployment rate, respectively. The columns refer to the different model specifications, i.e. the benchmark models, the SVI models and the AR-SVI models. The rows refer to the RMSE for both expanding and rolling window as well as a comparison with the benchmark models, measured as the percentage change in the RMSE. Lastly, the stars refer to the p- values from a DM-test on the difference between the nowcasts. See Section 5.3 for details about the test.
Figures 3 and 4, in Section 6.1, and 7 and 8, in Section 6.2, show the differences in the cumulative squared prediction errors of the AR(1) and the random walk, relative to the SVI/AR-SVI models, both in terms of expanding and rolling estimation.
Figures 5 and 6, in Section 6.1, and Figures 9 and 10, in Section 6.2, show the distributions of categories/queries included in the nowcasting models for retail sales and the unemployment rate, respectively. I refer to Appendix E for a breakdown of all the nowcasting models into predictors.
6.1 Retail sales
The results for the nowcasting models of retail sales indicate the following, see Table 3. All the models, on average, significantly outperform the predictions given by a random walk by approximately 30 - 40 pct, in terms of the RMSE. All the SVI models, except one22, on average, perform significantly poorer than an AR(1) model. Furthermore, I find that, on average, none of the SVIs add any significant predictive power to an AR(1), see the AR-SVI models.
22The difference in the predictions given by the SVIq model and an AR(1) with expanding window estimation is not statistically significant.
Benchmark models SVI models AR-SVI models AR(1) Random walk Categories Queries Categories Queries
Expanding window 0.9826 1.8293 1.1752 1.1963 0.9841 1.0433
% change from AR(1) (+20)∗∗∗ (+22)∗ (0) (+6)∗∗
% change from random walk (-36)∗∗∗ (-35)∗∗∗ (-46)∗∗∗ (-43)∗∗∗
Rolling window 0.9825 1.8293 1.2315 1.3583 1.0103 1.0525
% change from AR(1) (+25)∗∗ (+38)∗∗ (+3) (+7)
% change from random walk (-33)∗∗ (-26)∗∗ (-45)∗∗∗ (-42)∗∗∗
∗p <0.1,∗∗p <0.05,∗∗∗p <0.01
Table 3. Root mean squared error for the nowcast of the monthly growth rate in retail sales. The numbers in parantheses indicate whether the SVI/AR-SVI model performs better or worse than the benchmark models, in terms of the percentage change in the RMSE. The stars refer to the p-values from a Diebold-Mariano test and indicate whether the difference is statistically significant.
All the computed standard errors are heteroskedasticity and autocorrelation robust. January 2010 - January 2017.
Figure 3 shows the difference in the cumulative squared prediction error from the benchmark models relative to the SVIc/AR-SVIc models, i.e. when the SVIs used are at the category level. The figure shows that both the SVIc and the AR-SVIc, both in terms of expanding and rolling estimation, outperform a random walk during the whole test sample, see the increasing dotted lines. There are some large jumps in the series in early 2012, 2013 and 2015. Figures 2a and 2b show large movements in these particular periods, and the event in early 2015 is the most extreme, well in line with what we observe here. Hence, the results indicate that the SVIc/AR-SVIc models perform substantially better than a random walk when the target variable is subject to large shocks.
In terms of the performance relative to an AR(1) the picture is more mixed.
With few exceptions, an AR(1) outperforms the SVIc model during the whole sam- ple, and the largest jumps are exactly during the events describes above, i.e. early 2012, 2013 and 2015, see Figures 3a and 3b. However, the AR-SVIc model outper- forms an AR(1) in the same periods, see Figures 3c and 3d. This indicates that the SVIs contribute with valuable information complementary to the autoregressive component during these shocks. Interestingly, an AR(1) seems to ”catch up” with the AR-SVIcin the subsequent period. Hence, it might be that the AR-SVIc model overestimates the persistence of the shock. In addition, an AR(1) model vastly out- performs the AR-SVIc model in end of the test sample, in line with the historical pattern after an episode of dominance by the AR-SVIc relative to an AR(1).
(a)Relative to the SVIcmodel.
Expanding window.
2010 2012 2014 2016
−30
−20
−10 0
50 100 150
AR(1) (l.h.s.) Random walk (r.h.s.)
(b)Relative to the SVIcmodel.
Rolling window.
2010 2012 2014 2016
−50
−40
−30
−20
−10 0
0 50 100 150
AR(1) (l.h.s.) Random walk (r.h.s.)
(c)Relative to the AR-SVIcmodel.
Expanding window.
2010 2012 2014 2016
0 1 2 3 4
50 100 150 200
AR(1) (l.h.s.) Random walk (r.h.s.)
(d)Relative to the AR-SVIcmodel.
Rolling window.
2010 2012 2014 2016
−7
−6
−5
−4
−3
−2
−1 0
50 100 150 200
AR(1) (l.h.s.) Random walk (r.h.s.)
Figure 3. Retail sales. Difference in the cumulative squared prediction errors relative to the SVIc/AR-SVIc models, for expanding and rolling estimation. An increasing series means that the SVIc/AR-SVIc model performs better, in terms of lower squared prediction error, than the respective benchmark model, and vice versa. The series are plotted on different scales.