Introduction - two ingredients

(1)

and non-conjugate priors

Andrea Carriero¹ Todd E. Clark² Massimiliano Marcellino³

Norges Bank, 3 October 2017

1Queen Mary, University of London

2Federal Reserve Bank of Cleveland

3Bocconi University and CEPR

Carriero, Clark, Marcellino () Large VARs September 2017 1 / 22

(2)

Introduction - two ingredients

Two main ingredients are key for the speci…cation of a good Vector Autoregressive model (VAR) for forecasting and structural analysis of macroeconomic data:

A large cross section. Banbura, Giannone, and Reichlin (2010), Carriero, Clark, and Marcellino (2015), Giannone, Lenza, and Primiceri (2015) and Koop (2013) Time variation in the volatilities. Clark (2011), Clark and Ravazzolo (2015), Cogley and Sargent (2005), D’Agostino, Gambetti and Giannone (2013), and Primiceri (2005)

There are no papers which jointly allow forbothtime variationandlarge datasets

(3)

Introduction - heteroskedasticity

The reason lies in the structure of the likelihood function

Homoskedastic VARs are SUR models with the same set of regressors in each equation !Kronecker structure in the likelihood !OLS equation by equation Equation-speci…c stochastic volatility breaks this symmetry because each equation is driven by a di¤erent volatility

The system would need to be vectorised, and the conditional posterior involves manipulation of a matrix of dimensionpN² (N=number of variables,p=number of lags)

The computational complexity is thereforeN²³=N⁶

(4)

Introduction - asymmetric priors

In a Bayesian framework, simmetry is not only needed in the likelihood, but also in the prior

Kronecker structure in the likelihood+Kronecker structure in the prior= Kronecker structure in the posterior

For example, the VAR estimated by Banbura, Giannone, and Reichlin (2010) is a VAR with 130 variables, but in order to make this estimation possible one needs to assume:

(i) Homoskedasticity of the disturbances (ii) A speci…c structure for the prior

Without either (i) or (ii) the system would need to be vectorised prior to estimation

(5)

The problem

Consider the VAR of aN-dimensional vectoryt:

yt =Π(L)yt 1+vt; vt iid N(0,Σt) (1) De…neXt = [1,y_t⁰ ₁, ...,y_{t p}⁰ ]⁰ andΠ= [Π0jΠ1j^...jΠp]

In general we have the posterior vec(_Π)jΣ,y N(vec(µ¯_Π),ΩΠ)with posterior precision:

Ω¯_Π¹=Ω_Π¹

Prior

+

∑

T t=1

(Σ_t¹ XtX_t⁰)

Likelihood

(2) The precision matrixΩ¯_Π¹ is of sizeN(Np+1). Its manipulation requires

(pN²)³=O(N⁶)elementary operations

ForN very large modern computers (laptops/desktops) can’t even store such a matrix in RAM (e.g. N=125 needs 330 GB of RAM).

(6)

The usual solution

In general we have the posterior vec(Π)jΣ,y N(vec(µ¯_Π),Ω_Π)with

Ω¯_Π¹=Ω_Π¹

Prior

+

∑

T t=1

(Σ_t¹ X_tX_t⁰)

Likelihood

(2) Now assume that

(i) Σt=Σ(homoskedasticity) (ii) Ω_Π=Σ Ω0 (conjugate prior)

Ω¯_Π¹= Ω_Π¹

|{z}

Σ ¹ Ω0¹

+

∑

T t=1

(Σ_t¹

|{z}Σ ¹

XtX_t⁰) =Σ ¹ Ω₀¹+

∑

T t=1

XtX_t⁰

!

, (3)

and the two terms can be manipulated separately, reducing complexity byO(N³) Classical homoskedastic VARs can be estimated equation by equation.

(7)

Problems with the the usual solution

The Natural-conjugate homoskedastic approach allows to use large datasets, but it has important limitations:

It imposes homoskedasticity, against the overwhelming evidence in macroeconomic and …nancial data

The prior structureΣ Ω0 is restrictive (Rothemberg (1963), Sims and Zha (1998))

It prevents any asymmetry in the prior across equations, because the

coe¢ cients of each equation feature the same prior varianceΩ0 (up to a scale factor given by the elements ofΣ).

It has the unappealing consequence that prior beliefs must be correlated across equations, with a correlation structure proportional to that of the shocks (as described byΣ).

(8)

A new algorithm

In this paper we propose a new algorithm that makes possible to use:

A heteroskedastic model

The more general and less restrictive independent Normal - Inverse Wishart (and Normal-di¤use) prior

Our procedure is based on a simplefactorization of the likelihood, which allows to draw the VAR coe¢ cients equation by equation

This reduces the computational complexity fromN⁶ toN⁴.

Our new algorithm is very simple and can be easily inserted in any pre-existing algorithm for estimation of BVAR models.

(9)

The Model

Consider the following VAR model for aN-dimensionalyt with stochastic volatility:

yt = Π0+Π(L)yt 1+vt; (1) v_t = A ¹Λ^0.5_t et, et iid N(0,I_N) (2) whereΛt is a diagonal matrix with genericj-th elementhj,t andA ¹ is a lower triangular matrix with ones on its main diagonal.

The bottleneck is drawing vec(Π)j^A,ΛT,y_T N(vec(µ¯_Π),ΩΠ); To obtain a draw one needs to i) invert

Ω¯_Π¹=Ω_Π¹

Prior

+

∑

T t=1

(Σ_t¹ XtX_t⁰)

Likelihood

(3) ii) compute its Cholesky factor and iii) multiply the Cholesky factor by a random vector

Each of the above operations is of complexityN⁶

(10)

An algorithm for large VARs

Consider again the decompositionvt=A ¹Λ^0.5_t et: 2

66 4

v1,t

v_2,t ...

vN,t

3 77 5=

2 66 4

1 0 ... 0

a_2,1 1 ...

... 1 0

a_N_,1 ... a_N_,N ₁ 1 3 77 5 2 66 4

h^0.5_1,t 0 ... 0 0 h^0.5_2,t ...

... ... 0

0 ... 0 h^0.5_N,t 3 77 5 2 66 4

e1,t

e_2,t ...

eN,t

3 77 5,

wherea_j,i denotes the generic element of the matrixA ¹ which is available under knowledge ofA.

(11)

An algorithm for large VARs

The VAR can be written as:

y1,t = π₁⁽⁰⁾+

∑

N i=1

∑

p l=1

π⁽ⁱ_1,l⁾y_i_{,t l}+h_1,t^0.5e1,t

y2,t = π₂⁽⁰⁾+

∑

N i=1

∑

p l=1

π⁽ⁱ_2,l⁾y_i_{,t l}+a_2,1h^0.5_1,te1,t+h^0.5_2,te2,t

...

y_N_,t = π_N⁽⁰⁾+

∑

N i=1

∑

p l=1

π⁽ⁱ_N,l⁾y_i_{,t l}+a_N_,1h^0.5_1,te_1,t+ +a_N,N ₁h^0.5_N _1,te_N _1,t+h^0.5_N_,te_N_,t,

with the generic equation for variablej:

yj,t (a_j,1h^0.5_1,te1,t+...+a_j,,j ₁h^0.5_j _1,tej 1,t)

| {z }

y_j,t

=π⁽⁰⁾_j +

∑

N i=1

∑

p l=1

π⁽ⁱ_j,l⁾y_{i,t l}+hj,tej,t. (4)

When drawing the coe¢ cients of equationjthe termy_j,t is known, since it is given by the di¤erence between the dependent variable of that equation and the realized residuals of all the previousj 1 equations. Hence (4) is a standard generalized linear regression model with i.i.d. Gaussian disturbances.

(12)

An algorithm for large VARs

The full conditional posterior distribution of the conditional mean coe¢ cients can be factorized as:

p(ΠjÂ,ΛT,y) = p(π^(N⁾j^π^(N ¹⁾^,^π^(N ²⁾^,^{. . .}^,^π⁽¹⁾^,Â,ΛT,y) p(π^(N ¹⁾j^π^(N ²⁾^,^{. . .}^,^π⁽¹⁾^,Â,ΛT,y) ...

p(π⁽¹⁾jA,ΛT,y), and one can draw the coe¢ cients inΠin separate blocks:

Π^f^j^gjΠ^f^1:j ¹^g,A,ΛT,y N(µ¯_Πfjj1:j 1g,Ω_Πfjj1:j 1g) with

¯

µ_Πfjj1:j 1g = Ω_Πfjj1:j 1g

(T t=1

∑

X_j,th_j,t¹y_j,t⁰+Ω_Π¹_fjj¹:j 1gµ_Π_f_j_j₁_:_j ₁_g )

Ω_Π¹f^jj¹:j 1g = Ω_Π¹_fjj1:j 1g+

∑

T t=1

X_j,th_j,t¹X_j,t⁰ ,

whereµ andΩ are moments ofΠ^f^j^gjΠ^f^1:j ¹^g N(µ ,Ω )

(13)

An algorithm for large VARs

The conditional posterior ofΠobtained is thesameas the one from the system-wide algorithm

The algorithm will produce drawsnumerically identicalto those of the system-wide sampler

This is true regardelss of theordering, whichis irrelevantto the conditional posterior ofΠ

The total computational complexity of this estimation algorithm isO(N⁴), with a gain ofN².

Uses equations with at mostNp+1 regressors, and the correlation across equations typical of SUR models is implicitly accounted for by the factorization The dimension of the posterior variance matrixΩ_Π¹fjg is(Np+1), which means that its manipulation only involves operations of orderO(N³).

(14)

Computational complexity and speed of simulation

time for producing 10 draws as a function of N

Size of the cross-section (N)

0 1 2 3 4 5 6 7 8 9 10

Seconds

0 2 4 6 8 10 12 14 16

18 time for producing 10 draws as function of N

System wide agorithm Triangular algorithm

Size of the cross-section (N)

0 1 2 3 4 5 6 7 8 9 10

Number of elementary operations

0 10 20 30 40 50 60 70 80 90

100 Theoretical and Actual difference in computational complexity Actual difference

Theoretical difference

(15)

Computational complexity and speed of simulation

time for producing 10 draws as a function of N - log scale

Size of the cross-sec tion (N)

0 5 10 15 20 25 30 35 40

Seconds

10^-1 10⁰ 10¹ 10² 10³

10⁴ time for producing 10 draws as function of N

Syst em wide algorithm T riangular algorithm

Size of the cross-sec tion (N)

0 5 10 15 20 25 30 35 40

Number of elementary operations

10^-1 10⁰ 10¹ 10² 10³

10⁴ Theoretical and Actual difference in computational complexity Actual diff erence

T heoretical dif ference

(16)

Convergence and mixing

Regardless of the power of the computers used to perform the simulation the triangular algorithm will always produce many more draws than the traditional system-wide algorithmin a given unit of time.

This has important consequences in terms of producing draws with good mixing and convergence properties.

The triangular algorithm can produce drawsmany times closer to i.i.d. sampling in the same amount of time.

These computational and storage gains increase quadratically with the system size

(17)

Convergence and mixing

Ine¢ ciency factors= distance from i.i.d sampling: ideally should be around 1.

- 5 0 5 1 0 1 5 2 0 2 5

0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6

0 .7 Conditinal mean parameters, system-wide algorithm

- 0 .5 0 0 .5 1 1 .5 2 2 .5 3 3 .5

0 0 .2 0 .4 0 .6 0 .8 1 1 .2 1 .4 1 .6

1 .8 Conditinal mean parameters, triangular algorithm

- 1 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0

0 0 .0 2 0 .0 4 0 .0 6 0 .0 8 0 .1 0 .1 2 0 .1 4 0 .1 6 0 .1 8

0 .2 Covariances, system-wide algorithm

0 0 .2 0 .4 0 .6 0 .8 1 1 .2 1 .4 1 .6 1 .8 2

0 0 .5 1 1 .5 2 2 .5

3 Covariances, triangular algorithm

2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0

0 0 .0 2 0 .0 4 0 .0 6 0 .0 8 0 .1 0 .1 2 0 .1 4 0 .1 6 0 .1 8

0 .2Volatility factors (averaged across time), system-wide algorithm

0 .7 0 .7 5 0 .8 0 .8 5 0 .9 0 .9 5 1 1 .0 5

0 2 4 6 8 1 0 1 2 1 4 1 6 1 8

2 0Volatility factors (averaged across time), triangular algorithm

Sy s tem-wide algorithm, 5000 draws .

- 2 0 0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0

0 0 .0 0 5 0 .0 1 0 .0 1 5 0 .0 2 0 .0 2 5 0 .0 3

Volatility innovation variance, system-wide algorithm

Triangular algorithm res ults are bas ed on 1305000 draws with s kip-s ampling of 261, produc ing an effectiv e s ample of 5000 draws .

- 0 .5 0 0 .5 1 1 .5 2 2 .5 3 3 .5

0 0 .2 0 .4 0 .6 0 .8 1 1 .2 1 .4 1 .6

Volatility innovation variance, triangular algorithm

(18)

Empirical applications

As an illustration we estimate a VAR with stochastic volatilities, using 13 lags and a cross-section of 125 variables from FRED-MD

For a model of this size the system-wide algorithm would have a covariance matrix of the coe¢ cients of dimension 203250, which would require about 330 GB of RAM (203250² 8/10⁹).

Our estimation algorithm can produce 5000 draws in just above 7 hours on a 3.5 GHz Intel Core i7.

We …nd that:

The variance of the shocks was clearly unstable over time There is a factor structure in the volatilities

The combined use of both time variation in volatilities and a large data-set improves point and density forecasts, more that what these two ingredients do if used separately.

(19)

(20)

(21)

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 -0.4

-0.2 0 0.2 0.4

Var explained 73.2837%

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125

-0.1 0 0.1 0.2 0.3 0.4 0.5

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125

-0.4 -0.2 0 0.2 0.4 0.6

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125

-0.4 -0.2 0 0.2 0.4 0.6

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125

-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

FFR NB res erv es

CES1021000001

PCEPI Hours

RPI

interes t rates, ex c hange rates , and financ ial indic ators

Monetary aggregates

Real v ariables Pric es

Surv ey s

Figure 11: Principal components loadings of the variance-covariance of the volatilities (matrix ).

PCA of the variance matrix of the shocks to volatilities

(22)

heteroskedast ic

2 2.5 3 3.5

homoschedastic

2 2.5 3 3.5

RPI - SCO RE

heteroskedast ic

3.7 3.8 3.9

homoschedastic

3.65 3.7 3.75 3.8 3.85 3.9 3.95

DPCERA3M086SBEA - SCO RE

heteroskedast ic

3 3.1 3.2

homoschedastic

2.95 3 3.05 3.1 3.15 3.2 3.25

CMRMT SPL x - SCO RE

heteroskedast ic

3.3 3.4 3.5

homoschedastic

3.3 3.35 3.4 3.45 3.5 3.55

I NDPRO - SCO RE

heteroskedast ic

-2.5 -2 -1.5 -1

homoschedastic

-2.5 -2 -1.5 -1

CUMF NS - SCO RE

heteroskedast ic

-1 -0.5 0

homoschedastic

-1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4

UNRAT E - SCO RE

heteroskedast ic

4.6 4.7 4.8 4.9 5

homoschedastic

4.6 4.7 4.8 4.9 5

PAYEMS - SCO RE

heteroskedast ic -1 -0.8 -0.6 -0.4 -0.2

homoschedastic

-1 -0.8 -0.6 -0.4 -0.2

CES0600000007 - SCO RE

heteroskedast ic

4.2 4.3 4.4 4.5

homoschedastic

4.2 4.25 4.3 4.35 4.4 4.45 4.5

CES0600000008 - SCO RE

heteroskedast ic 3.4 3.5 3.6 3.7 3.8

homoschedastic

3.4 3.45 3.5 3.55 3.6 3.65 3.7 3.75 3.8

PPI F G S - SCO RE

heteroskedast ic

2 2.05 2.1

homoschedastic

1.96 1.98 2 2.02 2.04 2.06 2.08 2.1

PPI CMM - SCO RE

heteroskedast ic

4.5 4.6 4.7 4.8 4.9

homoschedastic

4.5 4.55 4.6 4.65 4.7 4.75 4.8 4.85 4.9

PCEPI - SCO RE

heteroskedast ic

2.5 3 3.5 4 4.5

homoschedastic

2.5 3 3.5 4 4.5

F EDF UNDS - SCO RE

heteroskedast ic

0 0.5 1

homoschedastic

-0.2 0 0.2 0.4 0.6 0.8 1

HO UST - SCO RE

heteroskedast ic 1.75 1.8 1.85 1.9

homoschedastic

1.75 1.8 1.85 1.9

S&P 500 - SCO RE

heteroskedast ic 2.25 2.3 2.35 2.4

homoschedastic

2.25 2.3 2.35 2.4

EXUSUKx - SCO RE

heteroskedast ic

-1.5 -1 -0.5 0

homoschedastic

-1.5 -1 -0.5 0

T 1YF F M - SCO RE

heteroskedast ic

-2 -1.5 -1 -0.5

homoschedastic

-2 -1.5 -1 -0.5

T 10YF F M - SCO RE

heteroskedast ic

-2 -1.5 -1 -0.5

homoschedastic

-2 -1.5 -1 -0.5

BAAF F M - SCO RE

heteroskedast ic -3.6 -3.4 -3.2 -3 -2.8

homoschedastic

-3.6 -3.4 -3.2 -3 -2.8

NAPMNO I - SCO RE

Figure 17: Comparison of density forecast accuracy. Each panel describes a di¤erent variable. The x axis reports the (log) density score obtained using the BVAR with stochastic volatility (heteroschedastic), the y axis reports the (log) density score obtained using the homoschedastic BVAR.

Each point corresponds to a di¤erent forecast horizon from 1 to 12 step-ahead.

Score comparison: homoskedastic model (y axis) vs

heteroskedastic model (x axis)

(23)

heteroskedastic 10^-3

5.7 5.75 5.8 5.85

homoschedastic

10^-3

5.7 5.75 5.8 5.85

RPI - RMSF E

5 5.1 5.2

homoschedastic

10^-3

5 5.05 5.1 5.15 5.2

DPCERA3M086SBEA - RMSF E

heteroskedastic ₁₀^-3

9.4 9.6 9.8

homoschedastic

10^-3

9.3 9.4 9.5 9.6 9.7 9.8

CMRMT SPL x - RMSF E

heteroskedastic ₁₀^-3 6.4 6.6 6.8 7 7.2 7.4

homoschedastic

10^-3

6.4 6.6 6.8 7 7.2 7.4

INDPRO - RMSF E

heteroskedastic

1 2 3

homoschedastic

1 1.5 2 2.5 3

CUMF NS - RMSF E

heteroskedastic

0.2 0.4 0.6 0.8

homoschedastic

0.2 0.3 0.4 0.5 0.6 0.7 0.8

UNRAT E - RMSF E

heteroskedastic ₁₀^-3 1.5 1.6 1.7 1.8 1.9

homoschedastic

10^-3

1.5 1.6 1.7 1.8 1.9

PAYEMS - RMSF E

heteroskedastic

0.3 0.4 0.5

homoschedastic

0.3 0.35 0.4 0.45 0.5 0.55

CES0600000007 - RMSF E

2.6 2.7 2.8 2.9

homoschedastic

10^-3

2.6 2.65 2.7 2.75 2.8 2.85 2.9 2.95

CES0600000008 - RMSF E

5.6 5.8 6 6.2

homoschedastic

10^-3

5.6 5.7 5.8 5.9 6 6.1 6.2 6.3

PPIF G S - RMSF E

heteroskedastic 0.03 0.031 0.032

homoschedastic

0.03 0.0305 0.031 0.0315 0.032 0.0325

PPICMM - RMSF E

2 2.2 2.4

homoschedastic

10^-3

1.9 2 2.1 2.2 2.3 2.4 2.5

PCEPI - RMSF E

homoschedastic

0.01 0.015 0.02

F EDF UNDS - RMSF E

heteroskedastic

0.1 0.15 0.2 0.25

homoschedastic

0.1 0.15 0.2 0.25

HO UST - RMSF E

heteroskedastic 0.03680.0370.03720.03740.0376

homoschedastic

0.0368 0.037 0.0372 0.0374 0.0376

S&P 500 - RMSF E

homoschedastic

0.0228 0.023 0.0232 0.0234 0.0236 0.0238 0.024

EXUSUKx - RMSF E

heteroskedastic 0.5 0.6 0.7 0.8 0.9

homoschedastic

0.5 0.6 0.7 0.8 0.9

T 1YF F M - RMSF E

heteroskedastic 0.6 0.8 1 1.2 1.4 1.6

homoschedastic

0.6 0.8 1 1.2 1.4 1.6

T 10YF F M - RMSF E

heteroskedastic 0.6 0.8 1 1.2 1.4 1.6 1.8

homoschedastic

0.6 0.8 1 1.2 1.4 1.6 1.8

BAAF F M - RMSF E

heteroskedastic

4 5 6 7

homoschedastic

4 4.5 5 5.5 6 6.5 7 7.5

NAPMNO I - RMSF E

Figure 16: Comparison of point forecast accuracy. Each panel describes a di¤erent variable. The x axis reports the RMSFE obtained using the BVAR with stochastic volatility (heteroschedastic), the y axis reports the RMSFE obtained using the homoschedastic BVAR. Each point corresponds to a di¤erent forecast horizon from 1 to 12 step-ahead (in most cases, a higher RMSFE corresponds to a longer forecast horizon).

RMSFE comparison: homoskedastic model (y axis) vs

heteroskedastic model (x axis)

(24)

Conclusions

The assumptions of conjugacy and homoskedasticity in a VARs are hardly defendable, but a more general speci…cation is only manageable with a small cross-section.

We have proposed a new estimation method VARs withnon-conjugate priorsand drifting volatilitieswhich can be applied withlargemodels

The method is based on a straightforward triangularization of the system, and it is very simple to implement.

Indeed, if a researcher already has algorithms to produce draws from a VAR with an independent N-IW prior and stochastic volatility, only a single needs to be slightly modi…ed with a few lines of code.

Given its simplicity and the advantages in terms of speed, mixing, and convergence, we argue that the proposed algorithm should be preferred in empirical applications, especially those involving large datasets.

(25)

Prior dependence

We assumed that the prior variance was diagonal. This can be relaxed.

With a prior dependent across equations, the general form of the posterior can be obtained using the triangularization also on the joint prior distribution, and is:

Π^fjgjΠ^f1:j ^1g,A,ΛT,y N(µ¯_Πfjj1:j 1g,Ω_Πfjj1:j 1g)

with

µ¯_Πf^jj¹:j 1g = Ω_Πf^jj¹:j 1g

(_T

t=1

∑

X_j,th_j,t¹y_j,t⁰+Ω_Π¹_fjj¹:j 1gµ_Π_fjj¹:j 1g

)

Ω_Π¹fjj1:j 1g = Ω_Π¹_fjj¹:j 1g+

∑

T t=1

X_j,th_j,t¹X_j,t⁰ , whereµ_Π_f_j_j₁_:_j ₁_g andΩ_Πfjj1:j 1g are moments of

Π^f^j^gjΠ^f^1:j ¹^g N(µ_Π_f_j_j₁_:_j ₁_g,Ω_Πf^jj¹:j 1g), i.e. the conditional priors implied by the joint prior speci…cation.

The moments ofΠ^f^j^gjΠ^f^1:j ¹^g can be found recursively from the joint prior

(26)

Model size, stochastic volatility, and forecasting

Pseudo out of sample exercise performed recursively, starting with the estimation sample 1960:3 to 1970:2 and ending with 1960:3 to 2014:5.

We consider four models.

1 A small homoskedastic VAR including the growth rate of industrial production (∆lnIP), the in‡ation rate based on consumption expenditures (∆lnPECEPI) and the e¤ective Federal Funds Rate (FFR).

2 A large (20 variables) homoskedastic VAR along the lines of Carriero, Clark, and Marcellino (2015), Giannone, Lenza, and Primiceri (2015), and Koop (2013).

3 A small VAR with time variation in volatilities along the lines of Clark (2011), Cogley and Sargent (2005) and Primiceri (2005).

4 The fourth model includes both time variation in the volatilities and a large (20 variables) information set.

(27)

Forecasting

Direct e¤ects:

The use of a larger dataset improves point forecasts via a better speci…cation of the conditional means.

The inclusion of time variation in volatilities improves density forecasts via a better modelling of error variances,

Interactions:

A better point forecast improves the density forecast as well, by centering the predictive density around a more reliable mean

Time varying volatilities improve the point forecasts at longer horizons - because the heteroskedastic model will provide more e¢ cient estimates (through a GLS argument) and a therefore a better characterization of the predictive densities