and non-conjugate priors
Andrea Carriero1 Todd E. Clark2 Massimiliano Marcellino3
Norges Bank, 3 October 2017
1Queen Mary, University of London
2Federal Reserve Bank of Cleveland
3Bocconi University and CEPR
Carriero, Clark, Marcellino () Large VARs September 2017 1 / 22
Introduction - two ingredients
Two main ingredients are key for the speci…cation of a good Vector Autoregressive model (VAR) for forecasting and structural analysis of macroeconomic data:
A large cross section. Banbura, Giannone, and Reichlin (2010), Carriero, Clark, and Marcellino (2015), Giannone, Lenza, and Primiceri (2015) and Koop (2013) Time variation in the volatilities. Clark (2011), Clark and Ravazzolo (2015), Cogley and Sargent (2005), D’Agostino, Gambetti and Giannone (2013), and Primiceri (2005)
There are no papers which jointly allow forbothtime variationandlarge datasets
Introduction - heteroskedasticity
The reason lies in the structure of the likelihood function
Homoskedastic VARs are SUR models with the same set of regressors in each equation !Kronecker structure in the likelihood !OLS equation by equation Equation-speci…c stochastic volatility breaks this symmetry because each equation is driven by a di¤erent volatility
The system would need to be vectorised, and the conditional posterior involves manipulation of a matrix of dimensionpN2 (N=number of variables,p=number of lags)
The computational complexity is thereforeN23=N6
Carriero, Clark, Marcellino () Large VARs September 2017 3 / 22
Introduction - asymmetric priors
In a Bayesian framework, simmetry is not only needed in the likelihood, but also in the prior
Kronecker structure in the likelihood+Kronecker structure in the prior= Kronecker structure in the posterior
For example, the VAR estimated by Banbura, Giannone, and Reichlin (2010) is a VAR with 130 variables, but in order to make this estimation possible one needs to assume:
(i) Homoskedasticity of the disturbances (ii) A speci…c structure for the prior
Without either (i) or (ii) the system would need to be vectorised prior to estimation
The problem
Consider the VAR of aN-dimensional vectoryt:
yt =Π(L)yt 1+vt; vt iid N(0,Σt) (1) De…neXt = [1,yt0 1, ...,yt p0 ]0 andΠ= [Π0jΠ1j...jΠp]
In general we have the posterior vec(Π)jΣ,y N(vec(µ¯Π),ΩΠ)with posterior precision:
Ω¯Π1=ΩΠ1
Prior
+
∑
T t=1(Σt1 XtXt0)
Likelihood
(2) The precision matrixΩ¯Π1 is of sizeN(Np+1). Its manipulation requires
(pN2)3=O(N6)elementary operations
ForN very large modern computers (laptops/desktops) can’t even store such a matrix in RAM (e.g. N=125 needs 330 GB of RAM).
Carriero, Clark, Marcellino () Large VARs September 2017 5 / 22
The usual solution
In general we have the posterior vec(Π)jΣ,y N(vec(µ¯Π),ΩΠ)with
Ω¯Π1=ΩΠ1
Prior
+
∑
T t=1(Σt1 XtXt0)
Likelihood
(2) Now assume that
(i) Σt=Σ(homoskedasticity) (ii) ΩΠ=Σ Ω0 (conjugate prior)
Ω¯Π1= ΩΠ1
|{z}
Σ 1 Ω01
+
∑
T t=1(Σt1
|{z}Σ 1
XtXt0) =Σ 1 Ω01+
∑
T t=1XtXt0
!
, (3)
and the two terms can be manipulated separately, reducing complexity byO(N3) Classical homoskedastic VARs can be estimated equation by equation.
Problems with the the usual solution
The Natural-conjugate homoskedastic approach allows to use large datasets, but it has important limitations:
It imposes homoskedasticity, against the overwhelming evidence in macroeconomic and …nancial data
The prior structureΣ Ω0 is restrictive (Rothemberg (1963), Sims and Zha (1998))
It prevents any asymmetry in the prior across equations, because the
coe¢ cients of each equation feature the same prior varianceΩ0 (up to a scale factor given by the elements ofΣ).
It has the unappealing consequence that prior beliefs must be correlated across equations, with a correlation structure proportional to that of the shocks (as described byΣ).
Carriero, Clark, Marcellino () Large VARs September 2017 7 / 22
A new algorithm
In this paper we propose a new algorithm that makes possible to use:
A heteroskedastic model
The more general and less restrictive independent Normal - Inverse Wishart (and Normal-di¤use) prior
Our procedure is based on a simplefactorization of the likelihood, which allows to draw the VAR coe¢ cients equation by equation
This reduces the computational complexity fromN6 toN4.
Our new algorithm is very simple and can be easily inserted in any pre-existing algorithm for estimation of BVAR models.
The Model
Consider the following VAR model for aN-dimensionalyt with stochastic volatility:
yt = Π0+Π(L)yt 1+vt; (1) vt = A 1Λ0.5t et, et iid N(0,IN) (2) whereΛt is a diagonal matrix with genericj-th elementhj,t andA 1 is a lower triangular matrix with ones on its main diagonal.
The bottleneck is drawing vec(Π)jA,ΛT,yT N(vec(µ¯Π),ΩΠ); To obtain a draw one needs to i) invert
Ω¯Π1=ΩΠ1
Prior
+
∑
T t=1(Σt1 XtXt0)
Likelihood
(3) ii) compute its Cholesky factor and iii) multiply the Cholesky factor by a random vector
Each of the above operations is of complexityN6
Carriero, Clark, Marcellino () Large VARs September 2017 9 / 22
An algorithm for large VARs
Consider again the decompositionvt=A 1Λ0.5t et: 2
66 4
v1,t
v2,t ...
vN,t
3 77 5=
2 66 4
1 0 ... 0
a2,1 1 ...
... 1 0
aN,1 ... aN,N 1 1 3 77 5 2 66 4
h0.51,t 0 ... 0 0 h0.52,t ...
... ... 0
0 ... 0 h0.5N,t 3 77 5 2 66 4
e1,t
e2,t ...
eN,t
3 77 5,
whereaj,i denotes the generic element of the matrixA 1 which is available under knowledge ofA.
An algorithm for large VARs
The VAR can be written as:
y1,t = π1(0)+
∑
N i=1∑
p l=1π(i1,l)yi,t l+h1,t0.5e1,t
y2,t = π2(0)+
∑
N i=1∑
p l=1π(i2,l)yi,t l+a2,1h0.51,te1,t+h0.52,te2,t
...
yN,t = πN(0)+
∑
N i=1∑
p l=1π(iN,l)yi,t l+aN,1h0.51,te1,t+ +aN,N 1h0.5N 1,teN 1,t+h0.5N,teN,t,
with the generic equation for variablej:
yj,t (aj,1h0.51,te1,t+...+aj,,j 1h0.5j 1,tej 1,t)
| {z }
yj,t
=π(0)j +
∑
N i=1∑
p l=1π(ij,l)yi,t l+hj,tej,t. (4)
When drawing the coe¢ cients of equationjthe termyj,t is known, since it is given by the di¤erence between the dependent variable of that equation and the realized residuals of all the previousj 1 equations. Hence (4) is a standard generalized linear regression model with i.i.d. Gaussian disturbances.
Carriero, Clark, Marcellino () Large VARs September 2017 11 / 22
An algorithm for large VARs
The full conditional posterior distribution of the conditional mean coe¢ cients can be factorized as:
p(ΠjA,ΛT,y) = p(π(N)jπ(N 1),π(N 2),. . .,π(1),A,ΛT,y) p(π(N 1)jπ(N 2),. . .,π(1),A,ΛT,y) ...
p(π(1)jA,ΛT,y), and one can draw the coe¢ cients inΠin separate blocks:
ΠfjgjΠf1:j 1g,A,ΛT,y N(µ¯Πfjj1:j 1g,ΩΠfjj1:j 1g) with
¯
µΠfjj1:j 1g = ΩΠfjj1:j 1g
(T t=1
∑
Xj,thj,t1yj,t0+ΩΠ1fjj1:j 1gµΠfjj1:j 1g )
ΩΠ1fjj1:j 1g = ΩΠ1fjj1:j 1g+
∑
T t=1Xj,thj,t1Xj,t0 ,
whereµ andΩ are moments ofΠfjgjΠf1:j 1g N(µ ,Ω )
An algorithm for large VARs
The conditional posterior ofΠobtained is thesameas the one from the system-wide algorithm
The algorithm will produce drawsnumerically identicalto those of the system-wide sampler
This is true regardelss of theordering, whichis irrelevantto the conditional posterior ofΠ
The total computational complexity of this estimation algorithm isO(N4), with a gain ofN2.
Uses equations with at mostNp+1 regressors, and the correlation across equations typical of SUR models is implicitly accounted for by the factorization The dimension of the posterior variance matrixΩΠ1fjg is(Np+1), which means that its manipulation only involves operations of orderO(N3).
Carriero, Clark, Marcellino () Large VARs September 2017 13 / 22
Computational complexity and speed of simulation
time for producing 10 draws as a function of N
Size of the cross-section (N)
0 1 2 3 4 5 6 7 8 9 10
Seconds
0 2 4 6 8 10 12 14 16
18 time for producing 10 draws as function of N
System wide agorithm Triangular algorithm
Size of the cross-section (N)
0 1 2 3 4 5 6 7 8 9 10
Number of elementary operations
0 10 20 30 40 50 60 70 80 90
100 Theoretical and Actual difference in computational complexity Actual difference
Theoretical difference
Computational complexity and speed of simulation
time for producing 10 draws as a function of N - log scale
Size of the cross-sec tion (N)
0 5 10 15 20 25 30 35 40
Seconds
10-1 100 101 102 103
104 time for producing 10 draws as function of N
Syst em wide algorithm T riangular algorithm
Size of the cross-sec tion (N)
0 5 10 15 20 25 30 35 40
Number of elementary operations
10-1 100 101 102 103
104 Theoretical and Actual difference in computational complexity Actual diff erence
T heoretical dif ference
Carriero, Clark, Marcellino () Large VARs September 2017 15 / 22
Convergence and mixing
Regardless of the power of the computers used to perform the simulation the triangular algorithm will always produce many more draws than the traditional system-wide algorithmin a given unit of time.
This has important consequences in terms of producing draws with good mixing and convergence properties.
The triangular algorithm can produce drawsmany times closer to i.i.d. sampling in the same amount of time.
These computational and storage gains increase quadratically with the system size
Convergence and mixing
Ine¢ ciency factors= distance from i.i.d sampling: ideally should be around 1.
- 5 0 5 1 0 1 5 2 0 2 5
0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6
0 .7 Conditinal mean parameters, system-wide algorithm
- 0 .5 0 0 .5 1 1 .5 2 2 .5 3 3 .5
0 0 .2 0 .4 0 .6 0 .8 1 1 .2 1 .4 1 .6
1 .8 Conditinal mean parameters, triangular algorithm
- 1 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0
0 0 .0 2 0 .0 4 0 .0 6 0 .0 8 0 .1 0 .1 2 0 .1 4 0 .1 6 0 .1 8
0 .2 Covariances, system-wide algorithm
0 0 .2 0 .4 0 .6 0 .8 1 1 .2 1 .4 1 .6 1 .8 2
0 0 .5 1 1 .5 2 2 .5
3 Covariances, triangular algorithm
2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 0
0 0 .0 2 0 .0 4 0 .0 6 0 .0 8 0 .1 0 .1 2 0 .1 4 0 .1 6 0 .1 8
0 .2Volatility factors (averaged across time), system-wide algorithm
0 .7 0 .7 5 0 .8 0 .8 5 0 .9 0 .9 5 1 1 .0 5
0 2 4 6 8 1 0 1 2 1 4 1 6 1 8
2 0Volatility factors (averaged across time), triangular algorithm
Sy s tem-wide algorithm, 5000 draws .
- 2 0 0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0
0 0 .0 0 5 0 .0 1 0 .0 1 5 0 .0 2 0 .0 2 5 0 .0 3
Volatility innovation variance, system-wide algorithm
Triangular algorithm res ults are bas ed on 1305000 draws with s kip-s ampling of 261, produc ing an effectiv e s ample of 5000 draws .
- 0 .5 0 0 .5 1 1 .5 2 2 .5 3 3 .5
0 0 .2 0 .4 0 .6 0 .8 1 1 .2 1 .4 1 .6
Volatility innovation variance, triangular algorithm
Carriero, Clark, Marcellino () Large VARs September 2017 17 / 22
Empirical applications
As an illustration we estimate a VAR with stochastic volatilities, using 13 lags and a cross-section of 125 variables from FRED-MD
For a model of this size the system-wide algorithm would have a covariance matrix of the coe¢ cients of dimension 203250, which would require about 330 GB of RAM (2032502 8/109).
Our estimation algorithm can produce 5000 draws in just above 7 hours on a 3.5 GHz Intel Core i7.
We …nd that:
The variance of the shocks was clearly unstable over time There is a factor structure in the volatilities
The combined use of both time variation in volatilities and a large data-set improves point and density forecasts, more that what these two ingredients do if used separately.
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 -0.4
-0.2 0 0.2 0.4
Var explained 73.2837%
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125
-0.1 0 0.1 0.2 0.3 0.4 0.5
Var explained 19.0428%
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125
-0.4 -0.2 0 0.2 0.4 0.6
Var explained 2.6287%
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125
-0.4 -0.2 0 0.2 0.4 0.6
Var explained 1.6826%
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125
-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
Var explained 0.52826%
FFR NB res erv es
CES1021000001
PCEPI Hours
RPI
interes t rates, ex c hange rates , and financ ial indic ators
Monetary aggregates
Real v ariables Pric es
Surv ey s
Figure 11: Principal components loadings of the variance-covariance of the volatilities (matrix ).
PCA of the variance matrix of the shocks to volatilities
heteroskedast ic
2 2.5 3 3.5
homoschedastic
2 2.5 3 3.5
RPI - SCO RE
heteroskedast ic
3.7 3.8 3.9
homoschedastic
3.65 3.7 3.75 3.8 3.85 3.9 3.95
DPCERA3M086SBEA - SCO RE
heteroskedast ic
3 3.1 3.2
homoschedastic
2.95 3 3.05 3.1 3.15 3.2 3.25
CMRMT SPL x - SCO RE
heteroskedast ic
3.3 3.4 3.5
homoschedastic
3.3 3.35 3.4 3.45 3.5 3.55
I NDPRO - SCO RE
heteroskedast ic
-2.5 -2 -1.5 -1
homoschedastic
-2.5 -2 -1.5 -1
CUMF NS - SCO RE
heteroskedast ic
-1 -0.5 0
homoschedastic
-1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4
UNRAT E - SCO RE
heteroskedast ic
4.6 4.7 4.8 4.9 5
homoschedastic
4.6 4.7 4.8 4.9 5
PAYEMS - SCO RE
heteroskedast ic -1 -0.8 -0.6 -0.4 -0.2
homoschedastic
-1 -0.8 -0.6 -0.4 -0.2
CES0600000007 - SCO RE
heteroskedast ic
4.2 4.3 4.4 4.5
homoschedastic
4.2 4.25 4.3 4.35 4.4 4.45 4.5
CES0600000008 - SCO RE
heteroskedast ic 3.4 3.5 3.6 3.7 3.8
homoschedastic
3.4 3.45 3.5 3.55 3.6 3.65 3.7 3.75 3.8
PPI F G S - SCO RE
heteroskedast ic
2 2.05 2.1
homoschedastic
1.96 1.98 2 2.02 2.04 2.06 2.08 2.1
PPI CMM - SCO RE
heteroskedast ic
4.5 4.6 4.7 4.8 4.9
homoschedastic
4.5 4.55 4.6 4.65 4.7 4.75 4.8 4.85 4.9
PCEPI - SCO RE
heteroskedast ic
2.5 3 3.5 4 4.5
homoschedastic
2.5 3 3.5 4 4.5
F EDF UNDS - SCO RE
heteroskedast ic
0 0.5 1
homoschedastic
-0.2 0 0.2 0.4 0.6 0.8 1
HO UST - SCO RE
heteroskedast ic 1.75 1.8 1.85 1.9
homoschedastic
1.75 1.8 1.85 1.9
S&P 500 - SCO RE
heteroskedast ic 2.25 2.3 2.35 2.4
homoschedastic
2.25 2.3 2.35 2.4
EXUSUKx - SCO RE
heteroskedast ic
-1.5 -1 -0.5 0
homoschedastic
-1.5 -1 -0.5 0
T 1YF F M - SCO RE
heteroskedast ic
-2 -1.5 -1 -0.5
homoschedastic
-2 -1.5 -1 -0.5
T 10YF F M - SCO RE
heteroskedast ic
-2 -1.5 -1 -0.5
homoschedastic
-2 -1.5 -1 -0.5
BAAF F M - SCO RE
heteroskedast ic -3.6 -3.4 -3.2 -3 -2.8
homoschedastic
-3.6 -3.4 -3.2 -3 -2.8
NAPMNO I - SCO RE
Figure 17: Comparison of density forecast accuracy. Each panel describes a di¤erent variable. The x axis reports the (log) density score obtained using the BVAR with stochastic volatility (het- eroschedastic), the y axis reports the (log) density score obtained using the homoschedastic BVAR.
Each point corresponds to a di¤erent forecast horizon from 1 to 12 step-ahead.
Score comparison: homoskedastic model (y axis) vs
heteroskedastic model (x axis)
heteroskedastic 10-3
5.7 5.75 5.8 5.85
homoschedastic
10-3
5.7 5.75 5.8 5.85
RPI - RMSF E
heteroskedastic 10-3
5 5.1 5.2
homoschedastic
10-3
5 5.05 5.1 5.15 5.2
DPCERA3M086SBEA - RMSF E
heteroskedastic 10-3
9.4 9.6 9.8
homoschedastic
10-3
9.3 9.4 9.5 9.6 9.7 9.8
CMRMT SPL x - RMSF E
heteroskedastic 10-3 6.4 6.6 6.8 7 7.2 7.4
homoschedastic
10-3
6.4 6.6 6.8 7 7.2 7.4
INDPRO - RMSF E
heteroskedastic
1 2 3
homoschedastic
1 1.5 2 2.5 3
CUMF NS - RMSF E
heteroskedastic
0.2 0.4 0.6 0.8
homoschedastic
0.2 0.3 0.4 0.5 0.6 0.7 0.8
UNRAT E - RMSF E
heteroskedastic 10-3 1.5 1.6 1.7 1.8 1.9
homoschedastic
10-3
1.5 1.6 1.7 1.8 1.9
PAYEMS - RMSF E
heteroskedastic
0.3 0.4 0.5
homoschedastic
0.3 0.35 0.4 0.45 0.5 0.55
CES0600000007 - RMSF E
heteroskedastic 10-3
2.6 2.7 2.8 2.9
homoschedastic
10-3
2.6 2.65 2.7 2.75 2.8 2.85 2.9 2.95
CES0600000008 - RMSF E
heteroskedastic 10-3
5.6 5.8 6 6.2
homoschedastic
10-3
5.6 5.7 5.8 5.9 6 6.1 6.2 6.3
PPIF G S - RMSF E
heteroskedastic 0.03 0.031 0.032
homoschedastic
0.03 0.0305 0.031 0.0315 0.032 0.0325
PPICMM - RMSF E
heteroskedastic 10-3
2 2.2 2.4
homoschedastic
10-3
1.9 2 2.1 2.2 2.3 2.4 2.5
PCEPI - RMSF E
heteroskedastic 0.01 0.015 0.02
homoschedastic
0.01 0.015 0.02
F EDF UNDS - RMSF E
heteroskedastic
0.1 0.15 0.2 0.25
homoschedastic
0.1 0.15 0.2 0.25
HO UST - RMSF E
heteroskedastic 0.03680.0370.03720.03740.0376
homoschedastic
0.0368 0.037 0.0372 0.0374 0.0376
S&P 500 - RMSF E
heteroskedastic 0.023 0.0235 0.024
homoschedastic
0.0228 0.023 0.0232 0.0234 0.0236 0.0238 0.024
EXUSUKx - RMSF E
heteroskedastic 0.5 0.6 0.7 0.8 0.9
homoschedastic
0.5 0.6 0.7 0.8 0.9
T 1YF F M - RMSF E
heteroskedastic 0.6 0.8 1 1.2 1.4 1.6
homoschedastic
0.6 0.8 1 1.2 1.4 1.6
T 10YF F M - RMSF E
heteroskedastic 0.6 0.8 1 1.2 1.4 1.6 1.8
homoschedastic
0.6 0.8 1 1.2 1.4 1.6 1.8
BAAF F M - RMSF E
heteroskedastic
4 5 6 7
homoschedastic
4 4.5 5 5.5 6 6.5 7 7.5
NAPMNO I - RMSF E
Figure 16: Comparison of point forecast accuracy. Each panel describes a di¤erent variable. The x axis reports the RMSFE obtained using the BVAR with stochastic volatility (heteroschedastic), the y axis reports the RMSFE obtained using the homoschedastic BVAR. Each point corresponds to a di¤erent forecast horizon from 1 to 12 step-ahead (in most cases, a higher RMSFE corresponds to a longer forecast horizon).
RMSFE comparison: homoskedastic model (y axis) vs
heteroskedastic model (x axis)
Conclusions
The assumptions of conjugacy and homoskedasticity in a VARs are hardly defendable, but a more general speci…cation is only manageable with a small cross-section.
We have proposed a new estimation method VARs withnon-conjugate priorsand drifting volatilitieswhich can be applied withlargemodels
The method is based on a straightforward triangularization of the system, and it is very simple to implement.
Indeed, if a researcher already has algorithms to produce draws from a VAR with an independent N-IW prior and stochastic volatility, only a single needs to be slightly modi…ed with a few lines of code.
Given its simplicity and the advantages in terms of speed, mixing, and convergence, we argue that the proposed algorithm should be preferred in empirical applications, especially those involving large datasets.
Prior dependence
We assumed that the prior variance was diagonal. This can be relaxed.
With a prior dependent across equations, the general form of the posterior can be obtained using the triangularization also on the joint prior distribution, and is:
ΠfjgjΠf1:j 1g,A,ΛT,y N(µ¯Πfjj1:j 1g,ΩΠfjj1:j 1g)
with
µ¯Πfjj1:j 1g = ΩΠfjj1:j 1g
(T
t=1
∑
Xj,thj,t1yj,t0+ΩΠ1fjj1:j 1gµΠfjj1:j 1g
)
ΩΠ1fjj1:j 1g = ΩΠ1fjj1:j 1g+
∑
T t=1Xj,thj,t1Xj,t0 , whereµΠfjj1:j 1g andΩΠfjj1:j 1g are moments of
ΠfjgjΠf1:j 1g N(µΠfjj1:j 1g,ΩΠfjj1:j 1g), i.e. the conditional priors implied by the joint prior speci…cation.
The moments ofΠfjgjΠf1:j 1g can be found recursively from the joint prior
Carriero, Clark, Marcellino () Large VARs September 2017 22 / 22
Model size, stochastic volatility, and forecasting
Pseudo out of sample exercise performed recursively, starting with the estimation sample 1960:3 to 1970:2 and ending with 1960:3 to 2014:5.
We consider four models.
1 A small homoskedastic VAR including the growth rate of industrial production (∆lnIP), the in‡ation rate based on consumption expenditures (∆lnPECEPI) and the e¤ective Federal Funds Rate (FFR).
2 A large (20 variables) homoskedastic VAR along the lines of Carriero, Clark, and Marcellino (2015), Giannone, Lenza, and Primiceri (2015), and Koop (2013).
3 A small VAR with time variation in volatilities along the lines of Clark (2011), Cogley and Sargent (2005) and Primiceri (2005).
4 The fourth model includes both time variation in the volatilities and a large (20 variables) information set.
Forecasting
Direct e¤ects:
The use of a larger dataset improves point forecasts via a better speci…cation of the conditional means.
The inclusion of time variation in volatilities improves density forecasts via a better modelling of error variances,
Interactions:
A better point forecast improves the density forecast as well, by centering the predictive density around a more reliable mean
Time varying volatilities improve the point forecasts at longer horizons - because the heteroskedastic model will provide more e¢ cient estimates (through a GLS argument) and a therefore a better characterization of the predictive densities
Carriero, Clark, Marcellino () Large VARs September 2017 20 / 22