Evaluation of
Vector Error Correction Models
in comparison with
Simkins: Forecasting with Vector
Autoregressive (VAR) Models
Subject to Business Cycle Restrictions
Stefan Zeugner
9851051
Seminar paper for:
A. Geyer: Seminar aus Operations Research 3622
Stefan Zeugner
Student at Vienna University of Economics and B.A.
April 2002
e-mail: h9851051@wu.edu
Stefan Zeugner, 9851051 2
Index
Introduction...............................................................................................3
Simkins’ unrestricted VAR ........................................................................4
A VAR in logs ...........................................................................................5
VEC models..............................................................................................7
Seasonality – A VEC in the Differences .................................................11
Conclusion..............................................................................................15
References: ............................................................................................16
Appendix 1: Forecasting Results Of the Considered Models .................17
Appendix 2: Model Forecasts for Three Periods in Diagrams ................18
Abstract:
This paper is based on the cited article of Scott Simkins (1995): In order of producing
macroeconomic forecasts, he constructed a 5-varibale VAR restricted by common
characteristics of business cycles in a Monte Carlo procedure. Simkins then
evaluated its performance against an unrestricted VAR and a Bayesian VAR and
concluded that his procedure was only marginally superior to an unrestricted VAR
and that a BVAR analysis performed much better in predicting GNP, unemployment
and inflation.
In this paper I will show that a slight improvement in the specification of the
unrestricted VAR and, even more, Vector Error Correction models produce forecasts
able to compete with the BVAR mentioned above.
Stefan Zeugner, 9851051 3
Introduction
Originally, I was supposed to reproduce the basis of this paper, the article
“Forecasting with Vector Autoregressive (VAR) Models Subject to Business Cycle
Restrictions”.
The author, Prof. Simkins, was primarily interested in an application of business cycle
theory to forecasting of five US quarterly macroeconomic time series: Real GNP,
GNP deflator, unemployment rate, real fixed investment and money supply (M1) from
1948 to 1990.
By applying a turning point procedure (Bry and Boschan), he then distinguished
seven completed business cycles in the data. These cycles were normalized by their
mean and divided into nine stages – the first trough (start), the peak and the second
trough (end) and three successive thirds for the expansion and the contraction phase
– and finally the mean of these stages (plus/minus standard deviation bounds) was
computed for each of the five variables.
Then Simkins estimated an ordinary, 6-lag level VAR and applied multinormal
drawing procedures to its parameters and errors in order to conduct a Monte Carlo
simulation for the whole sample period. Then the same turning point and stage
procedure as above was applied to the simulation outcomes and only those
corresponding to the “historical” business cycle patterns (for each variable – this
resulted in a selection of about 10% of the simulated paths) were selected as good
enough for conducting dynamic forecasts (1 to 8 steps ahead) for three arbitrarily
chosen periods (1987:3-1989:2, 1988:2-1990:1, 1989:1-1990:4). The author then
evaluated the “fit” of these predictions by Theil’s U-Statistic of GNP, Deflator and
unemployment rate forecasts.
Besides his own restricted VAR, Simkins did an evaluation of well-known, more
established VAR techniques: an unrestricted, normal VAR and a Bayesian VAR
(BVAR) with Minnesota Priors (this is a technique of imposing prior distributions near
to random walk to the VAR parameters and obtaining the “best” distribution by
successive re-estimation of the VAR’s final distribution by Monte Carlo Methods).
As I mentioned above, I was supposed to reproduce the paper and I invested a lot of
energy in understanding the theory of Bayesian and Monte Carlo techniques in order
Stefan Zeugner, 9851051 4
to calculate the restricted VAR model and the BVAR. But the means I had were
inadequate: no courses and experts regarding the matter at university, and the
software Eviews, Mathematica, MS Excel and VBA. After three days I concluded that
accomplishing my task would be a matter of weeks, rather than days. Therefore I
chose a different path:
Simkins wrote his paper in 1994, when Vector Error Correction (VEC) models did
already exist: Nevertheless he did not consider them for evaluating the performance
of his own model or for applying his procedure to them. In the following pages, I will
show how I estimated VEC models that even beat the Theil U Statistic for BVARs.
Moreover I will demonstrate that a slight change in the specification of the VAR (not
to levels, but to their logarithms) improves its performance considerably. These types
of models are much more simple to estimate than a BVAR or Simkins’ variant (and
they could provide a better basis for applying Simkins’ or Bayesian procedures).
I will first introduce Simkins’ unrestricted VARs and consider some improvements.
Then I will evaluate certain variants of VECs and their different performance
regarding different questions.
Simkins’ unrestricted VAR
Simkins estimated a simple 6-lag VAR with constants, corresponding to a
macroeconomic model by Litterman. Figure 1 shows its estimation output.
The model was evaluated versus the two others by Theil’s U Statistic: This measure
divides the root mean squared error (RMSE) of the models forecast by the RMSE of
the naïve forecast: the naïve forecast is simply taking the last value in the sample
Deflator Investment M1 GNP unemployment
R-squared 0.999950 0.999396 0.999834 0.996594 0.975708
Adj. R-squared 0.999937 0.999247 0.999793 0.995750 0.969685
Sum sq. resids 6.444210 53068.20 610.8618 9375.864 10.92836
S.E. equation 0.230777 20.94230 2.246874 8.802640 0.300528
F-statistic 80186.42 6677.206 24289.37 1180.301 161.9992
Log likelihood 24.53443 -660.6930 -321.3939 -528.9517 -15.60716
Akaike AIC 0.085073 9.101224 4.636762 7.367785 0.613252
Schwarz SC 0.701786 9.717937 5.253474 7.984498 1.229965
Mean dependent 51.99013 2337.464 256.1728 375.7796 5.724013
S.D. dependent 29.12900 763.0011 156.0972 135.0275 1.726051
Determinant Residual Covariance 102.4292
Log Likelihood -1430.210
Akaike Information Criteria 20.85803
Schwarz Criteria 23.94159
Figure 1: Estimation output of Simins’ VAR
Stefan Zeugner, 9851051 5
(the forecast’s starting point) as a prediction for the time series. The more the value
of Theil’s U Statistic is close to zero, the better is the fit of the underlying forecast. A
value below 1 indicates that the model performs better than the naive forecast.
This measure is applied to Real GNP, GNP Deflator and unemployment rate
forecasts – the performance of Simkins’ three models is shown in Figure 2.
The Theil U Statistics for the unrestricted VAR were computed by seven 1 to 8-step
ahead forecasts in the period 1987:3 to 1990:4. Thus the starting periods for the
seven forecasts are from 1987:2 to 1988:4. It can easily be seen that the dynamic
forecasts become the less accurate, the more they are ahead of their starting period.
Simkin’s methods are only marginally superior to the predictions by an unrestricted
VAR, whereas the Bayesian VAR performs much better than the simple and the
“theoretical” VAR.
A VAR in logs
However, some problems were not considered in this approach: First, a VAR is a
linear model, i.e. it does not capture non-linear elements, elements existing certainly
in level series of GNP, deflator, money supply and investment (especially concerning
Variable Steps ahead (k) Unrestricted VAR
model
Restricted VAR model Bayesian VAR model
Real GNP 1 1.062 1.043 0.303
2 1.227 1.186 0.298
3 1.198 1.132 0.311
4 1.158 1.060 0.366
5 1.124 0.993 0.445
6 1.116 0.977 0.549
7 1.135 1.007 0.648
8 1.157 1.039 0.794
GNP Deflator 1 0.551 0.510 0.290
2 0.651 0.582 0.289
3 0.721 0.605 0.284
4 0.786 0.621 0.274
5 0.838 0.634 0.262
6 0.869 0.631 0.253
7 0.900 0.634 0.252
8 0.941 0.646 0.266
Unemployment Rate 1 3.152 3.117 0.656
2 4.332 4.212 0.635
3 5.601 5.342 0.779
4 6.569 6.124 0.939
5 7.861 7.160 1.302
6 8.227 7.406 1.523
7 8.606 7.750 1.945
8 8.529 7.775 2.458
Figure 2: Theil U statistics of Simkins’ three models’ 1-8-step-ahead
Stefan Zeugner, 9851051 6
their exponential growth). The easiest way to respond to this problem is to linearize
the data by taking the logs of the levels.
Second, the lag length of 6 is not the optimal choice, if one considers selection
criteria based on log likelihood. A log-estimation of different lag lengths and choice by
the Schwarz criterion (SC) and the Akaike info criterion (AIC) lead to the conclusion
that a VAR with lag three would be optimal: SC and AIC are the highest for a 2-lag
VAR1, plus one lag for being sure to capture additional information (for the case that
the minimum lies between lag two and lag three). One might consider a lag selection
by an LR statistic, too: This is a measure of testing the null-hypothesis of adding
parameters to the model does not change it significantly towards the “good” direction.
Given the high number of added restrictions (parameters of the equations) per lag
(25, the number of degrees of freedom in a Chi2-distribution per adding one lag) the
resulting p-values prefer the 3-lag model to every higher-lag model2.
But Simkins wanted to conduct dynamic forecasts up to eight periods ahead.
Considering this aim, more lags would certainly add a bit more “real” information into
far-ahead forecasts, even if their short-term performance would suffer. Concerning
that, a 6-lag (and maybe an 8-lag) VAR seem to be the best choice because there
log-likelihoods increase considerably over lag 5 and 7. Nevertheless, for the sake of
a short paper I will only analyze the three lag VAR: The estimation output of such a
VAR is shown in Figure 3.
1 The AIC is even higher with higher lag number (6 and 8), but very slightly.
2 The p-values are computed by the following procedure: one minus the Chi-squared distribution of
two times the log-likelihood of the lower-lag VAR minus log-likelihood of the higher-lag VAR, with the
as many degrees of freedom as the difference of parameters between the two. The resulting p-values
are 0.99, 0.567 and 0.687 for the four-, six, and eight-lag VARs versus the three-lag VAR,
respectively.
Deflator Investment M1 GNP Unemployment
R-squared 0.999911 0.997028 0.999874 0.999419 0.967683
Adj. R-squared 0.999902 0.996721 0.999861 0.999359 0.964340
Sum sq. resides 0.004090 0.072261 0.006796 0.012190 0.476889
S.E. equation 0.005311 0.022324 0.006846 0.009169 0.057349
F-statistic 109195.5 3243.435 76727.68 16637.24 289.4563
Log likelihood 623.2854 392.1148 582.4173 535.3801 240.2119
Akaike AIC -7.543918 -4.672234 -7.036240 -6.451926 -2.785241
Schwarz SC -7.237691 -4.366007 -6.730013 -6.145700 -2.479014
Mean dependent 3.837678 5.875014 5.425485 7.710055 1.696497
S.D. dependent 0.537401 0.389856 0.580659 0.362215 0.303693
Determinant Residual Covariance 4.22E-20
Log Likelihood 2448.957
Akaike Information Criteria -29.42804
Schwarz Criteria -27.89691
Figure 3: Estimation output for a VAR in logs (3 lags)
Stefan Zeugner, 9851051 7
As seen in the Theil U table in the appendix3, this log-VAR provides a much better fit
then the original one, a feature that also marks the covariance matrix of the residuals:
A comparison of the two determinants of each’s residual covariance matrix shows a
value >100 for the original VAR and a value near to zero for the log-VAR. Since a
linearly dependent covariance matrix seems unlikely, the zero-value must be due to
very small covariances – but these are caused by the transformation into log-units,
and must not be due to a real improvement of the model. The same goes for the
“criterions”: The lower AIC and SC values of the log-VAR can not be considered as
an improvement since the dependent variable has changed.
VEC models
As seen by a closer look at Figure 1 and 3, the high R2s of the VAR models in (log-
linearized) levels hint at a spurious regression problem. This does not mean that
there is no relationship between our five variables, but part of the R2 might only be
due to the correlation of integrated data. A unit-root test on the five variables confirms
this suspicion: not even the coefficient4 of the unemployment rate is negative enough
to reject the null hypothesis of an integrated time series! The residuals of the level-
VAR are also integrated, whereas the residuals of the log-VAR are not.
3 The Theil U values for the unrestricted VAR in the appendix differ slightly from its Theil U values in
Figure 2. This is due to the fact that Theil U statistics in the appendix are from the VAR I reproduced
relying on Simkins‘ paper (and computed in MS Excel), and Figure 1 is copied from Simkins. The
difference may be attributed to MS Excel’s minor accuracy concerning matrix operations.
4 The coefficient of a LS regression of the first difference of the unemployment rate versus its value
lagged by one period (plus 4 lagged first differences). The unit-root test carried out was an
Augmented-Dickey-Fuller Test.
Covariance matrix of the VAR in levels
Deflator GNP Investment M1 Unempl.
Deflator 0.046707 0.321017 -0.155511 -0.045697 -0.007788
GNP 0.321017 62.17747 87.38736 3.884951 -0.859825
Investment -0.155511 87.38736 362.9078 4.918358 -2.739014
M1 -0.045697 3.884951 4.918358 5.019088 -0.107527
Unempl. -0.007788 -0.859825 -2.739014 -0.107527 0.073740
Covariance matrix of the VAR in logs
Deflator GNP Investment M1 Unempl.
Deflator 2.54E-05 1.25E-05 6.94E-07 -6.41E-07 -4.84E-05
GNP 1.25E-05 0.000449 4.32E-05 9.43E-05 -0.000508
Investment 6.94E-07 4.32E-05 4.22E-05 1.37E-05 -7.15E-05
M1 -6.41E-07 9.43E-05 1.37E-05 7.57E-05 -0.000276
Unempl. -4.84E-05 -0.000508 -7.15E-05 -0.000276 0.002962
Figure 4: Covariance matrices for Simkins’ VAR and the log-VAR
Stefan Zeugner, 9851051 8
The first response to this problem would be to estimate a VAR in the first differences
(resp. The returns) of our five variables. Nevertheless, some important information
may also be contained in the levels of the data: e.g. the so-called “productivity
slowdown” in US after-war growth rates (the higher the GNP, the less its growth).
Regarding that, a Vector Error Correction (VEC) Model would be the right response,
principally a VAR in first differences but with correction restrictions based on the
cointegration concept.
In order to know if a VEC is appropriate, a cointegration test has to be conducted.
Figure 5 summarizes such a test for the number of cointegration relations, and the
columns correspond to the five different assumptions concerning the structure of the
VEC equations (the number of lags does not change the outcome significantly).
According to its output, assumption 45 is selected because it sounds reasonable that
imbalances in our four integrated (without the unemployment rate) variables may
grow or fall with respect to time. In addition, a number of two cointegration equations
may be more realistic regarding the character of our time series and the SC and AIC
of assumption four seem more convincing.
5 The cointegration equation contains constants and a linear trend.
Series: LOG(DEFL) LOG(GNP) LOG(INV) LOG(MONE) LOG(UNEMP)
Lags interval: 1 to 6
Data Trend: None None Linear Linear Quadratic
Rank or No Intercept Intercept Intercept Intercept Intercept
No. of CEs No Trend No Trend No Trend Trend Trend
Akaike Information Criteria by Model and Rank
0 -29.24852 -29.24852 -29.34955 -29.34955 -29.34029
1 -29.34006 -29.46143 -29.55635 -29.55415 -29.55694
2 -29.38920 -29.50464 -29.59161 -29.62532 -29.61359
3 -29.42351 -29.52707 -29.58078 -29.61583 -29.59574
4 -29.40625 -29.49769 -29.48565 -29.55768 -29.54971
5 -29.28650 -29.38928 -29.38928 -29.44938 -29.44938
Schwarz Criteria by Model and Rank
0 -26.42494 -26.42494 -26.43184 -26.43184 -26.32847
1 -26.32823 -26.43078 -26.45041 -26.42938 -26.35687
2 -26.18913 -26.26693 -26.29743 -26.29349 -26.22529
3 -26.03521 -26.08229 -26.09836 -26.07694 -26.01920
4 -25.82971 -25.84585 -25.81499 -25.81172 -25.78493
5 -25.52172 -25.53038 -25.53038 -25.49635 -25.49635
L.R. Test: Rank = 4 Rank = 4 Rank = 2 Rank = 2 Rank = 2
Figure 5: Johansen Cointegration test summarizing five assumptions
Stefan Zeugner, 9851051 9
Therefore a VEC with two cointegration equations under assumption four is
estimated, one for 3 lags and one for 6 lags. Figure 7 shows their estimation outputs.
The two cointegration equations yield the same output regardless of which variables
are included in each of them, since they can be transformed linearly. An short look at
the two lower tables shows that almost all of the variables depend significantly on at
least one cointegration equation. It seems that the trend variable is significant
although it has only a slight impact on the outcome (eliminating it worsens the results
only slightly) – and at least one cointegration equation is justified. In addition the
cointegration relationships provide an opportunity of economic interpretation: If one
looks, e.g., at the 6-lag VEC, what effect does the level of GNP have on GNP
3-lag-VEC 6-lag-VEC
Standard errors & t-statistics in
parentheses
Standard errors & t-statistics in
parentheses
Cointegrating Eq: CointEq1 CointEq2 CointEq1 CointEq2
LOG(DEFLATOR(-1)) 1.000000 0.000000 1.000000 0.000000
LOG(INVESTMENT(-1)) 0.000000 1.000000 0.000000 1.000000
LOG(M_ONE(-1)) -0.210937 -0.575160 -0.466773 -0.062476
(0.29787) (0.23871) (0.14714) (0.08027)
(-0.70815) (-2.40943) (-3.17220) (-0.77833)
LOG(REALGNP(-1)) 6.397353 -6.304075 3.968452 -0.932555
(2.88127) (2.30905) (1.20802) (0.65899)
(2.22033) (-2.73016) (3.28508) (-1.41513)
LOG(UNEMPL(-1)) 0.648705 -0.789048 0.323661 -0.028291
(0.40834) (0.32724) (0.16503) (0.09003)
(1.58864) (-2.41119) (1.96119) (-0.31425)
@TREND(48:1) -0.057139 0.048589 -0.035204 -1.13E-05
(0.02594) (0.02079) (0.01102) (0.00601)
(-2.20279) (2.33740) (-3.19345) (-0.00187)
C -48.31831 43.10782 -29.50128 1.703781
The cointegration equations in the 3-lag-VEC model
Error Correction: D(LOG(DEFLAT
OR))
D(LOG(INVEST
MENT))
D(LOG(M_ONE)) D(LOG(REALGN
P))
D(LOG(UNEMPL
))
CointEq1 0.012441 -0.074175 0.022298 -0.039480 0.180855
(0.00656) (0.02735) (0.00871) (0.01198) (0.07367)
(1.89511) (-2.71165) (2.56138) (-3.29632) (2.45480)
CointEq2 0.002520 -0.172149 0.002547 -0.037476 0.320545
(0.00766) (0.03193) (0.01016) (0.01398) (0.08601)
(0.32881) (-5.39081) (0.25066) (-2.68022) (3.72691)
The cointegration equations in the 6-lag-VEC model
Error Correction: D(LOG(DEFLAT
OR))
D(LOG
本文档为【自向量回归模型var】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。