JSS Journal of Statistical Software
May 2011, Volume 41, Issue 10. http://www.jstatsoft.org/
State Space Methods in Stata
David M. Drukker
Stata
Richard B. Gates
Stata
Abstract
We illustrate how to estimate parameters of linear state-space models using the Stata
program sspace. We provide examples of how to use sspace to estimate the parame-
ters of unobserved-component models, vector autoregressive moving-average models, and
dynamic-factor models. We also show how to compute one-step, filtered, and smoothed
estimates of the series and the states; dynamic forecasts and their confidence intervals;
and residuals.
Keywords: state-space, unobserved-components models, local-level model, local-linear-trend
model, basic structural model, dynamic-factor model, vector autoregressive moving-average
model, sspace.
1. Introduction
Stata is a general purpose package for statistics, graphics, data management, and matrix
language programming. Stata’s coverage of statistical areas is one of the most complete
available, with many commands for regression analysis (StataCorp 2009k,l,m), multivariate
statistics (StataCorp 2009i), panel-data analysis (StataCorp 2009h), survey data analysis
(StataCorp 2009n), survival analysis and epidemiology statistics (StataCorp 2009o), and time-
series analysis (StataCorp 2009p). It is used for data management (Mitchell 2010), health
research (Juul and Frydenberg 2010; Cleves, Gould, Gutierrez, and Marchenko 2010), as well
as in economic analysis (Cameron and Trivedi 2009; Baum 2006). Stata is also a programming
language used by researchers to implement and disseminate their methods; see any of the more
than 40 issues of The Stata Journal for examples of peer-reviewed user-written programs and
see StataCorp (2009j,f,g) for Stata’s programming capabilities.
The Stata command sspace, released in version 11, estimates the parameters of linear state-
space models by maximum likelihood (StataCorp 2009e). As demonstrated by Harvey (1989)
and Commandeur, Koopman, and Ooms (2011), linear state-space models are very flexible,
ACER
高亮
2 State Space Methods in Stata
and many linear time-series models can be written as linear state-space models. In this
article, we show how to use sspace to estimate the parameters of linear state-space models.
We also note that Stata has some additional commands, such as dfactor, which provide
simpler syntaxes for estimating the parameters of particular linear state-space models.
Because of this flexibility, sspace has two syntaxes; we call them the covariance-form syntax
and the error-form syntax. They are illustrated by estimating the parameters of a local-
linear-trend model with a seasonal component and a vector autoregressive moving-average
(VARMA) model, respectively. In each syntax, the user must specify one or more state
equations, one or more observation equations, and the stochastic components.
2. Case 1: The local-level model
The local-level model is described by Commandeur et al. (2011, Section 2.1) and we briefly
review it here. The observation and state equations of this model are
yt = µt + �t,
µt = µt−1 + ξt, (1)
respectively, where �t ∼ N(0, σ2� ) and ξt ∼ N(0, σ2ξ ) and both are independent. We express
the level component at time t, µt, as a function of that at time t−1. This notation is a subtle
change from that in Commandeur et al. (2011), but it is more consistent with the syntax of
Stata’s sspace for describing the model and how sspace executes the state-space recursions
by starting with index 0 instead of 1. The parameters in this model are σ2� , σ
2
ξ , and µ0.
2.1. Covariance-form syntax
The covariance-form syntax of sspace is as follows:
sspace state_eq [state_eq ... state_eq]
obs_eq [obs_eq ... obs_eq] [if] [in] [, options]
where state_eq are state equations of the form
(statevar [lagged_statevars] [indepvars], state [noerror noconstant
covstate(covform)])
and obs_eq are observation equations of the form
(depvar [statevars] [indepvars] [, noerror noconstant
covobserved(covform)])
A list of state equations, observation equations, and options specifies an sspace model. The
square brackets indicate optional arguments, so the syntax diagram indicates that at least one
state equation and one observation equation are required. Each equation must be enclosed
in parentheses. In Stata parlance, a comma in the command toggles the parser from model
specification mode to options specification mode. Options included within an equation are
applied to that equation. Options specified outside the individual equations are applied to
the model as a whole.
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
Journal of Statistical Software 3
Each state equation specifies the name of a latent variable and must have the state option
specified. A state equation optionally contains a list of lagged state variables and a list
exogenous covariates. By default, a constant is included in the equation unless the noconstant
option is specified. By default, an error term is included in the equation unless the noerror
option is specified. The option covstate() allows you to specify the covariance structure
of the state equations. The covform in the syntax diagram may be identity, dscalar,
diagonal, or unstructured. The default is diagonal. The option dscalar states that the
covariance is diagonal and that all the variance terms are equal.
Each observation equation specifies the name of an observed dependent variable. An observa-
tion equation optionally contains a list of contemporaneous state variables and a list exogenous
covariates. By default, a constant is included in the equation unless the noconstant option
is specified. By default, an error term is included in the equation unless the noerror option
is specified. The option covobserved() allows you to specify the covariance structure of the
observation equations. The covariance forms are the same as the option covstate().
The [if] and the [in] specifications allow you to estimate the parameters using a subsample
of the observations.
The options in the main syntax diagram include model, optimization, and display options.
An important model option is constraints(), parameter constraints that identify the model.
A popular optimization option is the technique() option. Two good techniques for sspace
are technique(BHHH), or the Berndt-Hall-Hall-Hausman technique; and the technique(NR),
for Newton-Raphson. Optimization techniques may be mixed; such is the default, technique
(BHHH 5 NR), which specifies the BHHH method for the first 5 iterations and NR for the
remaining iterations. An example of a display option is level(), which allows you to set the
confidence level to something other than the default of 95%.
We clarify this syntax in the following example.
2.2. Estimating the variances of a local-level model using sspace
Here we illustrate the sspace syntax by estimating the parameters of the local-level model
on the well-known Nile dataset containing observations on the annual Nile River flow volume
at Aswan, Egypt, from 1870 to 1970. The Stata command use loads the dataset into memory
and the command describe describes it.
. use http://www.stata.com/ddrukker/nile.dta
(Nile river annual flow volume at Aswan from 1870 to 1970)
The describe command will display a dataset’s size, its variables, their storage type and
format, any labels associated with the variables, sorting information, and any descriptive
information that you have added to document your data.
. describe
Contains data from data/nile.dta
obs: 100 Nile river annual flow volume
at Aswan from 1870 to 1970
vars: 2 16 Jun 2008 10:49
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
4 State Space Methods in Stata
size: 1,200 (99.9% of memory free)
------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------
AFV long %12.0g Annual Flow Volume
year long %ty
------------------------------------------------------------------------------
Sorted by: year
Stata computes time-series operators of variables using a time variable specified by the tsset
command. Below we specify year to be our time variable; we tsset the data, in Stata
parlance.
. tsset year
time variable: year, 1871 to 1970
delta: 1 year
We could now use sspace to estimate the parameters using the code
constraint define 1 [level]L.level = 1
constraint define 2 [AFV]level = 1
sspace (level L.level, state noconstant) ///
(AFV level, noconstant), ///
constraints(1 2)
While this code is transparent to Stata users, we discuss it in some detail for readers who are
unaccustomed to Stata.
The first two lines define constraints on the model parameters, as discussed below. The third
line begins with the command sspace and is followed by the definition of the state equation
(level L.level, state noconstant)
which is best understood from right to left. The option noconstant specifies that there is no
constant term in the equation; the option state specifies the equation as a state equation;
and the comma separates the options from equation specification. By specifying the equation
as level L.level, we specify level as the name for the unobserved state and we specify that
the state equation is
levelt = αlevelt−1
We use Stata’s lag operator, L. in this example, to model level as a linear function of the
lagged level.
At the end of third line, the three slashes, ///, denote a line continuation in Stata. In this
example, we see that lines 3, 4, and 5 compose a single Stata command.
The fourth line specifies that the observation equation in the model is
AFVt = βlevelt + �t
Journal of Statistical Software 5
where the �t are independent and identically distributed (IID) normal errors. As in the state
equation above, we used the noconstant option to suppress the constant term.
The model in Equation (1) requires that α = β = 1. Lines 1 and 2 declare these constraints;
on line 4, the option constraints(1 2) applies them to this model.
Repeating the code, we proceed with estimation:
. constraint define 1 [level]L.level = 1
. constraint define 2 [AFV]level = 1
. sspace (level L.level, state noconstant) ///
> (AFV level, noconstant), ///
> constraints(1 2)
searching for initial values ...
(setting technique to bhhh)
Iteration 0: log likelihood = -635.14379
Iteration 1: log likelihood = -633.9615
Iteration 2: log likelihood = -633.60088
Iteration 3: log likelihood = -633.57318
Iteration 4: log likelihood = -633.54533
(switching technique to nr)
Iteration 5: log likelihood = -633.51888
Iteration 6: log likelihood = -633.46465
Iteration 7: log likelihood = -633.46456
Iteration 8: log likelihood = -633.46456
Refining estimates:
Iteration 0: log likelihood = -633.46456
Iteration 1: log likelihood = -633.46456
State-space model
Sample: 1871 - 1970 Number of obs = 100
Log likelihood = -633.46456
( 1) [level]L.level = 1
( 2) [AFV]level = 1
------------------------------------------------------------------------------
| OIM
AFV | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
level |
level |
L1. | 1 . . . . .
-------------+----------------------------------------------------------------
AFV |
level | 1 . . . . .
-------------+----------------------------------------------------------------
var(level) | 1469.176 1280.375 1.15 0.251 -1040.313 3978.666
var(AFV) | 15098.52 3145.548 4.80 0.000 8933.358 21263.68
------------------------------------------------------------------------------
Note: Model is not stationary.
Note: Tests of variances against zero are conservative and are provided only
for reference.
6 State Space Methods in Stata
Commandeur et al.
e() result name notation
e(A) T
e(B)
e(C) R
e(chol_Q) Q1/2
e(D) Z
e(F)
e(G)
e(chol_R) H1/2
Table 1: Kalman filter matrices in Stata’s e() results and their Commandeur et al. (2011)
equivalents.
The output table reports that sspace estimates σ2ξ to be 1,469.2 and σ
2
� to be 15,098.5.
Having provided a simple example of how to use sspace, we now provide some technical details
about its implementation. sspace uses the Mata optimizer optimize() (StataCorp 2009c).
sspace uses analytic first derivatives, from which it numerically computes the second order
derivatives necessary for Newton-Raphson optimization. If you are using the multiprocessor
version of Stata (Stata MP), the numerical second derivatives are computed in parallel.
optimize() will not declare convergence until the length of the scaled gradient is smaller
than 10−6. That is when gTk Ĥ
−1
k gk < 10
−6, where gk is the gradient on the k-th step and
Ĥk is the approximated negative Hessian. The requirement that Ĥk be nonsingular prevents
sspace from declaring convergence when the parameters are not identified, as discussed in
Drukker and Wiggins (2004).
The standard errors are computed from the negative Hessian unless the variance-covariance
option, vce(), specifies otherwise. The OIM in the table header for the standard errors indi-
cates that the standard errors are computed from the observed information matrix. If non-
normal errors are suspected, use vce(robust) to obtain the Huber-White robust standard
errors (StataCorp 2009q, robust).
Stata estimation commands store their results in a memory region called ereturn. The results
may be accessed by the user and are used by other Stata commands, which are referred to as
postestimation commands in Stata parlance. Typing
. ereturn list
lists the results saved in e(). You may view or access any e() result by identifying the object
as e(name), where name is the name of the object.
The matrices saved off by sspace are listed in Table 1 along with the Commandeur et al.
(2011, Equations 1 and 2) equivalents.
Mixing both notations, a linear state-space model is
αt = Tαt−1 + Bxt + Rηt
yt = Zαt + Fwt + G�t,
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
Journal of Statistical Software 7
where xt and wt are column vectors of covariates. The vector wt may contain lagged inde-
pendent variables specified on the left-hand side of observation equations. Commandeur et al.
(2011) incorporate the regression coefficent matrices B and F into the state transition matrix
T and the observation equation matrix Z, respectively.
The Kalman filter recursions are initialized with α1 = Tα0 + Bx1.
In this example the matrices are all 1 × 1, and we have e(A) = 1, e(D) = 1, e(chol_Q)
=
√
var(level), and e(chol_R) =
√
var(AFV). The remaining matrices do not exist for this
model.
Stata’s sspace uses the square-root filter to numerically implement the Kalman filter recur-
sions (DeJong 1991b; Durbin and Koopman 2001, Section 6.3). Moreover, when the model
is not stationary, as is the case here, the filter is augmented as described by DeJong (1991a),
DeJong and Chu-Chun-Lin (1994), and Durbin and Koopman (2001, Section 5.7). The two
techniques are used together to evaluate the likelihood (DeJong 1988) and to provide maxi-
mum likelihood (ML) estimates of the parameters of the state-space model. The techniques
also provide an estimate of the initial state. The initial state, α0 = µ0 is diffuse and is mod-
eled as var(µ0) → ∞ and E[µ0] = δ. The ML estimate of δ is 1120.0. This quantity is not
reported by sspace, but is stored as e(d).
We can obtain predictions using the predict command, after estimating the parameters. All
the standard objects and their standard errors can be predicted using predict after sspace.
These objects and the syntax for predict after sspace are discussed in StataCorp (2009d).
2.3. Case 1 postestimation
With the local-level model estimates still in memory we predict the smoothed trend of the
Nile annual flow volume using the DeJong (1989) diffuse Kalman filter. Here we use the rmse
option to obtain the smoothed trend root-mean-square error (RMSE) that is subsequently
used to compute 90% confidence intervals. A second call to predict obtains the standardized
residuals. We graph the series, trend, and trend confidence intervals in one graph and the
standardized residuals in a second graph. We then combine the two graphs into one and allow
it to render. This graph is displayed in Figure 1.
. predict trend, state equation(level) smethod(smooth) rmse(rmse)
.
. scalar z = invnormal(.95)
. gen lb = trend - z*rmse
. gen ub = trend + z*rmse
.
. predict res, rstandard
.
. twoway (tsline AFV trend) (tsrline lb ub), tlabel(1870(50)1970) ///
> ytitle(Annual Flow Volume) name(AFV) nodraw legend(off)
.
. tsline res, yline(3 -3) yline(0) tlabel(1870(50)1970) name(RES) nodraw
.
. graph combine AFV RES, name(AFVR) rows(2)
Next, we demonstrate forecasting. First we use the preserve command to save the original
dataset. We then extend the data by 10 years using the tsappend command. We compute
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
8 State Space Methods in Stata
Figure 1: In the upper panel we display the Nile annual flow volume time-series (blue) with
smoothed trend estimates (red) and trend 90% confidence intervals. The lower panel displays
the standardized residuals.
the one-step predictions, compute dynamic forecasts from 1971 to 1980, and compute the
RMSE’s for the predictions and forecast predictions. We then compute the 50% confidence
intervals for the forecasts and graph the results. Finally, we restore the original dataset. The
graph is shown in Figure 2.
. preserve
. tsappend, add(10)
. predict flow, dynamic(1971) rmse(rflow)
. scalar z = invnormal(.75)
. gen lb = flow - z*rflow
(1 missing value generated)
. gen ub = flow + z*rflow
(1 missing value generated)
. twoway (tsline AFV flow) (tsrline lb ub if year>=1970), ///
> tlabel(1870(10)1980) ytitle(Annual Flow Volume) name(FOR1) xline(1970) ///
> legend(label(1 "AFV") label(2 "predicted/forecast") label(3 "50% CI"))
. restore
ACER
高亮
ACER
高亮
ACER
高亮
Journal of Statistical Software 9
Figure 2: The Nile river annual flow volume (blue), one-step predictions and dynamic forecasts
(red), and forecast 50% confidence intervals.
3. Case 2: A local-linear-trend model
In this section we review the structure of a local-linear-trend model with an autoregressive
component, AR(1), and a seasonal component. The state-space form of a time-domain sea-
sonal component is described in Commandeur et al. (2011, Section 2.1). Our state-space
model is
µt =µt−1 + νt−1 + ξt, (2)
νt =νt−1, (3)
ηt =φ · ηt−1 + ζt, (4)
γ1,t =− γ1,t−1 − γ2,t−1 − γ3,t−1 + ωt, (5)
γ2,t =γ1,t, (6)
γ3,t =γ2,t, (7)
yt =µt + ηt + γ1,t, (8)
where ζt ∼ NID(0, σ2ζ ), ξt ∼ NID(0, σ2ξ ), and ωt ∼ NID(0, σ2ω).
Equation (8) is the observation equation and it depends on the states µ (the linear trend),
η (the AR(1) term), and γ1 (the seasonal component). The observation equation has no
error term. The model has six state equations: two for the linear trend, one for the AR(1)
component and three for the seasonal component.
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
ACER
高亮
10 State Space Methods in Stata
3.1. Estimating parameters of the local-linear-trend model using sspace
We now use sspace to estimate the parameters of a local-linear-trend model with an AR(1)
component and a seasonal component. We fit this model to quarterly data on the food and
tobacco production (FTP) in the United States for the years 1947 to 2000. Cox (2009) uses
the dataset to demonstrate graphing seasonal time-series data in Stata.
First we read the dataset into memory and describe it:
. use http://www.stata.com/ddrukker/ftp.dta
(Food and tobacco production in the United States for 1947-2000)
. describe
Contains data from data/ftp.dta
obs: 216 Food and tobacco production in
the United States for
1947-2000
vars: 2 11 Jan 2010 10:02
size: 2,592 (99.9
本文档为【State space in Stata】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。