1
ApEc 8212 Econometric Analysis --- Lecture #5
Instrumental Variables (Part 2)
I. Problems with Weak Instruments in IV Estimates
The justification for all the IV formulas in Lecture 4
is asymptotic; sample sizes have to be “large” for IV
to work. In small samples, IV estimates tend toward
OLS estimates as the number of instruments goes up.
(They are identical if the number of instruments = the
number of observations.) “Weak” IVs, that is IVs
with low predictive power for endogenous variables,
have several problems. This lecture reviews this issue.
A recent review of most (but not all) of the material in
this lecture is Cameron and Trivedi (2004, sect. 4.9).
Bound, Jaeger and Baker (1995)
Consider a simple IV model with only one x variable,
which you suspect is endogenous. For simplicity,
assume that the means of y, x and z equal zero, so
that we can ignore the constant terms:
y = βx +
x = z΄ +
2
Note that z can have several variables. Calculate
ˆOLS and ˆIV using the standard formulas.
Asymptotic Problems when Corr(, z) 0
Bound, et al. show (and you should be able to show)
that:
plim(ˆOLS) = +
2
x
,x
plim(ˆIV) = +
2
xˆ
,xˆ , where xˆ = z΄π
In each expression, the second term shows the
potential inconsistency. If any of the variables in z
are correlated with , then xˆ will be correlated with ,
so that the numerator in the second term in plim(ˆ IV)
will not be zero. More importantly, the size of this
bias will depend on the denominator of that second
term. The better are the z’s at predicting x, the larger
will be the denominator in that term (total variance of
x is fixed, and the better the first-stage regression fits
the more the variance of x will be due to variance of
xˆ and the less will be due to the variance of ν).
So the weaker one’s instruments are, the greater the
inconsistency if Assumption 2SLS.1 (E[z] = 0) does
3
not hold. This result can be (partially) quantified by
defining the inconsistency in ˆOLS and ˆIV as follows:
Incons(ˆOLS) = - plim(ˆOLS)
Incons(ˆIV) = - plim(ˆIV)
Define the relative inconsistency of ˆIV (relative to
inconsistency in ˆOLS) as Incons(ˆIV)/Incons(ˆOLS).
Using the expressions for the “plim’s” of ˆOLS and
ˆIV given above, the relative inconsistency is:
Relative inconsistency of ˆIV =
R
/
2
,x
,x,xˆ
z
where R2x,z is the R-squared from regressing x on z.
In fact, it will often be the case that there are other x
variables. Assume that those other variables can all
be considered exogenous (uncorrelated with the error
term). Recall that efficient estimation implies that
they should all be used as instruments. In this case
you need to replace R2x,z in the above formula with
the “partial R2 coefficient”, which is calculated as
follows:
4
1. Regress the potentially endogenous variable in x
on all the other variables in x, and save the
residuals (call them ex).
2. Regress each of the variables in z that are not
part of the exogenous variables in x on the
exogenous variables in x, and save each set of
residuals (call them ez).
3. Regress ex on ez. The R2 from this regression
replaces R2x,z in the above formula.
This partial R2 coefficient measures the correlation
between the part of x (the endogenous variable) that
is not correlated with the other variables in x with the
parts of the z variables that are not correlated with the
other variables in x (the other x variables have been
“partialed out” of both x and z using a linear
projection).
To see how to use this formula, suppose that you
“guess” that the numerator is 0.1, that is that
instrumenting x reduces the covariance between x
and by 90%. This seems promising. But suppose
that your instruments are weak in that the R2 of the
regression of x on the instruments (z) is only 0.10. In
this case, the degree of inconsistency in IV is no
better than the degree of inconsistency in OLS.
5
Finite Sample Problems when Corr(, z) = 0
Suppose that your instruments are perfect in the sense
that E[| z] = 0. However, in finite samples weak
instruments can lead to bias.
First define 2 as ZZ (2, or 2/σ2, is sometimes
called the “concentration parameter”). The bigger 2
is, the bigger the variance in xˆ ; i.e. the better job the
instruments z do of predicting x. An approximation
of the bias in ˆIV in finite samples is:
(,/2)(K-2)
where , is the correlation between ε and ν, and K is
the number of instruments. This is valid only if K > 2.
Bound et al. show that 1/(1+2/K) is approximately
equal to the magnitude of finite sample bias in ˆIV
relative to ˆOLS. That is:
(bias in ˆIV)/(bias in ˆOLS) 1/(1+2/K)
Note that both estimators are biased in the same
direction, since τ2 (= ZZ) must be > 0.
6
It turns out that the F statistic in the first stage
regression (the F statistic in a regression of x on the
instruments z) is asymptotically distributed as
(1+2/K). This implies that:
(bias in ˆIV)/(bias in ˆOLS) 1/F(first-stage regression)
The “weakness” of instruments in explaining the
potentially endogenous variable can be measured by
this F-statistic. So, for example, if you get an F-
statistic not much bigger than 1 (e.g. 1.2), your IV
estimates will be almost as biased as your OLS
estimates. In contrast, if you get an F-statistic of 10,
the IV bias is only about one tenth of the OLS bias,
which would be a clear improvement.
Bound et al examined a recent study by Angrist and
Krueger (1991). They found evidence for bias even
though the sample size was huge (about 330,000
observations). In addition, they found that they could
get very similar results with instruments generated
from random numbers. The evidence that something
was wrong in the Angrist and Krueger results was the
low F-statistics in some of their results.
Practical Implications:
7
1. To check for finite sample bias when you
“know” that E[|Z] = 0, look at the F-test of a
regression of x on z. If it is close to 1 your
estimates may be very bad. If it is much higher,
at least 5 or 10, you don’t have much to worry
about in terms of small sample bias.
2. This F-test procedure to check for finite sample
bias applies to the simplest case, with only one x
variable. More generally, suppose that there are
several x variables but only one is potentially
endogenous. The appropriate F-test in this case is
one from a regression of the sole potentially
endogenous variable on the excluded instruments
only (the variables in z that are not part of the
other x variables). For more than one endogenous
variable, see Angrist & Pischke (2008, pp.217-8).
3. To check for possible inconsistency due to weak
correlation between and z, look at the R2 in the
regression of x on the instrumental variables z.
Compare this to a “guesstimate” of the reduction
in inconsistency brought about by the use of IV
estimation (i.e. look at
R zx
xx
2
,
,,ˆ / .) If you think
that this ratio is much closer to 0 than to 1, then
it is better to use IV than to use simple OLS.
8
4. Point 3 is for the simple case where there is only
one x variable. If there are other x variables, and
they are all assumed to be exogenous, then you
need to calculate the “partial R2” and replace
R2x,z with this in the formula given above.
5. Strictly speaking, this inconsistency test applies
only to the case where only one of the x
variables may be endogenous. The more general
case is discussed in Shea (1997).
Shea (1997) (checking for degree of inconsistency
when Corr(, z) ≠ 0)
The Bound et al. (1995) paper derived its two “tests”
based on a model in which only one of the
explanatory variables may be endogenous. Yet in
many cases we may have more than one explanatory
variable that may be endogenous. It turns out that the
“R2” test of Bound et al. can be misleading in this
case. To see this, consider a model with many x
variables in which more than one could be
endogenous:
y = x +
where x is a vector of K variables. Assume also that
we have z, a vector of L instrumental variables (some
9
of which could be variables in x that are “known” to
be uncorrelated with ε). “Adapting” the Bound et al.
recommendations, you could regress each potentially
endogenous variable in x on the instruments in z and
check the R2. (This is for the case where there are no
exogenous variables in x; if there are some
exogenous variables then you should calculate the
partial R2.) A small R2 is a sign of trouble. In this
case, one checks each variable in x separately.
To see the intuition for why this may be misleading,
consider the case where both x and z contain two
variables. Suppose that z1 is highly correlated with
both x1 and x2 but that z2 is completely uncorrelated
with both x1 and x2. This is clearly a bad situation
because there are two variables to instrument but
there is only one good instrument. In this case IV
estimation cannot be used because you need at least
as many “good” instruments as you have variables
that need instrumenting (why?). However,
“adapting” the Bound et al. R2 test will not catch this
problem because it looks at x1 and x2 separately.
Shea suggests the following approach. Consider the
regression:
y = x11 + x22 +
10
where x1 is the first variable in x and x2 is the other K-
1 variables in x. (Note: this allows for the possibility
that some or even all of the x variables are
endogenous.)
Switch to matrix notation. Define:
1
~x = x1 – X2(X2X2)-1(X2x1)
1xˆ = Z(ZZ)-1Zx1
2Xˆ = Z(ZZ)-1ZX2
1x = 1xˆ - 2Xˆ ( 2Xˆ 2Xˆ )-1( 2Xˆ 1xˆ )
The N×1 vector 1~x is the component of the N×1
vector x1 that is orthogonal to the N×(K-1) matrix X2,
while 1xˆ and 2Xˆ are linear projections of x1 and X2,
respectively, on the N×L matrix Z (i.e. they are least
squares predictions of x1 and X2 using the variables
in Z as the regressors). Finally, 1x is the component
of x1’s projection on Z that is orthogonal to X2’s
projection on Z. (Intuitively, 1x is the “ability” of Z
to predict x1 beyond its ability to predict X2.)
11
Suppose we estimate y = x11 + x22 + using IV
(2SLS) using z as instruments for x. One can show
using formulas for partitioning matrices that:
ˆ1(IV) = ( 1x 1x )-1( 1x y) = 1 + ( 1x 1x )-1( 1x )
One can go further to show:
plim(ˆ1(IV) - 1) =
plim(ˆ1(OLS) - 1)[Cov( 1x , )/ Cov( 1~x , )]/Rp2
where Rp2 is the square of the correlation between 1x
and 1~x .
It is clear from this equation that plim(ˆ1(IV) - 1) = 0
if z is uncorrelated with , because this implies that
Cov( 1x , ) = 0 (why?).
Now suppose that there is at least a little bit of
correlation between z and . If it is “just a little”,
then IV is still better than OLS even though it is not
quite consistent. In addition, Shea makes the point
that we have to make sure that for each potentially
endogenous variable (e.g. x1) we need to be sure that
there are some components in z that predict x1 that
are linearly independent of the components that are
12
needed to predict x2. This is what is measured by
Rp2.
Shea suggests the following steps, which you need to
carry out for each potentially endogenous variable in
x (each one has a “turn” to be x1):
1. Regress x1 on z. Save the fitted values 1xˆ .
2. Regress x1 on the other variables in x. Save the
residuals 1x~ .
3. Regress 1xˆ on the other variables in xˆ . Save the
residuals 1x .
4. Compute Rp2 as the square of the correlation
between 1x~ and 1x .
5. Shea also suggests a finite sample correction for
Rp2: 2pR = 1 – [((N-1)/(N-L))(1 – Rp
2)], where L
is the number of instruments (not just excluded
instruments, but all instruments) and N is the
number of observations.
6. Use this Rp2 in place of the “R2” of Bound et al.
13
Godfrey (1999, ReStat) pointed out two things about
Shea’s paper:
1. There is an error in Shea’s equations (6) and (7):
the expressions for the ’s are in fact the
variances, not the standard errors. (Note also
that Shea’s equation is somewhat misleading
because it assumes that the estimates of 2 from
OLS and IV are the same, whereas in fact they
will be different.)
2. Most importantly, there is an easier way to
calculate Shea’s Rp2:
Rp2 = 2
OLS
2
IV
)IV(1
)OLS(1
s
s
)ˆvar(
)ˆvar(
where 2OLSs =
N
1i
(yi - xiβˆOLS)2
2IVs =
N
1i
(yi - xiβˆ IV)2
One last point on Shea’s paper: Z could contain
elements of X that are “known” to be exogenous.
14
II. More Weak IV Results
A. Stock and Yogo (2005)
This paper focuses on the case where the instruments
are valid in the sense that they are uncorrelated with
the error term in the equation of interest. Thus it
focuses on bias in finite samples. It makes two
contributions:
1. It gives two distinct definitions of weak
instruments, and shows how test statistics differ
for those two definitions.
2. It considers the case of more than one
endogenous variable, and gives a more precise
procedure than that given in the seminar paper
by Staiger and Stock (Econometrica, 1997).
The Model
y = Y + X + u
Y = Z + X + V
where there are n variables in Y, K1 in X, and K2 in
Z, and the sample size is T.) For future reference,
define Y = [y Y] and Z = [X Z].
15
There is a clever trick that allows us to “partial out”
the X variables from both equations, which simplifies
the exposition. Let the superscript “┴” denote
residuals from the projection of any variable or
variables on X. For example Y┴ = MXY, where MX
= I – X(X′X)-1X′. You should be able to show that
the OLS estimator of β, which can be denoted as
βˆOLS, is given by:
βˆOLS = (Y┴′Y┴)-1Y┴y┴.
Next, to be very general we define the “k-class” set
of estimators of β, which includes βˆ 2SLS as well as
other estimators, as:
βˆ k-class = [Y┴′(I - kMZ)Y┴]-1Y┴′(I - kMZ)y┴
where k indicates the type of k-class estimator (e.g.
setting k = 1 yields βˆ 2SLS).
The Wald statistic to test the hypothesis that β = β0 is:
Wk-class =
classkuu,
0classk0classk
ˆn
)ˆ()k('[)'ˆ(
ββYMIYββ Z
where ˆuu,k-class = ( classkuˆ ′ classkuˆ )/(T – K1 – n),with
classkuˆ = y
┴ - Y┴βˆ k-class.
16
Note: Stock and Yogo consider four specific types of
estimators: 2SLS, LIML, modified LIML, and bias-
adjusted 2SLS. We will only consider 2SLS, so we
will be setting k = 1.
The Craig-Donald statistic is a function of GT,
which is defined as:
GT = ( 2/1ˆ VVΣ ′Y┴′PZ┴Y┴ 2/1ˆ VVΣ )/K2
where VVΣˆ = (Y′ ZM Y)/(T – K1 – K2) and PZ┴ =
Z┴(Z┴′Z┴)-1Z┴′.
More specifically, the Craig-Donaldson statistic,
which can be denoted as gmin, is given by the
minimum eigenvalue of the matrix GT:
gmin = mineval(GT)
While this is something of a nuisance to calculate, in the
special case of only one endogenous variable (only one
variable in Y), gmin is simply the F-statistic of the first
stage regression (regression of the sole Y variable on Z).
Staiger and Stock (1997) gave a “rule of thumb” that the
F-test should be ≥ 10, and more generally that gmin
should be ≥ 10. You often see reference to this in
empirical papers that use the F-test to test for weak IVs.
17
But this is too simple, which leads to the other
contribution of the Stock and Yogo paper.
Two Definitions of Weak IVs
1. A set of instruments is “bias weak” if the ratio of the
bias of the IV estimate over the bias of the OLS
estimate exceeds a certain value b, where 0 < b < 1.
Stock and Staiger used b = 0.10, without any
particular justification. In my opinion it could be
much larger, certainly 0.2 and perhaps even 0.5.
2. A set of instruments is “coverage weak” if the
conventional Wald test of size α (e.g. α = 0.05)
based on IV statistics has an actual size that exceeds
some threshold, r, where r > α (e.g. r = 0.10).
Final important note: When using either the F-statistic
(case of one variable in Y) or the somewhat more
troublesome gmin statistic (more than one variable in Y)
you cannot use standard F-test critical values. Instead
you need to use the values presented in Tables given in
Stock and Yogo (see e.g. Table 5.1 for the first definition
of weak IVs and Table 5.2 for the second definition).
18
B. Andrews and Stock (2007)
This paper is a nice review of the literature (at least of
the literature up to about 2006). One of the most
interesting points is that it argues that instead of testing
for weak IVs we should all simply use statistical tests
that are “robust” to weak IVs. This is analogous to the
recommendation that it is not worth testing for
heterogeneity in the error term, instead just use a
variance covariance matrix that is robust to
heteroscedasticity. Note that all of the following
discussion focuses on the case with only one endogenous
variable.
The general model in this paper is essentially the same as
that in Stock and Yogo, except that it is limited to the
case of one endogenous variable and the notation is
somewhat different:
y1 = y2 + X1 + u
y2 = Zπ + Xξ + v2
where there are n observations, y1, y2, u and v2 are n×1
columns vectors, β is a scalar, X is an n×p matrix
(including a constant term), and Z is an n×k matrix.
Note that X has already been “partialed out” of Z, so
19
each variable in Z has a mean of zero and Z′X = 0. For
now we assume that u and v2 are normally distributed.
Just Identified Model: Anderson Rubin (AR) test
If your model is just identified (only one instrument),
Andrews and Stock recommend using the test developed
by Anderson and Rubin (1949), or a modified version
that is robust to heteroscedasticity.
We want to test the null hypothesis that β = β0 for some β0. The AR test statistic is simply the standard F-test of
the hypothesis that κ = 0 in the following regression:
y1 - y2 = Zκ + X + u
Overidentified Model: Conditional Likelihood Ratio
(CLR) test
If your model has more than one interest, the AR test is
not as “powerful” in the sense that it is less likely to
reject the null hypothesis when it is false. The
“standard” conditional likelihood ratio (CLR) test has
more power, and is also robust to non-normal errors.
However, it is not robust to heteroscedasticity.
Fortunately, some modifications are robust to
heteroscedasticity. Unfortunately, the discussion of CLR
in Andrews and Stock (2007) is very unclear!
20
III. Generated Regressors (Wooldridge, Ch. 6, Sec. 1)
Consider a linear model in which one variable, q, is
missing from the data set:
y = x΄β + γq + u, where E[u| x, q] = 0
However, suppose that there is another data set that
has the variable q as well as some “instruments” w
that determine q. Assume as well that we know the
(possibly nonlinear) relationship by which w
determines q, but we do not know the parameters δ
that govern that relationship: That is:
q = f(w, δ), where f( ) is known, δ is unknown
Note that f( ) could be a nonlinear function.
You can estimate β and γ using a 2-step procedure if:
1. You can obtain consistent estimates of δ, and
2. Your original data set also includes all the
variables in w
This is done by using the consistent estimate of δ,
call it δˆ , to construct qˆ = f(w,δˆ). This qˆ can then be
21
used by regressing y on x and qˆ . The question is:
Under what conditions will this approach lead to
consistent estimates
本文档为【ln5 Instrumental Variables 2】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。