ln5 Instrumental Variables 2

ln5 Instrumental Variables 2 1 ApEc 8212 Econometric Analysis --- Lecture #5 Instrumental Variables (Part 2) I. Problems with Weak Instruments in IV Estimates The justification for all the IV formulas in Lecture 4 is asymptotic; sample sizes have to be “large” for IV t...

1 ApEc 8212 Econometric Analysis --- Lecture #5 Instrumental Variables (Part 2) I. Problems with Weak Instruments in IV Estimates The justification for all the IV formulas in Lecture 4 is asymptotic; sample sizes have to be “large” for IV to work. In small samples, IV estimates tend toward OLS estimates as the number of instruments goes up. (They are identical if the number of instruments = the number of observations.) “Weak” IVs, that is IVs with low predictive power for endogenous variables, have several problems. This lecture reviews this issue. A recent review of most (but not all) of the material in this lecture is Cameron and Trivedi (2004, sect. 4.9). Bound, Jaeger and Baker (1995) Consider a simple IV model with only one x variable, which you suspect is endogenous. For simplicity, assume that the means of y, x and z equal zero, so that we can ignore the constant terms: y = βx +  x = z΄ +  2 Note that z can have several variables. Calculate ˆOLS and ˆIV using the standard formulas. Asymptotic Problems when Corr(, z)  0 Bound, et al. show (and you should be able to show) that: plim(ˆOLS) =  +    2 x ,x plim(ˆIV) =  +    2 xˆ ,xˆ , where xˆ = z΄π In each expression, the second term shows the potential inconsistency. If any of the variables in z are correlated with , then xˆ will be correlated with , so that the numerator in the second term in plim(ˆ IV) will not be zero. More importantly, the size of this bias will depend on the denominator of that second term. The better are the z’s at predicting x, the larger will be the denominator in that term (total variance of x is fixed, and the better the first-stage regression fits the more the variance of x will be due to variance of xˆ and the less will be due to the variance of ν). So the weaker one’s instruments are, the greater the inconsistency if Assumption 2SLS.1 (E[z] = 0) does 3 not hold. This result can be (partially) quantified by defining the inconsistency in ˆOLS and ˆIV as follows: Incons(ˆOLS) =  - plim(ˆOLS) Incons(ˆIV) =  - plim(ˆIV) Define the relative inconsistency of ˆIV (relative to inconsistency in ˆOLS) as Incons(ˆIV)/Incons(ˆOLS). Using the expressions for the “plim’s” of ˆOLS and ˆIV given above, the relative inconsistency is: Relative inconsistency of ˆIV = R / 2 ,x ,x,xˆ z   where R2x,z is the R-squared from regressing x on z. In fact, it will often be the case that there are other x variables. Assume that those other variables can all be considered exogenous (uncorrelated with the error term). Recall that efficient estimation implies that they should all be used as instruments. In this case you need to replace R2x,z in the above formula with the “partial R2 coefficient”, which is calculated as follows: 4 1. Regress the potentially endogenous variable in x on all the other variables in x, and save the residuals (call them ex). 2. Regress each of the variables in z that are not part of the exogenous variables in x on the exogenous variables in x, and save each set of residuals (call them ez). 3. Regress ex on ez. The R2 from this regression replaces R2x,z in the above formula. This partial R2 coefficient measures the correlation between the part of x (the endogenous variable) that is not correlated with the other variables in x with the parts of the z variables that are not correlated with the other variables in x (the other x variables have been “partialed out” of both x and z using a linear projection). To see how to use this formula, suppose that you “guess” that the numerator is 0.1, that is that instrumenting x reduces the covariance between x and  by 90%. This seems promising. But suppose that your instruments are weak in that the R2 of the regression of x on the instruments (z) is only 0.10. In this case, the degree of inconsistency in IV is no better than the degree of inconsistency in OLS. 5 Finite Sample Problems when Corr(, z) = 0 Suppose that your instruments are perfect in the sense that E[| z] = 0. However, in finite samples weak instruments can lead to bias. First define 2 as ZZ (2, or 2/σ2, is sometimes called the “concentration parameter”). The bigger 2 is, the bigger the variance in xˆ ; i.e. the better job the instruments z do of predicting x. An approximation of the bias in ˆIV in finite samples is: (,/2)(K-2) where , is the correlation between ε and ν, and K is the number of instruments. This is valid only if K > 2. Bound et al. show that 1/(1+2/K) is approximately equal to the magnitude of finite sample bias in ˆIV relative to ˆOLS. That is: (bias in ˆIV)/(bias in ˆOLS)  1/(1+2/K) Note that both estimators are biased in the same direction, since τ2 (= ZZ) must be > 0. 6 It turns out that the F statistic in the first stage regression (the F statistic in a regression of x on the instruments z) is asymptotically distributed as (1+2/K). This implies that: (bias in ˆIV)/(bias in ˆOLS)  1/F(first-stage regression) The “weakness” of instruments in explaining the potentially endogenous variable can be measured by this F-statistic. So, for example, if you get an F- statistic not much bigger than 1 (e.g. 1.2), your IV estimates will be almost as biased as your OLS estimates. In contrast, if you get an F-statistic of 10, the IV bias is only about one tenth of the OLS bias, which would be a clear improvement. Bound et al examined a recent study by Angrist and Krueger (1991). They found evidence for bias even though the sample size was huge (about 330,000 observations). In addition, they found that they could get very similar results with instruments generated from random numbers. The evidence that something was wrong in the Angrist and Krueger results was the low F-statistics in some of their results. Practical Implications: 7 1. To check for finite sample bias when you “know” that E[|Z] = 0, look at the F-test of a regression of x on z. If it is close to 1 your estimates may be very bad. If it is much higher, at least 5 or 10, you don’t have much to worry about in terms of small sample bias. 2. This F-test procedure to check for finite sample bias applies to the simplest case, with only one x variable. More generally, suppose that there are several x variables but only one is potentially endogenous. The appropriate F-test in this case is one from a regression of the sole potentially endogenous variable on the excluded instruments only (the variables in z that are not part of the other x variables). For more than one endogenous variable, see Angrist & Pischke (2008, pp.217-8). 3. To check for possible inconsistency due to weak correlation between  and z, look at the R2 in the regression of x on the instrumental variables z. Compare this to a “guesstimate” of the reduction in inconsistency brought about by the use of IV estimation (i.e. look at R zx xx 2 , ,,ˆ /  .) If you think that this ratio is much closer to 0 than to 1, then it is better to use IV than to use simple OLS. 8 4. Point 3 is for the simple case where there is only one x variable. If there are other x variables, and they are all assumed to be exogenous, then you need to calculate the “partial R2” and replace R2x,z with this in the formula given above. 5. Strictly speaking, this inconsistency test applies only to the case where only one of the x variables may be endogenous. The more general case is discussed in Shea (1997). Shea (1997) (checking for degree of inconsistency when Corr(, z) ≠ 0) The Bound et al. (1995) paper derived its two “tests” based on a model in which only one of the explanatory variables may be endogenous. Yet in many cases we may have more than one explanatory variable that may be endogenous. It turns out that the “R2” test of Bound et al. can be misleading in this case. To see this, consider a model with many x variables in which more than one could be endogenous: y = x +  where x is a vector of K variables. Assume also that we have z, a vector of L instrumental variables (some 9 of which could be variables in x that are “known” to be uncorrelated with ε). “Adapting” the Bound et al. recommendations, you could regress each potentially endogenous variable in x on the instruments in z and check the R2. (This is for the case where there are no exogenous variables in x; if there are some exogenous variables then you should calculate the partial R2.) A small R2 is a sign of trouble. In this case, one checks each variable in x separately. To see the intuition for why this may be misleading, consider the case where both x and z contain two variables. Suppose that z1 is highly correlated with both x1 and x2 but that z2 is completely uncorrelated with both x1 and x2. This is clearly a bad situation because there are two variables to instrument but there is only one good instrument. In this case IV estimation cannot be used because you need at least as many “good” instruments as you have variables that need instrumenting (why?). However, “adapting” the Bound et al. R2 test will not catch this problem because it looks at x1 and x2 separately. Shea suggests the following approach. Consider the regression: y = x11 + x22 +  10 where x1 is the first variable in x and x2 is the other K- 1 variables in x. (Note: this allows for the possibility that some or even all of the x variables are endogenous.) Switch to matrix notation. Define: 1 ~x = x1 – X2(X2X2)-1(X2x1) 1xˆ = Z(ZZ)-1Zx1 2Xˆ = Z(ZZ)-1ZX2 1x = 1xˆ - 2Xˆ ( 2Xˆ  2Xˆ )-1( 2Xˆ  1xˆ ) The N×1 vector 1~x is the component of the N×1 vector x1 that is orthogonal to the N×(K-1) matrix X2, while 1xˆ and 2Xˆ are linear projections of x1 and X2, respectively, on the N×L matrix Z (i.e. they are least squares predictions of x1 and X2 using the variables in Z as the regressors). Finally, 1x is the component of x1’s projection on Z that is orthogonal to X2’s projection on Z. (Intuitively, 1x is the “ability” of Z to predict x1 beyond its ability to predict X2.) 11 Suppose we estimate y = x11 + x22 +  using IV (2SLS) using z as instruments for x. One can show using formulas for partitioning matrices that: ˆ1(IV) = ( 1x  1x )-1( 1x y) = 1 + ( 1x  1x )-1( 1x ) One can go further to show: plim(ˆ1(IV) - 1) = plim(ˆ1(OLS) - 1)[Cov( 1x , )/ Cov( 1~x , )]/Rp2 where Rp2 is the square of the correlation between 1x and 1~x . It is clear from this equation that plim(ˆ1(IV) - 1) = 0 if z is uncorrelated with , because this implies that Cov( 1x , ) = 0 (why?). Now suppose that there is at least a little bit of correlation between z and . If it is “just a little”, then IV is still better than OLS even though it is not quite consistent. In addition, Shea makes the point that we have to make sure that for each potentially endogenous variable (e.g. x1) we need to be sure that there are some components in z that predict x1 that are linearly independent of the components that are 12 needed to predict x2. This is what is measured by Rp2. Shea suggests the following steps, which you need to carry out for each potentially endogenous variable in x (each one has a “turn” to be x1): 1. Regress x1 on z. Save the fitted values 1xˆ . 2. Regress x1 on the other variables in x. Save the residuals 1x~ . 3. Regress 1xˆ on the other variables in xˆ . Save the residuals 1x . 4. Compute Rp2 as the square of the correlation between 1x~ and 1x . 5. Shea also suggests a finite sample correction for Rp2: 2pR = 1 – [((N-1)/(N-L))(1 – Rp 2)], where L is the number of instruments (not just excluded instruments, but all instruments) and N is the number of observations. 6. Use this Rp2 in place of the “R2” of Bound et al. 13 Godfrey (1999, ReStat) pointed out two things about Shea’s paper: 1. There is an error in Shea’s equations (6) and (7): the expressions for the ’s are in fact the variances, not the standard errors. (Note also that Shea’s equation is somewhat misleading because it assumes that the estimates of 2 from OLS and IV are the same, whereas in fact they will be different.) 2. Most importantly, there is an easier way to calculate Shea’s Rp2: Rp2 = 2 OLS 2 IV )IV(1 )OLS(1 s s )ˆvar( )ˆvar(   where 2OLSs =  N 1i (yi - xiβˆOLS)2 2IVs =  N 1i (yi - xiβˆ IV)2 One last point on Shea’s paper: Z could contain elements of X that are “known” to be exogenous. 14 II. More Weak IV Results A. Stock and Yogo (2005) This paper focuses on the case where the instruments are valid in the sense that they are uncorrelated with the error term in the equation of interest. Thus it focuses on bias in finite samples. It makes two contributions: 1. It gives two distinct definitions of weak instruments, and shows how test statistics differ for those two definitions. 2. It considers the case of more than one endogenous variable, and gives a more precise procedure than that given in the seminar paper by Staiger and Stock (Econometrica, 1997). The Model y = Y + X + u Y = Z + X + V where there are n variables in Y, K1 in X, and K2 in Z, and the sample size is T.) For future reference, define Y = [y Y] and Z = [X Z]. 15 There is a clever trick that allows us to “partial out” the X variables from both equations, which simplifies the exposition. Let the superscript “┴” denote residuals from the projection of any variable or variables on X. For example Y┴ = MXY, where MX = I – X(X′X)-1X′. You should be able to show that the OLS estimator of β, which can be denoted as βˆOLS, is given by: βˆOLS = (Y┴′Y┴)-1Y┴y┴. Next, to be very general we define the “k-class” set of estimators of β, which includes βˆ 2SLS as well as other estimators, as: βˆ k-class = [Y┴′(I - kMZ)Y┴]-1Y┴′(I - kMZ)y┴ where k indicates the type of k-class estimator (e.g. setting k = 1 yields βˆ 2SLS). The Wald statistic to test the hypothesis that β = β0 is: Wk-class = classkuu, 0classk0classk ˆn )ˆ()k('[)'ˆ(        ββYMIYββ Z where ˆuu,k-class = ( classkuˆ ′ classkuˆ )/(T – K1 – n),with  classkuˆ = y ┴ - Y┴βˆ k-class. 16 Note: Stock and Yogo consider four specific types of estimators: 2SLS, LIML, modified LIML, and bias- adjusted 2SLS. We will only consider 2SLS, so we will be setting k = 1. The Craig-Donald statistic is a function of GT, which is defined as: GT = ( 2/1ˆ VVΣ ′Y┴′PZ┴Y┴ 2/1ˆ VVΣ )/K2 where VVΣˆ = (Y′ ZM Y)/(T – K1 – K2) and PZ┴ = Z┴(Z┴′Z┴)-1Z┴′. More specifically, the Craig-Donaldson statistic, which can be denoted as gmin, is given by the minimum eigenvalue of the matrix GT: gmin = mineval(GT) While this is something of a nuisance to calculate, in the special case of only one endogenous variable (only one variable in Y), gmin is simply the F-statistic of the first stage regression (regression of the sole Y variable on Z). Staiger and Stock (1997) gave a “rule of thumb” that the F-test should be ≥ 10, and more generally that gmin should be ≥ 10. You often see reference to this in empirical papers that use the F-test to test for weak IVs. 17 But this is too simple, which leads to the other contribution of the Stock and Yogo paper. Two Definitions of Weak IVs 1. A set of instruments is “bias weak” if the ratio of the bias of the IV estimate over the bias of the OLS estimate exceeds a certain value b, where 0 < b < 1. Stock and Staiger used b = 0.10, without any particular justification. In my opinion it could be much larger, certainly 0.2 and perhaps even 0.5. 2. A set of instruments is “coverage weak” if the conventional Wald test of size α (e.g. α = 0.05) based on IV statistics has an actual size that exceeds some threshold, r, where r > α (e.g. r = 0.10). Final important note: When using either the F-statistic (case of one variable in Y) or the somewhat more troublesome gmin statistic (more than one variable in Y) you cannot use standard F-test critical values. Instead you need to use the values presented in Tables given in Stock and Yogo (see e.g. Table 5.1 for the first definition of weak IVs and Table 5.2 for the second definition). 18 B. Andrews and Stock (2007) This paper is a nice review of the literature (at least of the literature up to about 2006). One of the most interesting points is that it argues that instead of testing for weak IVs we should all simply use statistical tests that are “robust” to weak IVs. This is analogous to the recommendation that it is not worth testing for heterogeneity in the error term, instead just use a variance covariance matrix that is robust to heteroscedasticity. Note that all of the following discussion focuses on the case with only one endogenous variable. The general model in this paper is essentially the same as that in Stock and Yogo, except that it is limited to the case of one endogenous variable and the notation is somewhat different: y1 = y2 + X1 + u y2 = Zπ + Xξ + v2 where there are n observations, y1, y2, u and v2 are n×1 columns vectors, β is a scalar, X is an n×p matrix (including a constant term), and Z is an n×k matrix. Note that X has already been “partialed out” of Z, so 19 each variable in Z has a mean of zero and Z′X = 0. For now we assume that u and v2 are normally distributed. Just Identified Model: Anderson Rubin (AR) test If your model is just identified (only one instrument), Andrews and Stock recommend using the test developed by Anderson and Rubin (1949), or a modified version that is robust to heteroscedasticity. We want to test the null hypothesis that β = β0 for some β0. The AR test statistic is simply the standard F-test of the hypothesis that κ = 0 in the following regression: y1 - y2 = Zκ + X + u Overidentified Model: Conditional Likelihood Ratio (CLR) test If your model has more than one interest, the AR test is not as “powerful” in the sense that it is less likely to reject the null hypothesis when it is false. The “standard” conditional likelihood ratio (CLR) test has more power, and is also robust to non-normal errors. However, it is not robust to heteroscedasticity. Fortunately, some modifications are robust to heteroscedasticity. Unfortunately, the discussion of CLR in Andrews and Stock (2007) is very unclear! 20 III. Generated Regressors (Wooldridge, Ch. 6, Sec. 1) Consider a linear model in which one variable, q, is missing from the data set: y = x΄β + γq + u, where E[u| x, q] = 0 However, suppose that there is another data set that has the variable q as well as some “instruments” w that determine q. Assume as well that we know the (possibly nonlinear) relationship by which w determines q, but we do not know the parameters δ that govern that relationship: That is: q = f(w, δ), where f( ) is known, δ is unknown Note that f( ) could be a nonlinear function. You can estimate β and γ using a 2-step procedure if: 1. You can obtain consistent estimates of δ, and 2. Your original data set also includes all the variables in w This is done by using the consistent estimate of δ, call it δˆ , to construct qˆ = f(w,δˆ). This qˆ can then be 21 used by regressing y on x and qˆ . The question is: Under what conditions will this approach lead to consistent estimates

                    本文档为【ln5 Instrumental Variables 2】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

ln5 Instrumental Variables 2

你可能还喜欢