ln15 Tobits and Other Corner Solution Models

ln15 Tobits and Other Corner Solution Models 1 ApEc 8212 Econometric Analysis II -- Lecture #15 Tobits and Other Corner Solution Models Readings: Wooldridge, Chapter 17 (Sections 1-8) I. Introduction Sometimes your y variable hits an upper or lower bound for a large number of observat...

1 ApEc 8212 Econometric Analysis II -- Lecture #15 Tobits and Other Corner Solution Models Readings: Wooldridge, Chapter 17 (Sections 1-8) I. Introduction Sometimes your y variable hits an upper or lower bound for a large number of observations. For example, you may be interested in estimating the demand for tobacco products, using prices, income and other variables as your explanatory variables. Yet most people don’t buy tobacco products, so your dependent variable, expenditures on tobacco products, will be zero for most of your observations. Wooldridge calls models for these situations corner solution models. Sometimes they are called “censored regression models”, but in fact the data are not really censored. We will discuss “real” censoring models in the next lecture. You might ask: What’s wrong with just using linear regression (i.e. assume that E[y| x] = x′β) when faced with a corner solution? The problems are: 2 1. It does not make sense that E[y| x] is linear in x (partial effects are constant) for values of x that lead most y’s to be equal to zero. 2. For some values of x (and β) you can have E[y| x] < 0, which would not make sense. One approach is to try to fix this up with clever functional forms, e.g. E[y| x] = ex′β. This would have to be estimated using nonlinear least squares (NLS). [Question: Why not just take the log of both sides?] But Var[y| x] is likely to be heteroscedastic, which means NLS is not efficient. Also, this approach does not allow us to estimate some relationships of interest, such as P[y = 0| x] and E[y| x, y > 0]. To estimate such kinds of relationships we need to make some more assumptions about the distribution of y conditional on x. The standard Type I Tobit model has the following set-up: yi = max(0, xi′β + ui) ui| x ~ N(0, σ2) 3 Note that with the normality assumption we are specifying the entire distribution of y conditional on x. So another way to express this model is: D(y| x) = Tobit(x′β, σ2) This is just notation, where D(y| x) denotes the distribution of y conditional on x. Sometimes it is useful to express this in latent variable form: yi* = xi′β + ui, ui| x ~ N(0, σ2) yi (observed y = max(0, yi*) Yet this way of writing the Tobit model could be misleading if we are not careful, since we really are interested in E[y| x], not E[y*| x] II. Some Very Useful Expected Value Formulas Consider E[y| x] under the assumptions of a Type I Tobit model. Assume that E[y| x] = 0. For any function g(z) ≡ max(0, z), g(z) is convex in z, and thus from Jensen’s inequality (applied to conditional expectations) we have: 4 E[y| x] ≥ max(0, E[x′β + u| x]) = max(0, x′β) Draw a picture to show this. Using the assumption that u is independent of x and is normally distributed, we can write E[y| x] as: E[y| x] = P[y = 0| x]×0 + P[y > 0| x]×E[y| x, y > 0] = P[y > 0| x]×E[y| x, y > 0] What is P[y > 0| x]? Define w = 1 if y > 0, and w = 0 if y = 0. Then: P[y > 0| x] = P[y* > 0| x] = P[u > -x′β| x] = P[u/σ > - x′β/σ] = Φ(x′β/σ) To solve for E[y| x, y > 0], use the following result for any variable z that is distributed as N(0, 1): E[z| z > c] = )c(1 )c(   More generally, for any variable u that is distributed as N(0, σ2), we have: 5 E[u| u > c] = σ      )/c(1 )/c( Thus, noting that y = y* when y > 0, we have: E[y| x, y > 0] = x′β + E[u| u > -x′β] = x′β + σ )/'( )/'(   βx βx [Note that Φ(x′β/σ) = 1 - Φ(-x′β/σ).] So E[y| x] = Φ(x′β/σ)E[y| x, y>0] = Φ(x′β/σ)x′β + σ(x′β/σ). Denote λ(c) = (c)/Φ(c) for any c. This function λ(c) is called the inverse Mill’s ratio. For any continuous xj we have: jx ]0y,|y[E   x = βj + σ[∂λ(x′β/σ)/∂xj] = βj{1 - λ(x′β/σ)[x′β/σ + λ(x′β/σ)]} You can show that {1 - λ(x′β/σ)[x′β/σ + λ(x′β/σ)]} lies between 0 and 1 (a good homework problem?), so the above expression is smaller (in absolute value) that ∂y*/∂xj (= βj). 6 If xj is a dummy variable, the best way to calculate the impact of a change in xj on E[y| x, y > 0] is to show how its value changes when xj changes from zero to one. What about ∂E[y| x]/∂xj? When xj is continuous, we just need to add terms that account for the change in the probability that y > 0 when xj changes: jx ]|y[E   x = jx ]|0y[P   x E[y| x, y > 0] + P[y > 0| x] jx ]0y,|y[E   x = Φ(x′β/σ)βj When xj is a dummy variable, just show the change in E[y| x] when it is evaluated at xj = 1 and at xj = 0. Finally, note that for two different variables, xj and xk, the ratios {∂E[y| x, y >0]/∂xj}/{∂E[y| x, y >0]/∂xk} and {∂E[y| x]/∂xj}/{∂E[y| x]/∂xk}, that is the ratios of the relative effects, simply equal βj/βk. Wooldridge explains (pp.674-675) why Tobit coefficients are typically larger than OLS coefficients. (The intuition can be seen by looking at the drawing.) 7 III. Estimation and Inference of Tobit Model Standard Tobits are estimated using maximum likelihood methods. The probability that y = 0, conditional on x, is 1 - P[yi > 0| xi]: P[yi = 0| xi] = 1 – Φ(xi′β) The density (probability) of y when y > 0 is the same for y and y*: f(y| xi) = f(y*| xi). The assumption that yi*| xi ~ N(xi′β, σ2) implies that: f(yi*| xi) = (1/σ)((yi* - xi′β)/σ) The likelihood function for yi is thus: Li(β,σ) = f(yi| xi; β,σ) = [1 - Φ(xi′β/σ)]1[yi=0][(1/σ)((yi - xi′β)/σ)]1[yi>0] where 1[ ] is an indicator function that = 1 if the term in brackets is true and = 0 if it is false. As usual, it is convenient to work in logs of the likelihood function: ℓi(β, σ) = 1[yi = 0]ln[1 - Φ(xi′β/σ)] + 1[yi > 0]{ln[((yi - xi′β)/σ)] – ln(σ2)/2} = 1[yi = 0]ln[1 - Φ(xi′β/σ)] - 1[yi > 0]{(yi - xi′β)2/2σ2] + ln(σ2)/2} 8 (dropping the constant term that comes from writing out ( )). The derivatives of the likelihood function with respect to β and σ2 are: ∂ℓi(β, σ)/∂β = -1[yi = 0](xi′β/σ)xi/[1 - Φ(xi′β/σ)] + 1[yi > 0](yi - xi′β)xi/σ2 ∂ℓi(β, σ)/∂σ2 = 1[yi = 0](xi′β/σ)xi′β/{2σ2[1 - Φ(xi′β/σ)]} + 1[yi > 0]{(yi - xi′β)2/2σ4] – 1/(2σ2)} You then use some optimization method (see Chapter 12, section 7, of Wooldridge) to find the values of β and σ2 that set these derivatives equal to zero. To get the covariance matrix for your estimated values of the parameters (call them βˆML and 2ˆ ML) you need to calculate the expected value of the Hessian matrix (matrix of second derivatives). This is: -E[Hi(β, σ2)| xi] = A(xi; β, σ2) =    iii iiiii c'b b'a x xxx where: ai = -(1/σ2){xi′(β/σ) i – [ i2/(1 – Φi)] – Φi} bi = (1/σ3){[xi′(β/σ)]2 i +  i - [xi′(β/σ) i2/(1 - Φi)]}/2 9 ci = -(1/σ4){[xi′(β/σ)]3 i + xi′(β/σ) i - [xi′(β/σ) i2/(1 - Φi)] – 2Φi}/4 and  i = (xi′β/σ), Φi = Φ(xi′β/σ). [Homework?] Finally, the asymptotic variance for βˆML and 2ˆ ML is: Avaˆ r(βˆML, 2ˆ ML) = [ N 1i Aˆ (xi; βˆML, 2ˆ ML)]-1 Testing of parameter restrictions can be done using the Wald, Lagrange multiplier (LM) or likelihood ratio (LR) tests using the same approach used for the logit and probit models. Again, for nonlinear restrictions the easiest is usually the Wald test. IV. Reporting Tobit Results We are primarily interested in βˆML and its covariance matrix (especially the standard errors). It is also useful to report the derivatives of E[y| x, y > 0] and E[y| x] for each of the elements of βˆML, averaged by either using x or taking the averages of these derivatives across the sample. As usual, if you have an x variable that is a dummy variable, it is best to report how E[y| x, y > 0] and E[y| x] change when the dummy variable is changed from 0 to 1. Finally, always report the value of the log likelihood function. 10 V. Specification Issues in Tobit Models The Tobit model has many assumptions, and if the assumptions fail the estimates will be inconsistent. In recent years many economists and others have criticized the use of Tobits because their assumptions are likely to be violated. In this section we examine how easy it is to relax specific assumptions. Heterogeneity Suppose that you have an unobserved variable, q, that affects y but is independent of all of the (observed) x variables. This model is: y = max(0, x′β + γq + u) u| x, q ~ N(0, σ2) Let q| x ~ N(0, τ2). Note that q is independent of x. This specification simply increases the variance of the error term in the Tobit model, and standard Tobit estimation will estimate β consistently, as well as the variance of the sum of γq and u: σ2 + γ2τ2. So heterogeneity of this sort, which is independent of x, is not a problem. For Tobits we often want to estimate the expected value of y (not y*) given x, and the expected value of y given x and y > 0. What if we want to condition on 11 q, that is to estimate E[y| x, q] and E[y| x, q, y > 0]? In general, you cannot estimate these expected values conditioning on q. That is, you can only estimate E[y| x] and E[y| x, y > 0]. In general, we need to assume that q is normally distributed, has a constant variance, and is uncorrelated with all the variables in x. Another form of heterogeneity is a model with: y = q×max(0, x′β + u) where q ≥ 0 and q is independent of x and u. See Wooldridge, p.681, for a brief discussion. Endogenous Explanatory Variables Suppose that one of the explanatory variables, call it y2, is correlated with the error term. Let the model be: y1 = max(0, z1′δ1 + α1y2 + u1) y2 = z′δ2 + v2 = z1′δ21 + z2′δ22 + v2 where u1 and v2 are normally distributed (with means of 0) and are independent of the z variables. 12 Question: Does the correlation between y2 and u1 imply anything about the correlation of u1 and v2? To estimate this model when y2 is correlated with u1, we need some instruments, i.e. we need δ22 ≠ 0. As usual, we want to estimate is δ1 and α1. However, we are also interested in estimating the (average) partial effects, so we need to estimate σu2 as well. [Recall that E[y| x] = Φ(x′β/σ)x′β + σ(x′β/σ) and E[y| x, y = 0] = x′β + σλ(x′β/σ).] Smith and Blundell (1986) proposed a 2-step method to test for endogeneity of y2. To start, note that if u1 and v2 are jointly normally distributed then: u1 = θ1v2 + e1 where θ1 = Cov(u1, v2)/Var(v2) ≡ η1/τ22, and e1 is normally distributed, independent of v2 and has some variance, call it τ12. Note that since both u1 and v2 are independent of the z’s, then so is e1. Insert this expression for u1 into the equation for y1: y1 = max(0, z1′δ1 + α1y2 + θ1v2 + e1) 13 where e1| z, v2 ~ N(0, τ12). Since y2 = z′δ2 + v2, e1 is also independent of y2. Thus if we could observe v2 we could use standard Tobit estimation to estimate δ1 and α1 (and θ1). The method proposed by Smith and Blundell is to estimate the equation for y2 by OLS to obtain an estimate of v2, and insert that estimate into the equation for y1. More specifically, the method is: 1. Estimate the y2 equation using OLS. 2. Estimate v2 as vˆ 2 = y2 - z′δˆ2(OLS). 3. Use standard Tobit methods to estimate the equation for y1, with z1, y2 and vˆ 2 as regressors. The last step gives consistent estimators for δ1, α1, θ1 and τ12. The standard t-statistic for θ1 is a valid test for the null hypothesis that θ1 = 0 (i.e. the two error terms are not correlated). In fact, this test doesn’t even need the assumption that v2 is normally distributed, because if θ1 = 0 then v2 doesn’t even belong in the equation for y1. 14 If θ1 ≠ 0 then the standard errors (and test statistics) given by the standard Tobit procedure for the y1 equation are incorrect because they do not account for the fact that vˆ 2 is an estimate of v2. The correct standard errors can be derived using general methods for two-step estimators (see Wooldridge, Chapter 12). Even here it is not necessary for v2 to be normally distributed; all you need is for u1 to be normally distributed conditional on z and v2. To calculate average partial effects (APEs) we need an estimate of σu2; see p.683 of Wooldridge for an explanation of how to do this. Three other points to note are: 1. You can also use general maximum likelihood methods to deal with endogenous regressors, but this is computationally more cumbersome. 2. If y2 is a dummy variable, then this does not work well (see p.533 of Wooldridge). 3. This procedure can be extended to the case of two or more endogenous variables. See Wooldridge, p.685. 15 Heteroscedasticity and Non-normality In the standard Tobit model, yi* = xi′β + ui, the error term ui is assumed to be normal and homoscedastic (constant variance). If the error term is not normally distributed or is heteroscedastic, then maximum likelihood estimates for β will be inconsistent. Unfortunately, neither of these assumptions is likely to be true, so this is a serious problem. The problem is not only that we get inconsistent estimates of β but also that our derivations for E[y| x, y > 0] and E[y| x] are incorrect even if we have the correct β, because those derivations use the assumptions that u is normally distributed and homoscedastic. The 1st thing to do is: test whether these assumptions are violated. To test for heteroscedasticity, let the alternative assumption (H1) be Var[u| x] = σ2ex1′δ, where x1 is a Q×1 column vector that contains some of the elements of x. [Question: Does it make sense for x1 to include a constant term?] The null hypothesis is H0: δ = 0. Since it may be hard to estimate the unrestricted model, let’s try using a Lagrange multiplier (LM) test, which only requires estimates of the restricted model. Recall that this is 16 also called the “score” test because it is based on evaluating the scores (first derivatives) of the likelihood function. These are given above (p.8). We also need the derivatives with respect to δ. You should be able to show that ∂ℓi(β, σ, δ)/∂δ = σ2x1i∂ℓi(β, σ)/∂σ2. Using the results in Chapter 13 (section 6) of Wooldridge, you can test for heteroscedasticity by regressing a constant term on all of the scores: Regress 1 on ∂ ˆ i(β, σ)/∂β, ∂ ˆ i(β, σ)/∂σ2 and σ2x1i∂ ˆ i(β, σ)/∂σ2 where ˆ indicates that the scores are computed using the restricted estimates for β and σ2. Under the null hypothesis (H0), N - SSR0 a ~ χQ2, where SSR0 is the standard sum of the squared residuals from this regression. Unfortunately, simulation studies have shown that, for finite samples, this test has a tendency to “overreject” the null even when it is true. Thus the best thing to do is write out the likelihood function with this heteroscedasticity in it, estimate both this unrestricted likelihood function and the restricted version, and do a likelihood ratio test. [A good homework problem for after spring break.] 17 It is also possible to test whether u is normally distributed. Wooldridge doesn’t show the details, but Greene (2008) shows this on pp.880-881. If you do reject homoscedasticity, just estimate the unrestricted likelihood function (which in fact you will have already done if you did a likelihood ratio test). You can also work out how to modify E[y| x] and E[y| x, y > 0]. (Another homework problem.) Semiparametric (Conditional Median) Approaches If you are willing to estimate the median of y given x, instead of the usual mean of y given x, there is an approach to estimating Tobit type models that does not require you to specify the distribution of the error term. In most cases, the median and mean will be close; for example, if the error term is symmetric they will be the same. The “modified” Tobit model is: y* = x′β + u, Med[u| x] = 0 This implies that Med[y*| x] = x′β, that is the median is a linear function of x. Question: Suppose that u is symmetric. What is the relationship between Med[y*| x] and E[y*| x]? 18 In general, for all nondecreasing functions g(z), Med[g(z)] = g(Med[z]). [Question: Does this same property hold for E[z]?] Since y = max(0, y*) is a nondecreasing function, we have: Med[y| x] = max(0, Med[y*| x]) = max(0, x′β) We saw (briefly) in the lecture on M-estimation that LAD (least absolute deviations) is a useful method to estimate the parameters of a conditional median. Thus we can estimate β without any additional assumptions on u by choosing β to minimize: β min  N 1i |yi – max(0, xi′β)| This gives a (root N) consistent estimate of β. It is also asymptotically normal. Note that Med[y| x] is very different from E[y| x] when y* is close to 0 or < 0 (draw a picture). The LAD estimator does not have any way to estimate E[y| x] or E[y| x, y > 0]. 19 VI. Alternatives to Tobit In the standard Tobit model, the same process determines whether y = 0 or y > 0 and the value of y if it is > 0. A generalization of the Type I Tobit model allows for separate processes. These are sometimes called hurdle models or two-tiered models. Here is a simple example: Prob[y = 0| x] = 1 – Φ(x′γ) Log(y)| x, y > 0 ~ N(x′β, σ2) Let w be a variable that = 1 if y > 0 and = 0 otherwise. Then the (conditional) density of observed y is: f(y| x) = Prob[w = 0| x]f(y| x, w = 0) + Prob[w = 1| x]f(y| x, w = 1) = 1[y = 0]×[1 - Φ(x′γ)] + 1[y > 0]×Φ(x′γ)[{log(y) - x′β}/σ]/(yσ) This follows because [{log(y) - x′β}/σ]/(yσ) is the density of y when y follows a lognormal distribution with a mean of x′β and a variance of σ2. 20 To estimate this by maximum likelihood, this can be expressed as: f(y| x;β,γ,σ) = [1 - Φ(x′γ)]1[y = 0]{Φ(x′γ)[{log(y) - x′β}/σ]/(yσ)}1[y > 0] The log likelihood of this for observation i is: ℓi(β, γ, σ) = 1[yi = 0]log[1 - Φ(xi′γ)] + 1[yi>0]{log[Φ(xi′γ)] - log(yi) - (½)log(σ2) - (½)[log(yi)-xi′β]2/σ2} where the term -(½)log(2π) has been dropped since it is a constant. Note that the MLE of γ is just a probit, and the MLE of β is just the OLS estimate of log(y) on x using the observations for which y > 0. Finally, a consistent estimate of σ is the standard estimate from the OLS estimate of β (square root of (1/N) 0y (log(yi)- xi′βˆOLS)2). Finally, the expected values are: E[y| x, y > 0] = exp(x′β + σ2/2) E[y| x] = Φ(x′γ)exp(x′β + σ2/2) 21 It turns out that it is difficult to test this specification against the standard Tobit specification, since the standard Tobit is not a special case of this. VII. Censored Regressions for Panel Data Tobits can be applied to panel data, but you have to be careful about the assumptions that you make, since different assumptions imply different methods to obtain consistent estimates and “correct” variance- covariance matrices for those estimates. Pooled Tobit As with logits and probits, a simple “pooled Tobit” can be used on panel data. The model is characterized by the following assumptions: yit = max(0, xit + uit), t = 1, 2, … T uit| xit ~ N(0, σ2) There are two important points about this set-up: 1. It does not assume strict exogeneity, so it is “OK” if uit is correlated with yis for s ≠ t. This 22 means that it is OK for x to contain lagged values of y (e.g. xit could contain yi,t-1). 2. It allows for serial dependence in the uit’s; that is Cov(uit, uis) is not required to be 0 for s ≠ t. This model can be estimated by maximizing the partial log likelihood function: N i 1 T t 11[yit = 0]ln[1 - Φ(xit′β/σ)] - 1[yit > 0]{(yit - xit′β)2/2σ2] + ln(σ2)/2} This is not efficient, but it will give a c

                    本文档为【ln15 Tobits and Other Corner Solution Models】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

ln15 Tobits and Other Corner Solution Models

你可能还喜欢