1
ApEc 8212 Econometric Analysis II -- Lecture #15
Tobits and Other Corner Solution Models
Readings: Wooldridge, Chapter 17 (Sections 1-8)
I. Introduction
Sometimes your y variable hits an upper or lower
bound for a large number of observations. For
example, you may be interested in estimating the
demand for tobacco products, using prices, income
and other variables as your explanatory variables.
Yet most people don’t buy tobacco products, so your
dependent variable, expenditures on tobacco
products, will be zero for most of your observations.
Wooldridge calls models for these situations corner
solution models. Sometimes they are called
“censored regression models”, but in fact the data are
not really censored. We will discuss “real” censoring
models in the next lecture.
You might ask: What’s wrong with just using linear
regression (i.e. assume that E[y| x] = x′β) when faced
with a corner solution? The problems are:
2
1. It does not make sense that E[y| x] is linear in x
(partial effects are constant) for values of x that
lead most y’s to be equal to zero.
2. For some values of x (and β) you can have E[y| x]
< 0, which would not make sense.
One approach is to try to fix this up with clever
functional forms, e.g. E[y| x] = ex′β. This would have
to be estimated using nonlinear least squares (NLS).
[Question: Why not just take the log of both sides?]
But Var[y| x] is likely to be heteroscedastic, which
means NLS is not efficient. Also, this approach does
not allow us to estimate some relationships of
interest, such as P[y = 0| x] and E[y| x, y > 0]. To
estimate such kinds of relationships we need to make
some more assumptions about the distribution of y
conditional on x.
The standard Type I Tobit model has the following
set-up:
yi = max(0, xi′β + ui)
ui| x ~ N(0, σ2)
3
Note that with the normality assumption we are
specifying the entire distribution of y conditional on
x. So another way to express this model is:
D(y| x) = Tobit(x′β, σ2)
This is just notation, where D(y| x) denotes the
distribution of y conditional on x.
Sometimes it is useful to express this in latent
variable form:
yi* = xi′β + ui, ui| x ~ N(0, σ2)
yi (observed y = max(0, yi*)
Yet this way of writing the Tobit model could be
misleading if we are not careful, since we really are
interested in E[y| x], not E[y*| x]
II. Some Very Useful Expected Value Formulas
Consider E[y| x] under the assumptions of a Type I
Tobit model. Assume that E[y| x] = 0. For any
function g(z) ≡ max(0, z), g(z) is convex in z, and
thus from Jensen’s inequality (applied to conditional
expectations) we have:
4
E[y| x] ≥ max(0, E[x′β + u| x]) = max(0, x′β)
Draw a picture to show this.
Using the assumption that u is independent of x and
is normally distributed, we can write E[y| x] as:
E[y| x] = P[y = 0| x]×0 + P[y > 0| x]×E[y| x, y > 0]
= P[y > 0| x]×E[y| x, y > 0]
What is P[y > 0| x]? Define w = 1 if y > 0, and w = 0
if y = 0. Then:
P[y > 0| x] = P[y* > 0| x] = P[u > -x′β| x]
= P[u/σ > - x′β/σ] = Φ(x′β/σ)
To solve for E[y| x, y > 0], use the following result
for any variable z that is distributed as N(0, 1):
E[z| z > c] =
)c(1
)c(
More generally, for any variable u that is distributed
as N(0, σ2), we have:
5
E[u| u > c] = σ
)/c(1
)/c(
Thus, noting that y = y* when y > 0, we have:
E[y| x, y > 0] = x′β + E[u| u > -x′β] = x′β + σ
)/'(
)/'(
βx
βx
[Note that Φ(x′β/σ) = 1 - Φ(-x′β/σ).]
So E[y| x] = Φ(x′β/σ)E[y| x, y>0] = Φ(x′β/σ)x′β + σ(x′β/σ).
Denote λ(c) = (c)/Φ(c) for any c. This function λ(c)
is called the inverse Mill’s ratio.
For any continuous xj we have:
jx
]0y,|y[E
x = βj + σ[∂λ(x′β/σ)/∂xj]
= βj{1 - λ(x′β/σ)[x′β/σ + λ(x′β/σ)]}
You can show that {1 - λ(x′β/σ)[x′β/σ + λ(x′β/σ)]}
lies between 0 and 1 (a good homework problem?),
so the above expression is smaller (in absolute value)
that ∂y*/∂xj (= βj).
6
If xj is a dummy variable, the best way to calculate
the impact of a change in xj on E[y| x, y > 0] is to
show how its value changes when xj changes from
zero to one.
What about ∂E[y| x]/∂xj? When xj is continuous, we
just need to add terms that account for the change in
the probability that y > 0 when xj changes:
jx
]|y[E
x =
jx
]|0y[P
x E[y| x, y > 0] + P[y > 0| x]
jx
]0y,|y[E
x
= Φ(x′β/σ)βj
When xj is a dummy variable, just show the change in
E[y| x] when it is evaluated at xj = 1 and at xj = 0.
Finally, note that for two different variables, xj and
xk, the ratios {∂E[y| x, y >0]/∂xj}/{∂E[y| x, y >0]/∂xk}
and {∂E[y| x]/∂xj}/{∂E[y| x]/∂xk}, that is the ratios of
the relative effects, simply equal βj/βk.
Wooldridge explains (pp.674-675) why Tobit
coefficients are typically larger than OLS
coefficients. (The intuition can be seen by looking at
the drawing.)
7
III. Estimation and Inference of Tobit Model
Standard Tobits are estimated using maximum
likelihood methods. The probability that y = 0,
conditional on x, is 1 - P[yi > 0| xi]:
P[yi = 0| xi] = 1 – Φ(xi′β)
The density (probability) of y when y > 0 is the same
for y and y*: f(y| xi) = f(y*| xi). The assumption that
yi*| xi ~ N(xi′β, σ2) implies that:
f(yi*| xi) = (1/σ)((yi* - xi′β)/σ)
The likelihood function for yi is thus:
Li(β,σ) = f(yi| xi; β,σ) = [1 - Φ(xi′β/σ)]1[yi=0][(1/σ)((yi - xi′β)/σ)]1[yi>0]
where 1[ ] is an indicator function that = 1 if the term
in brackets is true and = 0 if it is false. As usual, it is
convenient to work in logs of the likelihood function:
ℓi(β, σ) = 1[yi = 0]ln[1 - Φ(xi′β/σ)]
+ 1[yi > 0]{ln[((yi - xi′β)/σ)] – ln(σ2)/2}
= 1[yi = 0]ln[1 - Φ(xi′β/σ)] - 1[yi > 0]{(yi - xi′β)2/2σ2] + ln(σ2)/2}
8
(dropping the constant term that comes from writing
out ( )). The derivatives of the likelihood function
with respect to β and σ2 are:
∂ℓi(β, σ)/∂β = -1[yi = 0](xi′β/σ)xi/[1 - Φ(xi′β/σ)]
+ 1[yi > 0](yi - xi′β)xi/σ2
∂ℓi(β, σ)/∂σ2 = 1[yi = 0](xi′β/σ)xi′β/{2σ2[1 - Φ(xi′β/σ)]}
+ 1[yi > 0]{(yi - xi′β)2/2σ4] – 1/(2σ2)}
You then use some optimization method (see Chapter
12, section 7, of Wooldridge) to find the values of β
and σ2 that set these derivatives equal to zero.
To get the covariance matrix for your estimated values
of the parameters (call them βˆML and 2ˆ ML) you need
to calculate the expected value of the Hessian matrix
(matrix of second derivatives). This is:
-E[Hi(β, σ2)| xi] = A(xi; β, σ2) =
iii
iiiii
c'b
b'a
x
xxx
where: ai = -(1/σ2){xi′(β/σ) i – [ i2/(1 – Φi)] – Φi}
bi = (1/σ3){[xi′(β/σ)]2 i + i - [xi′(β/σ) i2/(1 - Φi)]}/2
9
ci = -(1/σ4){[xi′(β/σ)]3 i + xi′(β/σ) i - [xi′(β/σ) i2/(1 - Φi)] – 2Φi}/4
and i = (xi′β/σ), Φi = Φ(xi′β/σ). [Homework?]
Finally, the asymptotic variance for βˆML and 2ˆ ML is:
Avaˆ r(βˆML, 2ˆ ML) = [
N
1i
Aˆ (xi; βˆML, 2ˆ ML)]-1
Testing of parameter restrictions can be done using
the Wald, Lagrange multiplier (LM) or likelihood
ratio (LR) tests using the same approach used for the
logit and probit models. Again, for nonlinear
restrictions the easiest is usually the Wald test.
IV. Reporting Tobit Results
We are primarily interested in βˆML and its covariance
matrix (especially the standard errors). It is also
useful to report the derivatives of E[y| x, y > 0] and
E[y| x] for each of the elements of βˆML, averaged by
either using x or taking the averages of these
derivatives across the sample. As usual, if you have
an x variable that is a dummy variable, it is best to
report how E[y| x, y > 0] and E[y| x] change when the
dummy variable is changed from 0 to 1. Finally,
always report the value of the log likelihood function.
10
V. Specification Issues in Tobit Models
The Tobit model has many assumptions, and if the
assumptions fail the estimates will be inconsistent. In
recent years many economists and others have
criticized the use of Tobits because their assumptions
are likely to be violated. In this section we examine
how easy it is to relax specific assumptions.
Heterogeneity
Suppose that you have an unobserved variable, q, that
affects y but is independent of all of the (observed) x
variables. This model is:
y = max(0, x′β + γq + u) u| x, q ~ N(0, σ2)
Let q| x ~ N(0, τ2). Note that q is independent of x.
This specification simply increases the variance of
the error term in the Tobit model, and standard Tobit
estimation will estimate β consistently, as well as the
variance of the sum of γq and u: σ2 + γ2τ2. So
heterogeneity of this sort, which is independent of x,
is not a problem.
For Tobits we often want to estimate the expected
value of y (not y*) given x, and the expected value of
y given x and y > 0. What if we want to condition on
11
q, that is to estimate E[y| x, q] and E[y| x, q, y > 0]?
In general, you cannot estimate these expected values
conditioning on q. That is, you can only estimate
E[y| x] and E[y| x, y > 0].
In general, we need to assume that q is normally
distributed, has a constant variance, and is
uncorrelated with all the variables in x.
Another form of heterogeneity is a model with:
y = q×max(0, x′β + u)
where q ≥ 0 and q is independent of x and u. See
Wooldridge, p.681, for a brief discussion.
Endogenous Explanatory Variables
Suppose that one of the explanatory variables, call it
y2, is correlated with the error term. Let the model be:
y1 = max(0, z1′δ1 + α1y2 + u1)
y2 = z′δ2 + v2 = z1′δ21 + z2′δ22 + v2
where u1 and v2 are normally distributed (with means
of 0) and are independent of the z variables.
12
Question: Does the correlation between y2 and u1
imply anything about the correlation of u1 and v2?
To estimate this model when y2 is correlated with u1,
we need some instruments, i.e. we need δ22 ≠ 0.
As usual, we want to estimate is δ1 and α1. However,
we are also interested in estimating the (average)
partial effects, so we need to estimate σu2 as well.
[Recall that E[y| x] = Φ(x′β/σ)x′β + σ(x′β/σ) and
E[y| x, y = 0] = x′β + σλ(x′β/σ).]
Smith and Blundell (1986) proposed a 2-step method
to test for endogeneity of y2. To start, note that if u1
and v2 are jointly normally distributed then:
u1 = θ1v2 + e1
where θ1 = Cov(u1, v2)/Var(v2) ≡ η1/τ22, and e1 is
normally distributed, independent of v2 and has some
variance, call it τ12. Note that since both u1 and v2 are
independent of the z’s, then so is e1.
Insert this expression for u1 into the equation for y1:
y1 = max(0, z1′δ1 + α1y2 + θ1v2 + e1)
13
where e1| z, v2 ~ N(0, τ12). Since y2 = z′δ2 + v2, e1 is
also independent of y2. Thus if we could observe v2
we could use standard Tobit estimation to estimate δ1
and α1 (and θ1).
The method proposed by Smith and Blundell is to
estimate the equation for y2 by OLS to obtain an
estimate of v2, and insert that estimate into the
equation for y1. More specifically, the method is:
1. Estimate the y2 equation using OLS.
2. Estimate v2 as vˆ 2 = y2 - z′δˆ2(OLS).
3. Use standard Tobit methods to estimate the
equation for y1, with z1, y2 and vˆ 2 as regressors.
The last step gives consistent estimators for δ1, α1, θ1
and τ12.
The standard t-statistic for θ1 is a valid test for the
null hypothesis that θ1 = 0 (i.e. the two error terms
are not correlated). In fact, this test doesn’t even
need the assumption that v2 is normally distributed,
because if θ1 = 0 then v2 doesn’t even belong in the
equation for y1.
14
If θ1 ≠ 0 then the standard errors (and test statistics)
given by the standard Tobit procedure for the y1
equation are incorrect because they do not account
for the fact that vˆ 2 is an estimate of v2. The correct
standard errors can be derived using general methods
for two-step estimators (see Wooldridge, Chapter
12). Even here it is not necessary for v2 to be
normally distributed; all you need is for u1 to be
normally distributed conditional on z and v2.
To calculate average partial effects (APEs) we need
an estimate of σu2; see p.683 of Wooldridge for an
explanation of how to do this.
Three other points to note are:
1. You can also use general maximum likelihood
methods to deal with endogenous regressors, but
this is computationally more cumbersome.
2. If y2 is a dummy variable, then this does not
work well (see p.533 of Wooldridge).
3. This procedure can be extended to the case of
two or more endogenous variables. See
Wooldridge, p.685.
15
Heteroscedasticity and Non-normality
In the standard Tobit model, yi* = xi′β + ui, the error
term ui is assumed to be normal and homoscedastic
(constant variance). If the error term is not normally
distributed or is heteroscedastic, then maximum
likelihood estimates for β will be inconsistent.
Unfortunately, neither of these assumptions is likely
to be true, so this is a serious problem.
The problem is not only that we get inconsistent
estimates of β but also that our derivations for
E[y| x, y > 0] and E[y| x] are incorrect even if we
have the correct β, because those derivations use the
assumptions that u is normally distributed and
homoscedastic.
The 1st thing to do is: test whether these assumptions
are violated. To test for heteroscedasticity, let the
alternative assumption (H1) be Var[u| x] = σ2ex1′δ,
where x1 is a Q×1 column vector that contains some
of the elements of x. [Question: Does it make sense
for x1 to include a constant term?]
The null hypothesis is H0: δ = 0. Since it may be
hard to estimate the unrestricted model, let’s try using
a Lagrange multiplier (LM) test, which only requires
estimates of the restricted model. Recall that this is
16
also called the “score” test because it is based on
evaluating the scores (first derivatives) of the
likelihood function. These are given above (p.8).
We also need the derivatives with respect to δ. You
should be able to show that ∂ℓi(β, σ, δ)/∂δ = σ2x1i∂ℓi(β, σ)/∂σ2. Using the results in Chapter 13
(section 6) of Wooldridge, you can test for
heteroscedasticity by regressing a constant term on
all of the scores:
Regress 1 on ∂ ˆ i(β, σ)/∂β, ∂ ˆ i(β, σ)/∂σ2 and σ2x1i∂ ˆ i(β, σ)/∂σ2
where ˆ indicates that the scores are computed using
the restricted estimates for β and σ2. Under the null
hypothesis (H0), N - SSR0
a
~ χQ2, where SSR0 is the
standard sum of the squared residuals from this
regression.
Unfortunately, simulation studies have shown that,
for finite samples, this test has a tendency to
“overreject” the null even when it is true. Thus the
best thing to do is write out the likelihood function
with this heteroscedasticity in it, estimate both this
unrestricted likelihood function and the restricted
version, and do a likelihood ratio test. [A good
homework problem for after spring break.]
17
It is also possible to test whether u is normally
distributed. Wooldridge doesn’t show the details, but
Greene (2008) shows this on pp.880-881.
If you do reject homoscedasticity, just estimate the
unrestricted likelihood function (which in fact you
will have already done if you did a likelihood ratio
test). You can also work out how to modify E[y| x]
and E[y| x, y > 0]. (Another homework problem.)
Semiparametric (Conditional Median) Approaches
If you are willing to estimate the median of y given
x, instead of the usual mean of y given x, there is an
approach to estimating Tobit type models that does
not require you to specify the distribution of the error
term. In most cases, the median and mean will be
close; for example, if the error term is symmetric
they will be the same.
The “modified” Tobit model is:
y* = x′β + u, Med[u| x] = 0
This implies that Med[y*| x] = x′β, that is the median
is a linear function of x.
Question: Suppose that u is symmetric. What is the
relationship between Med[y*| x] and E[y*| x]?
18
In general, for all nondecreasing functions g(z),
Med[g(z)] = g(Med[z]). [Question: Does this same
property hold for E[z]?] Since y = max(0, y*) is a
nondecreasing function, we have:
Med[y| x] = max(0, Med[y*| x]) = max(0, x′β)
We saw (briefly) in the lecture on M-estimation that
LAD (least absolute deviations) is a useful method to
estimate the parameters of a conditional median.
Thus we can estimate β without any additional
assumptions on u by choosing β to minimize:
β
min
N
1i
|yi – max(0, xi′β)|
This gives a (root N) consistent estimate of β. It is
also asymptotically normal.
Note that Med[y| x] is very different from E[y| x]
when y* is close to 0 or < 0 (draw a picture). The
LAD estimator does not have any way to estimate
E[y| x] or E[y| x, y > 0].
19
VI. Alternatives to Tobit
In the standard Tobit model, the same process
determines whether y = 0 or y > 0 and the value of y
if it is > 0. A generalization of the Type I Tobit
model allows for separate processes. These are
sometimes called hurdle models or two-tiered
models.
Here is a simple example:
Prob[y = 0| x] = 1 – Φ(x′γ)
Log(y)| x, y > 0 ~ N(x′β, σ2)
Let w be a variable that = 1 if y > 0 and = 0
otherwise. Then the (conditional) density of
observed y is:
f(y| x) = Prob[w = 0| x]f(y| x, w = 0) + Prob[w = 1| x]f(y| x, w = 1)
= 1[y = 0]×[1 - Φ(x′γ)] + 1[y > 0]×Φ(x′γ)[{log(y) - x′β}/σ]/(yσ)
This follows because [{log(y) - x′β}/σ]/(yσ) is the
density of y when y follows a lognormal distribution
with a mean of x′β and a variance of σ2.
20
To estimate this by maximum likelihood, this can be
expressed as:
f(y| x;β,γ,σ) = [1 - Φ(x′γ)]1[y = 0]{Φ(x′γ)[{log(y) - x′β}/σ]/(yσ)}1[y > 0]
The log likelihood of this for observation i is:
ℓi(β, γ, σ) = 1[yi = 0]log[1 - Φ(xi′γ)] +
1[yi>0]{log[Φ(xi′γ)] - log(yi) - (½)log(σ2) - (½)[log(yi)-xi′β]2/σ2}
where the term -(½)log(2π) has been dropped since it
is a constant.
Note that the MLE of γ is just a probit, and the MLE
of β is just the OLS estimate of log(y) on x using the
observations for which y > 0. Finally, a consistent
estimate of σ is the standard estimate from the OLS
estimate of β (square root of (1/N)
0y (log(yi)-
xi′βˆOLS)2).
Finally, the expected values are:
E[y| x, y > 0] = exp(x′β + σ2/2)
E[y| x] = Φ(x′γ)exp(x′β + σ2/2)
21
It turns out that it is difficult to test this specification
against the standard Tobit specification, since the
standard Tobit is not a special case of this.
VII. Censored Regressions for Panel Data
Tobits can be applied to panel data, but you have to
be careful about the assumptions that you make, since
different assumptions imply different methods to
obtain consistent estimates and “correct” variance-
covariance matrices for those estimates.
Pooled Tobit
As with logits and probits, a simple “pooled Tobit”
can be used on panel data. The model is
characterized by the following assumptions:
yit = max(0, xit + uit), t = 1, 2, … T
uit| xit ~ N(0, σ2)
There are two important points about this set-up:
1. It does not assume strict exogeneity, so it is
“OK” if uit is correlated with yis for s ≠ t. This
22
means that it is OK for x to contain lagged
values of y (e.g. xit could contain yi,t-1).
2. It allows for serial dependence in the uit’s; that
is Cov(uit, uis) is not required to be 0 for s ≠ t.
This model can be estimated by maximizing the
partial log likelihood function:
N
i 1
T
t 11[yit = 0]ln[1 - Φ(xit′β/σ)] - 1[yit > 0]{(yit - xit′β)2/2σ2] + ln(σ2)/2}
This is not efficient, but it will give a c
本文档为【ln15 Tobits and Other Corner Solution Models】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。