1
ApEc 8212 Econometric Analysis --- Lecture #1
Conditional Expectations and Related Concepts
Website for typos in (1st edition of) Wooldridge book:
http://www.msu.edu/~ec/faculty/wooldridge/book2.htm
I. Introductory Remarks
There are four main uses of econometrics and statistics:
1. To measure the characteristics of interesting
economic variables (for example, the degree of
income inequality), and the correlations between
different variables of interest (for example, the
correlation in the returns to two investments).
2. To test the validity of economic theories.
3. To quantify the magnitudes (and direction) of
economic relationships and parameters.
4. To forecast/predict future values of variables.
Work on the 2nd and 3rd (and maybe the 4th) tasks soon
leads to attempts to measure causal relationships. To
avoid confusion, researchers must be very careful, and
very explicit, when they analyze data. You should start
simple and then go on to more complicated problems.
2
In this class we will use Wooldridge’s textbook,
which is somewhat more rigorous than Greene’s
textbook. In his Chapter 1, Wooldridge explains why
he wants assumptions to be made in terms of the
population from which the data were drawn, as
opposed to making assumptions about the data.
Greene does both. An example in which Greene
makes an assumption about the data is his assumption
on p.20 that E[εi| X] = 0. Both εi and X refer to the
data at hand, not the population from which the data
were drawn. Greene starts out this way because it is
easier to show some proofs of results, but as his
textbook continues he moves more into the direction
of making assumptions about the population. For
example, all asymptotic results rely on allowing the
sample size to go to infinity, which basically expands
the data to include the entire population and thus
amounts to making assumptions about the population.
Consistent with this approach, all of Wooldridge’s
discussion is asymptotic, whereas Greene starts with
finite sample properties of estimation methods and
then later covers asymptotic results.
A final implication of Wooldridge’s approach is that
explanatory variables (usually denoted by “x”) are
always treated as variables, never as fixed constants.
3
II. Conditional Expectations in Econometrics
Most econometric analysis involves estimating a
conditional expectation, that is the expected value of
a “dependent” (explained) variable, denoted by y,
conditional on a set of “explanatory” (control, or
independent) variables, denoted by the vector x:
E[y| x]
The familiar linear model is an important example. If
y = β′x + ε and E[ε| x] = 0 then E[y| x] = β′x.
In most econometric settings, economic theory either
assumes or demonstrates that x causes y. Thus data
on y and x from a large enough sample allow for
calculation of E[y| x] for many values of x, which
provides information on the causal effect of x on y
for a wide range of x. Wooldridge call these causal
relationships structural conditional expectations.
But many problems can arise in trying to estimate
causal relationships, such as measurement error in x,
feedback from y to x, or non-random samples (all
will be discussed in detail in this class). To get
around these problems, identification assumptions
are needed to estimate the structural relationships
(structural conditional expectations).
4
III. “Features” of Conditional Expectations
Let y be a random variable (the “explained” variable).
Let x be a K×1 vector of “explanatory” variables. As
long as E[|y|]<∞, there exists a function μ(x) such that:
E[y| x] = μ(x)
The function μ(x) is the expected (“average”) value
of y for a particular set of explanatory variables (x).
Since x is a vector of random variables, μ(x) is also a
random variable.
A special case of this is the standard linear model in
econometrics. If y = β′x + ε and E[ε| x] = 0 then
E[y| x] = μ(x) = β′x.
Partial Effects, Elasticities and Semielasticities
Assume that μ(x) is differentiable. If an element of x,
call it xj, is continuous, the effect of a small increase in
xj, conditional on the other variables in x, is approxi-
mated by the derivative of μ(x) with respect to xj:
ΔjE[y| x] ≈
jx
)(
x Δxj (I added j subscript to Δ)
Note: ∂μ(x)/∂xj is called the partial effect of xj on E[y| x].
5
Note that if xj is not continuous but takes only a few
values, the partial effect is calculated at specific
changes (pairs) of those values. For example, if xj is
a dummy variable (equals either 0 or 1) the partial
effect is calculated as:
ΔjE[y| x] = μ(x1, … xj-1, 1, xj+1, … xK) - μ(x1, … xj-1, 0, xj+1, … xK)
Sometimes we want to know the elasticity of y with
respect to xj, that is (∂y/∂xj)(xj/y). The (partial)
elasticity of E[y| x] with respect to xj, controlling for
all other variables in x, is:
jx
]|y[E
x ·
]|y[E
x j
x
=
jx
)(
x ·
)(
x j
x ( = )xlog(
])|y[Elog(
j
x )
(The 2nd equality holds only if E[y| x] > 0 and xj > 0.)
Question: Does
)xlog(
])|y[Elog(
j
x =
)xlog(
)]|y[log(E
j
x ?
Answer: In general, NO. But it is yes if our model is
log(y) = g(x) + u, and we assume that u and x are
independent (this may be a homework problem).
A final useful concept is the percentage change in y
when x increases by one unit:
6
100 ·
jx
]|y[E
x ·
]|y[E
1
x
= 100 ·
jx
])|y[Elog(
x
This way of expressing the causal impact of xj on y
(which is defined only if E[y| x] > 0) is called the
semi-elasticity of E[y| x] with respect to xj. Note
that, unlike elasticities, semi-elasticities have “units”.
Error Form of Models of Conditional Expectations
What is the difference between the variable y and the
conditional expectation of y (conditional on x)? To
see, decompose y into its (conditional) expected
value and an error term:
y = E[y| x] + u = μ(x) + u, where E[u| x] = 0
This way of expressing y is not really assuming
anything, it is just defining u as y - E[y| x]. Note that
E[u| x] = 0 follows from this definition of u because
taking the expectations (conditional on x) of both
sides of y = E[y| x] + u yields E[u| x] = 0 (since
E[E[y| x]| x] = E[y| x], as seen below).
Three other things to note are:
1. The error term u is uncorrelated with any
function of the variables in x.
7
2. E[u| x] = 0 implies that E[u] = 0. (see below)
3. In applying econometric models to a particular
data set we cannot use the result that E[u| x] = 0
to “prove” that u is uncorrelated with x in some
data set you may have. The above result is a
definition of u, but in our data the unobserved
variables that make up the “real” u may be
correlated with the variables in x. In other
words, μ(x) may not be a causal relationship
(E[y|x] may not be a structural conditional
expectation).
A simple example illustrates this last point. Suppose
that the causal (structural) determinants of wages are:
log(wage) = β0 + β1educ + β2 IQ + u
and E[u| educ, IQ] = 0. We want to estimate β1.
Suppose that IQ causes schooling:
educ = γ0 + γ1IQ
(for simplicity, I have not added an error term).
OLS can be used to obtain consistent (“unbiased”)
estimates of β1 if you have data on wages, educ
(years of schooling) and IQ.
8
What if you do not have data on IQ? Then the only
conditional expectation you can estimate is:
E[log(wage)| educ]
= β0 + β1educ + β2E[IQ| educ] + E[u| educ]
= β0 + β1educ + β2(educ – γ0)/γ1 + 0
= (β0 – γ0/γ1) + (β1 + β2/γ1)educ
Regressing log(wage) on years of schooling only will
estimate (β1 + β2/γ1), not β1. Even so, we can always
define a (nonstructural or noncausal) conditional
expectation relationship, E[log(wage)| educ] =
(β0 – γ0/γ1) + (β1 + β2/γ1)educ, and we can always add
an error term to it (call it v):
log(wage) = (β0 - γ0/γ1) + (β1 + β2/γ1)educ + v
where by definition E[v| educ] = 0.
Clearly, the fact that E[v| educ] = 0 does not imply
that the conditional expectation E[log(wage)| educ] is
a structural conditional expectation, and it does not
imply that regressing log(wage) on educ will estimate
the causal impact of educ (years of schooling) on
wages.
9
IV. Some Properties of Conditional Expectations
This section presents some results of conditional
expectations that will be used in later lectures.
Linearity of Conditional Expectations
This one is very useful (and was already used above).
Let a1(x), a2(x), … aG(x) be scalar functions of x (a
vector of random variables), and let y1, y2, … yG be
any (scalar) random variables (not just some
“dependent” variables). Then:
E[(
G
1j aj(x)yj + b(x))| x] =
G
1j aj(x)E[yj| x] + b(x)
as long as E[|yj|] < ∞, E[|aj(x)yj|] < ∞ and E[|b(x)|] < ∞.
In Wooldridge, this is the property CE.1 in Appendix
2A (p.30)
Note that a special case of this is when all the a( )
functions are constants and there is no b(x) function.
This gives a very useful result that we will use a lot:
E[
G
1j ajyj| x] =
G
1j ajE[yj| x]
10
Law of Iterated Expectations (LIE)
Let y be a random variable and let w be a vector of
random variables. Let x be another vector of random
variables that is a function of w: i.e. x = f(w) for
some function f( ). [Note: One example is that x is
simply a subset of the variables in w.] That is, if we
“know” w then, using f( ), we “know” x. But it is
not necessarily true that if we “know” x then we
“know” w. That is, w contains at least as much, and
possibly more, “information” than x. This implies:
E[y| x] = E[E[y| w]| x]
This is the Law of Iterated Expectations. Another
way to express it: define μ1(w) = E[y| w] and define μ2(x) = E[y| x]; then E[μ1(w)| x] = μ2(x). The intuition
is that “filtering” w through x in E[E[y| w]| x] “loses”
all the information in w that is not in x.
This is Property CE.3 in Appendix 2A of Wooldridge.
Another Useful Result
The following is also true of conditional expectations:
E[y| x] = E[E[y| x]| w]
11
This is very similar to LIE, but this time the “first”
conditioning is on the smaller information set and
“second” conditioning is on the larger information
set. Intuitively, “filtering” x through w does not give
any more information than was already in x.
One way to remember both LIE and this result is the
“rule”: The smaller information set dominates.
Implications of LIE
A useful special case of LIE occurs when w is {x, z}:
E[y| x] = E[E[y| x, z]| x]
Define μ1(x, z) ≡ [E[y| x, z]] and μ2(x) ≡ E[y| x]. Then:
μ2(x) = E[μ1(x, z)| x]
An econometric example is that sometimes we want
to know E[y| x, z], which allows us (assuming that
this is a structural conditional expectation) to
calculate the impact of some variable xj on y holding
both x and z constant. If we have no data on z but
we have data on y and x, this special case of LIE
shows us the relationship between what we can
estimate, E[y| x], and the causal relation, E[y| x, z].
12
If you know the functional form of μ1(x, z), the
above special case shows that you can obtain μ2(x) by
integrating μ1(x, z) over z (conditional on x), but in
many cases obtaining μ2(x) is even easier.
Example. Consider the following structural (causal)
conditional expectation:
E[y| x1, x2, z] = β0 + β1x1 + β2x2 + β3z
If z is not observed, by LIE (CE.3) and linearity of
conditional expectations (CE.1) we have:
E[y| x1, x2] = E[β0 + β1x1 + β2x2 + β3z| x1, x2]
= β0 + β1x1 + β2x2 + β3E[z| x1, x2]
Suppose that E[z| x1, x2] is linear in x1 and x2, in
particular that E[z| x1, x2] = δ0 + δ1x1 + δ2x2. Then:
E[y| x1, x2] = β0 + β1x1 + β2x2 + β3(δ0 + δ1x1 + δ2x2)
= (β0+ β3δ0) + (β1 + β3δ1)x1 + (β2 + β3δ2)x2
Thus, if you estimate the expected value of y
conditional on x1 and x2 (i.e. regress y on x1 and x2),
you will not obtain estimates of the structural
(causal) relationship between the x variables and y.
This is the problem of omitted variable bias.
13
Another useful implication of LIE is the
following. Let f(x) be a (vector) function and let g( )
be a (scalar) function such that E[y| x] = g(f(x)).
Then:
E[y| f(x)] = E[y| x] = g(f(x))
This is property CE.4 in Appendix A.2 of Wooldridge.
The intuition is that if E[y| x] = g(f(x)) then all of the
“information” in x that predicts y is contained in f(x),
which implies that E[y| x] = E[y| f(x)]. To prove this
taking the conditional expectations of E[y| x] = g(f(x)):
E[E[y| x]| f(x)] = E[g(f(x)) | f(x)]
E[y| f(x)] = g(f(x))
LIE implies that E[E[y| x]| f(x)] = E[y| f(x)], where w
in LIE is x here and x (= f(w)) in LIE is f(x) here.
Another way to express this is to define z ≡ f(x).
Then E[y| x] = g(f(x)) implies that E[y| z] = g(z).
Note that z can have either a larger or a smaller
number of variables than x.
14
Example. Consider a wage equation:
E[wage| educ, exper] = β0 + β1educ + β2exper
+ β3exper2 + β4educ·exper
This is g(f(x)); x = {educ, exper}. What is f(x)?
Thus CE.4 implies that…
E[wage| educ, exper, exper2, educ·exper] =
β0 + β1educ + β2exper + β3exper2 + β4educ·exper
which is the same as E[wage| educ, exper]. Thus, once
we condition on educ and exper (x), it is redundant to
condition on functions of those variables (f(x)).
For linear models, a more general result holds.
Assume, for some functions g1(x), …gM(x), we have:
E[y| x] = β0 + β1g1(x) + β2g2(x) + … + βMgM(x)
This is a very flexible model, since all of the x
variables appear in all of the g( ) functions.
Next, define z1≡ g1(x), … zM≡ gM(x). Then the last
implication of LIE discussed above implies that:
15
E[y| z1, z2, … zM] = β0 + β1z1 + β2z2 + … + βMzM
That is, any conditional expectation that is linear in
parameters and some complicated functions is also
linear in some conditioning variables. More
importantly, we can write the above expression as:
y = β0 + β1z1 + β2z2 + … + βMzM + u
where u is defined as the difference between y and
E[y| x] (= β0 + β1g1(x) + β2g2(x) + … + βMgM(x)).
This implies that E[u| x] = 0, and since the z’s are
functions of x we have E[u| z1, z2,…zM] = 0. We will
use this result in Lecture 3 (Chapter 4 of Wooldridge).
A final point. Statistical independence of u and x
implies that E[u| x] = E[u]. However it is not true
that E[u| x] = E[u] implies that u and x are
statistically independent.
Simplest Version of LIE
Let k = f(x) be a set of constants, which means that k
provides no information for any conditional expecta-
tions. Then LIE (using k for x and x for w) implies:
E[y] = E[y| k] = E[E[y| x] | k] = E[E[y| x]] = E[μ(x)]
This is Property CE.2 in Appendix 2A of Wooldridge.
16
V. Average Partial Effects
In many, if not most, econometric settings in which
we want to say something about causal relationships
of the x variables on y, it is important to consider the
expectation of y conditional not only on some
observed variables, denoted by x, but also on some
unobserved variables, which we can denote as “q”
(for simplicity, think of q as a single variable). These
q variables are often referred to as unobserved
heterogeneity.
Consider a structural (i.e. causal) relationship in
which x and q “cause” y. We are interested in
estimating the causal impact of the x variables on y.
The (structural) conditional mean of y is:
E[y| x, q] = μ1(x, q)
For some variable in x, denoted by xj, we are
interested in the (causal) impact of xj on y, holding
constant both the other variables in x and q. Assuming
that μ1(x, q) is differentiable in xj and that xj is
continuous, this impact can be expressed as:
θj(x, q) ≡ ∂E[y| x, q]/∂xj = ∂μ1(x, q)/∂xj
17
Since θj(x, q) depends on q, and we don’t observe q,
it is very unlikely that we can estimate θj(x, q) for
specific values of q. Sometimes we can assume that
E[q] = 0 and perhaps even estimate θj(x, 0), but this
really only applies to a small segment of the
population for whom q = 0.
Instead, it is usually more interesting (and more
useful for policy decisions) to calculate the partial
effect averaged across the distribution of q in the
population, which is called the average partial effect
(APE). For a given value of x, denoted by x0, APE
of xj at x0, denoted by δj(x0), is defined as:
δj(x0) ≡ Eq[θj(x0, q)]
where Eq[ ] denotes taking the expectation with
respect to the different values of q in the population.
Note that this relationship holds regardless of
whether x and q are independent.
If q is continuous with density g( ), APE becomes:
δj(x0) = ∫ θj(x0, q)g(q)dq
So, is it possible to estimate δj(x0) if we observe only
x and do not observe q? The general answer is: NO!
18
So, what can we do?
One possibility is to make some assumptions about
the relationship between x and q. For example, a
(possibly mistaken) common assumption in
nonlinear models is that q and x are independent. An
even weaker assumption is that q and x are
independent conditional on some vector of observed
variables, w. That is:
D(q| x, w) = D(q| w)
where D(·| ·) denotes a conditional distribution.
Intuitively, we can think of the variables in w as
“proxies” or “controls” for q, so that if we add them
to the regression then we do not have to worry about
correlation between q and x.
An additional assumption is needed to estimate the
structural (causal) impact of x on y in E[y| x, q], the
structural conditional mean relationship. That
assumption is that the w variables do not add any
“explanatory power” to this relationship:
E[y| x, q, w] = E[y| x, q]
19
One way of expressing this is to say that w is
redundant or “ignorable” in this structural conditional
expectation.
If both of these assumptions are true we can evaluate
the APE at any x0 as:
δj(x0) = Ew[∂E[y| x0, w]/∂xj]
That is, we integrate (over the distribution of w) the
partial derivative of the expectation of y with respect
to the observed variables x0 and w. To be specific,
if we have a random sample of y, x and w from the
population of interest, we estimate ∂ 2ˆ (x0, w)/∂xj,
where μ2(x0, w) ≡ E[y| x0, w], for each observation in
the sample and then take the average if it.
Wooldridge gives a proof of this result on p.24.
Here is an intuitive example of how this works.
Suppose you want to estimate a wage equation. You
think that wages are “caused” by two things:
education and “IQ”. You have data on education
(this will be x) but not on IQ (this will be q). What
you are really interested in is estimating the impact of
education on wages, holding q constant:
θed(educ, IQ) = ∂E[wage| educ, IQ]/∂educ ≡ ∂μ1(educ, IQ)/∂educ
20
You can’t estimate μ1(educ, IQ) because you do not
observe IQ, and it is likely that educ and IQ are
correlated. However, suppose you do have some
“test” that should reflect IQ, perhaps the person’s
SAT score when they were in high school. This may
be a good w. Putting “SAT” into the regression as a
“proxy” for IQ will give unbiased estimates of APE if
the following two assumptions hold:
D(IQ| educ, SAT) = D(IQ| SAT)
E[wage| educ, IQ, SAT] = E[wage| educ, IQ]
The first assumption is that education does not have
any additional power to explain the distribution of IQ
beyond the explanatory power of the SAT score. The
second assumption is that the SAT score does not
have any power to explain wages after conditioning
on education and IQ.
Do you think these two assumptions are reasonable?
You always should ask these ki
本文档为【ln1 Conditional Expectations and Related Concepts】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。