ln1 Conditional Expectations and Related Concepts

ln1 Conditional Expectations and Related Concepts 1 ApEc 8212 Econometric Analysis --- Lecture #1 Conditional Expectations and Related Concepts Website for typos in (1st edition of) Wooldridge book: http://www.msu.edu/~ec/faculty/wooldridge/book2.htm I. Introductory Remarks There are four ...

1 ApEc 8212 Econometric Analysis --- Lecture #1 Conditional Expectations and Related Concepts Website for typos in (1st edition of) Wooldridge book: http://www.msu.edu/~ec/faculty/wooldridge/book2.htm I. Introductory Remarks There are four main uses of econometrics and statistics: 1. To measure the characteristics of interesting economic variables (for example, the degree of income inequality), and the correlations between different variables of interest (for example, the correlation in the returns to two investments). 2. To test the validity of economic theories. 3. To quantify the magnitudes (and direction) of economic relationships and parameters. 4. To forecast/predict future values of variables. Work on the 2nd and 3rd (and maybe the 4th) tasks soon leads to attempts to measure causal relationships. To avoid confusion, researchers must be very careful, and very explicit, when they analyze data. You should start simple and then go on to more complicated problems. 2 In this class we will use Wooldridge’s textbook, which is somewhat more rigorous than Greene’s textbook. In his Chapter 1, Wooldridge explains why he wants assumptions to be made in terms of the population from which the data were drawn, as opposed to making assumptions about the data. Greene does both. An example in which Greene makes an assumption about the data is his assumption on p.20 that E[εi| X] = 0. Both εi and X refer to the data at hand, not the population from which the data were drawn. Greene starts out this way because it is easier to show some proofs of results, but as his textbook continues he moves more into the direction of making assumptions about the population. For example, all asymptotic results rely on allowing the sample size to go to infinity, which basically expands the data to include the entire population and thus amounts to making assumptions about the population. Consistent with this approach, all of Wooldridge’s discussion is asymptotic, whereas Greene starts with finite sample properties of estimation methods and then later covers asymptotic results. A final implication of Wooldridge’s approach is that explanatory variables (usually denoted by “x”) are always treated as variables, never as fixed constants. 3 II. Conditional Expectations in Econometrics Most econometric analysis involves estimating a conditional expectation, that is the expected value of a “dependent” (explained) variable, denoted by y, conditional on a set of “explanatory” (control, or independent) variables, denoted by the vector x: E[y| x] The familiar linear model is an important example. If y = β′x + ε and E[ε| x] = 0 then E[y| x] = β′x. In most econometric settings, economic theory either assumes or demonstrates that x causes y. Thus data on y and x from a large enough sample allow for calculation of E[y| x] for many values of x, which provides information on the causal effect of x on y for a wide range of x. Wooldridge call these causal relationships structural conditional expectations. But many problems can arise in trying to estimate causal relationships, such as measurement error in x, feedback from y to x, or non-random samples (all will be discussed in detail in this class). To get around these problems, identification assumptions are needed to estimate the structural relationships (structural conditional expectations). 4 III. “Features” of Conditional Expectations Let y be a random variable (the “explained” variable). Let x be a K×1 vector of “explanatory” variables. As long as E[|y|]<∞, there exists a function μ(x) such that: E[y| x] = μ(x) The function μ(x) is the expected (“average”) value of y for a particular set of explanatory variables (x). Since x is a vector of random variables, μ(x) is also a random variable. A special case of this is the standard linear model in econometrics. If y = β′x + ε and E[ε| x] = 0 then E[y| x] = μ(x) = β′x. Partial Effects, Elasticities and Semielasticities Assume that μ(x) is differentiable. If an element of x, call it xj, is continuous, the effect of a small increase in xj, conditional on the other variables in x, is approxi- mated by the derivative of μ(x) with respect to xj: ΔjE[y| x] ≈ jx )(   x Δxj (I added j subscript to Δ) Note: ∂μ(x)/∂xj is called the partial effect of xj on E[y| x]. 5 Note that if xj is not continuous but takes only a few values, the partial effect is calculated at specific changes (pairs) of those values. For example, if xj is a dummy variable (equals either 0 or 1) the partial effect is calculated as: ΔjE[y| x] = μ(x1, … xj-1, 1, xj+1, … xK) - μ(x1, … xj-1, 0, xj+1, … xK) Sometimes we want to know the elasticity of y with respect to xj, that is (∂y/∂xj)(xj/y). The (partial) elasticity of E[y| x] with respect to xj, controlling for all other variables in x, is: jx ]|y[E   x · ]|y[E x j x = jx )(   x · )( x j x ( = )xlog( ])|y[Elog( j  x ) (The 2nd equality holds only if E[y| x] > 0 and xj > 0.) Question: Does )xlog( ])|y[Elog( j  x = )xlog( )]|y[log(E j  x ? Answer: In general, NO. But it is yes if our model is log(y) = g(x) + u, and we assume that u and x are independent (this may be a homework problem). A final useful concept is the percentage change in y when x increases by one unit: 6 100 · jx ]|y[E   x · ]|y[E 1 x = 100 · jx ])|y[Elog(   x This way of expressing the causal impact of xj on y (which is defined only if E[y| x] > 0) is called the semi-elasticity of E[y| x] with respect to xj. Note that, unlike elasticities, semi-elasticities have “units”. Error Form of Models of Conditional Expectations What is the difference between the variable y and the conditional expectation of y (conditional on x)? To see, decompose y into its (conditional) expected value and an error term: y = E[y| x] + u = μ(x) + u, where E[u| x] = 0 This way of expressing y is not really assuming anything, it is just defining u as y - E[y| x]. Note that E[u| x] = 0 follows from this definition of u because taking the expectations (conditional on x) of both sides of y = E[y| x] + u yields E[u| x] = 0 (since E[E[y| x]| x] = E[y| x], as seen below). Three other things to note are: 1. The error term u is uncorrelated with any function of the variables in x. 7 2. E[u| x] = 0 implies that E[u] = 0. (see below) 3. In applying econometric models to a particular data set we cannot use the result that E[u| x] = 0 to “prove” that u is uncorrelated with x in some data set you may have. The above result is a definition of u, but in our data the unobserved variables that make up the “real” u may be correlated with the variables in x. In other words, μ(x) may not be a causal relationship (E[y|x] may not be a structural conditional expectation). A simple example illustrates this last point. Suppose that the causal (structural) determinants of wages are: log(wage) = β0 + β1educ + β2 IQ + u and E[u| educ, IQ] = 0. We want to estimate β1. Suppose that IQ causes schooling: educ = γ0 + γ1IQ (for simplicity, I have not added an error term). OLS can be used to obtain consistent (“unbiased”) estimates of β1 if you have data on wages, educ (years of schooling) and IQ. 8 What if you do not have data on IQ? Then the only conditional expectation you can estimate is: E[log(wage)| educ] = β0 + β1educ + β2E[IQ| educ] + E[u| educ] = β0 + β1educ + β2(educ – γ0)/γ1 + 0 = (β0 – γ0/γ1) + (β1 + β2/γ1)educ Regressing log(wage) on years of schooling only will estimate (β1 + β2/γ1), not β1. Even so, we can always define a (nonstructural or noncausal) conditional expectation relationship, E[log(wage)| educ] = (β0 – γ0/γ1) + (β1 + β2/γ1)educ, and we can always add an error term to it (call it v): log(wage) = (β0 - γ0/γ1) + (β1 + β2/γ1)educ + v where by definition E[v| educ] = 0. Clearly, the fact that E[v| educ] = 0 does not imply that the conditional expectation E[log(wage)| educ] is a structural conditional expectation, and it does not imply that regressing log(wage) on educ will estimate the causal impact of educ (years of schooling) on wages. 9 IV. Some Properties of Conditional Expectations This section presents some results of conditional expectations that will be used in later lectures. Linearity of Conditional Expectations This one is very useful (and was already used above). Let a1(x), a2(x), … aG(x) be scalar functions of x (a vector of random variables), and let y1, y2, … yG be any (scalar) random variables (not just some “dependent” variables). Then: E[( G 1j aj(x)yj + b(x))| x] = G 1j aj(x)E[yj| x] + b(x) as long as E[|yj|] < ∞, E[|aj(x)yj|] < ∞ and E[|b(x)|] < ∞. In Wooldridge, this is the property CE.1 in Appendix 2A (p.30) Note that a special case of this is when all the a( ) functions are constants and there is no b(x) function. This gives a very useful result that we will use a lot: E[ G 1j ajyj| x] = G 1j ajE[yj| x] 10 Law of Iterated Expectations (LIE) Let y be a random variable and let w be a vector of random variables. Let x be another vector of random variables that is a function of w: i.e. x = f(w) for some function f( ). [Note: One example is that x is simply a subset of the variables in w.] That is, if we “know” w then, using f( ), we “know” x. But it is not necessarily true that if we “know” x then we “know” w. That is, w contains at least as much, and possibly more, “information” than x. This implies: E[y| x] = E[E[y| w]| x] This is the Law of Iterated Expectations. Another way to express it: define μ1(w) = E[y| w] and define μ2(x) = E[y| x]; then E[μ1(w)| x] = μ2(x). The intuition is that “filtering” w through x in E[E[y| w]| x] “loses” all the information in w that is not in x. This is Property CE.3 in Appendix 2A of Wooldridge. Another Useful Result The following is also true of conditional expectations: E[y| x] = E[E[y| x]| w] 11 This is very similar to LIE, but this time the “first” conditioning is on the smaller information set and “second” conditioning is on the larger information set. Intuitively, “filtering” x through w does not give any more information than was already in x. One way to remember both LIE and this result is the “rule”: The smaller information set dominates. Implications of LIE A useful special case of LIE occurs when w is {x, z}: E[y| x] = E[E[y| x, z]| x] Define μ1(x, z) ≡ [E[y| x, z]] and μ2(x) ≡ E[y| x]. Then: μ2(x) = E[μ1(x, z)| x] An econometric example is that sometimes we want to know E[y| x, z], which allows us (assuming that this is a structural conditional expectation) to calculate the impact of some variable xj on y holding both x and z constant. If we have no data on z but we have data on y and x, this special case of LIE shows us the relationship between what we can estimate, E[y| x], and the causal relation, E[y| x, z]. 12 If you know the functional form of μ1(x, z), the above special case shows that you can obtain μ2(x) by integrating μ1(x, z) over z (conditional on x), but in many cases obtaining μ2(x) is even easier. Example. Consider the following structural (causal) conditional expectation: E[y| x1, x2, z] = β0 + β1x1 + β2x2 + β3z If z is not observed, by LIE (CE.3) and linearity of conditional expectations (CE.1) we have: E[y| x1, x2] = E[β0 + β1x1 + β2x2 + β3z| x1, x2] = β0 + β1x1 + β2x2 + β3E[z| x1, x2] Suppose that E[z| x1, x2] is linear in x1 and x2, in particular that E[z| x1, x2] = δ0 + δ1x1 + δ2x2. Then: E[y| x1, x2] = β0 + β1x1 + β2x2 + β3(δ0 + δ1x1 + δ2x2) = (β0+ β3δ0) + (β1 + β3δ1)x1 + (β2 + β3δ2)x2 Thus, if you estimate the expected value of y conditional on x1 and x2 (i.e. regress y on x1 and x2), you will not obtain estimates of the structural (causal) relationship between the x variables and y. This is the problem of omitted variable bias. 13 Another useful implication of LIE is the following. Let f(x) be a (vector) function and let g( ) be a (scalar) function such that E[y| x] = g(f(x)). Then: E[y| f(x)] = E[y| x] = g(f(x)) This is property CE.4 in Appendix A.2 of Wooldridge. The intuition is that if E[y| x] = g(f(x)) then all of the “information” in x that predicts y is contained in f(x), which implies that E[y| x] = E[y| f(x)]. To prove this taking the conditional expectations of E[y| x] = g(f(x)): E[E[y| x]| f(x)] = E[g(f(x)) | f(x)] E[y| f(x)] = g(f(x)) LIE implies that E[E[y| x]| f(x)] = E[y| f(x)], where w in LIE is x here and x (= f(w)) in LIE is f(x) here. Another way to express this is to define z ≡ f(x). Then E[y| x] = g(f(x)) implies that E[y| z] = g(z). Note that z can have either a larger or a smaller number of variables than x. 14 Example. Consider a wage equation: E[wage| educ, exper] = β0 + β1educ + β2exper + β3exper2 + β4educ·exper This is g(f(x)); x = {educ, exper}. What is f(x)? Thus CE.4 implies that… E[wage| educ, exper, exper2, educ·exper] = β0 + β1educ + β2exper + β3exper2 + β4educ·exper which is the same as E[wage| educ, exper]. Thus, once we condition on educ and exper (x), it is redundant to condition on functions of those variables (f(x)). For linear models, a more general result holds. Assume, for some functions g1(x), …gM(x), we have: E[y| x] = β0 + β1g1(x) + β2g2(x) + … + βMgM(x) This is a very flexible model, since all of the x variables appear in all of the g( ) functions. Next, define z1≡ g1(x), … zM≡ gM(x). Then the last implication of LIE discussed above implies that: 15 E[y| z1, z2, … zM] = β0 + β1z1 + β2z2 + … + βMzM That is, any conditional expectation that is linear in parameters and some complicated functions is also linear in some conditioning variables. More importantly, we can write the above expression as: y = β0 + β1z1 + β2z2 + … + βMzM + u where u is defined as the difference between y and E[y| x] (= β0 + β1g1(x) + β2g2(x) + … + βMgM(x)). This implies that E[u| x] = 0, and since the z’s are functions of x we have E[u| z1, z2,…zM] = 0. We will use this result in Lecture 3 (Chapter 4 of Wooldridge). A final point. Statistical independence of u and x implies that E[u| x] = E[u]. However it is not true that E[u| x] = E[u] implies that u and x are statistically independent. Simplest Version of LIE Let k = f(x) be a set of constants, which means that k provides no information for any conditional expecta- tions. Then LIE (using k for x and x for w) implies: E[y] = E[y| k] = E[E[y| x] | k] = E[E[y| x]] = E[μ(x)] This is Property CE.2 in Appendix 2A of Wooldridge. 16 V. Average Partial Effects In many, if not most, econometric settings in which we want to say something about causal relationships of the x variables on y, it is important to consider the expectation of y conditional not only on some observed variables, denoted by x, but also on some unobserved variables, which we can denote as “q” (for simplicity, think of q as a single variable). These q variables are often referred to as unobserved heterogeneity. Consider a structural (i.e. causal) relationship in which x and q “cause” y. We are interested in estimating the causal impact of the x variables on y. The (structural) conditional mean of y is: E[y| x, q] = μ1(x, q) For some variable in x, denoted by xj, we are interested in the (causal) impact of xj on y, holding constant both the other variables in x and q. Assuming that μ1(x, q) is differentiable in xj and that xj is continuous, this impact can be expressed as: θj(x, q) ≡ ∂E[y| x, q]/∂xj = ∂μ1(x, q)/∂xj 17 Since θj(x, q) depends on q, and we don’t observe q, it is very unlikely that we can estimate θj(x, q) for specific values of q. Sometimes we can assume that E[q] = 0 and perhaps even estimate θj(x, 0), but this really only applies to a small segment of the population for whom q = 0. Instead, it is usually more interesting (and more useful for policy decisions) to calculate the partial effect averaged across the distribution of q in the population, which is called the average partial effect (APE). For a given value of x, denoted by x0, APE of xj at x0, denoted by δj(x0), is defined as: δj(x0) ≡ Eq[θj(x0, q)] where Eq[ ] denotes taking the expectation with respect to the different values of q in the population. Note that this relationship holds regardless of whether x and q are independent. If q is continuous with density g( ), APE becomes: δj(x0) = ∫ θj(x0, q)g(q)dq So, is it possible to estimate δj(x0) if we observe only x and do not observe q? The general answer is: NO! 18 So, what can we do? One possibility is to make some assumptions about the relationship between x and q. For example, a (possibly mistaken) common assumption in nonlinear models is that q and x are independent. An even weaker assumption is that q and x are independent conditional on some vector of observed variables, w. That is: D(q| x, w) = D(q| w) where D(·| ·) denotes a conditional distribution. Intuitively, we can think of the variables in w as “proxies” or “controls” for q, so that if we add them to the regression then we do not have to worry about correlation between q and x. An additional assumption is needed to estimate the structural (causal) impact of x on y in E[y| x, q], the structural conditional mean relationship. That assumption is that the w variables do not add any “explanatory power” to this relationship: E[y| x, q, w] = E[y| x, q] 19 One way of expressing this is to say that w is redundant or “ignorable” in this structural conditional expectation. If both of these assumptions are true we can evaluate the APE at any x0 as: δj(x0) = Ew[∂E[y| x0, w]/∂xj] That is, we integrate (over the distribution of w) the partial derivative of the expectation of y with respect to the observed variables x0 and w. To be specific, if we have a random sample of y, x and w from the population of interest, we estimate ∂ 2ˆ (x0, w)/∂xj, where μ2(x0, w) ≡ E[y| x0, w], for each observation in the sample and then take the average if it. Wooldridge gives a proof of this result on p.24. Here is an intuitive example of how this works. Suppose you want to estimate a wage equation. You think that wages are “caused” by two things: education and “IQ”. You have data on education (this will be x) but not on IQ (this will be q). What you are really interested in is estimating the impact of education on wages, holding q constant: θed(educ, IQ) = ∂E[wage| educ, IQ]/∂educ ≡ ∂μ1(educ, IQ)/∂educ 20 You can’t estimate μ1(educ, IQ) because you do not observe IQ, and it is likely that educ and IQ are correlated. However, suppose you do have some “test” that should reflect IQ, perhaps the person’s SAT score when they were in high school. This may be a good w. Putting “SAT” into the regression as a “proxy” for IQ will give unbiased estimates of APE if the following two assumptions hold: D(IQ| educ, SAT) = D(IQ| SAT) E[wage| educ, IQ, SAT] = E[wage| educ, IQ] The first assumption is that education does not have any additional power to explain the distribution of IQ beyond the explanatory power of the SAT score. The second assumption is that the SAT score does not have any power to explain wages after conditioning on education and IQ. Do you think these two assumptions are reasonable? You always should ask these ki

                    本文档为【ln1 Conditional Expectations and Related Concepts】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

ln1 Conditional Expectations and Related Concepts

你可能还喜欢