首页 第六讲极大似然估计

第六讲极大似然估计

举报
开通vip

第六讲极大似然估计第六讲极大似然估计 第六讲 极大似然估计 The Likelihood Function and Identification of the Parameters (极大似然函数及参数识别) 1、似然函数的表示 在具有n个观察值的随机样本中~每个观察值的密度 ,,fx,,函数为。由于n个随机观察值是独立的~其联合i 密度函数为 ,,,,,,,,fx,x,?,x,,,fx,,fx,,?fx,, 12n12n n ,,,,,fx,,,L,|x,x,?,x,i12n ,i1 ,,L,|x,x,?,xL,|X函数...

第六讲极大似然估计
第六讲极大似然估计 第六讲 极大似然估计 The Likelihood Function and Identification of the Parameters (极大似然函数及参数识别) 1、似然函数的 关于同志近三年现实表现材料材料类招标技术评分表图表与交易pdf视力表打印pdf用图表说话 pdf 示 在具有n个观察值的随机样本中~每个观察值的密度 ,,fx,,函数为。由于n个随机观察值是独立的~其联合i 密度函数为 ,,,,,,,,fx,x,?,x,,,fx,,fx,,?fx,, 12n12n n ,,,,,fx,,,L,|x,x,?,x,i12n ,i1 ,,L,|x,x,?,xL,|X函数被称为似然函数~通常记为~,,12n ,,或者L,。 与Greene书中定义的区别 The probability density function, or pdf for a random variable y, conditioned on a set of parameters, is , , fy|,denoted . This function identifies the data ,, generating process that underlies an observed sample of data and, at the same time, provides a mathematical description of the data that the process will produce. The joint density of n independent and identically distributed (iid) observations from this process is the product of the individual densities; n (17-1) Lfyyyfyθy|,,,|| ?,,,,,,,,,,12ni ,1i This joint density is the likelihood function, defined as a function of the unknown parameter vector, , where y is θ used to indicate the collection of sample data. Note that we write the joint density as a function of the data conditioned on the parameters whereas when we form the likelihood function, we write this function in reverse, as a function of the parameters, conditioned on the data. Though the two functions are the same, it is to be emphasized that the likelihood function is written in this fashion to highlight our interest in the parameters and the information about them that is contained in the observed data. However, it is understood that the likelihood function is not meant to represent a probability density for the parameters as it is in Section 16.2.2. In this classical estimation framework, the parameters are assumed to be fixed constants which we hope to learn about from the data. It is usually simpler to work with the log of the likelihood function: n . (17-2) lnLlnfyθyθ|| ,,,,,i ,1i Again, to emphasize our interest in the parameters, given the observed data, we denote this function LLθdataθy||,. The likelihood function and its ,,,, logarithm, evaluated at , are sometimes denoted simply θ LθlnLθ and , respectively or, where no ambiguity can ,,,, arise, just or . lnLL It will usually be necessary to generalize the concept of the likelihood function to allow the density to depend on other conditioning variables. To jump immediately to one of our central applications, suppose the disturbance in the classical linear regression model is normally distributed. Then, conditionedonit’sspecific is normally x,yii 2T,distributed with mean and variance . That ,,xβii means that the observed random variables are not iid; they have different means. Nonetheless, the observations are independent, and as we will examine in closer detail, n lnLlnfyθyXxθ|,|, ,,,,,ii ,1i 2 (17-3) T,,ny,xβ,,1ii2,, 2,,,,lnln,,,,,2,,2,,1i,, nK,ith,where X is the matrix of data with row equal xto . i 2、识别问题 The rest of this chapter will be concerned with obtaining θestimates of the parameters, and in testing hypotheses about them and about the data generating process. Before we begin that study, we consider the question of whether estimation of the parameters is possible at all—the question of identification. Identification is an issue related to the formulation of the model. The issue of identification must be resolved before estimation can even be considered. The question posed is essentially this: Suppose we had an infinitely large sample—that is, for current purposes, all the information there is to be had about the parameters. Could we uniquely determine the values of from such a θ sample? As will be clear shortly, the answer is sometimes no. 注意:希望大家能够熟练地写出不同分布的密度函数~以及对应的似然函数。这是微观计量经 济学的基本功。特别是正态分布、Logistic分布 。更一般地讲~指数类分布的密度函数。 17.3 Efficient estimation: the Principle of Maximum Likelihood The principle of maximum likelihood provides a means of choosing an asymptotically efficient estimator for a parameter or a set of parameters. The logic of the technique is easily illustrated in the setting of a discrete distribution. Consider a random sample of the following 10 observations from a Poisson distribution: 5, 0, 1, 1, 0, 3, 2, 3, 4, and 1. y,θieθ The density for each observation is |,fyθ,,i!yi Since the observations are independent, their joint density, which is the likelihood for this sample, is y,θ1010ieθ ,,,||,,fyyyfy?θθ,,,,,,1210i!yii,,11i. ,,,,,θθθθθ50110201020eeeeeθθθθθ ,,,? 5!0!1!5!0!1!1!207360,,,? The last result gives the probability of observing this particular sample, assuming that a Poisson distribution θwith as yet unknown parameter generated the data. θWhat value of would make this sample most probable? θFigure 17.1 plots this function for various values of. It has ,,2a single mode at , which would be the maximum ,likelihood estimate, or MLE, of . Consider maximizing with respect to . Since the ,Lθy|,, log function is monotonically increasing and easier to work with, we usually maximize instead; in sampling lnLθy|,, from a Poisson population, nn lnLnlnylnyθyθθ|!,,,,,,,,,,ii ,,11ii n:,lnLθy|,,1 ,,,,,,ny0θy,iMLn,θθ,1i For the assumed sample of observations, lnLlnθyθθ|102012.242,,,, ,, :,lnLθy|,,1 ,,,,,,102002θ ,θθ and 1,,d,,10202,,dlnLθy|,,1θ,, ,,,,,200 .thisisamaxmun22ddθθθ The solution is the same as before. Figure 17.1 also plots the log of to illustrate the result. Lθy|,, The reference to the probability of observing the given sample is not exact in a continuous distribution, since a particular sample has probability zero. Nonetheless, the principle is the same. The values of the parameters that maximize or Lθdata|,, :its log are the maximum likelihood estimates, denot. ed θSince the logarithm is a monotonic function, the values that maximize are the same as those that maximize Lθdata|,, . The necessary condition for maximizing lnLθdata|,, is lnLθdata|,, ,lnLθdata|,, ,0. (17-4) ,θ This is called the likelihood equation. The general result then is that the MLE is a root of the likelihood equation. The application to the parameters of the dgp for a discrete random variable are suggestive that maximum likelihood is a“good”useofthedata.It remains to establish this as a general principle. We turn to that issue in the next section. 17.4 Properties of maximum Likelihood Estimation Maximum likelihood estimators (MLEs) are most attractive because of their large sample or asymptotic properties. If certain regularity conditions are met, the MLE will have these properties. The finite sample properties are sometimes less than optimal. 2For example, the MLE may be biased; the MLE of in , Example 17.2 is biased downward. The occasional statement that the properties of the MLE are only optimal in large samples is not true, however. It can be shown that when sampling is from an exponential family of distributions (see Definition 18.1), there will exist sufficient statistics. If so, MLEs will be functions of them, which means that when minimum variance unbiased estimators exist, they will be MLEs. [See Stuart and Ord (1989).] Most applications in econometrics do not involve exponential families, so the appeal of the MLE remains primarily its asymptotic properties. We use the following notation: : is the maximum likelihood estimator; , , denotes the true value of the parameter vector; 0 , denotes another possible value of the parameter vector, not the MLE and not necessarily the true values. Expectation based on the true values of the parameters is denoted . E,,,0 If we assume that the regularity conditions discussed below are met by , then we have the following theorem. fx,θ,,0 定理4.2,克拉美-劳下界,,信息数和信息矩阵, 若x的密度函数满足一定的正则条件~参数的一 个无偏估计量的方差总是大于等于 , ,12,,,,,,,,InL,1,,, ,,,,,,IE,,2,,,,,,,,,, ,12,,,,,,,,InL,,,,,,,, E ,,,,,,,,,,,,,, 证明 住所证明下载场所使用证明下载诊断证明下载住所证明下载爱问住所证明下载爱问 ,, 定义4.12,渐进正态和渐进有效, dˆˆ,,,,nθ,θ,,,N0,V若成立~则估计量是渐进正态的,θ 若任何其他一致渐进正态分布估计量的协方差阵超出 1ˆ一个非负定阵~则估计量是渐进有效的。 Vθ n 对于大多数的估计问题~渐进正态和渐进有效通常是选择估计量的准则。 渐进期望 一个随机变量的渐进期望和渐进方差是指渐近分布的 期望和方差。于是~遵从极限分布为的估 d1ˆ计量的渐进期望是、渐进方差是。θnNθθ0V,,V,,,,, n这意味着~估计量是渐进无偏的。 一致性和渐进无偏性的关系,三种可能的定义,: ˆ,, ,1,n,,,极限分布的均值为0, ˆ,,,,,E ,2, lim n,, ˆplim,,, ,3, 这些定义的意义是什么, 渐进方差(常用的定义之一) 2,,1,,,,,,ˆˆˆ,,,, ,,AsyVar,En,E,.,,,,limlimnn,,n,,,,,,nn,,,, ML估计的特性 由于其大样本特性或渐进特性~ML估计量具有很大的 fx,θ吸引力~当在满足正则条件的基础上~有: ,,0 定理4.18,极大似然估计的性质, ,,若似然函数fx,θ满足正则条件~极大似然估计量有下列渐进性质: ˆM1、一致性: plimθθ, 2a,1,,,lnL θθIθ,N,M2、渐进正态:~Iθ,E ,,,,,,,,,,T,,θθ,, ˆθM3、渐进有效:是渐进有效的~且达到一致估计量的克拉美-劳下界: ,12,,,,,,,lnLθ,,ˆ,,,, Asy.VarθE,,,,T,,,,,,θθ,,,, ,1 ,,,,,,,,lnLθlnLθ,,,,,,,E ,,,,,,,,T,θ,,,,θ,,,,, ˆ,,cθM4、不变性:若是的ML估计~是连续函数~θθ ˆ,,γ,cθ,,cθ则的ML估计是。 对这些性质的理解。 这些渐进特性说明了ML在计量经济学中盛行的原因: 第一个是说明估计量的极限分布, 第二个是大大地促进了假设检验和区间估计的构造, 第三个是一个特别强有力的结果~MLE具有一个一致估计量所能达到的最小方差, 第四个是为构造函数估计提供方便: 两层含义: 1.若对一组参数已经得到估计~并要求他们的一个函数的估计时~则不需要重新估计模型, 2.不变性原理暗示我们可以按我们自己喜欢的方式自由地对似然函数,re-parameterize,(重参数化)~以达到简化估计的目的。 不过~这些都是渐进特性~有限样本特性通常是未知的~当我们已知这些有限样本特性时~有时会发现MLE在小样本情况下并不是最佳估计量。 为了证明上述的性质~我们需要一些关于概率密度函数有用的性质~在这些有用性质的支撑下~进行上述性质的证明。 17.4.1 Regularity Conditions 首先是正则条件~然后是有用的性质。 To sketch proofs of these results, we first obtain some useful properties of probability density functions. We assume that is a random sample from the yyy,,,?,,12n fy|θpopulation with density function and that the ,,i0 following regularity conditions hold. [Our statement of these is informal. A more rigorous treatment may be found in Stuart and Ord (1989) or Davidson and MacKinnon (1993).] ,,xxx,,,?fx|θ设是来自密度函数为的单元,多元,,,i12n ,,fx|θ总体~密度函数遵从下列正则条件: i ,,lnfx,θx R1. 对几乎所有的和所有的~关于的前三θθi 阶导数是有限的。,这样就确保了某些Ta级ylor数近似的存在和导数的有限方差,, lnL ,,lnfx,θ R2. 满足获得一阶二阶导数期望所需的条i 件, 3,,,lnfx,θi R3. 对于所有的取值~小于一个具有有限θ ,θ,θ,θjkl 期望的函数,这点使我们能够对Ta级ylor数进行舍去项 数,。 关于正则条件的理解: What they are , 1. has three continuous derivatives wrt lnf,,, parameters , 2. Conditions needed to obtain expectations of derivatives are met (E.g., range of the variable is not a function of the parameters) . , 3. Third derivative has finite expectation. What they mean , Moment conditions and convergence. We need to obtain expectations of derivatives. , We need to be able to truncate Taylor series. , We will use central limit theorems With these regularity conditions, we will obtain the following fundamental characteristicsof fy|θ: ,,i D1 is simply a consequence of the definition of the likelihood function. D2 leads to the moment condition which defines the maximum likelihood estimator. On the one hand, the MLE is found as the maximizer of a function, which mandates finding the vector which equates the gradient to zero. On the other, D2 is a more fundamental relationship which places the MLE in the class of generalized method of moments estimators. D3 produces what is known as the Information matrix equality. This relationship shows how to obtain the asymptotic covariance matrix of the MLE. 在这些正则条件~我们有下列关于的基本性质: fx,θ,,i 2,lnfy|θ,lnfy|θ,,,,iiD1.,和 g,H,lnfy|θ,,iiiT,θ,,θθ,,是随机变量的全部随机样本,,这一性质in,1,2,,? 可从我们关于随机抽样的假设中得到。, ,,,lnfy|θ,,i0 D2. ,,EEgθ0,,,,,,i0,,,θ,, fAfBθθθθ0||,,(前提条件:~,,,,,,,,00 ,,BAθθ,,,,00BAθ,θ~ 其中:分别是随机变量,,0,,,, ,,θθ 变化的上限和下限) 2,,,lnfy|θ,,i0 D3. ,,VarEEgθH,,,,,,,,,,ii0T,,,,θθ,,语言的描述:,弄清楚含义, D2. 一阶导数的期望为零, D3. 二阶导数矩阵期望的负值等于一阶导数的方差。 证明: x首先~考虑的定义域,范围,与参数有关~对于每个i AyBθθ,,~有。依据定义~有多重积分 y,,,,00ii Bθ,,0 fydy|1θ,。 ,,0i, Aθ,,0 对上式关于求导。 θ0 依据莱不利滋,Leibnitz,理论~有 Bθ,,0 ,fydy|θ,,i0,Bθ,,0,fy|θ,,Aθ,,0i0,dy ,,,θθ00Aθ,,0 ,,BAθθ,,,,00 ,,,fBfAθθθθ0||,,,,,,,,0000 ,,θθ00 如果上式中第二、三项为零我们则可以对第一项的微分和积分顺序进行调整。问题是第二、三项为零的充要条件是什么,必要条件是在积分端点密度函数为零,而充分条件则是观测到的随机变量的范围与参‎‎数无关~这xi ,,BAθθ,,,, 就意味着~而这点正是正则条件R2。 ,,0 ,,θθ If the second and third terms go to zero, then we may interchange the operations of differentiation and integration. The necessary condition is that . (Note that the uniform limfylimfy||0θθ,,,,,,ii00,,θθyAyB,,,,ii00 distribution suggested above violates this condition.) Sufficient conditions are that the range of the observed random variable,, does not depend on the parameters, yi ,,ABθθ,,,,00which means that ,,0 or that the density ,,θθ00 is zero at the terminal points. This condition, then, is regularity condition R2. The latter is usually assumed, and we will assume it in what follows. So, Bθ,, ,fydy|θ,,0ii,Bθ,,0,fy|θ,,Aθ,,0i ,dyi,,,θθ00,,Aθ0 ,,lnfyfy||θθ,,,,1ii00,Bθ,,0,,θθθfy|,,000i,lnfy|θ,,i0 ,fydy|θ,,ii0,,θ0A,,θ0 ,lnfy|θ,,,,i0,E ,,,θ0,, ,D2得证,。 ,,Egθ0,,,,i0,, 由于微分积分顺序可以交换~对再度关 Bθ,,0,fy|θ,,0iθdy于微分~有 0i,,θ0Aθ,,0 Bθ,,0 ,fydy|θ,,0ii,Bθ,,0,fy|θ,,,,Aθ,,0i0 ,dyi,,,,,θθθθ0000,,Aθ0 Bθ,,0,lnfy|θ,,,,,0i ,fydy|θ,,0,,ii,,,θθ00,,Aθ,,0 Bθ,,20,,,,,lnfylnfyfy|||θθθ,,,,,,000iii ,,,fydy|0θ,,,,0ii,TT,,,,θθθθ0000Aθ,,,,0 ,,fylnfy||θθ,,,,ii00但是~~ ,fy|θ,,i0TT,,θθ00 同时~和的积分等于积分的和。因此~ Bθ,,20,,,lnfy|θ,,0i ,,,fydyEH|θθ,,,,,,,,00iii,T,,θθ00Aθ,,,,0 Bθ,,0,,lnfyfy||θθ,,,,,,00ii ,dy,,i,T,,θθ00,,Aθ,,0 Bθ,,0,,lnfylnfy||θθ,,,,,,00ii ,fydy|θ,,,,0ii,T,,θθ00,,Aθ,,0 ,,lnfylnfy||θθ,,,,,,ii00 ,E,,T,,θθ00,, 上式表明~左边是二阶导数矩阵期望的负值~右边是一阶导数平方的期望。根据D2,一阶导数的期望为零,~右边的意义是一阶导数的方差~因此有~二阶导数矩阵期望的负值等于一阶导数的方差。D3得证~ 即有: ,,,lnfx,θ,,Var ,, ,θ,, ,,,,,,lnfx,θlnfx,θ,,,,,, ,E,,,,,,T,θ,θ,,,,,, 2,,,,lnfx,θ, E,, ,,T,θ,θ,, 17.4.3 The likelihood Equation ( ML估计量渐进性质的 衍生) 设对数似然函数为 n lnLlnfy,|θ,,,i ,1i nlnL, gg,,则: ,17-9, ,i,θi,1 2n,lnL ,,HH和 ,iT,,θθ,1i遵从D1和D2~ nn,lnL,,,,ggθg0EEEE,,,, ,17-10, ,,,,,,,,0000ii,,,,θ,,,,,ii11,, which is the likelihood equation mentioned earlier. 17.4.4 The Information Matrix Equality nn,,TT,,EggEgg,考虑~且依据D1,随机抽样性质,~,,,,ij i,,11j,, 下标不等的项被剔除~得 nn,,,,TT,,,,,,Egg,Egg,E,H,,EH ~ ,,iii,,,,,,,,,1,1ii 因此~有 2,,,,lnLLln,, (17-11) VarVarEE,,,,,gH,,,,00000,,T,,,,,θθθ,,,, This very useful result is known as the information matrix equality. 有了这些准备工作~我们就可以对M1~M2~M3和M4进行证明了。 证明:,细节比较多~需要花费一些时间。,,pp477-480, 理解: We will sketch formal proofs of these results: The log-likelihood function, again The likelihood equation and the information matrix. A linear Taylor series approximation to the first order conditions: ggH,,,,,,,,,0,,,,,,,,MLML ( under regularity, higher order terms will vanish in large samples.) Our usual approach. Large sample behavior of the left and right hand sides is the same. A Proof of consistency. (Property 1) The limiting variance of . We are using the n,,,,,ML central limit theorem here. Leads to asymptotic normality (Property 2). We will derive the asymptotic variance of the MLE. Efficiency (we have not developed the tools to prove this.) The Cramer-Rao lower bound for efficient estimation (an asymptotic version of Gauss-Markov). Estimating the variance of the maximum likelihood estimator. Invariance. (A VERY handy result.) Coupled with the Slutsky theorem and the delta method, the invariance property makes estimation of nonlinear functions of parameters very easy. Deriving the Properties of the Maximum Likelihood Estimator 一个例 子,多元正态分布的信息矩阵,例4.21。 ML估计量渐进方差的估计 BHHH估计量 用例子说明具体求法 一个ML估计量的方差估计量
本文档为【第六讲极大似然估计】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_003124
暂无简介~
格式:doc
大小:219KB
软件:Word
页数:0
分类:企业经营
上传时间:2017-10-07
浏览量:12