首页 结构方程模型分析分类变量(英文)

结构方程模型分析分类变量(英文)

举报
开通vip

结构方程模型分析分类变量(英文) Journal of Econometrics 22 (1983) 43-65. North-Holland Publishing Company LATENT VARIABLE STRUCTURAL EQUATION MODELING WITH CATEGORICAL DATA* Bengt MUTHkN Unioersity of California, Los Angeles, CA 90024, USA Structural equation modeling with latent v...

结构方程模型分析分类变量(英文)
Journal of Econometrics 22 (1983) 43-65. North-Holland Publishing Company LATENT VARIABLE STRUCTURAL EQUATION MODELING WITH CATEGORICAL DATA* Bengt MUTHkN Unioersity of California, Los Angeles, CA 90024, USA Structural equation modeling with latent variables is overviewed for situations involving a mixture of dichotomous, ordered polytomous, and continuous indicators of latent variables. Special emphasis is placed on categorical variables, Models in psychometrics, econometrics and biometrics are interrelated via a general model due to Muthen. Limited information least squares estimators and full information estimation are discussed. An example is estimated with a model for a four-wave longitudinal data set, where dichotomous responses are related to each other and a set of independent variables via latent variables with a variance component structure. 1. Introduction This article gives a general overview of the specification and estimation of latent variable structural equation models, with particular emphasis on the case of dichotomous and ordered polytomous observed variables (indicators). With some recent exceptions, the methodology available to date is intended for the case of continuous indicators only. Developments for categorical indicators are important since in many applications, particularly in the social and behavioral sciences, observed variables frequently have a small number of categories with non-equidistant scale steps, and often they are dichotomous (binary). The categories of such variables may be scored for subsequent treatment as continuous, interval scale variables. Pearson product-moment correlations and covariances are, however, unsuited for these quasi-continuous variables, particularly when the variables are skewed. When such variables are forced into the mold of traditional structural equation models, a distorted analysis will result. This article draws on new developments presented in Muthen (1981a), where a general structural equation model and its estimation was proposed. Muthen’s model allows for both dichotomous, ordered polytomous, and continuous indicators of latent variables. With this general model, a large body of methodological contributions from psychometrics, biometrics, and *This research was supported by Grant 81-IJ-CX-0015 from the National Institute of Justice and by Grant DA 01070 from the U.S. Public Health Service. Ol65-7410/83/$03.00 0 Elsevier Science Publishers 44 B. Muthen, Latent variable structural equation modeling econometrics can be conveniently interrelated. This is carried out with respect to modeling in section 3. Section 4 considers estimation approaches, while section 5 presents the estimation of a social-psychological longitudinal model with features that are relevant to many fields of application, including econometrics. 2. A general model Muthen (1981a) considered the following model for G groups (populations) of observation units. The model is presented in a somewhat re-arranged way here. For each group g is observed a random dependent (endogenous) variable vector yCg) (p x 1) and a random independent (exogenous) variable vector xCg) (q x 1). Observations from different groups are assumed to be independent. In what follows the super-script g should be attached to each array of the model, but will be deleted for simplicity in cases where no confusion can arise. Each observed variable may be continuous or categorical with ordered categories. The observed variables are assumed to be generated by a set of underlying latent continuous variables in the following way. For each group, assume the linear structural equation system for a set of m latent dependent variables v] and a set of n latent independent variables 5, where a (m x 1) is a parameter vector of intercepts, B (m x m) is a parameter matrix of coefftcients for the regressions among the q’s such that the diagonal elements of B are zero and Z-B is non-singular, r (m x n) is a parameter matrix of coefficients for the regressions of q’s on t’s, and 5 is a random vector of residuals (errors in the equations). Also assume the linear ‘inner’ measurement relations for a set of p latent response variables y* and a set of q latent response variables x*, (2) x*=v,+Axt+6, (3) where vY (p x 1) and v, (q x 1) are parameter vectors of intercepts, A, (p x m) and LI, (q x n) are parameter matrices of coefficients (loadings) for the regressions of the latent response variables on the latent variables in the structural relations, and E (p x 1) and 6 (q x 1) are random vectors of residuals (errors of measurement). The observed variables are assumed to be related to the latent response variables by a set of p+q “outer” measurement relations. For a certain latent B. MuthPn, Latent variable structural equation modeling 45 response variable, z* say, two alternative types of measurements, z say, are allowed. With a categorical z with, say, C categories we assume the monotonic relation, z=C-1 if rc_, (8) /i,C,,n; + 0, (symmetric) n,c,,n; 1 A,@A:+o, ’ (9) where &,,=(I-B)-‘(IT’+Y)(l-B)‘-‘, (10) C,,=W(I-B)‘-‘, (11) and E(y* 1 x)=v,+A,(I-B)~‘tx+A,(I-B)-‘TX, (12) VY* ) x)=A,(l-B)-‘Y(I-B)‘-‘A;+@,. (13) 3. Overview of related models In its special cases, the general model reviewed above is related to several other models, used in different application areas. Modeling will be overviewed here utilizing this general model. Although the categorical case will be emphasized it is straightforward and convenient to also include in a condensed way the more familiar case of continuous variables. 3.1. Continuous variables A basic model is Joreskog’s so-called LISREL model, presented Jijreskog (1973,1977). In LISREL, all indicators are considered to continuous, so that (5) holds for all outer measurement relations, i.e., the latent response variables are all observed. The original LISREL model was concerned with the special case of a single group (G= l), and used the standardization a=O, K=O, so that E(q)=O, E(t)=O. Case A and Case B were both considered, using the normality assumptions. Case B, when further specialized to involve no measurement structure and no measurement errors B. Muthkn, Lutrnr variable structural equation modeling 47 in (2), has p=m and y=q. The case of p=m (and q=n) will be referred to as the single-indicator case, as opposed to the multiple-indicator case. It has been extensively studied by econometricians in the analysis of linear simultaneous equation systems [for familiar references, e.g. see the overview in Jiireskog (1973, pp. 93-9.5)]. LISREL is a hybrid modeling of linear factor analysis (inner) measurement relations [see e.g. Lawley and Maxwell (1971)], see (2) and (3), combined with a linear simultaneous equation system for the factors, see (1). This has proven very useful, particularly in social and behavioral science applications. For overviews with illustrations and additional detail, see e.g. Aigner and Goldberger (1977), Bentler (1980), Bentler and Weeks (1980), Bielby and Hauser (1977), Browne (1982) and Jiireskog (1978). Retaining the requirement of continuous indicators, simultaneous analysis of several groups, g= 1,2,. . . , G, and the inclusion of structured means via the parameter arrays a(9) and K(~) has been incorporated in the LISREL framework more recently. The multiple-group factor analysis of Jareskog (1971) was extended by Siirbom (1974) to study not only differences and similarities in covariance structure but also in factor means. Multiple-group analysis with structured means was developed into more general LISREL models in Sijrbom (1982) with applications to latent variable ANCOVA [S&-born (1978)] and the analysis of longitudinal data [Jiireskog and SGrbom (1980)]; see also JGreskog and Stirborn (1981). 3.2. Categorical variables: Single indicators Turning to situations with categorical response variables, consider first the single-indicator case. Here we find Case B models. The simplest situation is that of univariate and multivariate regression with categorical response variables. Methodology for this situation is well-known to econometricians and an excellent review with econometric applications covering dichotomous, ordered and unordered polytomous response is given in Amemiya (1981). These models originated in biometric work, notably probit/logit regression in bioassay [see e.g. Bliss (1935)]. Probit regression is a special case of the general model of section 2, while logit regression and related log-linear modeling fall outside this model. In the multivariate case the general model gives the multivariate probit model of Ashford and Sowden (1970). Multivariate logit models are not directly related to this model structure; there is no multivariate logistic distribution with logistic marginal distributions that have unconstrained correlation coefficients [see Gumbel (1961) and also Amemiya (1981, pp. 1525-1531) and Morimune (1979)]. As opposed to multivariate regression, simultaneous equation models generally place a structure on the reduced-form regression coefficients and possibly also the reduced-form error covariances/correlations. With categorical 48 B. Muthen. Latent variable structural equation modeling response variables, such models have recently attracted a growing interest in econometrics, but do not seem to have been utilized in biometrics or psychometrics. Some important contributions are Amemiya (1978), Heckman (1974,1978) and Maddala and Lee (1976). 3.3. Categorical variables: Multiple indicators We now consider the more complex situation of categorical response variables, where there are multiple indicators of latent variables. Developments here have mainly come from psychometric work. Consider first the measurement part of the general model. Here, the latent response variables for the observed response variables are related to the latent variable constructs by a factor analysis type measurement model. With dichotomous indicators, probit models have been considered also here, although the independent continuous variables are now latent. In item response (latent trait) theory language [see, e.g., Lord (1980)] the general model with dichotomous indicators implies the so-called two-parameter normal ogive item characteristic curve model of Lawley (1943,1944), Lord and Novick (1968) and Bock and Lieberman (1970). For a set of items (dichotomous variables) designed to measure a certain trait (factor), conditional independence is assumed to hold, given the factor. In the general model the analogous assumption is the diagonality of the measurement error covariance matrix (0, or 0,). Note, however, that correlated errors can be handled. For related one-, two- and three-parameter logistic item response models, see e.g. Andersen (1980). The general multiple-factor model has been studied by Bock and Aitkin (1981), Christoffersson (1975) and Muthen (1978), both for exploratory (‘unrestricted’) and confirmatory (‘restricted’) factor analysis. Muthen and Christoffersson (1981) generalized the model to handle simultaneous multiple-group analysis, where various degrees of invariance over populations can be studied. As in the continuous variable case, modeling of factor mean differences over populations is then of interest, see e.g. Muthen (1981b). The extension of the measurement model to more than two ordered categories by (4), in combination with both (2) and (3), is straightforward and natural. For special cases, this was first proposed by Edwards and Thurstone (1952), and later studied by e.g. Bock and Jones (196Q Samejima (1969) and Bartholomew (1980). [Note the biometric counterparts of Aitchison and Silvey (1957) and Gurland, Lee and Dahm (1960).] The unordered polytomous case, not covered by the general model above, was studied by Bock (1972). Further contributions are found in Samejima (1972). The extension to structural equation modeling with categorical response variables as latent variable indicators was first brought forward in Muthtn (1976a), and further developed in Muthen (1977,1979,1982a). Here, Case B B. MuthPn, Latent variable structural equation modeling 49 was considered with dichotomous observed variables for each latent response variable, Muthen (1979) considered a multiple-indicator-multiple-cause (MIMIC) model analogous to the MIMIC model discussed in Joreskog and Goldberger (1975) for the case of continuous response variables, while Muthen (1976b) studied a model with reciprocal interaction between two dependent latent variable constructs. The general model of section 2 covers not only Case A and Case B of the general structural equation model but also any combination of dichotomous, ordered categorical, and continuous indicators in the measurement part. Further generalizations of the measurement part are possible. One example is the inclusion of categorical-continuous or limited dependent observed variables [see, e.g., Tobin (1958) and Amemiya (1973, 1982)]. 4. Estimation The general model of section 2 can be estimated in various ways. Two basically different approaches have been attempted for special cases of this model, limited information (univariate and bivariate) multi-stage weighted least-squares (WLS), and full information, maximum likelihood (ML) estimation. Limited information estimation has been motivated by the fact that when categorical response variables are involved, a straight-forward application of ML may lead to heavy computations. 4.1. Limited information estimation Muthtn (1981a) proposed a three-stage limited information WLS estimator. Muthen summarized the structure of the general model in three parts, encompassing both Case A and Case B. The three parts are respectively a mean/threshold/reduced-form regression intercept structure, a reduced-form regression slope structure, and a covariance/correlation structure. Any of the three parts may be used alone or together with any of the other parts. A computer program LACCI [Muthen (1982b)] may be used for all computations (LACCI was utilized for the analyses of section 5). The model structure will first be presented in its full generality and then explained through a set of special cases. For each group, deleting the group index, consider the three population vectors (TV, g2 and g3: Part I (mean/threshold/reduced-form regression intercept structure) Part 2 (reduced-form regression slope structure) (14) (15) cr2 = vet {dA,(Z-BJ ‘r,}, 50 B. MurhPn. Latent variable structural equation modeling Part 3 (covariance/correlation structure) a,=Kvec{A[A,(l-B,)mlYz(I-B,)‘-‘A;+O,]A}. (16) Here, A is a diagonal matrix of scaling factors particularly useful in multiple- group analyses with categorical variables, A* contains the same element as A but diagonal elements are duplicated for categorical variables with more than one threshold (more than two categories), K, and K, similarly distributed elements from the vectors they pre-multiply, the vet operator strings out matrix elements row-wise into a column vector, and K selects lower- triangular elements from the symmetric matrix elements it pre-multiplies, where a diagonal element is only included if the corresponding observed variable is continuous. For Case A, part 2 is not needed. We may stack the dependent variables followed by the independent variables into a single vector. Then, the arrays of the three-part model structure organize the parameters as 7,= TY 11 7, ’ v,= VY [I v* ’ A,= [ *Y 0 1 o ,? = 0 A,’ [ 0, 0 (symmetric) I> 0, a Ciz= [I K ’ B l- B’=O o’ [ 1 TZ has no counterpart, Yy,= Y (symmetric) 0 Q, 1, For Case B, 7, = 7,, vz=v y, A=Ay, 0, = o,, a,=a, B,=B, rz=l-, Yz= YJ. With the normality specification on the latent response variables, any model that tits in the general framework is identified if and only if its parameters are identified in terms of g(1),.(2) ,..., c(‘), where o(~)’ = @Jr, #’ ,a’$‘). Muthen (1981a) utilized this fact in that statistics stg) were produced as consistent estimators of acg), in order to estimate the model parameters in a final estimation stage. Preceeding estimation stages give scg), B. MuthPn. Latent variable structural equation modeling 51 where only limited information from bivariate sample distributions is needed. In the final estimation stage, a WLS fitting function with a general, full weight matrix is used, F = 2 ($7) _ a’9))‘j,@7- +(d _ &d), g=i (17) where the (limited information) generalized least squares (GLS) estimator is obtained when lVg) is a consistent estimator of the asymptotic covariance matrix of stg). For the estimator based on the minimization of (17) there is no requirement that the sy’ elements form a positive definite matrix, although in large samples absence of this would indicate a mis-specified model. With GLS, F calculated at the minimum provides a large-sample chi-square test of model lit to the first- and second-order statistics. Large-sample standard errors of estimates are also readily available. With continuous indicators only, the model structure in the single-group case can usually be encompassed by the covariance matrix structure alone, i.e., part 3 of Muthen’s three-part structure. With A = I, part 3 includes the LISREL model structure. In a multiple-group analysis, the model usually also implies a structure on the observed variable means, so that both part 1 and part 3 would be used, where part 1 in this case simplifies to v,+A,(Z -B,) ‘cI,. The bivariate sample statistics vectors .sig) and sSg’ have elements from the sample mean vector and the ordinary sample covariance matrix Scg). Part 2 is not needed here. Joreskog (1973,1977) considered the full informati
本文档为【结构方程模型分析分类变量(英文)】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_211584
暂无简介~
格式:pdf
大小:1MB
软件:PDF阅读器
页数:23
分类:
上传时间:2011-01-24
浏览量:39