首页 A_global_geometric_framework_for_nonlinear_dim_reduction

A_global_geometric_framework_for_nonlinear_dim_reduction

举报
开通vip

A_global_geometric_framework_for_nonlinear_dim_reduction 23; right 36, 13, and 27); superior frontal gyrus (left 29, 31, and 45; right 17, 35, and 37). 17. Although the improvement in WM performance with cholinergic enhancement was a nonsigniÞcant trend in the current study (P 5 0.07), in a previous study (9) with...

A_global_geometric_framework_for_nonlinear_dim_reduction
23; right 36, 13, and 27); superior frontal gyrus (left 29, 31, and 45; right 17, 35, and 37). 17. Although the improvement in WM performance with cholinergic enhancement was a nonsigniÞcant trend in the current study (P 5 0.07), in a previous study (9) with a larger sample (n 5 13) the effect was highly signiÞcant (P , 0.001). In the current study, we analyzed RT data for six of our seven subjects because the behavioral data for one subject were unavailable due to a computer failure. The difference in the signiÞcance of the two Þndings is simply a result of the difference in sample sizes. A power analysis shows that the size of the RT difference and variability in the current sample would yield a signif- icant result (P 5 0.01) with a sample size of 13. During the memory trials, mean RT was 1180 ms during placebo and 1119 ms during physostigmine. During the control trials, mean RT was 735 ms during placebo and 709 ms during physostigmine, a differ- ence that did not approach signiÞcance (P 5 0.24), suggesting that the effect of cholinergic enhance- ment on WM performance is not due to a nonspeciÞc increase in arousal. 18. Matched-pair t tests (two-tailed) were used to test the signiÞcance of drug-related changes in the vol- ume of regions of interest that showed signiÞcant response contrasts. 19. H. Sato, Y. Hata, H. Masui, T. Tsumoto, J. Neuro- physiol. 55, 765 (1987). 20. M. E. Hasselmo, Behav. Brain Res. 67, 1 (1995). 21. M. G. Baxter, A. A. Chiba, Curr. Opin. Neurobiol. 9, 178 (1999). 22. B. J. Everitt, T. W. Robbins, Annu. Rev. Psychol. 48, 649 (1997). 23. R. Desimone, J. Duncan, Annu. Rev. Neurosci. 18, 193 (1995). 24. P. C. Murphy, A. M. Sillito, Neuroscience 40, 13 (1991). 25. M. Corbetta, F. M. Miezin, S. Dobmeyer, G. L. Shul- man, S. E. Peterson, J. Neurosci. 11, 2383 (1991). 26. J. V. Haxby et al., J. Neurosci. 14, 6336 (1994). 27. A. Rosier, L. Cornette, G. A. Orban, Neuropsychobiol- ogy 37, 98 (1998). 28. M. E. Hasselmo, B. P. Wyble, G. V. Wallenstein, Hip- pocampus 6, 693 (1996). 29. S. P. Mewaldt, M. M. Ghoneim, Pharmacol. Biochem. Behav. 10, 1205 (1979). 30. M. Petrides, Philos. Trans. R. Soc. London Ser. B 351, 1455 (1996). 31. M. E. Hasselmo, E. Fransen, C. Dickson, A. A. Alonso, Ann. N.Y. Acad. Sci. 911, 418 (2000). 32. M. M. Mesulam, Prog. Brain Res. 109, 285 (1996). 33. R. T. Bartus, R. L. Dean III, B. Beer, A. S. Lippa, Science 217, 408 (1985). 34. N. Qizilbash et al., JAMA 280, 1777 (1998). 35. J. V. Haxby, J. Ma. Maisog, S. M. Courtney, in Mapping and Modeling the Human Brain, P. Fox, J. Lancaster, K. Friston, Eds. ( Wiley, New York, in press). 36. We express our appreciation to S. Courtney, R. Desi- mone, Y. Jiang, S. Kastner, L. Latour, A. Martin, L. Pessoa, and L. Ungerleider for careful and critical review of the manuscript. We also thank M. B. Scha- piro and S. I. Rapoport for input during early stages of this project. This research was supported by the National Institute on Mental Health and National Institute on Aging Intramural Research Programs. 7 August 2000; accepted 15 November 2000 A Global Geometric Framework for Nonlinear Dimensionality Reduction Joshua B. Tenenbaum,1* Vin de Silva,2 John C. Langford3 Scientists working with large volumes of high-dimensional data, such as global climate patterns, stellar spectra, or human gene distributions, regularly con- front the problem of dimensionality reduction: finding meaningful low-dimen- sional structures hidden in their high-dimensional observations. The human brain confronts the same problem in everyday perception, extracting from its high-dimensional sensory inputs—30,000 auditory nerve fibers or 106 optic nerve fibers—a manageably small number of perceptually relevant features. Here we describe an approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set. Unlike classical techniques such as principal component analysis (PCA) and multidimensional scaling (MDS), our approach is capable of discovering the nonlinear degrees of freedom that underlie com- plex natural observations, such as human handwriting or images of a face under different viewing conditions. In contrast to previous algorithms for nonlinear dimensionality reduction, ours efficiently computes a globally optimal solution, and, for an important class of data manifolds, is guaranteed to converge asymptotically to the true structure. A canonical problem in dimensionality re- duction from the domain of visual perception is illustrated in Fig. 1A. The input consists of many images of a person’s face observed under different pose and lighting conditions, in no particular order. These images can be thought of as points in a high-dimensional vector space, with each input dimension cor- responding to the brightness of one pixel in the image or the firing rate of one retinal ganglion cell. Although the input dimension- ality may be quite high (e.g., 4096 for these 64 pixel by 64 pixel images), the perceptually meaningful structure of these images has many fewer independent degrees of freedom. Within the 4096-dimensional input space, all of the images lie on an intrinsically three- dimensional manifold, or constraint surface, that can be parameterized by two pose vari- ables plus an azimuthal lighting angle. Our goal is to discover, given only the unordered high-dimensional inputs, low-dimensional representations such as Fig. 1A with coordi- nates that capture the intrinsic degrees of freedom of a data set. This problem is of central importance not only in studies of vi- sion (1–5), but also in speech (6, 7), motor control (8, 9), and a range of other physical and biological sciences (10–12). The classical techniques for dimensional- ity reduction, PCA and MDS, are simple to implement, efficiently computable, and guar- anteed to discover the true structure of data lying on or near a linear subspace of the high-dimensional input space (13). PCA finds a low-dimensional embedding of the data points that best preserves their variance as measured in the high-dimensional input space. Classical MDS finds an embedding that preserves the interpoint distances, equiv- alent to PCA when those distances are Eu- clidean. However, many data sets contain essential nonlinear structures that are invisi- ble to PCA and MDS (4, 5, 11, 14). For example, both methods fail to detect the true degrees of freedom of the face data set (Fig. 1A), or even its intrinsic three-dimensionality (Fig. 2A). Here we describe an approach that com- bines the major algorithmic features of PCA and MDS—computational efficiency, global optimality, and asymptotic convergence guar- antees—with the flexibility to learn a broad class of nonlinear manifolds. Figure 3A illus- trates the challenge of nonlinearity with data lying on a two-dimensional “Swiss roll”: points far apart on the underlying manifold, as mea- sured by their geodesic, or shortest path, dis- tances, may appear deceptively close in the high-dimensional input space, as measured by their straight-line Euclidean distance. Only the geodesic distances reflect the true low-dimen- sional geometry of the manifold, but PCA and MDS effectively see just the Euclidean struc- ture; thus, they fail to detect the intrinsic two- dimensionality (Fig. 2B). Our approach builds on classical MDS but seeks to preserve the intrinsic geometry of the data, as captured in the geodesic manifold distances between all pairs of data points. The crux is estimating the geodesic distance be- tween faraway points, given only input-space distances. For neighboring points, input- space distance provides a good approxima- 1Department of Psychology and 2Department of Mathematics, Stanford University, Stanford, CA 94305, USA. 3Department of Computer Science, Car- negie Mellon University, Pittsburgh, PA 15217, USA. *To whom correspondence should be addressed. E- mail: jbt@psych.stanford.edu R E P O R T S www.sciencemag.org SCIENCE VOL 290 22 DECEMBER 2000 2319 tion to geodesic distance. For faraway points, geodesic distance can be approximated by adding up a sequence of “short hops” be- tween neighboring points. These approxima- tions are computed efficiently by finding shortest paths in a graph with edges connect- ing neighboring data points. The complete isometric feature mapping, or Isomap, algorithm has three steps, which are detailed in Table 1. The first step deter- mines which points are neighbors on the manifold M, based on the distances dX (i, j) between pairs of points i, j in the input space X. Two simple methods are to connect each point to all points within some fixed radius e, or to all of its K nearest neighbors (15). These neighborhood relations are represented as a weighted graph G over the data points, with edges of weight dX(i, j) between neighboring points (Fig. 3B). In its second step, Isomap estimates the geodesic distances dM (i, j) between all pairs of points on the manifold M by computing their shortest path distances dG(i, j) in the graph G. One simple algorithm (16) for find- ing shortest paths is given in Table 1. The final step applies classical MDS to the matrix of graph distances DG 5 {dG(i, j)}, constructing an embedding of the data in a d-dimensional Euclidean space Y that best preserves the manifold’s estimated intrinsic geometry (Fig. 3C). The coordinate vectors yi for points in Y are chosen to minimize the cost function E 5 \t~DG! 2 t~DY!\L2 (1) where DY denotes the matrix of Euclidean distances {dY(i, j) 5 \yi 2 yj\} and \A\L2 the L2 matrix norm =Si, j Ai j 2 . The t operator Fig. 1. (A) A canonical dimensionality reduction problem from visual perception. The input consists of a sequence of 4096-dimensional vectors, rep- resenting the brightness values of 64 pixel by 64 pixel images of a face rendered with different poses and lighting directions. Applied to N 5 698 raw images, Isomap (K 5 6) learns a three-dimen- sional embedding of the dataÕs intrinsic geometric structure. A two-dimensional projection is shown, with a sample of the original input images (red circles) superimposed on all the data points (blue) and horizontal sliders (under the images) repre- senting the third dimension. Each coordinate axis of the embedding correlates highly with one de- gree of freedom underlying the original data: left- right pose (x axis, R 5 0.99), up-down pose ( y axis, R 5 0.90), and lighting direction (slider posi- tion, R 5 0.92). The input-space distances dX(i, j ) given to Isomap were Euclidean distances be- tween the 4096-dimensional image vectors. (B) Isomap applied to N 5 1000 handwritten Ò2Ós from the MNIST database (40). The two most signiÞcant dimensions in the Isomap embedding, shown here, articulate the major features of the Ò2Ó: bottom loop (x axis) and top arch ( y axis). Input-space distances dX(i, j ) were measured by tangent distance, a metric designed to capture the invariances relevant in handwriting recognition (41). Here we used e-Isomap (with e 5 4.2) be- cause we did not expect a constant dimensionality to hold over the whole data set; consistent with this, Isomap Þnds several tendrils projecting from the higher dimensional mass of data and repre- senting successive exaggerations of an extra stroke or ornament in the digit. R E P O R T S 22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2320 converts distances to inner products (17), which uniquely characterize the geometry of the data in a form that supports efficient optimization. The global minimum of Eq. 1 is achieved by setting the coordinates yi to the top d eigenvectors of the matrix t(DG) (13). As with PCA or MDS, the true dimen- sionality of the data can be estimated from the decrease in error as the dimensionality of Y is increased. For the Swiss roll, where classical methods fail, the residual variance of Isomap correctly bottoms out at d 5 2 (Fig. 2B). Just as PCA and MDS are guaranteed, given sufficient data, to recover the true structure of linear manifolds, Isomap is guar- anteed asymptotically to recover the true di- mensionality and geometric structure of a strictly larger class of nonlinear manifolds. Like the Swiss roll, these are manifolds whose intrinsic geometry is that of a convex region of Euclidean space, but whose ambi- ent geometry in the high-dimensional input space may be highly folded, twisted, or curved. For non-Euclidean manifolds, such as a hemisphere or the surface of a doughnut, Isomap still produces a globally optimal low- dimensional Euclidean representation, as measured by Eq. 1. These guarantees of asymptotic conver- gence rest on a proof that as the number of data points increases, the graph distances dG(i, j) provide increasingly better approxi- mations to the intrinsic geodesic distances dM(i, j), becoming arbitrarily accurate in the limit of infinite data (18, 19). How quickly dG(i, j) converges to dM(i, j) depends on cer- tain parameters of the manifold as it lies within the high-dimensional space (radius of curvature and branch separation) and on the density of points. To the extent that a data set presents extreme values of these parameters or deviates from a uniform density, asymp- totic convergence still holds in general, but the sample size required to estimate geodes- ic distance accurately may be impractically large. Isomap’s global coordinates provide a simple way to analyze and manipulate high- dimensional observations in terms of their intrinsic nonlinear degrees of freedom. For a set of synthetic face images, known to have three degrees of freedom, Isomap correctly detects the dimensionality (Fig. 2A) and sep- arates out the true underlying factors (Fig. 1A). The algorithm also recovers the known low-dimensional structure of a set of noisy real images, generated by a human hand vary- ing in finger extension and wrist rotation (Fig. 2C) (20). Given a more complex data set of handwritten digits, which does not have a clear manifold geometry, Isomap still finds globally meaningful coordinates (Fig. 1B) and nonlinear structure that PCA or MDS do not detect (Fig. 2D). For all three data sets, the natural appearance of linear interpolations between distant points in the low-dimension- al coordinate space confirms that Isomap has captured the data’s perceptually relevant structure (Fig. 4). Previous attempts to extend PCA and MDS to nonlinear data sets fall into two broad classes, each of which suffers from limitations overcome by our approach. Local linear techniques (21–23) are not designed to represent the global structure of a data set within a single coordinate system, as we do in Fig. 1. Nonlinear techniques based on greedy optimization procedures (24–30) attempt to discover global structure, but lack the crucial algorithmic features that Isomap inherits from PCA and MDS: a noniterative, polyno- mial time procedure with a guarantee of glob- al optimality; for intrinsically Euclidean man- Fig. 2. The residual variance of PCA (open triangles), MDS [open triangles in (A) through (C); open circles in (D)], and Isomap (Þlled cir- cles) on four data sets (42). (A) Face images varying in pose and il- lumination (Fig. 1A). (B) Swiss roll data (Fig. 3). (C) Hand images varying in Þnger exten- sion and wrist rotation (20). (D) Handwritten Ò2Ós (Fig. 1B). In all cas- es, residual variance de- creases as the dimen- sionality d is increased. The intrinsic dimen- sionality of the data can be estimated by looking for the ÒelbowÓ at which this curve ceases to decrease signiÞcantly with added dimensions. Arrows mark the true or approximate dimensionality, when known. Note the tendency of PCA and MDS to overestimate the dimensionality, in contrast to Isomap. Fig. 3. The ÒSwiss rollÓ data set, illustrating how Isomap exploits geodesic paths for nonlinear dimensionality reduction. (A) For two arbitrary points (circled) on a nonlinear manifold, their Euclidean distance in the high- dimensional input space (length of dashed line) may not accurately reßect their intrinsic similarity, as measured by geodesic distance along the low-dimensional manifold (length of solid curve). (B) The neighbor- hood graph G constructed in step one of Isomap (with K 5 7 and N 5 1000 data points) allows an approximation (red segments) to the true geodesic path to be computed efÞciently in step two, as the shortest path in G. (C) The two-dimensional embedding recovered by Isomap in step three, which best preserves the shortest path distances in the neighborhood graph (overlaid). Straight lines in the embedding (blue) now represent simpler and cleaner approximations to the true geodesic paths than do the corresponding graph paths (red). R E P O R T S www.sciencemag.org SCIENCE VOL 290 22 DECEMBER 2000 2321 ifolds, a guarantee of asymptotic conver- gence to the true structure; and the ability to discover manifolds of arbitrary dimensional- ity, rather than requiring a fixed d initialized from the beginning or computational resourc- es that increase exponentially in d. Here we have demonstrated Isomap’s per- formance on data sets chosen for their visu- ally compelling structures, but the technique may be applied wherever nonlinear geometry complicates the use of PCA or MDS. Isomap complements, and may be combined with, linear extensions of PCA based on higher order statistics, such as independent compo- nent analysis (31, 32). It may also lead to a better understanding of how the brain comes to represent the dynamic appearance of ob- jects, where psychophysical studies of appar- ent motion (33, 34) suggest a central role for geodesic transformations on nonlinear mani- folds (35) much like those studied here. References and Notes 1. M. P. Young, S. Yamane, Science 256, 1327 (1992). 2. R. N. Shepard, Science 210, 390 (1980). 3. M. Turk, A. Pentland, J. Cogn. Neurosci. 3, 71 (1991). 4. H. Murase, S. K. Nayar, Int. J. Comp. Vision 14, 5 (1995). 5. J. W. McClurkin, L. M. Optican, B. J. Richmond, T. J. Gawne, Science 253, 675 (1991). 6. J. L. Elman, D. Zipser, J. Acoust. Soc. Am. 83, 1615 (1988). 7. W. Klein, R. Plomp, L. C. W. Pols, J. Acoust. Soc. Am. 48, 999 (1970). 8. E. Bizzi, F. A. Mussa-Ivaldi, S. Giszter, Science 253, 287 (1991). 9. T. D. Sanger, Adv. Neural Info. Proc. Syst. 7, 1023 (1995). 10. J. W. Hurrell, Science 269, 676 (1995). 11. C. A. L. Bailer-Jones, M. Irwin, T. von Hippel, Mon. Not. R. Astron. Soc. 298, 361 (1997). 12. P. Menozzi, A. Piazza, L. Cavalli-Sforza, Science 201, 786 (1978). 13. K. V. Mardia, J. T. Kent, J. M. Bibby, Multivariate Analysis, (Academic Press, London, 1979). 14. A. H. Monahan, J. Clim., in press. 15. The scale-invariant K parameter is typically easier to set than e, but may yield misleading results when the local dimensionality varies across the data set. When available, additional constraints such as the temporal ordering of observations may also help to determine neighbors. In earlier work (36) we explored a more complex method (37), which required an order of magnitude more data and did not support the theo- retical performance guarantees we provide here for e- and K-Isomap. 16. This procedure, known as FloydÕs algorithm, requires O(N3) operations. More efÞcient algorithms exploit- ing the sparse structure of the neighborhood graph can be found in (38). 17. The operator t is deÞned by t(D) 5 2HSH/2, where S is the matrix of squared distances {Sij 5 Di j 2}, and H is the Òcentering matrixÓ {Hij 5 dij 2 1/N} (13). 18. Our proof works by showing that for a sufÞciently high density (a) of data points, we can always choose a neighborhood size (e or K) large enough that the graph will (with high probability) have a path not much longer than the true geodesic, but small enough to prevent edges that Òshort circuitÓ the true geometry of the manifold. More precisely, given ar- bitrarily small values of l1, l2, and m, we can guar- antee that with probability at least 1 2 m, estimat
本文档为【A_global_geometric_framework_for_nonlinear_dim_reduction】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_733847
暂无简介~
格式:pdf
大小:627KB
软件:PDF阅读器
页数:5
分类:互联网
上传时间:2011-10-22
浏览量:15