23; right 36, 13, and 27); superior frontal gyrus (left
29, 31, and 45; right 17, 35, and 37).
17. Although the improvement in WM performance with
cholinergic enhancement was a nonsigniÞcant trend
in the current study (P 5 0.07), in a previous study
(9) with a larger sample (n 5 13) the effect was
highly signiÞcant (P , 0.001). In the current study,
we analyzed RT data for six of our seven subjects
because the behavioral data for one subject were
unavailable due to a computer failure. The difference
in the signiÞcance of the two Þndings is simply a
result of the difference in sample sizes. A power
analysis shows that the size of the RT difference and
variability in the current sample would yield a signif-
icant result (P 5 0.01) with a sample size of 13.
During the memory trials, mean RT was 1180 ms
during placebo and 1119 ms during physostigmine.
During the control trials, mean RT was 735 ms during
placebo and 709 ms during physostigmine, a differ-
ence that did not approach signiÞcance (P 5 0.24),
suggesting that the effect of cholinergic enhance-
ment on WM performance is not due to a nonspeciÞc
increase in arousal.
18. Matched-pair t tests (two-tailed) were used to test
the signiÞcance of drug-related changes in the vol-
ume of regions of interest that showed signiÞcant
response contrasts.
19. H. Sato, Y. Hata, H. Masui, T. Tsumoto, J. Neuro-
physiol. 55, 765 (1987).
20. M. E. Hasselmo, Behav. Brain Res. 67, 1 (1995).
21. M. G. Baxter, A. A. Chiba, Curr. Opin. Neurobiol. 9,
178 (1999).
22. B. J. Everitt, T. W. Robbins, Annu. Rev. Psychol. 48,
649 (1997).
23. R. Desimone, J. Duncan, Annu. Rev. Neurosci. 18, 193
(1995).
24. P. C. Murphy, A. M. Sillito, Neuroscience 40, 13
(1991).
25. M. Corbetta, F. M. Miezin, S. Dobmeyer, G. L. Shul-
man, S. E. Peterson, J. Neurosci. 11, 2383 (1991).
26. J. V. Haxby et al., J. Neurosci. 14, 6336 (1994).
27. A. Rosier, L. Cornette, G. A. Orban, Neuropsychobiol-
ogy 37, 98 (1998).
28. M. E. Hasselmo, B. P. Wyble, G. V. Wallenstein, Hip-
pocampus 6, 693 (1996).
29. S. P. Mewaldt, M. M. Ghoneim, Pharmacol. Biochem.
Behav. 10, 1205 (1979).
30. M. Petrides, Philos. Trans. R. Soc. London Ser. B 351,
1455 (1996).
31. M. E. Hasselmo, E. Fransen, C. Dickson, A. A. Alonso,
Ann. N.Y. Acad. Sci. 911, 418 (2000).
32. M. M. Mesulam, Prog. Brain Res. 109, 285 (1996).
33. R. T. Bartus, R. L. Dean III, B. Beer, A. S. Lippa, Science
217, 408 (1985).
34. N. Qizilbash et al., JAMA 280, 1777 (1998).
35. J. V. Haxby, J. Ma. Maisog, S. M. Courtney, in Mapping
and Modeling the Human Brain, P. Fox, J. Lancaster, K.
Friston, Eds. ( Wiley, New York, in press).
36. We express our appreciation to S. Courtney, R. Desi-
mone, Y. Jiang, S. Kastner, L. Latour, A. Martin, L.
Pessoa, and L. Ungerleider for careful and critical
review of the manuscript. We also thank M. B. Scha-
piro and S. I. Rapoport for input during early stages of
this project. This research was supported by the
National Institute on Mental Health and National
Institute on Aging Intramural Research Programs.
7 August 2000; accepted 15 November 2000
A Global Geometric Framework
for Nonlinear Dimensionality
Reduction
Joshua B. Tenenbaum,1* Vin de Silva,2 John C. Langford3
Scientists working with large volumes of high-dimensional data, such as global
climate patterns, stellar spectra, or human gene distributions, regularly con-
front the problem of dimensionality reduction: finding meaningful low-dimen-
sional structures hidden in their high-dimensional observations. The human
brain confronts the same problem in everyday perception, extracting from its
high-dimensional sensory inputs—30,000 auditory nerve fibers or 106 optic
nerve fibers—a manageably small number of perceptually relevant features.
Here we describe an approach to solving dimensionality reduction problems
that uses easily measured local metric information to learn the underlying
global geometry of a data set. Unlike classical techniques such as principal
component analysis (PCA) and multidimensional scaling (MDS), our approach
is capable of discovering the nonlinear degrees of freedom that underlie com-
plex natural observations, such as human handwriting or images of a face under
different viewing conditions. In contrast to previous algorithms for nonlinear
dimensionality reduction, ours efficiently computes a globally optimal solution,
and, for an important class of data manifolds, is guaranteed to converge
asymptotically to the true structure.
A canonical problem in dimensionality re-
duction from the domain of visual perception
is illustrated in Fig. 1A. The input consists of
many images of a person’s face observed
under different pose and lighting conditions,
in no particular order. These images can be
thought of as points in a high-dimensional
vector space, with each input dimension cor-
responding to the brightness of one pixel in
the image or the firing rate of one retinal
ganglion cell. Although the input dimension-
ality may be quite high (e.g., 4096 for these
64 pixel by 64 pixel images), the perceptually
meaningful structure of these images has
many fewer independent degrees of freedom.
Within the 4096-dimensional input space, all
of the images lie on an intrinsically three-
dimensional manifold, or constraint surface,
that can be parameterized by two pose vari-
ables plus an azimuthal lighting angle. Our
goal is to discover, given only the unordered
high-dimensional inputs, low-dimensional
representations such as Fig. 1A with coordi-
nates that capture the intrinsic degrees of
freedom of a data set. This problem is of
central importance not only in studies of vi-
sion (1–5), but also in speech (6, 7), motor
control (8, 9), and a range of other physical
and biological sciences (10–12).
The classical techniques for dimensional-
ity reduction, PCA and MDS, are simple to
implement, efficiently computable, and guar-
anteed to discover the true structure of data
lying on or near a linear subspace of the
high-dimensional input space (13). PCA
finds a low-dimensional embedding of the
data points that best preserves their variance
as measured in the high-dimensional input
space. Classical MDS finds an embedding
that preserves the interpoint distances, equiv-
alent to PCA when those distances are Eu-
clidean. However, many data sets contain
essential nonlinear structures that are invisi-
ble to PCA and MDS (4, 5, 11, 14). For
example, both methods fail to detect the true
degrees of freedom of the face data set (Fig.
1A), or even its intrinsic three-dimensionality
(Fig. 2A).
Here we describe an approach that com-
bines the major algorithmic features of PCA
and MDS—computational efficiency, global
optimality, and asymptotic convergence guar-
antees—with the flexibility to learn a broad
class of nonlinear manifolds. Figure 3A illus-
trates the challenge of nonlinearity with data
lying on a two-dimensional “Swiss roll”: points
far apart on the underlying manifold, as mea-
sured by their geodesic, or shortest path, dis-
tances, may appear deceptively close in the
high-dimensional input space, as measured by
their straight-line Euclidean distance. Only the
geodesic distances reflect the true low-dimen-
sional geometry of the manifold, but PCA and
MDS effectively see just the Euclidean struc-
ture; thus, they fail to detect the intrinsic two-
dimensionality (Fig. 2B).
Our approach builds on classical MDS but
seeks to preserve the intrinsic geometry of the
data, as captured in the geodesic manifold
distances between all pairs of data points. The
crux is estimating the geodesic distance be-
tween faraway points, given only input-space
distances. For neighboring points, input-
space distance provides a good approxima-
1Department of Psychology and 2Department of
Mathematics, Stanford University, Stanford, CA
94305, USA. 3Department of Computer Science, Car-
negie Mellon University, Pittsburgh, PA 15217, USA.
*To whom correspondence should be addressed. E-
mail: jbt@psych.stanford.edu
R E P O R T S
www.sciencemag.org SCIENCE VOL 290 22 DECEMBER 2000 2319
tion to geodesic distance. For faraway points,
geodesic distance can be approximated by
adding up a sequence of “short hops” be-
tween neighboring points. These approxima-
tions are computed efficiently by finding
shortest paths in a graph with edges connect-
ing neighboring data points.
The complete isometric feature mapping,
or Isomap, algorithm has three steps, which
are detailed in Table 1. The first step deter-
mines which points are neighbors on the
manifold M, based on the distances dX (i, j)
between pairs of points i, j in the input space
X. Two simple methods are to connect each
point to all points within some fixed radius e,
or to all of its K nearest neighbors (15). These
neighborhood relations are represented as a
weighted graph G over the data points, with
edges of weight dX(i, j) between neighboring
points (Fig. 3B).
In its second step, Isomap estimates the
geodesic distances dM (i, j) between all pairs
of points on the manifold M by computing
their shortest path distances dG(i, j) in the
graph G. One simple algorithm (16) for find-
ing shortest paths is given in Table 1.
The final step applies classical MDS to
the matrix of graph distances DG 5 {dG(i, j)},
constructing an embedding of the data in a
d-dimensional Euclidean space Y that best
preserves the manifold’s estimated intrinsic
geometry (Fig. 3C). The coordinate vectors yi
for points in Y are chosen to minimize the
cost function
E 5 \t~DG! 2 t~DY!\L2 (1)
where DY denotes the matrix of Euclidean
distances {dY(i, j) 5 \yi 2 yj\} and \A\L2
the L2 matrix norm =Si, j Ai j
2 . The t operator
Fig. 1. (A) A canonical dimensionality reduction
problem from visual perception. The input consists
of a sequence of 4096-dimensional vectors, rep-
resenting the brightness values of 64 pixel by 64
pixel images of a face rendered with different
poses and lighting directions. Applied to N 5 698
raw images, Isomap (K 5 6) learns a three-dimen-
sional embedding of the dataÕs intrinsic geometric
structure. A two-dimensional projection is shown,
with a sample of the original input images (red
circles) superimposed on all the data points (blue)
and horizontal sliders (under the images) repre-
senting the third dimension. Each coordinate axis
of the embedding correlates highly with one de-
gree of freedom underlying the original data: left-
right pose (x axis, R 5 0.99), up-down pose ( y
axis, R 5 0.90), and lighting direction (slider posi-
tion, R 5 0.92). The input-space distances dX(i, j )
given to Isomap were Euclidean distances be-
tween the 4096-dimensional image vectors. (B)
Isomap applied to N 5 1000 handwritten Ò2Ós
from the MNIST database (40). The two most
signiÞcant dimensions in the Isomap embedding,
shown here, articulate the major features of the
Ò2Ó: bottom loop (x axis) and top arch ( y axis).
Input-space distances dX(i, j ) were measured by
tangent distance, a metric designed to capture the
invariances relevant in handwriting recognition
(41). Here we used e-Isomap (with e 5 4.2) be-
cause we did not expect a constant dimensionality
to hold over the whole data set; consistent with
this, Isomap Þnds several tendrils projecting from
the higher dimensional mass of data and repre-
senting successive exaggerations of an extra
stroke or ornament in the digit.
R E P O R T S
22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org2320
converts distances to inner products (17),
which uniquely characterize the geometry of
the data in a form that supports efficient
optimization. The global minimum of Eq. 1 is
achieved by setting the coordinates yi to the
top d eigenvectors of the matrix t(DG) (13).
As with PCA or MDS, the true dimen-
sionality of the data can be estimated from
the decrease in error as the dimensionality of
Y is increased. For the Swiss roll, where
classical methods fail, the residual variance
of Isomap correctly bottoms out at d 5 2
(Fig. 2B).
Just as PCA and MDS are guaranteed,
given sufficient data, to recover the true
structure of linear manifolds, Isomap is guar-
anteed asymptotically to recover the true di-
mensionality and geometric structure of a
strictly larger class of nonlinear manifolds.
Like the Swiss roll, these are manifolds
whose intrinsic geometry is that of a convex
region of Euclidean space, but whose ambi-
ent geometry in the high-dimensional input
space may be highly folded, twisted, or
curved. For non-Euclidean manifolds, such as
a hemisphere or the surface of a doughnut,
Isomap still produces a globally optimal low-
dimensional Euclidean representation, as
measured by Eq. 1.
These guarantees of asymptotic conver-
gence rest on a proof that as the number of
data points increases, the graph distances
dG(i, j) provide increasingly better approxi-
mations to the intrinsic geodesic distances
dM(i, j), becoming arbitrarily accurate in the
limit of infinite data (18, 19). How quickly
dG(i, j) converges to dM(i, j) depends on cer-
tain parameters of the manifold as it lies
within the high-dimensional space (radius of
curvature and branch separation) and on the
density of points. To the extent that a data set
presents extreme values of these parameters
or deviates from a uniform density, asymp-
totic convergence still holds in general, but
the sample size required to estimate geodes-
ic distance accurately may be impractically
large.
Isomap’s global coordinates provide a
simple way to analyze and manipulate high-
dimensional observations in terms of their
intrinsic nonlinear degrees of freedom. For a
set of synthetic face images, known to have
three degrees of freedom, Isomap correctly
detects the dimensionality (Fig. 2A) and sep-
arates out the true underlying factors (Fig.
1A). The algorithm also recovers the known
low-dimensional structure of a set of noisy
real images, generated by a human hand vary-
ing in finger extension and wrist rotation
(Fig. 2C) (20). Given a more complex data
set of handwritten digits, which does not have
a clear manifold geometry, Isomap still finds
globally meaningful coordinates (Fig. 1B)
and nonlinear structure that PCA or MDS do
not detect (Fig. 2D). For all three data sets,
the natural appearance of linear interpolations
between distant points in the low-dimension-
al coordinate space confirms that Isomap has
captured the data’s perceptually relevant
structure (Fig. 4).
Previous attempts to extend PCA and
MDS to nonlinear data sets fall into two
broad classes, each of which suffers from
limitations overcome by our approach. Local
linear techniques (21–23) are not designed to
represent the global structure of a data set
within a single coordinate system, as we do in
Fig. 1. Nonlinear techniques based on greedy
optimization procedures (24–30) attempt to
discover global structure, but lack the crucial
algorithmic features that Isomap inherits
from PCA and MDS: a noniterative, polyno-
mial time procedure with a guarantee of glob-
al optimality; for intrinsically Euclidean man-
Fig. 2. The residual
variance of PCA (open
triangles), MDS [open
triangles in (A) through
(C); open circles in (D)],
and Isomap (Þlled cir-
cles) on four data sets
(42). (A) Face images
varying in pose and il-
lumination (Fig. 1A).
(B) Swiss roll data (Fig.
3). (C) Hand images
varying in Þnger exten-
sion and wrist rotation
(20). (D) Handwritten
Ò2Ós (Fig. 1B). In all cas-
es, residual variance de-
creases as the dimen-
sionality d is increased.
The intrinsic dimen-
sionality of the data
can be estimated by
looking for the ÒelbowÓ
at which this curve ceases to decrease signiÞcantly with added dimensions. Arrows mark the true or
approximate dimensionality, when known. Note the tendency of PCA and MDS to overestimate the
dimensionality, in contrast to Isomap.
Fig. 3. The ÒSwiss rollÓ data set, illustrating how Isomap exploits geodesic
paths for nonlinear dimensionality reduction. (A) For two arbitrary points
(circled) on a nonlinear manifold, their Euclidean distance in the high-
dimensional input space (length of dashed line) may not accurately
reßect their intrinsic similarity, as measured by geodesic distance along
the low-dimensional manifold (length of solid curve). (B) The neighbor-
hood graph G constructed in step one of Isomap (with K 5 7 and N 5
1000 data points) allows an approximation (red segments) to the true
geodesic path to be computed efÞciently in step two, as the shortest
path in G. (C) The two-dimensional embedding recovered by Isomap in
step three, which best preserves the shortest path distances in the
neighborhood graph (overlaid). Straight lines in the embedding (blue)
now represent simpler and cleaner approximations to the true geodesic
paths than do the corresponding graph paths (red).
R E P O R T S
www.sciencemag.org SCIENCE VOL 290 22 DECEMBER 2000 2321
ifolds, a guarantee of asymptotic conver-
gence to the true structure; and the ability to
discover manifolds of arbitrary dimensional-
ity, rather than requiring a fixed d initialized
from the beginning or computational resourc-
es that increase exponentially in d.
Here we have demonstrated Isomap’s per-
formance on data sets chosen for their visu-
ally compelling structures, but the technique
may be applied wherever nonlinear geometry
complicates the use of PCA or MDS. Isomap
complements, and may be combined with,
linear extensions of PCA based on higher
order statistics, such as independent compo-
nent analysis (31, 32). It may also lead to a
better understanding of how the brain comes
to represent the dynamic appearance of ob-
jects, where psychophysical studies of appar-
ent motion (33, 34) suggest a central role for
geodesic transformations on nonlinear mani-
folds (35) much like those studied here.
References and Notes
1. M. P. Young, S. Yamane, Science 256, 1327 (1992).
2. R. N. Shepard, Science 210, 390 (1980).
3. M. Turk, A. Pentland, J. Cogn. Neurosci. 3, 71 (1991).
4. H. Murase, S. K. Nayar, Int. J. Comp. Vision 14, 5
(1995).
5. J. W. McClurkin, L. M. Optican, B. J. Richmond, T. J.
Gawne, Science 253, 675 (1991).
6. J. L. Elman, D. Zipser, J. Acoust. Soc. Am. 83, 1615
(1988).
7. W. Klein, R. Plomp, L. C. W. Pols, J. Acoust. Soc. Am.
48, 999 (1970).
8. E. Bizzi, F. A. Mussa-Ivaldi, S. Giszter, Science 253, 287
(1991).
9. T. D. Sanger, Adv. Neural Info. Proc. Syst. 7, 1023
(1995).
10. J. W. Hurrell, Science 269, 676 (1995).
11. C. A. L. Bailer-Jones, M. Irwin, T. von Hippel, Mon.
Not. R. Astron. Soc. 298, 361 (1997).
12. P. Menozzi, A. Piazza, L. Cavalli-Sforza, Science 201,
786 (1978).
13. K. V. Mardia, J. T. Kent, J. M. Bibby, Multivariate
Analysis, (Academic Press, London, 1979).
14. A. H. Monahan, J. Clim., in press.
15. The scale-invariant K parameter is typically easier to
set than e, but may yield misleading results when the
local dimensionality varies across the data set. When
available, additional constraints such as the temporal
ordering of observations may also help to determine
neighbors. In earlier work (36) we explored a more
complex method (37), which required an order of
magnitude more data and did not support the theo-
retical performance guarantees we provide here for
e- and K-Isomap.
16. This procedure, known as FloydÕs algorithm, requires
O(N3) operations. More efÞcient algorithms exploit-
ing the sparse structure of the neighborhood graph
can be found in (38).
17. The operator t is deÞned by t(D) 5 2HSH/2, where S
is the matrix of squared distances {Sij 5 Di j
2}, and H is
the Òcentering matrixÓ {Hij 5 dij 2 1/N} (13).
18. Our proof works by showing that for a sufÞciently
high density (a) of data points, we can always choose
a neighborhood size (e or K) large enough that the
graph will (with high probability) have a path not
much longer than the true geodesic, but small
enough to prevent edges that Òshort circuitÓ the true
geometry of the manifold. More precisely, given ar-
bitrarily small values of l1, l2, and m, we can guar-
antee that with probability at least 1 2 m, estimat
本文档为【A_global_geometric_framework_for_nonlinear_dim_reduction】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。