LETTER
doi:10.1038/nature11258
Reconstructing Native American population history
David Reich1,2, Nick Patterson2, Desmond Campbell3,4, Arti Tandon1,2, Ste´phane Mazieres3,5, Nicolas Ray6, Maria V. Parra3,7,
Winston Rojas3,7, ConstanzaDuque3,7, NataliaMesa3,7, Luis F. Garcı´a7, Omar Triana7, Silvia Blair7, AmandaMaestre7, Juan C. Dib8,
Claudio M. Bravi3,9, Graciela Bailliet9, Daniel Corach10, Ta´bita Hu¨nemeier3,11, Maria Ca´tira Bortolini11, Francisco M. Salzano11,
Marı´a Luiza Petzl-Erler12, Victor Acun˜a-Alonzo13, Carlos Aguilar-Salinas14, Samuel Canizales-Quinteros15,16, Teresa Tusie´-Luna15,
Laura Riba15, Maricela Rodrı´guez-Cruz17, Mardia Lopez-Alarco´n17, Ramo´n Coral-Vazquez18, Thelma Canto-Cetina19,
Irma Silva-Zolezzi20{, Juan Carlos Fernandez-Lopez20, Alejandra V. Contreras20, Gerardo Jimenez-Sanchez20{,
Maria Jose´ Go´mez-Va´zquez21, Julio Molina22, A´ngel Carracedo23, Antonio Salas23, Carla Gallo24, Giovanni Poletti24,
David B.Witonsky25, Gorka Alkorta-Aranburu25, Rem I. Sukernik26, Ludmila Osipova27, Sardana A. Fedorova28, Rene´ Vasquez29,
Mercedes Villena29, Claudia Moreau30, Ramiro Barrantes31, David Pauls32, Laurent Excoffier33,34, Gabriel Bedoya7,
Francisco Rothhammer35, Jean-Michel Dugoujon36, Georges Larrouy36, William Klitz37, Damian Labuda30, Judith Kidd38,
Kenneth Kidd38, Anna Di Rienzo25, Nelson B. Freimer39, Alkes L. Price2,40 & Andre´s Ruiz-Linares3
The peopling of the Americas has been the subject of extensive
genetic, archaeological and linguistic research; however, central
questions remain unresolved1–5. One contentious issue is whether
the settlement occurred by means of a single6–8 migration or
multiple streams of migration from Siberia9–15. The pattern of
dispersals within the Americas is also poorly understood. To
address these questions at a higher resolution than was previously
possible, we assembled data from 52 Native American and
17 Siberian groups genotyped at 364,470 single nucleotide
polymorphisms. Here we show that Native Americans descend
from at least three streams of Asian gene flow. Most descend
entirely from a single ancestral population that we call ‘First
American’. However, speakers of Eskimo–Aleut languages from
the Arctic inherit almost half their ancestry from a second stream
of Asian gene flow, and the Na-Dene-speaking Chipewyan from
Canada inherit roughly one-tenth of their ancestry from a third
stream. We show that the initial peopling followed a southward
expansion facilitated by the coast, with sequential population splits
and little gene flow after divergence, especially in SouthAmerica. A
major exception is in Chibchan speakers on both sides of the
Panama isthmus, who have ancestry from both North and South
America.
The settlement of the Americas occurred at least 15,000 years ago
through Beringia, a land bridge betweenAsia andAmerica that existed
during the ice ages1–5. Most analyses of Native American genetic
diversity have examined single loci, particularly mitochondrial DNA
or the Y chromosome, and some interpretations of these data model
the settlement of America as a single migratory wave fromAsia6–8. We
assembled native population samples fromCanada to the southern tip
of SouthAmerica, genotyped themon single nucleotidepolymorphism
(SNP) microarrays, and merged our data with six other data sets. The
combined data set consists of 364,470 SNPs genotyped in 52 Native
American populations (493 samples; Fig. 1a and Supplementary
Table 1), 17 Siberian populations (245 samples; Supplementary Fig. 1
and Supplementary Table 2) and 57 other populations (1,613 samples)
(Supplementary Notes).
A complication in studying Native American genetic history is
admixture with European and African immigrants since 1492. Cluster
analysis16 shows that many of the samples we examined have some
non-native admixture (an average of 8.5%; Fig. 1b and Supplementary
Tables 1 and 3). This admixture is a challenge for learning about the
historical relationships among the populations, and to address this
complication we used three independent approaches. First, we
restricted analyses to 163 Native Americans from 34 populations
without evidence of admixture (Supplementary Notes). Second, we
subtracted the expected contribution of European and African
ancestry to the statistics we used to learn about population relation-
ships (Supplementary Notes). Third, we inferred the probability of
non-native ancestry at each genomic segment and ‘masked’ segments
with more than a negligible probability of this ancestry (Fig. 1b,
1Department of Genetics, HarvardMedical School, Boston, Massachusetts 02115, USA. 2Broad Institute of Harvard and theMassachusetts Institute of Technology, Cambridge,Massachusetts 02142, USA.
3Department of Genetics, Evolution and Environment, University College LondonWC1E6BT, UK. 4Department of Psychiatry andCentre for Genomic Sciences, TheUniversity of HongKong, Pokfulam,Hong
Kong SAR. 5Anthropologie Bio-culturelle, Droit, Ethique et Sante´ (ADES), UMR7268, Aix-Marseille Universite´/CNRS/EFS,Marseille 13344, France. 6Institute for Environmental Sciences, and Forel Institute,
University of Geneva, Geneva 1227, Switzerland. 7Universidad de Antioquia, Medellı´n, Colombia. 8Fundacio´n Salud para el Tro´pico, SantaMarta, Colombia. 9Instituto Multidisciplinario de Biologı´a Celular
(CCT La Plata-CONICET, CICPCA), 1900 La Plata, Argentina. 10Servicio de Huellas Digitales Gene´ticas and CONICET, Universidad de Buenos Aires, Argentina. 11Departamento de Gene´tica, Instituto de
Biocieˆncias, Universidade Federal do Rio Grande do Sul, Porto Alegre 91501-970, Brazil. 12Departamento de Gene´tica, Universidade Federal do Parana´, Curitiba 81531-980, Brazil. 13National Institute of
Anthropology andHistory, Me´xico City 06100,Me´xico. 14Departamento de Endocrinologı´a yMetabolismo, InstitutoNacional de CienciasMe´dicas y Nutricio´n Salvador Zubira´n, Me´xico City 14100,Me´xico.
15Unidad de Biologı´a Molecular y Medicina Geno´mica, Instituto Nacional de Ciencias Me´dicas y Nutricio´n Salvador Zubira´n/Universidad Nacional Auto´noma de Me´xico, Me´xico City 14000, Me´xico.
16DepartamentodeBiologı´a, Facultad deQuı´mica, UniversidadNacional Auto´nomadeMe´xico,Me´xico City 04510,Me´xico. 17Unidadde Investigacio´nMe´dica enNutricio´n, Hospital de Pediatrı´a, CMNSXXI,
Instituto Mexicano del Seguro Social, Me´xico City 06720, Me´xico. 18Seccio´n de Posgrado, Escuela Superior de Medicina del Instituto Polite´cnico Nacional, Me´xico City 11340, Me´xico. 19Laboratorio de
Biologı´a de la Reproduccio´n, Departamento de Salud Reproductiva y Gene´tica, Centro de Investigaciones Regionales, Me´rida Yucata´n 97000, Me´xico. 20Instituto Nacional de Medicina Geno´mica, Me´xico
City 14610, Me´xico. 21Universidad Auto´noma de Nuevo Leo´n, San Nicola´s de los Garza, Nuevo Leo´n 66451, Me´xico. 22Centro de Investigaciones Biome´dicas de Guatemala, Ciudad de Guatemala,
Guatemala. 23Instituto de Ciencias Forenses, Universidade de Santiago de Compostela, Fundacio´n de Medicina Xeno´mica (SERGAS), CIBERER, Santiago de Compostela, Galicia 15782, Spain.
24Laboratorios de Investigacio´n y Desarrollo, Facultad de Ciencias y Filosofı´a, Universidad Peruana Cayetano Heredia, Lima 15102, Peru. 25Department of HumanGenetics, University of Chicago, Chicago
60637, USA. 26Laboratory of Human Molecular Genetics, Institute of Molecular and Cellular Biology, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090, Russia. 27Institute of
Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090, Russia. 28Department of Molecular Genetics, Yakut Research Center of ComplexMedical Problems and
North-East Federal University, Yakutsk, Sakha (Yakutia) 677010, Russia. 29Instituto Boliviano de Biologı´a de la Altura, Universidad Autonoma Toma´s Frı´as, Potosı´, Bolivia. 30De´partement de Pe´diatrie,
Centre de Recherche du CHU Sainte-Justine, Universite´ de Montre´al, Montre´al, Quebec H3T 1C5, Canada. 31Escuela de Biologı´a, Universidad de Costa Rica, San Jose´, Costa Rica. 32Center for Human
Genetic Research, Massachusetts General Hospital, HarvardMedical School, Boston, Massachusetts 02114, USA. 33Computational andMolecular Population Genetics Laboratory, Institute of Ecology and
Evolution, University of Bern, 3012 Bern, Switzerland. 34Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland. 35Instituto de Alta Investigacio´n, Universidad de Tarapaca´, Programa de Gene´tica
Humana ICBMFacultaddeMedicinaUniversidaddeChile andCentro de InvestigacionesdelHombre en elDesierto, Arica1001236,Chile. 36AnthropologieMole´culaire et Imagerie deSynthe`se, CNRSUMR
5288,Universite´ Paul Sabatier Toulouse III, Toulouse 31000, France. 37School of PublicHealth, University of California, Berkeley, California 94720, USA. 38Department of Genetics, Yale University School of
Medicine, NewHaven, Connecticut 06520, USA. 39Center for Neurobehavioral Genetics, Semel Institute for Neuroscience andHuman Behavior, University of California Los Angeles, Los Angeles, California
90095, USA. 40Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115, USA. {Present addresses: BioAnalytical Science Department Nestec Ltd,
Nestle´ Research Center, 1000 Lausanne, Switzerland (I.S.-Z.); Global Biotech Consulting Group, Me´xico City 09010, Me´xico (G.J.-S.).
0 0 M O N T H 2 0 1 2 | V O L 0 0 0 | N A T U R E | 1
Macmillan Publishers Limited. All rights reserved©2012
Supplementary Notes and Supplementary Fig. 2). Our inferences from
these three approaches are concordant (Supplementary Figs 3 and 4).
We built a tree (Fig. 1c) using Fst distances between pairs of popula-
tions, which broadly agrees with geography and linguistic categories17
(trees based on masked and unmasked data were similar; Supplemen-
tary Fig. 3). An early split separates Asians fromNativeAmericans and
extreme northeastern Siberians (Chukchi, Naukan, Koryak), which is
consistent with studies that have identified pan-American variants
sharedwith some northeastern Siberians6,7,10,18. Eskimo–Aleut speakers
and far-northeastern Siberians form a cluster that is separated from
other Native American populations by a long internal branch. Within
America the tree shows a series of splits in an approximate north–south
sequence beginning with the Arctic, followed by northern North
America, northern/central and southern Mexico and lower Central
America/Colombia, and ending in three South American clusters
(theAndes, theChaco region and eastern SouthAmerica). This pattern
of splits is consistent with a north–south population expansion, an
inference that is also supported by the negative correlation between
heterozygosity and distance from the Bering Strait (r520.48,
P5 0.007). This correlation increases if we use ‘least cost distances’
that consider the coasts as facilitators of migration19–21, and persists if
we exclude four Native North American populations with ancestry
from later streams of Asian gene flow (Supplementary Notes and
Supplementary Fig. 5).
Trees provide a simplified model of history that does not accom-
modate the possibility of gene flow after population separation.
Circumstantial evidence that some Native American populations
may not fit a simple tree comes from cluster analysis, which infers
Siberian-related ancestry in some northernNorthAmericans (Fig. 1b),
and from single-locus studies that have identified genetic variants
shared between Eurasia and North America that are absent from
SouthAmerica11,22,23. The advent of genome-wide data sets has allowed
the development of a formal four-population test for whether sets of
four populations are consistent with a tree. This test is robust to the
Eskimo–Aleut
Na-Dene
Northern Amerind
Central Amerind
Chibchan–Paezan
Equatorial–Tucanoan
Ge–Pano–Carib
Andean
Linguistic families
Toba
Guahibo
Surui
Ticuna Arara
Jamamadi
Huilliche
Chono
Bribri
Algonquin
Chane
Wichi
Chipewyan
Cree
Kaqchikel
Maleku
Arhuaco
Wayuu
Karitiana
Embera
Piapoco
Ojibwa
Maya
Mixe
Inga
Aymara
Diaguita
Guarani
Kaingang
Chilote
Yaghan
Yaqui
Mixtec
Chorotega
Zapotec
Huetar
Guaymi
KogiPurepecha
Waunana
Palikur
Parakana
Teribe
Cabecar
East
Greenland Inuit
Pima
Quechua
West
Greenland
Inuit
Aleutian
Tepehuano
0.01 in Fst units
Uralic–Yukaghir
Altaic
Chukchi–Kamchatkan
Eskimo–Aleut
Na-Dene
Northern Amerind
Central Amerind
Chibchan–Paezan
Equatorial–Tucanoan
Ge–Pano–Carib
Andean
Isolate
Linguistic families
Arhuaco (5)
Kogi (4)
Maleku (3)
Huetar (1)
Guaymi (5)
Teribe (3)
Bribri (4)
Cabecar (31)
Surui (24)
Karitiana (13)
Parakana (1)
Arara (1)
Jamamadi (1)
Ticuna (6)
Inga (9)
Piapoco (7)
Guahibo (6)
Wichi (5)
Toba (4)
Chane (2)
Guarani (6)
Kaingang (2)
Quechua (40)
Aymara (23)
Diaguita (5)
Hulliche (4)
Chono (4)
Chilote (8)
Yaghan (4)
Yaqui (1)
Pima (33)
Tepehuano (25)
Purepecha (1)
Zapotec1 (22)
Zapotec2 (21)
Mixe (17)
Mixtec (5)
Maya2 (12)
Maya1 (37)
Kaqchikel (13)
Chorotega (1)
Embera (5)
Waunana (3)
Wayuu (11)
Yoruba (25)
French (29)
Papuan (17)
Japanese (31)
Cambodian (11)
Han (34)
Yi (10)
Khanty (35)
Ket (2)
Selkup (9)
Yukaghir (13)
Tundra Nentsi (3)
Nganasan1 (8)
Nganasan2 (14)
Dolgan (4)
Evenki (15)
Yakut (34)
Buryat (17)
Mongolian (8)
Altaian (12)
Tuvinians (15)
Chukchi (30)
Koryak (10)
Naukan (16)
East Greenland Inuit (7)
West Greenland Inuit (8)
Aleutian (8)
Chipewyan (15)
Algonquin (5)
Cree (4)
Ojibwa (5)
Palikur (3)
Sub-Saharan African West Eurasian
C
hukchi–
K
am
chatkan
Andean
E
q
uatorial–
Tucanoan
C
hib
chan–
P
aezan
G
e–P
ano–C
arib
Central
Amerind
Northern
Amerind
N
a-D
ene
E
skim
o–A
leut
U
ralic–Y
ukaghir
b
a
c
Figure 1 | Geographic, linguistic and genetic overview of 52 Native
American populations. a, Sampling locations of the populations, with colours
corresponding to linguistic groups. b, Cluster-based analysis (k5 4) using
ADMIXTURE shows evidence of some West-Eurasian-related and sub-
Saharan-African-related ancestry in many Native Americans before masking
(top), but little afterwards (bottom). Thick vertical lines denotemajor linguistic
groupings, and thin vertical lines separate individual populations.
c, Neighbour-joining tree based on Fst distances relating Native American to
selected non-American populations (sample sizes in parentheses). Native
American and Siberian data were analysed after masking, but consistent trees
were obtained on a subset of completely unadmixed samples (Supplementary
Fig. 3). Some populations have evidence for substructure, and we represent
these as two different groups (for example Maya1 and Maya2).
RESEARCH LETTER
2 | N A T U R E | V O L 0 0 0 | 0 0 M O N T H 2 0 1 2
Macmillan Publishers Limited. All rights reserved©2012
ascertainment bias affecting SNP arrays24. For each of the 52 Native
American populations in turn, we tested the hypothesis that they
conform to the tree: ((test population, southern Native American),
(outgroup1, outgroup2)) for 45 pairs of ten Asian outgroups. We used
a Hotelling T-test to evaluate whether all four-population test f4
statistics of this form are consistent with the expectation of zero
(Supplementary Notes). The test is not significant for 47 populations,
which is consistent with their stemming from the same, presumably
first, wave of American settlement; we call this ancestry ‘First
American’ (Table 1). In contrast, four populations from northern
North America show highly significant evidence of ancestry from
additional streams of gene flow from Asia, subsequent to the initial
peopling of America, which we confirm through the Hotelling T-test
and a complementary test (Supplementary Notes): East Greenland
Inuit (P, 1029), West Greenland Inuit (P, 1029), Aleutian
Islanders (P5 93 1025) and Chipewyan (P, 1029). The recently
sequenced genome of a 4,000-year-old Saqqaq Palaeo-Eskimo from
Greenland25 also has evidence of ancestry that is distinct from more
southern Native Americans (P5 23 1029) (Supplementary Notes).
Examination of the values of the f4 statistics allows us to infer the
minimum number of gene flow events from Asia into America con-
sistent with the data. Each stream of gene flow is expected to produce a
distinct vector of f4 statistics, constituting a ‘signature’ of how the
ancestral migrating population relates to present-day Asian popula-
tions. By finding the minimum number of vectors whose linear com-
binations are necessary to produce the vector observed in each
population, we infer that a minimum of three gene flow events from
Asia are necessary to explain the data from all Native American popu-
lations jointly, including the Saqqaq Palaeo-Eskimo (Supplementary
Notes). These three episodes correspond to First American ancestry
(distributed throughout the Americas) and to two additional streams
of gene flow detected in a subset of northern North Americans
(East Greenland Inuit, West Greenland Inuit, Aleutian Islanders,
Chipewyan and Saqqaq). Table 1 shows that f4 statistics in the Inuit
and Aleutian islanders are consistent with deriving the non-First-
American portions of their ancestry from the same later stream of
Asian gene flow, providing support for deep shared ancestry between
these linguistically linked groups12,26. The Na-Dene-speaking
Chipewyan have a different pattern of f4 statistics from Eskimo–
Aleut speakers, implying that they descend at least in part from a
separate stream of Asian gene flow (P, 1029 for comparisons with
the Greenland Inuit; Table 1). This is consistent with the hypothesis
that Na-Dene languages mark a distinct migration from Asia9,17.
Because we only have data from one Na-Dene-speaking group, an
important direction for future work will be to test whether the distinct
Asian ancestry that we detect in the Chipewyan is a shared signature
throughout Na-Dene speakers. Finally, the Saqqaq25 have a vector of f4
statistics consistent with that in the Chipewyan, raising the possibility
that the Saqqaq and Chipewyan both carry genetic material from the
same later stream of Asian gene flow into the Americas, postdating the
First American migration (Supplementary Notes).
To develop an explicit model for the settlement of the Americas, we
used the admixture graph (AG) framework24. AGs are generalizations
of trees that accommodate the possibility of a limited number of
unidirectional gene flow events. They are powerful tools for learning
about history because they make predictions about the values of
f-statistics (such as f4) that can be used to test the fit of a proposed
model24 (Supplementary Notes). Figure 2 presents an AG relating
selected Native American and Old World populations that is a good
fit to the data in the sense that none of the f-statistics predicted by the
Table 1 | Native Americans descend from at least three streams of Asian gene flow
Population groupings tested P value for this many Asian streams being enough to explain the data Minimum number of streams of Asian
gene flow needed to explain the data
1 2 3
East Greenland Inuit/West Greenland Inuit/First American ,1029 0.64 1 2
East Greenland Inuit/Aleutian/First American ,1029 0.57 1 2
West Greenland Inuit/Aleutian/First American ,1029 0.41 1 2
Chipewyan/East Greenland Inuit/First American ,1029 0.02 1 3
Chipewyan/West Greenland Inuit/First American ,1029 0.006 1 3
Chipewyan/Aleutian/First American ,1029 0.03 1 3
Saqqaq/East Greenland Inuit/First American ,1029 6 31026 1 3
Saqqaq/West Greenland Inuit/First American ,1029 2 31026 1 3
Saqqaq/Aleutian/First American ,1029 0.17 1 2
Saqqaq/Chipewyan/First American ,1029 0.29 1 2
Saqqaq/Eskimo–Aleut/Chipewyan/First American ,1029 8 31026 0.27 3
Weuse themethoddescribed in SupplementaryNotes to test formallywhether specified groupings ofNative Americanpopulations are consistentwithdescending fromone, twoor three streamsof gene flow from
Asia. We use ‘First American’ to refer to a pool of 43 populations from
本文档为【Reich2012 Reconstructing Native American population history】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。