User:Carwil/Genetic Structure of Human Populations (2002 scientific article)

From Wikipedia, the free encyclopedia

In a 2002 scientific article, "Genetic Structure of Human Populations," Noah Rosenberg and six other genetics researchers…


In a 2005 paper, Rosenberg and his team acknowledged that findings of a study on human population structure are highly influenced by the way the study is designed.[1][2] They reported that the number of loci, the sample size, the geographic dispersion of the samples and assumptions about allele-frequency correlation all have an effect on the outcome of the study.


Clusters by Rosenberg et al. (2002, 2005)[edit]

A major finding of Rosenberg and colleagues (2002) was that when five clusters were generated by the program (specified as K=5), "clusters corresponded largely to major geographic regions." Specifically, the five clusters corresponded to Africa, Europe plus the Middle East plus Central and South Asia, East Asia, Oceania, and the Americas. The study also confirmed prior analyses by showing that, "Within-population differences among individuals account for 93 to 95% of genetic variation; differences among major groups constitute only 3 to 5%."

Human population structure can be inferred from multilocus DNA sequence data (Rosenberg et al. 2002, 2005). Individuals from 52 populations were examined at 993 DNA markers. This data was used to partition individuals into K = 2, 3, 4, 5, or 6 gene clusters. In this figure, the average fractional membership of individuals from each population is represented by horizontal bars partitioned into K colored segments.

Rosenberg and colleagues (2005) have argued, based on cluster analysis, that populations do not always vary continuously and a population's genetic structure is consistent if enough genetic markers (and subjects) are included. "Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus dataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions." They also wrote, regarding a model with five clusters corresponding to Africa, Eurasia (Europe, Middle East, and Central/South Asia), East Asia, Oceania, and the Americas: "For population pairs from the same cluster, as geographic distance increases, genetic distance increases in a linear manner, consistent with a clinal population structure. However, for pairs from different clusters, genetic distance is generally larger than that between intracluster pairs that have the same geographic distance. For example, genetic distances for population pairs with one population in Eurasia and the other in East Asia are greater than those for pairs at equivalent geographic distance within Eurasia or within East Asia. Loosely speaking, it is these small discontinuous jumps in genetic distance—across oceans, the Himalayas, and the Sahara—that provide the basis for the ability of STRUCTURE to identify clusters that correspond to geographic regions".[3]

Rosenberg stated that their findings "should not be taken as evidence of our support of any particular concept of biological race (...). Genetic differences among human populations derive mainly from gradations in allele frequencies rather than from distinctive 'diagnostic' genotypes."[4] The study's overall results confirmed that genetic difference within populations is between 93 and 95%. Only 5% of genetic variation is found between groups.[1]

Criticism[edit]

The Rosenberg study has been criticised on several grounds.

The existence of allelic clines and the observation that the bulk of human variation is continuously distributed, has led some scientists to conclude that any categorization schema attempting to partition that variation meaningfully will necessarily create artificial truncations. (Kittles & Weiss 2003). It is for this reason, Reanne Frank argues, that attempts to allocate individuals into ancestry groupings based on genetic information have yielded varying results that are highly dependent on methodological design.[5] Serre and Pääbo (2004) make a similar claim:

The absence of strong continental clustering in the human gene pool is of practical importance. It has recently been claimed that "the greatest genetic structure that exists in the human population occurs at the racial level" (Risch et al. 2002). Our results show that this is not the case, and we see no reason to assume that "races" represent any units of relevance for understanding human genetic history.

In a response to Serre and Pääbo (2004), Rosenberg et al. (2005) maintain that their clustering analysis is robust. Additionally, they agree with Serre and Pääbo that membership of multiple clusters can be interpreted as evidence for clinality (isolation by distance), though they also comment that this may also be due to admixture between neighbouring groups (small island model). Thirdly they comment that evidence of clusterdness is not evidence for any specific concepts of "biological race".[6]

Clustering does not particularly correspond to continental divisions. Depending on the parameters given to their analytical program, Rosenberg and Pritchard were able to construct between divisions of between 4 and 20 clusters of the genomes studied, although they excluded analysis with more than 6 clusters from their published article. Probability values for various cluster configurations varied widely, with the single most likely configuration coming with 16 clusters although other 16-cluster configurations had low probabilities. Overall, "there is no clear evidence that K=6 was the best estimate" according to geneticist Deborah Bolnick (2008:76-77).[7] The number of genetic clusters used in the study was arbitrarily chosen. Although the original research used different number of clusters, the published study emphasized six genetic clusters. The number of genetic clusters is determined by the user of the computer software conducting the study. Rosenberg later revealed that his team used pre-conceived numbers of genetic clusters from six to twenty "but did not publish those results because Structure [the computer program used] identified multiple ways to divide the sampled individuals". Dorothy Roberts, a law professor, asserts that "there is nothing in the team's findings that suggests that six clusters represent human population structure better than ten, or fifteen, or twenty."[8] When instructed to find two clusters, the program identified two populations anchored around by Africa and by the Americas. In the case of six clusters, the entirety of Kalesh people, an ethnic group living in Northern Pakistan, was added to the previous five.[1][9]

Commenting on Rosenberg's study, law professor Dorothy Roberts wrote that "the study actually showed that there are many ways to slice the expansive range of human genetic variation.


Genetic clustering studies, and particularly the five-cluster result published by Rosenberg's team in 2002, have been interpreted by journalist Nicholas Wade, evolutionary biologist Armand Marie Leroi, and others as demonstrating the biological reality of race.[10][11][12] For Leroi, "Race is merely a shorthand that enables us to speak sensibly, though with no great precision, about genetic rather than cultural or political differences." He states that, "One could sort the world's population into 10, 100, perhaps 1,000 groups", and describes Europeans, Basques, Andaman Islanders, Ibos, and Castilians each as a "race".[12] In response to Leroi's claims, the Social Science Research Council convened a panel of experts to discuss race and genomics online.[13] In their 2002 and 2005 papers, Rosenberg and colleagues disagree that their data implies the biological reality of race.[14][15]


Genetic cluster studies[edit]

Gene clusters from Rosenberg (2006) for K=7 clusters. (Cluster analysis divides a dataset into any prespecified number of clusters.) Individuals have genes from multiple clusters. The cluster prevalent only among the Kalash people (yellow) only splits off at K=7 and greater.

Genetic structure studies are carried out using statistical computer programs designed to find clusters of genetically similar individuals within a sample of individuals. Studies such as those by Risch and Rosenberg use a computer program called STRUCTURE to find human populations (gene clusters). It is a statistical program that works by placing individuals into one of an arbitrary number of clusters based on their overall genetic similarity, many possible pairs of clusters are tested per individual to generate multiple clusters.[16] The basis for these computations are data describing a large number of single nucleotide polymorphisms (SNPs), genetic insertions and deletions (indels), microsatellite markers (or short tandem repeats, STRs) as they appear in each sampled individual. Cluster analysis divides a dataset into any prespecified number of clusters.

These clusters are based on multiple genetic markers that are often shared between different human populations even over large geographic ranges. The notion of a genetic cluster is that people within the cluster share on average similar allele frequencies to each other than to those in other clusters.

  1. ^ a b c Roberts, Dorothy (2011). Fatal Invention. London, New York: The New Press.
  2. ^ Noah A. Rosenberg; Saurabh Mahajan; Sohini Ramachandran; Chengfeng Zhao; Jonathan K. Pritchard; Marcus Feldman (2005). "Clines, Clusters, and the Effects of Study Design on the Inference of Human Population Science". PLOS Genetics. 1 (6): 660, 668. doi:10.1371/journal.pgen.0010070. PMC 1310579. PMID 16355252.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  3. ^ Wade, Nicholas (2015-04-28). A Troublesome Inheritance: Genes, Race and Human History. Penguin. ISBN 978-0-14-312716-1.
  4. ^ Raff, Jennifer (1 July 2014). "Nicholas Wade and Race: Building a Scientific Façade". Human Biology. 86 (3): 227–232. doi:10.13110/humanbiology.86.3.0227. ISSN 0018-7143.
  5. ^ "Back with a Vengeance: the Reemergence of a Biological Conceptualization of Race in Research on Race/Ethnic Disparities in Health Reanne Frank". Archived from the original on 1 December 2008. Retrieved 3 May 2011.
  6. ^ Social Science Research Council. "Is Race "Real"?".
  7. ^ Rosenberg, Noah A.; Pritchard, Jonathan K.; Weber, James L.; Cann, Howard M.; Kidd, Kenneth K.; Zhivotovsky, Lev A.; Feldman, Marcus W. (2002-12-20). "Genetic Structure of Human Populations". Science. 298 (5602): 2381–2385. Bibcode:2002Sci...298.2381R. doi:10.1126/science.1078311. ISSN 0036-8075. PMID 12493913.
  8. ^ Rosenberg, NA; Mahajan, S; Ramachandran, S; Zhao, C; Pritchard, JK; et al. (2005). "Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure". PLoS Genet. 1 (6): e70. doi:10.1371/journal.pgen.0010070. PMC 1310579. PMID 16355252.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  9. ^ Witherspoon, D.J.; Wooding, S.; Rogers, A.R.; Marchani, E.E.; Watkins, W.S.; Batzer, M.A.; Jorde, L.B. (2007). "Genetic Similarities Within and Between Human Populations". Genetics. 176 (1): 351–359. doi:10.1534/genetics.106.067355. PMC 1893020. PMID 17339205.
  10. ^ Raff, Jennifer (1 July 2014). "Nicholas Wade and Race: Building a Scientific Façade". Human Biology. 86 (3): 227–232. doi:10.13110/humanbiology.86.3.0227. ISSN 0018-7143.
  11. ^ Leroi, Armand Marie (14 March 2005). "A Family Tree in Every Gene". New York Times. Retrieved 26 June 2016.
  12. ^ a b Social Science Research Council. "Is Race "Real"?".
  13. ^ Rosenberg, Noah A.; Pritchard, Jonathan K.; Weber, James L.; Cann, Howard M.; Kidd, Kenneth K.; Zhivotovsky, Lev A.; Feldman, Marcus W. (2002-12-20). "Genetic Structure of Human Populations". Science. 298 (5602): 2381–2385. Bibcode:2002Sci...298.2381R. doi:10.1126/science.1078311. ISSN 0036-8075. PMID 12493913.
  14. ^ Rosenberg, NA; Mahajan, S; Ramachandran, S; Zhao, C; Pritchard, JK; et al. (2005). "Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure". PLoS Genet. 1 (6): e70. doi:10.1371/journal.pgen.0010070. PMC 1310579. PMID 16355252.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  15. ^ Witherspoon, D.J.; Wooding, S.; Rogers, A.R.; Marchani, E.E.; Watkins, W.S.; Batzer, M.A.; Jorde, L.B. (2007). "Genetic Similarities Within and Between Human Populations". Genetics. 176 (1): 351–359. doi:10.1534/genetics.106.067355. PMC 1893020. PMID 17339205.
  16. ^ Cite error: The named reference Witherspoon was invoked but never defined (see the help page).