Abstract
Many algorithms have been proposed to analyze population structures from the single nucleotide polymorphism (SNP) genotyping data of some number of individuals and try to assign individuals to genetically similar groups. These algorithms can be categorized into two computational paradigms: parametric and non-parametric approaches. Although the parametric-based approach is a gold standard for population structure analysis, the computational burden incurred by running these algorithms is unacceptable for large complex dataset. As genotyping platforms incorporating more SNPs, analyzing ever larger and more complex datasets are becoming a standard practice. Hence, the computationally efficient non-parametric methods for analysis of genotypic datasets are needed to reveal the population structure. In this study, we evaluated two leading non-parametric population structure analysis techniques, namely ipPCA and AWclust, on their abilities to characterize the genetic diversity and population structure of two complex SNP genotype datasets (as many as 243855 SNPs). The head-to-head comparisons were conducted on two major aspects: ability to infer the number of genetically related subpopulations (K) and ability to correctly assign individuals to these subpopulations. The experimental results suggested that AWclust could be more suitable when applying to a small and less complex dataset. However, with a large and more complex dataset, ipPCA is a much better choice yielding higher accuracy on assigning genetically similar individuals to the inferred groups.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Lander, E.S., Schork, N.J.: Genetic Dissection of Complex Traits. Science 265(5181), 2037–2048 (1994)
Risch, N.J.: Searching for Genetic Determinants in the New Millennium. Nature 405, 847–856 (2000)
Marchini, J., Cardon, L.R., Phillips, M.S., Donnelly, P.: The Effects of Human Population Structure on Large Genetic Association Studies. Nat. Genet. 36(5), 512–517 (2004)
Freedman, M.L., Reich, D., Penney, K.L., McDonald, G.J., Mignault, A.A., Patterson, N., Gabriel, S.B., Topol, E.J., Smoller, J.W., Pato, C.N., Pato, M.T., Petryshen, T.L., Kolonel, L.N., Lander, E.S., Sklar, P., Henderson, B., Hirschhorn, J.N., Altshuler, D.: Assessing the Impact of Population Stratification on Genetic Association Studies. Nat. Genet. 36, 388–393 (2004)
Cavalli-Sforza, L.L., Menozzi, P., Piazza, A.: The History and Geography of Human Genes. Princeton University Press, Princeton (1994)
Bowcock, A.M., Ruiz-Linares, A., Tomfohrde, J., Minch, E., Kidd, J., Cavalli-Sforza, L.L.: High Resolution of Human Evolutionary Trees with Polymorphic Microsatellites. Nature 368, 455–457 (1994)
Mountain, J.L., Cavalli-Sforza, L.L.: Multilocus Genotypes, a Tree of Individuals, and Human Evolutionary History. Am. J. Hum. Genet. 61, 705–718 (1997)
Rosenberg, N.A., Pritchard, J.K., Weber, J.L., Cann, H.M., Kidd, K.K., Zhivotovsky, L.A., Feldman, M.W.: Genetic Structure of Human Populations. Science 298, 2381–2384 (2002)
Shriver, M.D., Kennedy, G.C., Parra, E.J., Lawson, H.A., Sonpar, V., Huang, J., Akey, J.M., Jones, K.W.: The Genomic Distribution of Population Substructure in Four Populations Using 8,525 Autosomal SNPs. Hum. Genomics 1, 274–276 (2004)
Pritchard, J.K., Stephens, M., Donelly, P.: Inference of Population Structure Using Multilocus Genotype Data. Am. J. Hum. Genet. 67, 945–959 (2000)
Purcell, S., Sham, P.: Properties of Structured Association Approaches to Detecting Population Stratification. Hum. Hered. 58, 93–107 (2004)
Intarapanich, A., Shaw, P.J., Assawamakin, A., Wangkumhang, P., Ngamphiw, C., Chaichoompu, K., Piriyapongsa, J., Tongsima, S.: Iterative Pruning PCA Improves Resolution of Highly Structured Populations. BMC Bioinf. 10(382) (2009)
Gao, X., Starmer, J.D.: AWclust: Point-and-Click Software for Non-parametric Population Structure Analysis. BMC Bioinf. 9(77) (2008)
Xing, J., Watkins, W.S., Witherspoon, D.J., Zhang, Y., Guthery, S.L., Thara, R., Mowry, B.J., Bulayeva, K., Weiss, R.B., Jorde, L.B.: Fine-Scaled Human Genetic Structure Revealed by SNP Microarrays. Genome Res. 19, 815–825 (2009)
Liang, L., Zollner, S., Abecasis, G.R.: GENOME: a rapid coalescent-based whole genome simulator. Bioinformatics (Oxford, England) 23(12), 1565–1567 (2007)
Ewens, W.J.: Mathematical Population Genetics. Springer, Berlin (1979)
Bezdec, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Parsons, L., Haque, E., Liu, H.: Subspace Clustering for High Dimensional Data: a Review. ACM SIGKDD Explor. Newslett. 6(1), 15 (2004)
Patterson, N., Price, A.L., Reich, D.: Population Structure and Eigenanalysis. PLoS genet. 2(12), e190 (2006)
Gibbs, R.A., Tassell, C.V., Weinstock, G., Green, R., Hamernik, D., Kappes, S., Liu, G., Matukumalli, L., Matukumali, A., Sonstegard, T., Silva, M.: Genome-Wide Survey of SNP Variation Uncovers the Genetic Structure of Cattle Breeds. Science 24, 528–532 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Deejai, P., Assawamakin, A., Wangkumhang, P., Poomputsa, K., Tongsima, S. (2010). On Assigning Individuals from Cryptic Population Structures to Optimal Predicted Subpopulations: An Empirical Evaluation of Non-parametric Population Structure Analysis Techniques. In: Chan, J.H., Ong, YS., Cho, SB. (eds) Computational Systems-Biology and Bioinformatics. CSBio 2010. Communications in Computer and Information Science, vol 115. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16750-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-16750-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16749-2
Online ISBN: 978-3-642-16750-8
eBook Packages: Computer ScienceComputer Science (R0)