Abstract
Using relational databases to manage SNP datasets is a very useful technique that has significant advantages over alternative methods, including the ability to leverage the power of relational databases to perform data validation, and the use of the powerful SQL query language to export data. SNPpy is a Python program which uses the PostgreSQL database and the SQLAlchemy Python library to automate SNP data management. This chapter shows how to use SNPpy to store and manage large datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Purcell S et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575
Aulchenko YS et al (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics 23(10):1294–1296
Mitha F, Herodotou H, Borisov N, Jiang C, Yoder J, Owzar K (2011) SNPpy - Database Management for SNP Data from Genome Wide Association Studies. PLoS ONE 6(10):e24,982, DOI 10.1371/journal.pone. 0024982, URL http://dx.doi.org/10.1371%2Fjournal.pone.0024982
Acknowledgements
This article is adapted from [3]. The author wishes to thank his coauthors on this, the original SNPpy project paper; namely Herodotos Herodotou, Nedyalko Borisov, Chen Jiang, Josh Yoder, and Kouros Owzar. The author also wishes to thank the PostgreSQL community, specifically the denizens of the postgresql-general and postgresql-performance mailing list, and particularly the Freenode IRC channel #postgresql. The people who contributed advice and suggestions are too numerous to list them all, but specific mention goes to Andrew Gierth (RhodiumToad) #postgresql’s resident expert on everything PostgreSQL related, Erikjan Rijkers (breinbaas), Jeff Trout (threshar), David Fetter (davidfetter), Jon T Erdman (StuckMojo), David Blewett (BlueAidan), depesz, Casey Allen Shobe (Raptelan), Chua Khee Chin (merlin83), Marko Tiikkaja (johto), Robert Haas and Robert Schnabel. The author also wishes to thank Michael Bayer, the author of SQLAlchemy, who has been extremely helpful in answering numerous queries on the SQLAlchemy user mailing list. Finally, the author wishes to thank the members of the StackExchange question-answer sites, especially stackoverflow.com, unix.stackexchange.com, and tex.stackexchange.com, for answering many questions in connection with this project. The members of tex.stackexchange.com were particularly helpful with regard to LaTeX issues.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Mitha, F. (2013). Managing Large SNP Datasets with SNPpy. In: Gondro, C., van der Werf, J., Hayes, B. (eds) Genome-Wide Association Studies and Genomic Prediction. Methods in Molecular Biology, vol 1019. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-447-0_4
Download citation
DOI: https://doi.org/10.1007/978-1-62703-447-0_4
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-446-3
Online ISBN: 978-1-62703-447-0
eBook Packages: Springer Protocols