Abstract
Next generation sequencing (NGS) technologies are being applied to many fields of biology, notably to survey the polymorphism across individuals of a species. However, while single nucleotide polymorphisms (SNPs) are almost routinely identified in model organisms, the detection of SNPs in non model species remains very challenging due to the fact that almost all methods rely on the use of a reference genome. We address here the problem of identifying SNPs without a reference genome. For this, we propose an approach which compares two sets of raw reads. We show that a SNP corresponds to a recognisable pattern in the de Bruijn graph built from the reads, and we propose algorithms to identify these patterns, that we call mouths. We outline the potential of our method on real data. The method is tailored to short reads (typically Illumina), and works well even when the coverage is low where it reports few but highly confident SNPs. Our program, called KisSnp, can be downloaded here: http://alcovna.genouest.org/kissnp/ .
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
E. coli long-term experimental evolution project site, http://myxo.css.msu.edu/ecoli/
Barrick, J.E., Yu, D.S., Jeong, H., Oh, T.K., Schneider, D., Lenski, R.E., Kim, J.F.: Genome evolution and adaptation in a long-term experiment with escherichia coli. Nature 461, 1243–1247 (2009)
Cannon, C., Kua, C.-S., Zhang, D., Harting, J.: Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack. Molecular Ecology 19(Suppl. 1) ,147–161 (2010)
Cooper, D.N., Smith, B.A., Cooke, H.J., Niemann, S., Schmidtke, J.: An estimate of unique DNA sequence heterozygosity in the human genome. Hum. Genet. 69, 201–205 (1985)
Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980)
Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(11), 1851–1858 (2008)
Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H., Wang, J., Wang, J.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20(2), 265–272 (2010)
Pevzner, P., Tang, H., Waterman, M.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. 98, 9748–9753 (2001)
Ratan, A., Zhang, Y., Hayes, V., Schuster, S., Miller, W.: Calling SNPs without a reference genome. BMC Bioinformatics 11, 130 (2010)
Richter, D., Ott, F., Auch, A., Schmid, R., Huson, D.: MetaSim – A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3(10), e3373 (2008)
Zerbino, D., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Peterlongo, P., Schnel, N., Pisanti, N., Sagot, MF., Lacroix, V. (2010). Identifying SNPs without a Reference Genome by Comparing Raw Reads. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16321-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-16321-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16320-3
Online ISBN: 978-3-642-16321-0
eBook Packages: Computer ScienceComputer Science (R0)