Identifying SNPs without a Reference Genome by Comparing Raw Reads

Peterlongo, Pierre; Schnel, Nicolas; Pisanti, Nadia; Sagot, Marie-France; Lacroix, Vincent

doi:10.1007/978-3-642-16321-0_14

Pierre Peterlongo¹⁸,
Nicolas Schnel¹⁸,
Nadia Pisanti¹⁹,
Marie-France Sagot²⁰ &
…
Vincent Lacroix²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6393))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

1336 Accesses
22 Citations

Abstract

Next generation sequencing (NGS) technologies are being applied to many fields of biology, notably to survey the polymorphism across individuals of a species. However, while single nucleotide polymorphisms (SNPs) are almost routinely identified in model organisms, the detection of SNPs in non model species remains very challenging due to the fact that almost all methods rely on the use of a reference genome. We address here the problem of identifying SNPs without a reference genome. For this, we propose an approach which compares two sets of raw reads. We show that a SNP corresponds to a recognisable pattern in the de Bruijn graph built from the reads, and we propose algorithms to identify these patterns, that we call mouths. We outline the potential of our method on real data. The method is tailored to short reads (typically Illumina), and works well even when the coverage is low where it reports few but highly confident SNPs. Our program, called KisSnp, can be downloaded here: http://alcovna.genouest.org/kissnp/ .

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads

Article Open access 31 May 2017

Mapping Algorithms in High-Throughput Sequencing

SNP Discovery Using Next Generation Transcriptomic Sequencing

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

E. coli long-term experimental evolution project site, http://myxo.css.msu.edu/ecoli/
Barrick, J.E., Yu, D.S., Jeong, H., Oh, T.K., Schneider, D., Lenski, R.E., Kim, J.F.: Genome evolution and adaptation in a long-term experiment with escherichia coli. Nature 461, 1243–1247 (2009)
Article Google Scholar
Cannon, C., Kua, C.-S., Zhang, D., Harting, J.: Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack. Molecular Ecology 19(Suppl. 1) ,147–161 (2010)
Google Scholar
Cooper, D.N., Smith, B.A., Cooke, H.J., Niemann, S., Schmidtke, J.: An estimate of unique DNA sequence heterozygosity in the human genome. Hum. Genet. 69, 201–205 (1985)
Article Google Scholar
Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980)
Article Google Scholar
Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(11), 1851–1858 (2008)
Article Google Scholar
Li, R., Zhu, H., Ruan, J., Qian, W., Fang, X., Shi, Z., Li, Y., Li, S., Shan, G., Kristiansen, K., Li, S., Yang, H., Wang, J., Wang, J.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20(2), 265–272 (2010)
Article Google Scholar
Pevzner, P., Tang, H., Waterman, M.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. 98, 9748–9753 (2001)
Article MathSciNet MATH Google Scholar
Ratan, A., Zhang, Y., Hayes, V., Schuster, S., Miller, W.: Calling SNPs without a reference genome. BMC Bioinformatics 11, 130 (2010)
Article Google Scholar
Richter, D., Ott, F., Auch, A., Schmid, R., Huson, D.: MetaSim – A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3(10), e3373 (2008)
Article Google Scholar
Zerbino, D., Birney, E.: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

INRIA Rennes - Bretagne Atlantique, EPI Symbiose, Rennes, France
Pierre Peterlongo & Nicolas Schnel
Dipartimento di Informatica, Università di Pisa, Italy
Nadia Pisanti
INRIA Rhône-Alpes, 38330 Montbonnot Saint-Martin, France and Université de Lyon, F-69000 Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Evolutive, F-69622, Villeurbanne, France
Marie-France Sagot & Vincent Lacroix

Authors

Pierre Peterlongo
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Schnel
View author publications
You can also search for this author in PubMed Google Scholar
Nadia Pisanti
View author publications
You can also search for this author in PubMed Google Scholar
Marie-France Sagot
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Lacroix
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Physics and Mathematics, Edificio "B", Universidad Michoacana, Ciudad Universitaria, 5800, Morelia, Mich., Mexico
Edgar Chavez
Dept. of Computer Science and Enginerring, University of California, 92521, Riverside, CA, USA
Stefano Lonardi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peterlongo, P., Schnel, N., Pisanti, N., Sagot, MF., Lacroix, V. (2010). Identifying SNPs without a Reference Genome by Comparing Raw Reads. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16321-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-16321-0_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16320-3
Online ISBN: 978-3-642-16321-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Identifying SNPs without a Reference Genome by Comparing Raw Reads

Abstract

Chapter PDF

Similar content being viewed by others

FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads

Mapping Algorithms in High-Throughput Sequencing

SNP Discovery Using Next Generation Transcriptomic Sequencing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Identifying SNPs without a Reference Genome by Comparing Raw Reads

Abstract

Chapter PDF

Similar content being viewed by others

FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads

Mapping Algorithms in High-Throughput Sequencing

SNP Discovery Using Next Generation Transcriptomic Sequencing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation