Abstract
High-throughput sequencing technologies produce a large number of short reads that may contain errors. These sequencing errors constitute one of the major problems in analyzing such data. Many algorithms and software tools have been proposed to correct errors in short reads. However, the computational complexity limits their performance. In this paper, we propose a novel and efficient hybrid approach which is based on an alignment-free method combined with multiple alignments. We construct suffix arrays on all short reads to search the correct overlapping regions. For each correct overlapping region, we form multiple alignments for the substrings following the correct overlapping region to identify and correct the erroneous bases. Our approach can correct all types of errors in short reads produced by different sequencing platforms. Experiments show that our approach provides significantly higher accuracy and is comparable or even faster than previous approaches.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Mardis, E.R.: The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133–141 (2008)
Tammi, M.T., Arner, E., Kindlund, E., Andersson, B.: Correcting errors in shotgun sequences. Nucleic Acids Res. 31, 4663–4672 (2003)
Pevzner, P.A., Tang, H., Waterman, M.S.: A new approach to fragment assembly in DNA sequencing. In: RECOMB 2001, pp. 256–267 (2001)
Chaisson, M.J., Pevzner, P.A., Tang, H.: Fragment assembly with short reads. Bioinformatics 20, 2067–2074 (2004)
Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res. 19, 336–346 (2009)
Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008)
Yang, X., Dorman, K.S., Aluru, S.: Reptile: representative tiling for short read error correction. Bioinformatics 26, 2526–2533 (2010)
Kelley, D., Schatz, M., Salzberg, S.: Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11(11), R116 (2010)
Shi, H., Schmidt, B., Liu, W., Muller-Wittig, W.: A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware. J. Comput. Biol. 17, 603–615 (2009)
Schroder, J., Schroder, H., Puglisi, S.J., Sinha, R., Schmidt, B.: SHREC: a short-read error correction method. Bioinformatics 25, 2157–2163 (2009)
Salmela, L.: Correction of sequencing errors in a mixed set of reads. Bioinformatics 26(10), 1284–1290 (2010)
Ilie, L., Fazayeli, F., Ilie, S.: HiTEC: accurate error correction in high-throughput sequencing data. Bioinformatics 27(3), 295–302 (2011)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line search. SIAM J. Comput. 22(5), 935–948 (1993)
Simon, J., Puglisi, W.F., Smyth, A.T.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 1–31 (2007)
Mori, Y.: Short description of improved two-stage suffix sorting algorithm, http://homepage3.nifty.com/wpage/software/itssort.txt
Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)
Needleman, S.B.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhao, Z., Yin, J., Li, Y., Xiong, W., Zhan, Y. (2011). An Efficient Hybrid Approach to Correcting Errors in Short Reads. In: Torra, V., Narakawa, Y., Yin, J., Long, J. (eds) Modeling Decision for Artificial Intelligence. MDAI 2011. Lecture Notes in Computer Science(), vol 6820. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22589-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-22589-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22588-8
Online ISBN: 978-3-642-22589-5
eBook Packages: Computer ScienceComputer Science (R0)