Abstract
In the identification of living species through the analysis of their DNA sequences, the mitochondrial “cytochrome c oxidase subunit 1” (COI) gene has proved to be a good DNA barcode. Nevertheless, the quality of the full length barcode sequences often can not be guaranteed because of the DNA degradation in biological samples, so that only short sequences (mini-barcode) are available. In this paper, a prototype-based classification approach for the analysis of DNA barcode, exploiting a spectral representation of DNA sequences and a memory-based neural network, is proposed. The neural network is a modified version of General Regression Neural Network (GRNN) used as a classification tool. Furthermore, the relationship between the characteristics of different species and their spectral distribution is investigated. Namely, a subset of the whole spectrum of a DNA sequence, composed by very high frequency DNA k-mers, is considered providing a robust system for the classification of barcode sequences. The proposed approach is compared with standard classification algorithms, like Support Vector Machine (SVM), obtaining better results specially when applied to mini-barcode sequences.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 420–434. Springer, Heidelberg (2000), doi:10.1007/3-540-44503-X
Chang, C.-C., Lin, C.-J.: Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
Fiannaca, A., La Rosa, M., Rizzo, R., Urso, A.: Analysis of DNA barcode sequences using neural gas and spectral representation. In: van Zee, G.A., van de Vorst, J.G.G. (eds.) EANN 2013. LNCS, vol. 384, pp. 215–224. Springer, Heidelberg (1989)
Francois, D., Wertz, V., Verleysen, M.: The Concentration of Fractional Distances. IEEE Transactions on Knowledge and Data Engineering 19(7), 873–886 (2007)
Hajibabaei, M., Smith, M.A., Janzen, D.H., Rodriguez, J.J., Whitfield, J.B., Hebert, P.D.N.: A minimalist barcode can identify a specimen whose DNA is degraded. Molecular Ecology Notes 6(4), 959–964 (2006)
Hajibabaei, M., Singer, G.A.C., Hebert, P.D.N., Hickey, D.A.: DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics.. Trends in Genetics 23(4), 167–172 (2007)
Haykin, S.: Neural networks: a comprehensive foundation, 2nd edn. Prentice-Hall (1998)
Hebert, P.D.N., Ratnasingham, S., DeWaard, J.R.: Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proceedings of the Royal Society. Series B, Biological Sciences 270(suppl.), S96–S99 (2003)
Hinnenburg, A., Aggarwal, C., Keim, D.: What is the nearest neighbor in high dimensional spaces?. In: Proceedings of the 26th International Conference on Very Large Data Bases, VLDB 2000, pp. 506–515. Morgan Kaufmann Publishers Inc. (2000)
Kuksa, P., Pavlovic, V.: Efficient alignment-free DNA barcode analytics. BMC Bioinformatics 10(suppl. 14), 9 (2009)
La Rosa, M., Fiannaca, A., Rizzo, R., Urso, A.: Alignment-free Analysis of Barcode Sequences by means of Compression-Based Methods. BMC Bioinformatics 14, S4 (2013)
La Rosa, M., Fiannaca, A., Rizzo, R., Urso, A.: A study of compression–based methods for the analysis of barcode sequences. In: Peterson, L.E., Masulli, F., Russo, G. (eds.) CIBB 2012. LNCS, vol. 7845, pp. 105–116. Springer, Heidelberg (2013)
Marshall, E.: Taxonomy. Will DNA bar codes breathe life into classification? Science 307(5712), 1037 (2005)
Meusnier, I., Singer, G.A.C., Landry, J.-F., Hickey, D.A., Hebert, P.D.N., Hajibabaei, M.: A universal DNA mini-barcode for biodiversity analysis.. BMC Genomics 9, 214 (2008)
Ratnasingham, S., Hebert, P.D.N.: bold: The Barcode of Life Data System (http://www.barcodinglife.org).. Molecular Ecology Notes 7(3), 355–364 (2007)
Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: The General Regression Neural Network to Classify Barcode and mini-barcode DNA. In: Proceedings of CIBB (2014)
Scholkopf, B., Smola, A.: Learning with kernels. MIT Press, Cambridge (2002)
Seo, T.K.: Classification of nucleotide sequences using support vector machines. Journal of Molecular Evolution 71(4), 250–267 (2010)
Specht, D.F.: A general regression neural network. IEEE Transactions on Neural Networks 2(6), 568–576 (1991)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A. (2015). The General Regression Neural Network to Classify Barcode and mini-barcode DNA. In: DI Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2014. Lecture Notes in Computer Science(), vol 8623. Springer, Cham. https://doi.org/10.1007/978-3-319-24462-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-24462-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24461-7
Online ISBN: 978-3-319-24462-4
eBook Packages: Computer ScienceComputer Science (R0)