Abstract
Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a long procedure and the obtained dissimilarity results is not a metric. Recently the normalized compression distance was introduced as a method to calculate the distance between two generic digital objects, and it seems a suitable way to compare genomic strings. In this paper the clustering and the mapping, obtained using a SOM, with the traditional evolutionary distance and the compression distance are compared in order to understand if the two distances sets are similar. The first results indicate that the two distances catch different aspects of the genomic sequences and further investigations are needed to obtain a definitive result.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
National Center for Biotechnology Information, Entrez Nucleotide query, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide
European Molecular Biology Laboratory, http://www.ebi.ac.uk/embl/
Nei, M., Kumar, S.: Molecular Evolution and Phylogenetics. Oxford University Press, New York (2000)
Needleman, S.B., Wunsch, C.D.: J. Mol. Biol. 48, 443–453 (1970)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)
Li, M., Chen, X., Li, X., Ma, B., Vityi, P.M.B.: The similarity metric. IEEE Trans. Inf. Theory 50(12), 3250–3264 (2004)
Li, M., Vitanyi, P.M.B.: An Introduction to Kolmogorov Complexity and its Applications, 2nd edn. Springer, New York (1997)
Kohonen, T.: Self-organizing maps. Springer, Heidelberg (1995)
Drancourt, M., Bollet, C., Carlioz, A., Martelin, R., Gayral, J., Raoult, D.: 16S Ribosomal DNA Sequence Analysis of a Large Collection of Environmental and Clinical Unidentifiable Bacterial Isolates. J. Clin. Microbiol. 38, 3623–3630 (2000)
Drancourt, M., Berger, P., Raoult, D.: Systematic 16S RNA Gene Sequencing of Atypical Clinical Isolates Identified 27 New Bacterial Species Associated with Humans. J. Clin. Microbiol. 42, 2197–2202 (2004)
Cilibrasi, R., Vitanyi, P.M.B.: Clustering by Compression. IEEE Trans. Inf. Theory 51(4), 1523–1545 (2005)
Somervuo, P., Kohonen, T.: Clustering and visualization of large protein sequence databases by means of an extension of the self-organizing map. In: Proceedings of the Third International Conference on Discovery Science, pp. 76–85 (2000)
Oja, M., Somervuo, P., Kaski, S., Kohonen, T.: Clustering of human endogenous retrovirus sequences with median self-organizing map. In: WSOM 2003 Workshop on Self-Organizing Maps, September 9-14, 2003 (2003)
Pearson, W., Lipman, D.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)
La Rosa, M., Di Fatta, G., Gaglio, S., Giammanco, G.M., Rizzo, R., Urso, A.: Soft Topographic Map for Clustering and Classification of Bacteria. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 332–343. Springer, Heidelberg (2007)
Graepel, T., Burger, M., Obermayer, K.: Self-organizing maps: generalizations and new optimization techniques. Neurocomputing 21, 173–190 (1998)
Chen, X., Kwong, S., Li, M.: A compression algorithm for DNA sequences. Engineering in Medicine and Biology Magazine 20(4), 61–66 (2001)
Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Networks 15(8-9), 945–952 (2002)
Hasenfuss, A., Hammer, B.: Relational Topographic Maps. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 93–105. Springer, Heidelberg (2007)
Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17, 401–419 (1952)
Jukes, T.H., Cantor, R.R.: Evolution of protein molecules. In: Munro, H.N. (ed.) Mammalian Protein Metabolism, pp. 21–132. Academic Press, New York (1969)
Kaski, S., Lagus, K.: Comparing Self-Organizing Maps. In: Proceedings of the 1996 International Conference on Artificial Neural Networks (1996)
Ferragina, P., Giancarlo, R., Greco, V., Manzini, G., Valiente, G.: Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment. BMC Bioinformatics 8, 252 (2007)
Garrity, G.M., Lilburn, T.G.: Self-organizing and self-correcting classifications of biological data. Bioinformatics 21(10), 2309–2314 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
La Rosa, M., Rizzo, R., Urso, A., Gaglio, S. (2008). Comparison of Genomic Sequences Clustering Using Normalized Compression Distance and Evolutionary Distance. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2008. Lecture Notes in Computer Science(), vol 5179. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85567-5_92
Download citation
DOI: https://doi.org/10.1007/978-3-540-85567-5_92
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85566-8
Online ISBN: 978-3-540-85567-5
eBook Packages: Computer ScienceComputer Science (R0)