Abstract
A common way of expressing string similarity in structural pattern recognition is the edit distance. It allows one to apply the kNN rule in order to classify a set of strings. However, compared to the wide range of elaborated classifiers known from statistical pattern recognition, this is only a very basic method. In the present paper we propose a method for transforming strings into n-dimensional real vector spaces based on prototype selection. This allows us to subsequently classify the transformed strings with more sophisticated classifiers, such as support vector machine and other kernel based methods. In a number of experiments, we show that the recognition rate can be significantly improved by means of this procedure.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bunke, H., Sanfeliu, A.: Syntactic and Structural Pattern Recognition – Theory and Applications. World Scientific Publ. Co., Singapore (1990)
Cha, S.H., Shin, Y.C., Srihari, S.N.: Approximate stroke sequence matching algorithm for character recognition and analysis. In: 5th International Conference on Document Analysis and Recognition, pp. 53–56 (1999)
Bunke, H., Bühler, U.: Applications of approximate string matching to 2D shape recognition. Pattern Recognition 26, 1797–1812 (1993)
Chen, S.W., Tung, S.T., Fang, C.Y., Cheng, S., Jain, A.K.: Extended attributed string matching for shape recognition. Computer Vision and Image Understanding 70, 36–50 (1998)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the ACM 21, 168–173 (1974)
Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, Heidelberg (2000)
Wilson, R.C., Hancock, E.R., Luo, B.: Pattern vectors from algebraic graph theory. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1112–1124 (2005)
Hjaltason, G.R., Samet, H.: Properties of embedding methods for similarity searching in metric spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 530–549 (2003)
Pękalska, E.: Dissimilarity representations in pattern recognition. PhD thesis, Delft University of Technology (2005)
Pękalska, E., Duin, R.P., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recognition 39, 189–208 (2006)
Kohonen, T.: Median strings. Pattern Recognition Letters 3, 309–313 (1985)
Katsavounidis, I., Kuo, C.C.J., Zhang, Z.: A new initialization technique for generalized lloyd iteration. IEEE Signal processing letters 1, 144–146 (1994)
Juan, A., Vidal, E.: Comparison of four initialization techniques for the k -medians clustering algorithm. In: Amin, A., Pudil, P., Ferri, F., Iñesta, J.M. (eds.) SPR 2000 and SSPR 2000. LNCS, vol. 1876, pp. 842–852. Springer, Heidelberg (2000)
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River (1988)
Alpaydin, E., Alimoglu, F.: Department of Computer Engineering, Bogaziçi University, 80815 Istanbul Turkey (1998), ftp://ftp.ics.uci.edu/pub/mlearn/databases/pendigits
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Vapnik, V.: Statistical Learning Theory. Wiley-Interscience, Chichester (1998)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Alimoglu, F., Alpaydin, E.: Combining multiple representations for pen-based handwritten digit recognition. Turk. J. Elec. Engin. 9 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Spillmann, B., Neuhaus, M., Bunke, H., Pękalska, E., Duin, R.P.W. (2006). Transforming Strings to Vector Spaces Using Prototype Selection. In: Yeung, DY., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2006. Lecture Notes in Computer Science, vol 4109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11815921_31
Download citation
DOI: https://doi.org/10.1007/11815921_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37236-3
Online ISBN: 978-3-540-37241-7
eBook Packages: Computer ScienceComputer Science (R0)