How to Quantitatively Compare Data Dissimilarities for Unsupervised Machine Learning?

Mokbel, Bassam; Gross, Sebastian; Lux, Markus; Pinkwart, Niels; Hammer, Barbara

doi:10.1007/978-3-642-33212-8_1

Bassam Mokbel²²,
Sebastian Gross²³,
Markus Lux²²,
Niels Pinkwart²³ &
…
Barbara Hammer²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7477))

Included in the following conference series:

IAPR Workshop on Artificial Neural Networks in Pattern Recognition

1607 Accesses
2 Citations

Abstract

For complex data sets, the pairwise similarity or dissimilarity of data often serves as the interface of the application scenario to the machine learning tool. Hence, the final result of training is severely influenced by the choice of the dissimilarity measure. While dissimilarity measures for supervised settings can eventually be compared by the classification error, the situation is less clear in unsupervised domains where a clear objective is lacking. The question occurs, how to compare dissimilarity measures and their influence on the final result in such cases. In this contribution, we propose to use a recent quantitative measure introduced in the context of unsupervised dimensionality reduction, to compare whether and on which scale dissimilarities coincide for an unsupervised learning task. Essentially, the measure evaluates in how far neighborhood relations are preserved if evaluated based on rankings, this way achieving a robustness of the measure against scaling of data. Apart from a global comparison, local versions allow to highlight regions of the data where two dissimilarity measures induce the same results.

Download to read the full chapter text

Chapter PDF

The dissimilarity approach: a review

Article 02 August 2019

Simple supervised dissimilarity measure: Bolstering iForest-induced similarity with class information without learning

Article 26 March 2020

Pattern Recognition with Non-Euclidean Similarities

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ackerman, M., Ben-David, S., Loker, D.: Towards property-based classification of clustering paradigms. In: NIPS 2010, pp. 10–18 (2010)
Google Scholar
Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J. of Mathematical Models and Methods in Appl. Sci. 1(4), 300–307 (2007)
MathSciNet Google Scholar
Chen, Y., Garcia, E.K., Gupta, M.R., Rahimi, A., Cazzanti, L.: Similarity-based classification: Concepts and algorithms. JMLR 10, 747–776 (2009)
MathSciNet MATH Google Scholar
Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Trans. on Information Theory 51(4), 1523–1545 (2005)
Article Google Scholar
Frasconi, P., Gori, M., Sperduti, A.: A general framework for adaptive processing of data structures. IEEE TNN 9(5), 768–786 (1998)
Google Scholar
Gärtner, T.: Kernels for Structured Data. PhD thesis, Univ. Bonn (2005)
Google Scholar
Gisbrecht, A., Mokbel, B., Hammer, B.: Relational generative topographic mapping. Neurocomputing 74(9), 1359–1371 (2011)
Article Google Scholar
Hammer, B., Hasenfuss, A.: Topographic mapping of large dissimilarity datasets. Neural Computation 22(9), 2229–2284 (2010)
Article MathSciNet MATH Google Scholar
Hammer, B., Jain, B.: Neural methods for non-standard data. In: ESANN 2004, pp. 281–292 (2004)
Google Scholar
Hammer, B., Micheli, A., Sperduti, A.: Universal approximation capability of cascade correlation for structures. Neural Computation 17, 1109–1159 (2005)
Article MathSciNet MATH Google Scholar
Hammer, B., Micheli, A., Sperduti, A.: Adaptive Contextual Processing of Structured Data by Recursive Neural Networks: A Survey of Computational Properties. In: Hammer, B., Hitzler, P. (eds.) Perspectives of Neural-Symbolic Integration. SCI, vol. 77, pp. 67–94. Springer, Heidelberg (2007)
Chapter Google Scholar
Hammer, B., Mokbel, B., Schleif, F.-M., Zhu, X.: White Box Classification of Dissimilarity Data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012, Part III. LNCS, vol. 7208, pp. 309–321. Springer, Heidelberg (2012)
Chapter Google Scholar
Hathaway, R.J., Bezdek, J.C.: Nerf c-means: Non-euclidean relational fuzzy clustering. Pattern Recognition 27(3), 429–437 (1994)
Article Google Scholar
Lee, J.A., Verleysen, M.: Quality assessment of dimensionality reduction: Rank-based criteria. Neurocomputing 72(7-9), 1431–1443 (2009)
Article Google Scholar
Lee, J.A., Verleysen, M.: Nonlinear dimensionality redcution. Springer (2007)
Google Scholar
Lee, J.A., Verleysen, M.: Scale-independent quality criteria for dimensionality reduction. Pattern Recognition Letters 31, 2248–2257 (2010)
Article Google Scholar
Lewis, J., Ackerman, M., Sa, V.D.: Human cluster evaluation and formal quality measures. In: Proc. of the 34th Ann. Conf. of the Cog. Sci. Society (2012)
Google Scholar
Liu, H., Song, D., Rüger, S., Hu, R., Uren, V.: Comparing Dissimilarity Measures for Content-Based Image Retrieval. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 44–50. Springer, Heidelberg (2008)
Chapter Google Scholar
Malerba, D., Esposito, F., Gioviale, V., Tamma, V.: Comparing dissimilarity measures for symbolic data analysis. In: Pre-Proc. of ETK-NTTS 2001, HERSONISSOS, pp. 473–481 (2001)
Google Scholar
Mokbel, B., Lueks, W., Gisbrecht, A., Biehl, M., Hammer, B.: Visualizing the quality of dimensionality reduction. In: ESANN 2012, pp. 179–184 (2012)
Google Scholar
Neuhaus, M., Bunke, H.: Edit distance-based kernel functions for structural pattern classification. Pat. Rec. 39(10), 1852–1863 (2006)
Article MATH Google Scholar
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. of the National Academy of Sciences USA 85(8), 2444–2448 (1988)
Article Google Scholar
Pekalska, E., Duin, R.P.: The Dissimilarity Representation for Pattern Recognition. Foundations and Applications. World Scientific (2005)
Google Scholar
Qin, A.K., Suganthan, P.N.: Kernel neural gas algorithms with application to cluster analysis. In: ICPR 2004, vol. 4, pp. 617–620. IEEE Computer Society (2004)
Google Scholar
Robertson, S.: Understanding inverse document frequency: On theoretical arguments for idf. Journal of Documentation 60(5), 503–520 (2004)
Article Google Scholar
Rossi, F., Villa-Vialaneix, N.: Consistency of functional learning methods based on derivatives. Pat. Rec. Letters 32(8), 1197–1209 (2011)
Article Google Scholar
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: Computational capabilities of graph neural networks. IEEE TNN 20(1), 81–102 (2009)
Google Scholar
Gross, S., Zhu, X., Hammer, B., Pinkwart, N.: Cluster Based Feedback Provision Strategies in Intelligent Tutoring Systems. In: Cerri, S.A., Clancey, W.J., Papadourakis, G., Panourgia, K. (eds.) ITS 2012. LNCS, vol. 7315, pp. 699–700. Springer, Heidelberg (2012)
Chapter Google Scholar
Mozgovoy, M., Karakovskiy, S., Klyuev, V.: Fast and reliable plagiarism detection system. In: 37th Annual Frontiers In Education Conference - Global Engineering: Knowledge Without Borders, Opportunities Without Passports, FIE 2007 (2007)
Google Scholar
Wise, M.J.: Running Karp-Rabin Matching and Greedy String Tiling. Technical report 463 (Univ. of Sydney. Basser Dept. of Comp. Sci.) (1993) ISBN 0867586699
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing high-dimensional data using t-sne. JMLR 9, 2579–2605 (2008)
MATH Google Scholar
Venna, J.: Dimensionality reduction for Visual Exploration of Similarity Structures. PhD thesis, Helsinki University of Technology, Espoo, Finland (2007)
Google Scholar
Venna, J., Peltonen, J., Nybo, K., Aidos, H., Kaski, S.: Information retrieval perspective to nonlinear dimensionality reduction for data visualization. JMLR 11, 451–490 (2010)
MathSciNet MATH Google Scholar
Yin, H.: On the equivalence between kernel self-organising maps and self-organising mixture density networks. Neural Netw. 19(6), 780–784 (2006)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

CITEC Centre of Excellence, Bielefeld University, Germany
Bassam Mokbel, Markus Lux & Barbara Hammer
Computer Science Institute, Clausthal University of Technology, Germany
Sebastian Gross & Niels Pinkwart

Authors

Bassam Mokbel
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Gross
View author publications
You can also search for this author in PubMed Google Scholar
Markus Lux
View author publications
You can also search for this author in PubMed Google Scholar
Niels Pinkwart
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Hammer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fondazione Bruno Kessler (FBK), 38123, Trento, Italy
Nadia Mana
Institute of Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Friedhelm Schwenker
Dipartimento di Ingegneria dell’Informazione, Università di Siena, 53100, Siena, Italy
Edmondo Trentin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mokbel, B., Gross, S., Lux, M., Pinkwart, N., Hammer, B. (2012). How to Quantitatively Compare Data Dissimilarities for Unsupervised Machine Learning?. In: Mana, N., Schwenker, F., Trentin, E. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2012. Lecture Notes in Computer Science(), vol 7477. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33212-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-33212-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33211-1
Online ISBN: 978-3-642-33212-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

How to Quantitatively Compare Data Dissimilarities for Unsupervised Machine Learning?

Abstract

Chapter PDF