Abstract
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly involving some errors referred to as “gaps” and “offshoots”. We develop a method to globally maximize the likelihood of a scenario involving gains and losses of the general concept manifested in a fuzzy cluster of leaf nodes of the taxonomy. Probabilities of the gain and loss events are derived from multiple runs of our earlier method of maximum parsimony starting with randomly generated values for the two parameters involved. Supplemented with fuzzy c-means clustering, this allows us to obtain meaningful generalizations for six fuzzy thematic clusters of Data Science topics using over 17000 abstracts from 17 research journals published by Springer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
The ACM Computing Classification System - Association for Computing Machinery. https://www.acm.org/publications/class-2012
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)
Frolov, D.: Generalization over taxonomy (GOT) package (2021). https://github.com/dmitsf/GOT/
Frolov, D., et al.: Finding an appropriate generalization for a fuzzy thematic set in taxonomy. Series WP7 - v. 4 (2018)
Frolov, D., Nascimento, S., Fenner, T.I., Mirkin, B.: Parsimonious generalization of fuzzy thematic sets in taxonomies applied to the analysis of tendencies of research in data science. Inf. Sci. 512, 595–615 (2020)
Chernyak, E., Mirkin, B.: Refining a taxonomy by using annotated suffix trees and Wikipedia resources. Ann. Data Sci. 2(1), 61–82 (2015)
Xu, S., Fang, J., Li, X.: Weighted Laplacian method and its theoretical applications. IOP Conf. Ser. Mater. Sci. Eng. 768(7), 072032 (2020)
Acknowledgements
D.F. and B.M. gratefully acknowledge support from the Basic Research Program of the National Research University Higher School of Economics. S.N. acknowledges the support from NOVA LINCS (UIDB/04516/2020).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hayrapetyan, Z., Nascimento, S., Fenner, T., Frolov, D., Mirkin, B. (2022). Modeling Generalization in Domain Taxonomies Using a Maximum Likelihood Criterion. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 469. Springer, Cham. https://doi.org/10.1007/978-3-031-04819-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-04819-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04818-0
Online ISBN: 978-3-031-04819-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)