Abstract
This paper presents a relatively rare case of an optimization problem in data analysis to admit a globally optimal solution by a recursive algorithm. We are concerned with finding a most specific generalization of a fuzzy set of topics assigned to leaves of domain taxonomy represented by a rooted tree. The idea is to “lift” the set to its “head subject” in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly bringing in some errors, either “gaps” or “offshoots” or both. Our method globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. We apply this to a collection of 17645 research papers on Data Science published in 17 Springer journals for the past 20 years. We extract a taxonomy of Data Science (TDS) from the international Association for Computing Machinery Computing Classification System 2012. We find fuzzy clusters of leaf topics over the text collection, optimally lift them to head subjects in TDS, and comment on the tendencies of current research following from the lifting results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
The 2012 ACM Computing Classification System. http://www.acm.org/about/class/2012. Accessed 30 Apr 2018
Blei, D.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Chernyak, E.: An approach to the problem of annotation of research publications. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 429–434. ACM (2015)
Frolov, D., Mirkin, B., Nascimento, S., Fenner, T.: Finding an appropriate generalization for a fuzzy thematic set in taxonomy. Working paper WP7/2018/04, Moscow, Higher School of Economics Publ. House, 58 p. (2018)
Lloret, E., Boldrini, E., Vodolazova, T., MartÃnez-Barco, P., Munoz, R., Palomar, M.: A novel concept-level approach for ultra-concise opinion summarization. Expert. Syst. Appl. 42(20), 7148–7156 (2015)
Mei, J.P., Wang, Y., Chen, L., Miao, C.: Large scale document categorization with fuzzy clustering. IEEE Trans. Fuzzy Syst. 25(5), 1239–1251 (2017)
Mirkin, B., Nascimento, S.: Additive spectral method for fuzzy cluster analysis of similarity data including community structure and affinity matrices. Inf. Sci. 183(1), 16–34 (2012)
Mueller, G., Bergmann, R.: Generalization of workflows in process-oriented case-based reasoning. In: FLAIRS Conference, pp. 391–396 (2015)
Pampapathi, R., Mirkin, B., Levene, M.: A suffix tree approach to anti-spam email filtering. Mach. Learn. 65(1), 309–338 (2006)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 25(5), 513–523 (1998)
Song, Y., Liu, S., Wang, H., Wang, Z., Li, H.: Automatic taxonomy construction from keywords. US Patent No. 9,501,569. Washington, DC, US Patent and Trademark Office (2016)
Vedula, N., Nicholson, P.K., Ajwani, D., Dutta, S., Sala, A., Parthasarathy, S.: Enriching taxonomies with functional domain knowledge. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 745–754. ACM (2018)
Waitelonis, J., Exeler, C., Sack, H.: Linked data enabled generalized vector space model to improve document retrieval. In: Proceedings of NLP & DBpedia 2015 Workshop in Conjunction with 14th International Semantic Web Conference (ISWC), vol. 1486. CEUR-WS (2015)
Wang, C., He, X., Zhou, A.: A Short survey on taxonomy learning from text corpora: issues, resources and recent advances. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1190–1203 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Frolov, D., Mirkin, B., Nascimento, S., Fenner, T. (2020). Globally Optimal Parsimoniously Lifting a Fuzzy Query Set Over a Taxonomy Tree. In: Le Thi, H., Le, H., Pham Dinh, T. (eds) Optimization of Complex Systems: Theory, Models, Algorithms and Applications. WCGO 2019. Advances in Intelligent Systems and Computing, vol 991. Springer, Cham. https://doi.org/10.1007/978-3-030-21803-4_78
Download citation
DOI: https://doi.org/10.1007/978-3-030-21803-4_78
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21802-7
Online ISBN: 978-3-030-21803-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)