Abstract
Representation learning is a method to compute the corresponding vectorized representations of entities or relationships. It is one of the most basic and essential natural language processing tasks. Current computer domain knowledge modeling techniques have two flaws: (1) the neglect of fine-grained knowledge hierarchies, and (2) the lack of a unified reference standard for modeling domain information. The fine-grained knowledge hierarchy includes knowledge domains, units, and topics. We use the Computer Science Guidelines as a standard to annotate an unstructured and unlabeled corpus in the computer domain with knowledge annotation and topic mapping. We organise the corpus into a computer domain knowledge system with a three-level hierarchy. We propose a knowledge representation method that incorporates contextual semantic information and topic information. The method can be applied to discover connections between knowledge of entities of different granularity. We compare it with several existing textual representation methods. Experimental results on extracting knowledge representations in computer domains show that combining contextual semantic information and topic information methods are more effective than single ones.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
ACM/IEEE-CS Joint Task Force on Computing Curricula: Computer science curricula 2013. Technical Report. ACM Press and IEEE Computer Society Press (2013). https://doi.org/10.1145/2534860
Bendersky, M., Croft, W.B.: Modeling higher-order term dependencies in information retrieval using query hypergraphs. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 941–950 (2012)
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. In: Advances in Neural Information Processing Systems, vol. 13 (2000)
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, pp. 1045–1048. Makuhari (2010)
Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
Wang, P., Xu, J., Xu, B., Liu, C., Zhang, H., Wang, F., Hao, H.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2: Short Papers, pp. 352–357 (2015)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Schenker, A., Last, M., Bunke, H., Kandel, A.: Graph representations for web document clustering. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 935–942. Springer (2003)
Mihalcea, R., Tarau, P.: Textrank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab (1999)
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers), pp. 2227–2237. Association for Computational Linguistics, New Orleans, Louisiana (2018). https://doi.org/10.18653/v1/N18-1202. https://aclanthology.org/N18-1202
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Kenton, J.D.M.W.C., Toutanova, L.K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: 2010 IEEE International Conference on Data Mining, pp. 911–916. IEEE (2010)
Zhang, S., Yang, Z., Xing, X., Gao, Y., Xie, D., Wong, H.S.: Generalized pair-counting similarity measures for clustering and cluster ensembles. IEEE Access 5, 16904–16918 (2017). https://doi.org/10.1109/ACCESS.2017.2741221
Acknowledgment
The work described in this paper was partially supported by grants from the funding of Guangzhou education scientific research project [No. 1201730714], and the Guangdong Basic and Applied Basic Research Foundation [No. 2022A151501-1697].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, L., Zhong, Q., Zhang, S. (2023). A Data-Based Approach for Computer Domain Knowledge Representation. In: Xiong, N., Li, M., Li, K., Xiao, Z., Liao, L., Wang, L. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 153. Springer, Cham. https://doi.org/10.1007/978-3-031-20738-9_93
Download citation
DOI: https://doi.org/10.1007/978-3-031-20738-9_93
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20737-2
Online ISBN: 978-3-031-20738-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)