Abstract
In this article the relation between the selection of textual data representation and text mining quality has been shown. Due to this, the information-carrying capacity of data has been formalized. Then the procedure of comparing information-carrying data capacity with different structures has been described. Moreover, the method of preparing the γ -gram representation of a text involving machine learning methods and ontology created by the domain expert, has been presented. This method integrates expert knowledge and automatic methods to develop the traditional text-mining technology, which cannot understand text semantics. Representation built in this way can improve the quality of text mining, what was shown in the test research.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Amiri, I.S., Akanbi, O.A., Fazeldehkordi, E.: A Machine-Learning Approach to Phishing Detection and Defense, p. 27 (2014)
Mbarek, R., Tmar, M., Hattab, H.: A New Relevance Feedback Algorith Based on Vector Space Basis Change. Computational Linguistics and Intelligent Text Processing 2, 355–356 (2014)
Munier, N.: A Strategy for Using Multicriteria Analysis in Decision-Making: A Guide for Simple and Complex Environmental Projects, pp. 59–65 (2011)
Velasquez, M., Hester, P.T.: An Analysis of Multi-Criteria Decision Making Methods. International Journal of Operations Research 10(2), 56–66 (2013)
Gawrysiak, P.: Automatyczna kategoryzacja dokumentów, pp. 36–45 (2001)
Ramasubramanian, C., Ramya, R.: Effective Pre-Processing Activities in Text Mining using Improved Porter’s Stemming Algorithm. International Journal of Advanced Research in Computer and Communication Engineering, Valume 2(12), 4537 (2013)
Weiss, S.M., Indurkhya, N., Zhang, T.: Fundamentals of Predictive Text Mining, pp. 17–19 (2010)
Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W.: Handbook of Latent Semantic Analysis, p. 10 (2013)
Dale, R., Moisl, H., Somers, H.: Handbook of Natural Language Processing, p. 11 (2000)
Luna Dong, X., Gabrilovich, E., Murphy, K., Dang, V., Horn, W., Lugaresi, C., Sun, S., Zhang, W.: Knowledge_based Trust: Estimating the Trustworthiness of Web Sources. Computer Science Database (2015)
Gentile, A.L., Basile, P., Iaquinta, L., Semeraro, G.: Lexical and semantic resources for NLP: from words to meanings. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part III. LNCS (LNAI), vol. 5179, pp. 277–284. Springer, Heidelberg (2008)
Kononenko, I., Kukar, M.: Machine Learning and Data Mining, p. 17 (2007)
Berry, M., Linoff, G.: Mastering Data Mining: The Art and Science of Customer Relationship Management, p. 7 (2004)
Esposti, M.D.: Mathematical Models of Textual Data: A short Review, pp. 100–102 (2014)
Sabbah, T., Selemat, A.: Modified Frequency-Based Term Weighting Scheme for Accurate Dark Web Content Classification, pp. 185–187 (2014)
Jackson, P., Moulinier, I.: Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Categorization, Amsterdam, vol. 5, pp. 125–126 (2007)
Bechhofer, S., Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D.L., Patel-Schneider, P.F., Stein, L.A.: OWL Web Ontology Language (2015). http://www.w3.org/TR/owl-ref/
Merkelis, R.: Philosophy and Linguistics, p. 12 (2013). http://www.slideshare.net/robertasmerkelis/philosophy-and-linguistics-28940425
Jiang, L., Zhang, H.-b., Yang, X., Xie, N.: Research on semantic text mining based on domain ontology. In: Li, D., Chen, Y. (eds.) Computer and Computing Technologies in Agriculture VI, Part I. IFIP AICT, vol. 392, pp. 336–343. Springer, Heidelberg (2013)
Chakraborty, G., Pagolu, M., Satshi, G.: Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS, p. 70 (2014)
Sanders, T., Schilperoord, J., Spooren, W.: Text Representation: Linguistic and Psycholinguistic Aspects, pp. 1–19 (2001)
Śmiałkowska, B., Gibert, M.: The classification of text documents by using Latent Semantic Analysis for extracted information. Ekonomiczne Problemu Usług No. 106, Zeszyty Naukowe Uniwersytetu Szczecińskiego No. 781, pp. 345–358 (2013)
Smialkowska, B., Gibert, M.: The classification of text documents in Polish language by using Latent Semantic Analysis for extracted information. Theoretical and applied informatics 25, 239–250 (2013)
Lubaszewski, W.: Słowniki komputerowe i automatyczna ekstrakcja informacji z tekstu (2009)
Wang, R.Y., Strong, D.M.: What data quality means to data consumers. Journal of Management Information Systems 12(4), 7 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gibert, M. (2015). Improving Information-Carrying Data Capacity in Text Mining. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9330. Springer, Cham. https://doi.org/10.1007/978-3-319-24306-1_63
Download citation
DOI: https://doi.org/10.1007/978-3-319-24306-1_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24305-4
Online ISBN: 978-3-319-24306-1
eBook Packages: Computer ScienceComputer Science (R0)