Improving Information-Carrying Data Capacity in Text Mining

Gibert, Marcin

doi:10.1007/978-3-319-24306-1_63

Marcin Gibert¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9330))

2258 Accesses

Abstract

In this article the relation between the selection of textual data representation and text mining quality has been shown. Due to this, the information-carrying capacity of data has been formalized. Then the procedure of comparing information-carrying data capacity with different structures has been described. Moreover, the method of preparing the γ -gram representation of a text involving machine learning methods and ontology created by the domain expert, has been presented. This method integrates expert knowledge and automatic methods to develop the traditional text-mining technology, which cannot understand text semantics. Representation built in this way can improve the quality of text mining, what was shown in the test research.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Large Scale Text Mining Approaches for Information Retrieval and Extraction

Research on Semantic Text Mining Based on Domain Ontology

Meta-learning of Textual Representations

Keywords

References

Amiri, I.S., Akanbi, O.A., Fazeldehkordi, E.: A Machine-Learning Approach to Phishing Detection and Defense, p. 27 (2014)
Google Scholar
Mbarek, R., Tmar, M., Hattab, H.: A New Relevance Feedback Algorith Based on Vector Space Basis Change. Computational Linguistics and Intelligent Text Processing 2, 355–356 (2014)
Google Scholar
Munier, N.: A Strategy for Using Multicriteria Analysis in Decision-Making: A Guide for Simple and Complex Environmental Projects, pp. 59–65 (2011)
Google Scholar
Velasquez, M., Hester, P.T.: An Analysis of Multi-Criteria Decision Making Methods. International Journal of Operations Research 10(2), 56–66 (2013)
MathSciNet Google Scholar
Gawrysiak, P.: Automatyczna kategoryzacja dokumentów, pp. 36–45 (2001)
Google Scholar
Ramasubramanian, C., Ramya, R.: Effective Pre-Processing Activities in Text Mining using Improved Porter’s Stemming Algorithm. International Journal of Advanced Research in Computer and Communication Engineering, Valume 2(12), 4537 (2013)
Google Scholar
Weiss, S.M., Indurkhya, N., Zhang, T.: Fundamentals of Predictive Text Mining, pp. 17–19 (2010)
Google Scholar
Landauer, T.K., McNamara, D.S., Dennis, S., Kintsch, W.: Handbook of Latent Semantic Analysis, p. 10 (2013)
Google Scholar
Dale, R., Moisl, H., Somers, H.: Handbook of Natural Language Processing, p. 11 (2000)
Google Scholar
Luna Dong, X., Gabrilovich, E., Murphy, K., Dang, V., Horn, W., Lugaresi, C., Sun, S., Zhang, W.: Knowledge_based Trust: Estimating the Trustworthiness of Web Sources. Computer Science Database (2015)
Google Scholar
Gentile, A.L., Basile, P., Iaquinta, L., Semeraro, G.: Lexical and semantic resources for NLP: from words to meanings. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part III. LNCS (LNAI), vol. 5179, pp. 277–284. Springer, Heidelberg (2008)
Chapter Google Scholar
Kononenko, I., Kukar, M.: Machine Learning and Data Mining, p. 17 (2007)
Google Scholar
Berry, M., Linoff, G.: Mastering Data Mining: The Art and Science of Customer Relationship Management, p. 7 (2004)
Google Scholar
Esposti, M.D.: Mathematical Models of Textual Data: A short Review, pp. 100–102 (2014)
Google Scholar
Sabbah, T., Selemat, A.: Modified Frequency-Based Term Weighting Scheme for Accurate Dark Web Content Classification, pp. 185–187 (2014)
Google Scholar
Jackson, P., Moulinier, I.: Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Categorization, Amsterdam, vol. 5, pp. 125–126 (2007)
Google Scholar
Bechhofer, S., Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D.L., Patel-Schneider, P.F., Stein, L.A.: OWL Web Ontology Language (2015). http://www.w3.org/TR/owl-ref/
Merkelis, R.: Philosophy and Linguistics, p. 12 (2013). http://www.slideshare.net/robertasmerkelis/philosophy-and-linguistics-28940425
Jiang, L., Zhang, H.-b., Yang, X., Xie, N.: Research on semantic text mining based on domain ontology. In: Li, D., Chen, Y. (eds.) Computer and Computing Technologies in Agriculture VI, Part I. IFIP AICT, vol. 392, pp. 336–343. Springer, Heidelberg (2013)
Chapter Google Scholar
Chakraborty, G., Pagolu, M., Satshi, G.: Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS, p. 70 (2014)
Google Scholar
Sanders, T., Schilperoord, J., Spooren, W.: Text Representation: Linguistic and Psycholinguistic Aspects, pp. 1–19 (2001)
Google Scholar
Śmiałkowska, B., Gibert, M.: The classification of text documents by using Latent Semantic Analysis for extracted information. Ekonomiczne Problemu Usług No. 106, Zeszyty Naukowe Uniwersytetu Szczecińskiego No. 781, pp. 345–358 (2013)
Google Scholar
Smialkowska, B., Gibert, M.: The classification of text documents in Polish language by using Latent Semantic Analysis for extracted information. Theoretical and applied informatics 25, 239–250 (2013)
Google Scholar
Lubaszewski, W.: Słowniki komputerowe i automatyczna ekstrakcja informacji z tekstu (2009)
Google Scholar
Wang, R.Y., Strong, D.M.: What data quality means to data consumers. Journal of Management Information Systems 12(4), 7 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems Engineering, Faculty of Computer Science, West Pomeranian University of Technology in Szczecin, Szczecin, Poland
Marcin Gibert

Authors

Marcin Gibert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcin Gibert .

Editor information

Editors and Affiliations

Universidad Complutense de Madrid, Madrid, Spain
Manuel Núñez
Wroclaw University of Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Computer Science Department, Universidad Autónoma De Madrid, Madrid, Spain
David Camacho
Wroclaw University of Technology, Wroclaw, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gibert, M. (2015). Improving Information-Carrying Data Capacity in Text Mining. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9330. Springer, Cham. https://doi.org/10.1007/978-3-319-24306-1_63

Download citation

DOI: https://doi.org/10.1007/978-3-319-24306-1_63
Published: 24 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24305-4
Online ISBN: 978-3-319-24306-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Information-Carrying Data Capacity in Text Mining

Abstract

Chapter PDF

Similar content being viewed by others

Large Scale Text Mining Approaches for Information Retrieval and Extraction

Research on Semantic Text Mining Based on Domain Ontology

Meta-learning of Textual Representations

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving Information-Carrying Data Capacity in Text Mining

Abstract

Chapter PDF

Similar content being viewed by others

Large Scale Text Mining Approaches for Information Retrieval and Extraction

Research on Semantic Text Mining Based on Domain Ontology

Meta-learning of Textual Representations

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation