Abstract
The information retrieval is the task of obtaining relevant information from a large collection of databases. Preprocessing plays an important role in information retrieval to extract the relevant information. In this paper, a text preprocessing approach text preprocessing for information retrieval (TPIR) is proposed. The proposed approach works in two steps. Firstly, spell check utility is used for enhancing stemming and secondly, synonyms of similar tokens are combined. In this paper, proposed technique is applied to a case study on International Monetary Fund. The experimental results prove the efficiency of the proposed approach in terms of complexity, time and performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Subbaiah S.: Extracting Knowledge using Probabilistic Classifier for Text Mining. In: International Conference on Pattern Recognition Informatics and Mobile Engineering, pp. 440–442 (2013).
Bhujade V., Jhanwe N.J.: Knowledge Discovery in Text Mining Technique Using Association Rules Extraction. In: International Conference on Computational Intelligence and Communication Systems, pp. 498–502 (2011).
Tari, Hakenberg J., Chen Y., Son T., Gonzalez G., Baral C.: Incremental Information Extraction Using Relational Databases. IEEE Transactions on Knowledge and Data Engineering, 24, 1, 86–99 (2012).
Ramasubramanian C., Ramya R.: Effective Pre-Processing Activities in Text Mining using Improved Porter’s Stemming Algorithm. International Journal of Advanced Research in Computer and Communication Engineering. 2, 12, 4536–4538 (2013).
Patil L.H., Atique M.: A Novel Approach for Feature Selection Method TF-IDF in Document Clustering. In: IEEE International Advance Computing Conference (IACC), pp. 858–862 (2013).
Amarasinghe K., Hruska R.: Optimal Stop Word Selection for Text Mining in Critical Infrastructure Domain. In: IEEE Conference, pp. 179–184 (2015).
Singh V., Saini B.: An Effective Pre-Processing Algorithm for Information Retrieval Systems. International Journal of Database Management Systems (IJDMS), 6, 6, 13–24 (2014).
Nayak A.S., Kanive A.P., Chandavekar N., Balasubramani R: Survey on Pre-Processing Techniques for Text Mining. International Journal Of Engineering And Computer Science, ISSN: 2319-7242, 5, 6, pp. 16875–16879, (2016).
Xubu M., Guo J.: Information Extraction of Strategic Activities based on Semi-structured Text. In: International Joint Conference on Computational Sciences and Optimization, pp. 579–583 (2014).
Hadni M., Lachkar A., Ouatik S.A.: A New and Efficient Stemming Technique for Arabic Text Categorization. In: International Conference on Multimedia Computing and Systems (ICMCS), pp. 791–796 (2012).
Feilmayr C.: Text Mining-Supported Information Extraction an Extended Methodology for Developing Information Extraction Systems. In: International Workshop on Database and Expert Systems Applications, pp. 217–221 (2011).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Virmani, D., Taneja, S. (2019). A Text Preprocessing Approach for Efficacious Information Retrieval. In: Panigrahi, B., Trivedi, M., Mishra, K., Tiwari, S., Singh, P. (eds) Smart Innovations in Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol 669. Springer, Singapore. https://doi.org/10.1007/978-981-10-8968-8_2
Download citation
DOI: https://doi.org/10.1007/978-981-10-8968-8_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8967-1
Online ISBN: 978-981-10-8968-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)