Abstract
With the increase in internet usage, the amount of available textual data has also continued to increase rapidly. In addition, the development of stronger computers has enabled the processing of data to become much easier. The tourism field has a strong potential to utilize such data available on the internet; yet, on the other hand, a high proportion of available data is unlabelled and unprocessed. In order to use them effectively, new methods and new approaches are needed. In this regard, the area of Natural Language Processing (NLP) helps researchers to utilize textual data and develop an understanding of text analysis. By using machine learning approaches, text mining potential can expand enormously, leading to deeper insights, a better understanding of social phenomena, and, thus, also a better basis for decision-making. As such, this chapter will provide the reader with the basics of NLP as well as present the text pre-processing procedure in detail.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Aicher, J., Asiimwe, F., Batchuluun, B., Hauschild, M., Zöhrer, M., & Egger, R. (2016). Online hotel reviews: Rating symbols or text… text or rating symbols? That is the question! In A. Inversini & R. Schegg (Eds.), Information and communication Technologies in Tourism 2016 (pp. 369–382). Springer International Publishing.
Alaei, A. R., Becken, S., & Stantic, B. (2017). Sentiment analysis in tourism: Capitalising on big data. Journal of Travel Research, 1(9), 175–191.
Albishre, K., Albathan, M., & Li, Y. (2015, December). Effective 20 newsgroups dataset cleaning. In 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) (Vol. 3, pp. 98–101). IEEE.
Anandarajan, M., Hill, C., & Nolan, T. (2019). Practical text analytics (Vol. 2). Springer International Publishing.
Baldwin, T., Cook, P., Lui, M., MacKinlay, A., & Wang, L. (2013, October). How noisy social media text, how different social media sources? In Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 356–364).
Bird, S., Loper, E., & Klein, E. (2009). Natural language processing with python. O'Reilly Media.
Blondel, V. D., & Senellart, P. P. (2002). Automatic extraction of synonyms in a dictionary. vertex, 1, x1.
Boyarskaya, E. (2019). Ambiguity matters in linguistics and translation. Слово.ру: балтийский акцент, 10(3), 81–93. https://doi.org/10.5922/2225-5346-2019-3-6
Bussière, K. (2018). Chapter 4 – Text analysis (digital humanities - a primer). Available online at https://carletonu.pressbooks.pub/digh5000/chapter/chapter-4-text-analysis/.
Calheiros, A. C., Moro, S., & Rita, P. (2017). Sentiment classification of consumer-generated online reviews using topic modeling. Journal of Hospitality Marketing & Management, 26(7), 675–693.
Chang, Y. C., Ku, C. H., & Chen, C. H. (2020). Using deep learning and visual analytics to explore hotel reviews and responses. Tourism Management, 80, 104129.
Chantrapornchai, C., & Tunsakul, A. (2019). Information extraction based on named entity for tourism corpus. In 2019 16th International Joint Conference on Computer Science and Software Engineering (pp. 187–192). IEEE.
Conti, E., & Lexhagen, M. (2020). Instagramming nature-based tourism experiences: A netnographic study of online photography and value creation. Tourism Management Perspectives, 34, 2–3.
Cook, P., Evert, S., Schäfer, R., & Stemle, E. (Eds.). (2016). Proceedings of the 10th Web as Corpus Workshop. Association for Computational Linguistics.
Egger, R. (2010). Theorizing web 2.0 phenomena in tourism: A sociological signpost. Information Technology & Tourism, 12(2), 125–137. https://doi.org/10.3727/109830510X12887971002666
Fielding, N. G., Lee, R. M., & Blank, G. (2017). The SAGE handbook of online research methods. SAGE Publications Ltd.
García-Pablos, A., Cuadros, M., & Linaza, M. T. (2016). Automatic analysis of textual hotel reviews. Information Technology & Tourism, 16(1), 45–69.
Guerreiro, J., & Rita, P. (2020). How to predict explicit recommendations in online reviews using text mining and sentiment analysis. Journal of Hospitality and Tourism Management, 43, 269–272.
Han, H. J.; Mankad, S.; Gavirneni, N.; Verma, R. (2016). What guests really think of your hotel: Text analytics of online customer reviews. Cornell Hospitality report, 16(2), 3–17. Available online at https://scholarship.sha.cornell.edu/cgi/viewcontent.cgi?article=1003&context=chrreports, checked on 4/5/2019.
Hannigan, T. R., Haans, R. F. J., Vakili, K., Tchalian, H., Glaser, V. L., Wang, M. S., Kaplan, S., & Jennings, P. D. (2019). Topic modeling in management research: Rendering new theory from textual data. Academy of Management Annals, 13(2), 586–632.
Hapke, H. M., Lane, H., & Howard, C. (2019). Natural language processing in action. Manning.
Hazem, A., & Daille, B. (2018, May). Word embedding approach for synonym extraction of multi-word terms. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
IDC (2018). Time Crunch: Equalising time spent on data management vs analytics. https://blogs.idc.com/2018/08/23/time-crunch-equalizing-time-spent-on-data-management-vs-analytics/
Ignatow, G., & Mihalcea, R. (2017). Text mining: A guidebook for the social sciences. SAGE Publications, Inc.
Kannan, S., & Gurusamy, V. (2014). Pre-processing techniques for text mining. International Journal of Computer Science & Communication Networks, 5(1), 7–16.
Kannan, S., Gurusamy, V., Vijayarani, S., Ilamathi, J., & Nithya, M. (2014). Preprocessing techniques for text mining. International Journal of Computer Science & Communication Networks, 5(1), 7–16.
Kao, A., & Poteet, S. R. (2007). Natural language processing and text mining. Springer.
Keung, P., Lu, Y., Szarvas, G., & Smith, N. A. (2020). The multilingual Amazon reviews corpus.
Kumar, C. P., & Babu, L. D. (2019). Novel text pre-processing framework for sentiment analysis. In Smart intelligent computing and applications (pp. 309–317). Springer.
Li, S., Li, G., Law, R., & Paradies, Y. (2020). Racism in tourism reviews. Tourism Management, 80, 104100.
Li, Q., Li, S., Zhang, S., Hu, J., & Hu, J. (2019). A review of text corpus-based tourism big data mining. Applied Sciences, 9(16), 3300. https://doi.org/10.3390/app9163300
Ma, Y., Xiang, Z., Du, Q., & Fan, W. (2018). Effects of user-provided photos on hotel review helpfulness: An analytical approach with deep leaning. International Journal of Hospitality Management, 71, 120–131.
MacCartney, B. (2014). Understanding natural language understanding. ACM SIGAI Bay Area Chapter Inaugural Meeting, 2014. Available online at https://nlp.stanford.edu/~wcmac/papers/20140716-UNLU.pdf.
Manning, C. (2019, March 21). Coreference Resolution [Video]. Youtube. https://www.youtube.com/watch?v=i19m4GzBhfc&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z&index=16&ab_channel=stanfordonline
Markopoulos, G., Mikros, G., Iliadi, A., & Liontos, M. (2015). Sentiment analysis of hotel reviews in Greek: A comparison of unigram features. In Cultural tourism in a digital era (pp. 373–383). Springer.
Mendez, J. R., Iglesias, E. L., Fdez-Riverola, F., Diaz, F., & Corchado, J. M. (2005, November). Tokenising, stemming and stopword removal on anti-spam filtering domain. In Conference of the Spanish Association for Artificial Intelligence (pp. 449–458). Springer.
Merriam-Webster. (2021). Contraction. In Merriam-Webster.com dictionary. Retrieved January 14, 2021, from. https://www.merriam-webster.com/dictionary/contraction
Munezero, M., Montero, C. S., Sutinen, E., & Pajunen, J. (2014). Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Transactions on Affective Computing, 5(2), 101–111.
Poon, A. (1993). Tourism, technology and competitive strategies. CAB International.
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Porter, M. F. (2001). Snowball: A language for stemming algorithms. Available online at http://snowball.tartarus.org/texts/introduction.html.
Qi, P., Dozat, T., Zhang, Y., Manning, C. D., 2018. Universal dependency parsing from scratch. In Proceedings of the CoNLL 2018 shared task: Multilingual parsing from raw text to Universal Dependencies.
Ricci, F. (2020). Recommender systems in Tourism. In Z. Xiang, M. Fuchs, U. Gretzel, & W. Höpken (Eds.), Handbook of e-Tourism (pp. 1–18). Springer International Publishing; Imprint Springer.
Rockwell, G. (2003). What is text analysis, really? Literary and Linguistic Computing, 18(2), 209–219.
Saralegi, X., & Leturia, I. (2007). Kimatu, a tool for cleaning non-content text parts from HTML docs. In Proceedings of the 3rd Web as Corpus Workshop (pp. 163–167).
Sarkar, D. (2019). Text analytics with python. Apress.
Sarker, A., & Gonzalez, G. (2016, December). Data, tools and resources for mining social media drug chatter. In Proceedings of the fifth workshop on building and evaluating resources for biomedical text mining (BioTxtM2016) (pp. 99–107).
Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39, pp. 1041–4347). Cambridge University Press.
Siemens, R. (1996). Lemmatization and parsing with TACT pre-processing programs. Digital Studies/Le champ numérique.
Thanaki, J. (2017). Python natural language processing. Explore NLP with machine learning and deep learning techniques. Packt.
Tsai, C.-F., Chen, K., Hu, Y.-H., & Chen, W.-K. (2020). Improving text summarization of online hotel reviews with review helpfulness and sentiment. In Tourism Management, 80, 104122. https://doi.org/10.1016/j.tourman.2020.104122
Vijayarani, S., Ilamathi, M. J., & Nithya, M. (2015). Pre-processing techniques for text mining-an overview. International Journal of Computer Science & Communication Networks, 5(1), 7–16.
Wennker, P. (2020). Künstliche Intelligenz in der Praxis. Anwendung in Unternehmen und Branchen: KI wettbewerbs- und zukunftsorientiert Einsetzen. Springer Gabler. Available online at https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=6326361
Xiang, Z. (2018). From digitisation to the age of acceleration: On information technology and tourism. Tourism Management Perspectives, 25, 147–150.
Xiang, Z., Du, Q., Ma, Y., & Fan, W. (2017). A comparative analysis of major online review platforms: Implications for social media analytics in hospitality and tourism. Tourism Management, 58, 51–65.
Yang, L., Cao, H., Hao, F., Zhang, W. Z., & Ahmad, M. (2020). Research on tourism question answering system based on xi’an tourism knowledge graph. Journal of Physics: Conference Series, 1616(1), 12090. https://doi.org/10.1088/1742-6596/1616/1/012090
Yu, J., & Egger, R. (2021). Tourist experiences at overcrowded attractions: A text analytics approach. In W. Wörndl, C. Koo, & J. L. Stienmetz (Eds.), Information and Communication Technologies in Tourism 2021. Proceedings of the ENTER 2021 eTourism Conference, January 19–22, 2021 (pp. 231–243). Springer.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Further Readings and Other Sources
Further Readings and Other Sources
-
Quantum stat provides a list with more than 300 NLP Colab-Notebooks, providing an excellent overview by describing the notebook, the language-model used, and the NLP tasks it is designed for. https://notebooks.quantumstat.com/
-
Ivan Bilan, the author of chapter 19 (Entity Matching), has established "The NLP Pandect", an incredible comprehensive and helpful collection covering almost all topics on NLP. Among them are compendiums, conference papers, NLP datasets, links to podcasts, newsletters, meetups, YouTube channels, and much more. https://github.com/ivan-bilan/The-NLP-Pandect
-
The University of Michigan offers a complete NLP course on Youtube https://tinyurl.com/NLP-michigan, and we also recommend to check for free courses on coursea.org, like the ones from DeepLearning.AI https://tinyurl.com/deeplearningai-course or the HSE University https://tinyurl.com/NLP-HSE-course.
-
Finally, a great NLP course is also provided by Lena Voita, who is teaching at the Yandex School of Data Analysis. https://lena-voita.github.io/nlp_course.html
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Egger, R., Gokce, E. (2022). Natural Language Processing (NLP): An Introduction. In: Egger, R. (eds) Applied Data Science in Tourism. Tourism on the Verge. Springer, Cham. https://doi.org/10.1007/978-3-030-88389-8_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-88389-8_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88388-1
Online ISBN: 978-3-030-88389-8
eBook Packages: Business and ManagementBusiness and Management (R0)