Abstract
We developed a system to extract tourist information from the web. However, insufficient tourist information is often provided from Twitter. We believe that previous methods could not consider tweets about tourist spots that did not contain the tourist spot name. In this study, we propose a tourist information extraction method from tweets without tourist spot names. In our experiment, we evaluated whether tourist information was contained in tweets before and after tweets containing the tourist spot names, tweets of followers of the user who tweeted tourist spot names, and tweets with images that do not contain tourist spot names. The experiments provided the following three results: (1) Tweets without tourist spot names tweeted before and after tweets containing tourist spot names contain tourist information. (2) Replies to tweets containing tourist spot names contain tourist information. (3) Tweets with images that do not contain tourist spot names contain information regarding the food and entertainment available at tourist spots.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The average number of trips taken by Japanese travelers decreased after 2006; this decrease stopped after 2010, based on a national tourism survey by the Japan Tourism Agency [1]. The appearance of new forms of travel in Japan, in which fans visit the locations of dramas and animated programs, is considered to be a factor [2]. In addition, the number of foreign travelers to Japan has increased year-by-year since 2012 including visitor arrivals and overseas Japanese travelers, as determined by the Japan Tourism Agency [3]. Therefore, tourist spot operators want to know the needs of tourists and any problems with the tourist spot so that they can be solved because the tourism behaviors of tourists have changed. In our previous study, we developed a system both to extract tourist information from the web and to visualize similarities. However, insufficient tourist information was provided by Twitter [4]. We believe that we did not consider tweets about tourist spots that do not containing tourist spot names. In this study, we propose a tourist information extraction method to apply to tweets that do not contain tourist spot names. We evaluated whether we can extract tourist information from tweets that do not contain tourist spot names using tweets with position information regarding the surrounding tourist spot, tweets with images of the tourist spot, tweets before and after tweets containing the tourist spot name, and followers of the user who tweeted the tourist spot name. In the remainder of this paper, TNCT refers to Tweets that do Not Contain Tourist spot names.
2 Related Work
Shimada et al. proposed a tourist information analysis system using tweets containing tourist spot names and words related to the tourist spots [5]. They extracted these tweets and analyzed the polarity of the tweets. Alan et al. proposed a method to extract events using unique representations of tweets and dates [6]. They extracted these tweets containing unique representations that are strongly related to specific times as tweets related to the event. They were using tweets containing tourist spot names and words related to the tourist spots; however, we extracted information about tourist spots from not only tweets containing tourist spot names but also tweets that do not contain tourist spot names.
Oku et al. developed a tourist spot recommendation system using tweets that included position information and images that included position information [7]. They combined the active region of the tourist spot in tweets with both position information and the tourist spot name of the target, in addition to images that include position information. The active region is an area where users often tweet about the target tourist spot. They estimated the active region of the tourist spot, and recommended tourist spots based on the features of the tweets from within the region. Ryong et al. proposed a measurement method for geographical regularity using time and position information from tweets to detect social events [8]. They estimated the normal geographical regularity from the times of tweets based on position information, pose, and the action of the user, and detected irregular events in the region of the target. They extracted tourist information using position information contained in tweets, images, and tweet times. However, the extraction by the position information has a limit, because the number of tweets containing position information is very small. We extracted information about tourist spots from not only tweets containing tourist spot names but also TNCTs.
3 Tourist Information Extraction Method
In this study, we evaluated whether we could extract tourist information from TNCTs. Figure 1 shows the tourist information extraction method.
-
Procedure 1
Collection of tweets containing tourist spot names
This procedure collects tweets containing tourist spot names from tweets that include position information and tweets that do not include position information [Fig. 1(A)-a]. This procedure estimates the user location at the time of the tweet about the tourist spot using the active region estimation method suggested by Oku et al. [7]. The procedure also collects tweets within the defined region [Fig. 1(A)-b].
-
Procedure 2
Collection of tweets before and after tweets containing tourist spot names
This procedure collects tweets within 3 h before and after tweets containing tourist spot names [Fig. 1(A)-c]. In this study, we set the target of collecting within 3 h because we think that you can see the sights within 3 h when you visit the tourist spots. People tend to tweet multiple times in succession within a few hours, rather than tweeting with a long gap between each tweet.
-
Procedure 3
Collection of tweets of followers
This procedure collects the tweets of the followers of a user whose tweets contained tourist spot names [Fig. 1(A)-d]. This procedure collects neighboring tweets of tweets containing tourist spot names from the tweets of followers because these people might travel together with their followers.
-
Procedure 4
Collection of tweets with images of tourist spots
This procedure collects tweets with images of the tourist spot [Fig. 1(A)-e]. We use Google Cloud Vision APIFootnote 1 to detect landmarks for identification of tourist spots.
-
Procedure 5
Extraction of feature words
This procedure selects tweets that may include tourist information from the collected tweets obtained from Procedure 1 to Procedure 4 [Fig. 1(B)]. This procedure extracts the feature words of collected tweets containing tourist information [Fig. 1(C)]. In this study, the tweets containing tourist information are defined as tweets containing opinions, impressions, complaints, etc. regarding the tourist spots. The procedure separates the words of tweets by applying a morphological analysis systemFootnote 2 and uses tf-idf for extraction of feature words. We determined the parts of speech (i.e., noun, adjective, and verb) during the extraction for words that may be used to describe the features of tourist spots.
-
Procedure 6
Collection of tweets containing feature words
This procedure collects extracted the tweets containing feature words from Procedure 5 from tweets without position information [Fig. 1(D)]. We target TNCTs containing feature words in this collection.
-
Procedure 7
Re-collection of tourist information
This procedure re-collects tourist information from TNCTs by repeating Procedure 5 and Procedure 6.
4 Experiment
In this experiment, we evaluated whether tourist information was contained in tweets before and after tweets containing tourist spot names, tweets of followers of the user containing tourist spot names, and tweets with images that do not contain tourist spot names. In the previous study, we found tweets with position information and feature words to be effective [9]. We extracted the tourist information regarding the tourist spots for a total of 10 points at tourist spots for the first location in nine areas (each in TripAdvisorFootnote 3) and the first location in the Japan tourism ranking (using TripAdvisorFootnote 4). Table 1 shows the tourist spots of the target to be collected in this experiment. We searched for tweets containing tourist spot names using the official keyword search on Twitter, and examined tweets before and after the tweets and the tweets of followers between 0:00 and 23:59 on November 23, 2016, which is a national holiday in Japan. This date was chosen because we think many people used this day to visit family and friends at tourist spots. In this experiment, we evaluated whether there was tourist information contained in tweets containing images but not containing tourist spot names among tweets before and after tweets containing tourist spot names. We made this selection because the number of tweets containing images but not tourist spot names was too large to parse.
5 Experimental Results and Discussion
5.1 Tourist Information from Tweets Before and After Tweets Containing Tourist Spot Names
Table 2 shows the number of tweets collected for each tourist spot, the amount of tourist information in tweets containing the tourist spot name, etc. Tourism information was collected from TNCTs in all tourist spots. Table 3 shows several examples. We do not know about impressions, we only have tweet (b); however, we may determine the impressions of Ohori Park based on tweet (a), which occurred before the tweets about Ohori Park. From this result, we know that tourist information may be contained in TNCTs tweeted before and after tweets containing tourist spot names.
5.2 Tourist Information from Tweets of Followers
Table 4 shows the amount of tourist information that was contained in tweets of followers and tweets containing images. There are also duplicate tweets because there may be tweets from followers containing images. Results are tweets containing images that contain tourist information for all tourist spots; however, tweets by followers contained tourist information for seven out of ten tourist spots. In addition, the collected tweets of followers containing tourist information were replies to the tweets describing wanting to go to the tourist spots or having gone to the tourist spots, rather than describing going to the tourist spots together. Table 2 shows the amount of tourist information that is contained in replies for each tourist spot. There were TNCTs containing tourist information in replies for 30 of 134 pieces of tourist information collected in this experiment; Table 5 shows several examples. When user B tweets the reaction to the tweet about going to Fushimi Inari Taisha (tweeted by user A), user A replied to the tweet (A-2) regarding their impression of Fushimi Inari Taisha. User B replied to a tweet (B-2) regarding their impression of the tourist spot, in addition to this reply. From this result, we know that tourist information was contained in the replies to tweets containing tourist spot names.
5.3 Tourist Information from Tweets with Images
Tweets containing images also contained tourist information for all tourist spots, as shown in Table 4. The images were often of the tourist spots themselves; however, there were images of the food sold at tourist spots and entertainment at tourist spots. Table 6 shows several examples. Tweets about food and entertainment are common kinds of tweets. We learn that we can order jasmine tea at Shurijo Castle because a user wrote tweet (F-2) showing an image of the tea after writing tweet (F-1), among tweets about food. We learn that celebrations of children of three, five, and seven years of age and shrine visits were held at Atsuta Jingu because a user has written tweet (E-2) showing an image of the shrine after tweeting with the hashtag for Atsuta Jingu, among tweets about entertainment. From these results, we know that not only the impressions of the tourist spots themselves but also information about food and entertainment at tourist spots is contained in TNCTs containing images.
5.4 Summary of Discussion
In this experiment, we determined whether there was tourist information contained in TNCTs. The results of the experiments revealed the following.
-
(1)
TNCTs tweeted before and after tweets containing tourist spot names contain tourist information.
-
(2)
Replies to tweets containing tourist spot names contain tourist information.
-
(3)
TNCTs containing images contain information regarding the food and entertainment available at tourist spots.
These results lead to a solution of problem of insufficient tourist information being provided from Twitter (as seen in the previous study), namely, by finding tourist information contained in TNCTs, such as in (1) and (2). Tourist information can be classified into multiple kinds such as history, food, and entertainment at tourist spots, and they can be the main features of the tourist spots in (3). Based on this information, we can express the features of each tourist spot using multiple indices when we visualize the tourist information. The scope of awareness (including similarities and problems) will spread because we can extract different kinds of information regarding the tourist spots. In addition, the amount of tourist information in TNCTs is about 0.6 times the amount of tourist information in tweets containing tourist spot names. From this result, we could collect approximately 1.6 times the amount of tourist information collected by a normal search. Therefore, we conclude that the proposed method is effective for collecting more tourist information from Twitter.
6 Conclusions
In this study, we proposed a tourist information extraction method based on TNCTs. We evaluated whether tourist information can be extracted from TNCTs. The results of the experiments revealed the following.
-
(1)
TNCTs tweeted before and after tweets containing tourist spot names contain tourist information.
-
(2)
Replies to tweets containing tourist spot names contain tourist information.
-
(3)
TNCTs containing images contain information regarding the food and entertainment available at tourist spots.
In the future, we will examine correspondences, because we can not identify landmarks from food and entertainment images that do not photograph the tourist spots. We will determine whether the method can collect tourist information about minor tourist spots from TNCTs.
References
Japan Tourism Agency: Research study on economic impacts of tourism in Japan (2014). http://www.mlit.go.jp/common/001136064.pdf
Kazuya, H., Yusuke, K.: Research on the regional promotion through anime-tourism. In: 19th Conference of Japan Association for Evolutionary Economics, pp. 1–56 (2015)
Japan Tourism Agency: Change of visitor arrivals and Japanese overseas travelers. http://www.mlit.go.jp/kankocho/siryou/toukei/in_out.html
Sayuri, W., Takashi, Y.: Tourist Information Visualization System for Improvement Discovery Based on the Similarity among Tourist Spots, Multimedia, Distributed, Cooperative, and Mobile Symposium, pp. 1357–1362 (2016)
Kazutaka, S., Shunsuke, I., Hiroshi, M., Tsutomu, E.: Analyzing tourism information on Twitter for a local city. In: 1st ACIS International Symposium on Software and Network Engineering (SSNE 2011), pp. 61–66 (2011)
Ritter, A., Mausam, E.O., Clark, S.: Open domain event extraction from Twitter. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2012), pp. 1104–1112 (2012)
Kenta, O., Koki, U., Fumio, H.: Mapping geotagged tweets to tourist spots for recommender systems. In: 2014 IIAI 3rd International Conference on Advanced Applied Informatics (IIAI 2014), pp. 789–794 (2014)
Lee, R., Sumiya, K.: Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. In: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks, pp. 1–10 (2010)
Sayuri, W., Takashi, Y.: Proposal of tourist information extraction methods from tweets without position information by tweets with position information and tweets containing tourist spots names, IPSJ Kansai-Branch Convention 2016, G-15, pp. 1–3 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Watanabe, S., Yoshino, T. (2017). Tourist Information Extraction Method from Tweets Without Tourist Spot Names for Tourist Information Visualization System. In: Yoshino, T., Yuizono, T., Zurita, G., Vassileva, J. (eds) Collaboration Technologies and Social Computing. CollabTech 2017. Lecture Notes in Computer Science(), vol 10397. Springer, Cham. https://doi.org/10.1007/978-3-319-63088-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-63088-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63087-8
Online ISBN: 978-3-319-63088-5
eBook Packages: Computer ScienceComputer Science (R0)