Keywords

1 Introduction

The average number of trips taken by Japanese travelers decreased after 2006; this decrease stopped after 2010, based on a national tourism survey by the Japan Tourism Agency [1]. The appearance of new forms of travel in Japan, in which fans visit the locations of dramas and animated programs, is considered to be a factor [2]. In addition, the number of foreign travelers to Japan has increased year-by-year since 2012 including visitor arrivals and overseas Japanese travelers, as determined by the Japan Tourism Agency [3]. Therefore, tourist spot operators want to know the needs of tourists and any problems with the tourist spot so that they can be solved because the tourism behaviors of tourists have changed. In our previous study, we developed a system both to extract tourist information from the web and to visualize similarities. However, insufficient tourist information was provided by Twitter [4]. We believe that we did not consider tweets about tourist spots that do not containing tourist spot names. In this study, we propose a tourist information extraction method to apply to tweets that do not contain tourist spot names. We evaluated whether we can extract tourist information from tweets that do not contain tourist spot names using tweets with position information regarding the surrounding tourist spot, tweets with images of the tourist spot, tweets before and after tweets containing the tourist spot name, and followers of the user who tweeted the tourist spot name. In the remainder of this paper, TNCT refers to Tweets that do Not Contain Tourist spot names.

2 Related Work

Shimada et al. proposed a tourist information analysis system using tweets containing tourist spot names and words related to the tourist spots [5]. They extracted these tweets and analyzed the polarity of the tweets. Alan et al. proposed a method to extract events using unique representations of tweets and dates [6]. They extracted these tweets containing unique representations that are strongly related to specific times as tweets related to the event. They were using tweets containing tourist spot names and words related to the tourist spots; however, we extracted information about tourist spots from not only tweets containing tourist spot names but also tweets that do not contain tourist spot names.

Oku et al. developed a tourist spot recommendation system using tweets that included position information and images that included position information [7]. They combined the active region of the tourist spot in tweets with both position information and the tourist spot name of the target, in addition to images that include position information. The active region is an area where users often tweet about the target tourist spot. They estimated the active region of the tourist spot, and recommended tourist spots based on the features of the tweets from within the region. Ryong et al. proposed a measurement method for geographical regularity using time and position information from tweets to detect social events [8]. They estimated the normal geographical regularity from the times of tweets based on position information, pose, and the action of the user, and detected irregular events in the region of the target. They extracted tourist information using position information contained in tweets, images, and tweet times. However, the extraction by the position information has a limit, because the number of tweets containing position information is very small. We extracted information about tourist spots from not only tweets containing tourist spot names but also TNCTs.

3 Tourist Information Extraction Method

In this study, we evaluated whether we could extract tourist information from TNCTs. Figure 1 shows the tourist information extraction method.

Fig. 1.
figure 1

Tourist information extraction procedure.

  1. Procedure 1

    Collection of tweets containing tourist spot names

    This procedure collects tweets containing tourist spot names from tweets that include position information and tweets that do not include position information [Fig. 1(A)-a]. This procedure estimates the user location at the time of the tweet about the tourist spot using the active region estimation method suggested by Oku et al. [7]. The procedure also collects tweets within the defined region [Fig. 1(A)-b].

  2. Procedure 2

    Collection of tweets before and after tweets containing tourist spot names

    This procedure collects tweets within 3 h before and after tweets containing tourist spot names [Fig. 1(A)-c]. In this study, we set the target of collecting within 3 h because we think that you can see the sights within 3 h when you visit the tourist spots. People tend to tweet multiple times in succession within a few hours, rather than tweeting with a long gap between each tweet.

  3. Procedure 3

    Collection of tweets of followers

    This procedure collects the tweets of the followers of a user whose tweets contained tourist spot names [Fig. 1(A)-d]. This procedure collects neighboring tweets of tweets containing tourist spot names from the tweets of followers because these people might travel together with their followers.

  4. Procedure 4

    Collection of tweets with images of tourist spots

    This procedure collects tweets with images of the tourist spot [Fig. 1(A)-e]. We use Google Cloud Vision APIFootnote 1 to detect landmarks for identification of tourist spots.

  5. Procedure 5

    Extraction of feature words

    This procedure selects tweets that may include tourist information from the collected tweets obtained from Procedure 1 to Procedure 4 [Fig. 1(B)]. This procedure extracts the feature words of collected tweets containing tourist information [Fig. 1(C)]. In this study, the tweets containing tourist information are defined as tweets containing opinions, impressions, complaints, etc. regarding the tourist spots. The procedure separates the words of tweets by applying a morphological analysis systemFootnote 2 and uses tf-idf for extraction of feature words. We determined the parts of speech (i.e., noun, adjective, and verb) during the extraction for words that may be used to describe the features of tourist spots.

  6. Procedure 6

    Collection of tweets containing feature words

    This procedure collects extracted the tweets containing feature words from Procedure 5 from tweets without position information [Fig. 1(D)]. We target TNCTs containing feature words in this collection.

  7. Procedure 7

    Re-collection of tourist information

    This procedure re-collects tourist information from TNCTs by repeating Procedure 5 and Procedure 6.

4 Experiment

In this experiment, we evaluated whether tourist information was contained in tweets before and after tweets containing tourist spot names, tweets of followers of the user containing tourist spot names, and tweets with images that do not contain tourist spot names. In the previous study, we found tweets with position information and feature words to be effective [9]. We extracted the tourist information regarding the tourist spots for a total of 10 points at tourist spots for the first location in nine areas (each in TripAdvisorFootnote 3) and the first location in the Japan tourism ranking (using TripAdvisorFootnote 4). Table 1 shows the tourist spots of the target to be collected in this experiment. We searched for tweets containing tourist spot names using the official keyword search on Twitter, and examined tweets before and after the tweets and the tweets of followers between 0:00 and 23:59 on November 23, 2016, which is a national holiday in Japan. This date was chosen because we think many people used this day to visit family and friends at tourist spots. In this experiment, we evaluated whether there was tourist information contained in tweets containing images but not containing tourist spot names among tweets before and after tweets containing tourist spot names. We made this selection because the number of tweets containing images but not tourist spot names was too large to parse.

Table 1. Collection of target tourist spots.
Table 2. Number of tweets collected for each tourist spot name, the amount of tourist information, and the amount of increase of tourist information.
Table 3. Examples of tourism information in tweets before and after tweets containing tourist spot names.
Table 4. Amount of tourist information contained in the tweets of followers and tweets containing images before and after tweets containing tourist spot names.
Table 5. Examples of tourism information contained in replies before and after tweets containing tourist spot names.

5 Experimental Results and Discussion

5.1 Tourist Information from Tweets Before and After Tweets Containing Tourist Spot Names

Table 2 shows the number of tweets collected for each tourist spot, the amount of tourist information in tweets containing the tourist spot name, etc. Tourism information was collected from TNCTs in all tourist spots. Table 3 shows several examples. We do not know about impressions, we only have tweet (b); however, we may determine the impressions of Ohori Park based on tweet (a), which occurred before the tweets about Ohori Park. From this result, we know that tourist information may be contained in TNCTs tweeted before and after tweets containing tourist spot names.

5.2 Tourist Information from Tweets of Followers

Table 4 shows the amount of tourist information that was contained in tweets of followers and tweets containing images. There are also duplicate tweets because there may be tweets from followers containing images. Results are tweets containing images that contain tourist information for all tourist spots; however, tweets by followers contained tourist information for seven out of ten tourist spots. In addition, the collected tweets of followers containing tourist information were replies to the tweets describing wanting to go to the tourist spots or having gone to the tourist spots, rather than describing going to the tourist spots together. Table 2 shows the amount of tourist information that is contained in replies for each tourist spot. There were TNCTs containing tourist information in replies for 30 of 134 pieces of tourist information collected in this experiment; Table 5 shows several examples. When user B tweets the reaction to the tweet about going to Fushimi Inari Taisha (tweeted by user A), user A replied to the tweet (A-2) regarding their impression of Fushimi Inari Taisha. User B replied to a tweet (B-2) regarding their impression of the tourist spot, in addition to this reply. From this result, we know that tourist information was contained in the replies to tweets containing tourist spot names.

Table 6. Examples describing food and entertainment available at tourist spots contained in tweets before and after tweets containing tourist spot names.

5.3 Tourist Information from Tweets with Images

Tweets containing images also contained tourist information for all tourist spots, as shown in Table 4. The images were often of the tourist spots themselves; however, there were images of the food sold at tourist spots and entertainment at tourist spots. Table 6 shows several examples. Tweets about food and entertainment are common kinds of tweets. We learn that we can order jasmine tea at Shurijo Castle because a user wrote tweet (F-2) showing an image of the tea after writing tweet (F-1), among tweets about food. We learn that celebrations of children of three, five, and seven years of age and shrine visits were held at Atsuta Jingu because a user has written tweet (E-2) showing an image of the shrine after tweeting with the hashtag for Atsuta Jingu, among tweets about entertainment. From these results, we know that not only the impressions of the tourist spots themselves but also information about food and entertainment at tourist spots is contained in TNCTs containing images.

5.4 Summary of Discussion

In this experiment, we determined whether there was tourist information contained in TNCTs. The results of the experiments revealed the following.

  1. (1)

    TNCTs tweeted before and after tweets containing tourist spot names contain tourist information.

  2. (2)

    Replies to tweets containing tourist spot names contain tourist information.

  3. (3)

    TNCTs containing images contain information regarding the food and entertainment available at tourist spots.

These results lead to a solution of problem of insufficient tourist information being provided from Twitter (as seen in the previous study), namely, by finding tourist information contained in TNCTs, such as in (1) and (2). Tourist information can be classified into multiple kinds such as history, food, and entertainment at tourist spots, and they can be the main features of the tourist spots in (3). Based on this information, we can express the features of each tourist spot using multiple indices when we visualize the tourist information. The scope of awareness (including similarities and problems) will spread because we can extract different kinds of information regarding the tourist spots. In addition, the amount of tourist information in TNCTs is about 0.6 times the amount of tourist information in tweets containing tourist spot names. From this result, we could collect approximately 1.6 times the amount of tourist information collected by a normal search. Therefore, we conclude that the proposed method is effective for collecting more tourist information from Twitter.

6 Conclusions

In this study, we proposed a tourist information extraction method based on TNCTs. We evaluated whether tourist information can be extracted from TNCTs. The results of the experiments revealed the following.

  1. (1)

    TNCTs tweeted before and after tweets containing tourist spot names contain tourist information.

  2. (2)

    Replies to tweets containing tourist spot names contain tourist information.

  3. (3)

    TNCTs containing images contain information regarding the food and entertainment available at tourist spots.

In the future, we will examine correspondences, because we can not identify landmarks from food and entertainment images that do not photograph the tourist spots. We will determine whether the method can collect tourist information about minor tourist spots from TNCTs.