Keywords

JEL Classification

1 Introduction

Many aspects of our lives, especially those associated with fun and leisure, are posted on the World Wide Web through our personal social networks, forums or online distribution channels, and are available almost to everyone. Nowadays, other travelers’ experiences constitute the necessary input for designing unique and personal tourist products that increase significantly the likelihood of future customers’ satisfaction in consuming their own products. In addition, being constantly online (with high-speed networks such as 5G and high-end smart devices) enables the interaction with the produced and consumed tourist content, making it dynamic and modifiable (Ciasullo et al., 2021; Marchiori & Cantoni, 2015). The online sharing of travel instances, whether on social media or photo-sharing apps, gives users the opportunity to instantly access the generated content, almost at the moment that this is being experienced. Hence, users are becoming “living” aspects of their future travel choices and at the same time participate in a continuous evaluation on new and existing experiences and products.

Planning a trip is a process, through which a special tourist experience is designed, shaped and possibly consumed without restrictions, under the assumptions that our daily lives are governed by a necessary regularity in the sense of free and unhindered movement (mainly due to the COVID-19 pandemic and the subsequent lockdowns) and there is an immediate access to all available digital and physical infrastructures. One of the most important steps in this process is the search for accommodation. At a second level, taking into account the need to enrich their experience, visitors also include Food and Beverage spots (F&B POIs—Food & Beverage Points of Interest) in their plans, especially when these involve visiting urban areas.

From the stakeholders’ side, the importance of Information and Communication Technologies (ICTs) and the applications of World Wide Web in the distribution of tourism product are indisputable and seems to be an inevitable step towards the future of tourism development. Cities, as tourism destinations, should be highly competitive and beyond that should co-create their tourism products along with their potential visitors (Buhalis, 2000; Smith, 2015). Furthermore, the available urban touristic resources, as well local infrastructures should be efficiently managed and routed under the visitors and native citizens demands and needs (Bădiţă, 2013; Smith, 2015). Along with the development of ICTs, a significant increment in the functionality of a city as a tourism destination has initiated the development of the context of smart tourism. Yet, smart tourism is being thoroughly discussed (Buhalis & Amaranggana, 2015; Gretzel et al., 2015a; Li et al., 2017; Yoo et al., 2017; Akdu, 2020) and still forging the shape of modern cities and urban environments. New ideas have been established where “digital ecosystems and smart business networks” are combined to develop Smart Tourism Ecosystems (Gretzel et al., 2015b). These kinds of environments allow tourism stakeholders to elevate and enrich their services using the existing city’s digital environment (Gretzel et al.,2015b; Brandt et al., 2017).

The sustainability of these environments remains, also, a crucial issue and all the available spatio-temporal oriented data are showing their importance in understanding the future of various tourist ecosystems. These data are usually revealing the impact of visitors’ habits and presence along with available infrastructures, urban landmarks, etc. The official statistics provide accurate and reliable data, but present the drawback that they are published with a significance delay, so there is an inherent difficulty in extracting information about tourist behavior, especially in urban area environments (García-Palomares et al., 2015). In a smart city environment behavioral information is required and ideally, should be accessed immediately. Furthermore, researchers are allowed to “cut” their dependence on official statistics (Shelton et al., 2015). But at the bottom line, the combination of less variable features and behavioral information should be the future key in understanding and developing sustainable tourism areas.

The analysis of large spatio-temporal data is not something new, and even more, the analysis framework as well as its reliability are still being studied. In addition, recent interesting results have been published in the direction of an essential understanding of urban tourism phenomenon. Thus, in this work, volunteered geographic information data are employed to investigate the behavior of tourists in major urban areas. Specifically, a spatial clustering-based methodology is proposed and applied on spatio-temporal data sets collected from photo-sharing social platforms to analyze geo-tagged data that are related to F&B POIs in major city centers. The aim of this work is to show (mostly spatially) F&B locations and how they are related with the visitors’ position during their stay in a city. The proposed methodology is applied on the city of Athens using data from the Flickr photo-sharing site and OpenStreetMap web services.

2 Literature Review

Web 2.0 popularization and development change partially its content as it becomes user driven, known as User-Generated Content, (Bruns, 2007; Straumann et al., 2014). At the same time, all the users’ devices (smartphones, GPS devices, etc.) generate geolocated information, which, aptly, is named big (geo)data (García-Palomares et al., 2015). Nowadays, the most common source of producing geolocated data are social networks (Batty, 2013), which can be employed in various ways (Kitchin, 2013).

The overall tourism experience includes trip preparation to the end of the trip back at home, which usually involves internet and social networks (Fotis et al., 2012; Leung et al., 2013; Zeng & Gerritsen, 2014; Munar & Jacobsen, 2014). So, geotagged data beyond the extraction of tourists’ destination preferences and points of interest in an urban area (Junker et al.,2017; Paldino et al., 2015) can also be used as consulting material for future visitors (Buhalis & Law, 2008; Xiang & Gretzel, 2010).

Several studies have been developed to analyze social media data in connection with tourism activity, using geo-tagged data sourced on web services like Flickr and Panoramio: Wood et al. (2013) used photographs to estimate number of visitors in tourism sites; Girardin et al. (2008) studied the tourists’ behavior in Rome analyzing their flows between various points of interest; Popescu et al. (2009) identified places people were visited as well as the duration of their stay; (Gavric et al., 2011) extracted Berlin’s preferred locations and tourist dynamics; Kisilevich et al. (2013) identified popular city landmarks and events; Kurashima et al. (2013) used geotagged photos as a sequence of visited locations and then they recommend travel routes between landmarks; De Choudhury et al. (2010) introduced the creation of automated travel itineraries aiming in creating meaningful travel itineraries for individuals and professionals; Mamei et al. (2010) also recommended personalized routes using tourist experiences, behavior, and tastes; (Lu et al., 2010) suggested tourist trips based on photo-sharing geodata; Li (2013) used geotagged photographs to approximate the optimal solution for tourists’ multi-day and multi-stay travel planning using the Iterated Local Search heuristic algorithm; Tammet et al. (2013) developed sightmap exploiting photos density; Koerbitz et al. (2013) approximated the overnight stays in Austria and compare their results with the official statistics; Straumann et al. (2014) distinguished and studied foreign and domestic visitors in Zurich; Sun and Fan (2014) identified social events; García-Palomares et al. (2015) identified tourist hot spots in European metropolis using spatial statistical techniques to analyze location patterns.

Recently, for the problem of discovering Tourism Areas of Interest (TAOIs), Koutras et al. (2019) extracted TAOIs using volunteered geographic information, Devkota et al. (2019) investigated the same topic combining in addition nighttime light remotely sensed data with promising results, especially for areas with low social media penetration, while Karayazi et al. (2020, 2021) discussed the issue of extracting information regarding the attractiveness and representation of heritage sites. Lastly, Liu et al. (2021) discussed the performance of methods which detect base locations of individuals, contributing in a second layer on the general context of reliability of the methods discussed in this work.

3 Methodology

3.1 Experimental Setup

For our analysis, data from the Flickr (www.flickr.com) photo-sharing platform was used. Flickr offers publicly geo-tagged photo data to registered users via Flickr’s Application Programming Interface (API) (www.flickr.com/services/api/) for non-commercial use. A great number of photos uploaded to the platform is taken by tourists during their trip using GPS enabled on their photo cameras or smartphones. Picture information such as user name, geolocation of the photo, date and time that the photo was taken and uploaded, type of camera that was used, photo metadata, photo EXIF data, etc., can be queried and retrieved using requests in REST, XML-RPC or SOAP format. The platform’s API response formats include REST, XML-RPC, SOAP, JSON, or PHP depending on the developer’s request. In this work, we used REST requests to the API, while the responses were returned in JSON format.

In detail, the flickr.photos.search method was employed in collecting photos for our experiments. This method provides various optional arguments that can be used to narrow our search spatially as well as temporally. For the first case, a user has the choice of querying for pictures taken inside a confined geographic area in the form of a rectangle bounding box with user-defined bottom-left and top-right corners of the box.

To extract photos using the Flickr API, we have written a script in PHP language using the following parameters in our query:

  1. 1.

    Area of interest: since we want to study visitors of the city of Athens, we define a bounding box with long/lat 23.5923,37.8058 (corner A), and 23.9178,38.1479 (corner B) that includes the city’s administrative boundaries.

  2. 2.

    Temporal period of interest: we queried photos taken in the above area of interest in a period from 1/1/2009 to 15/10/2017.

The query results were returned in JSON format and they contain important information about photos, including photo ID, owner ID, photo title, photo geolocation (latitude and longitude), date that the picture was taken, date that the picture was uploaded, textual tags, as well as various information, such as the ID of the host server. Among the metadata, photo title, textual tags, and location are optionally provided by users, while the other fields (e.g., photo ID and server ID) are automatically filled by Flickr when the photos are uploaded. Location information is available for each metadata record, since we have only retrieved geotagged photos. In this work, only the owner ID, the date that the photo was taken, the time that the photo was taken, the latitude/longitude, and the place_ID of the photo were retained.

The sample size was 201.100 photos. All data were pre-processed prior to the analysis to initially remove multiple photos taken from the same user in the same location in a very short period of time (multiple pictures taken within a minute). Additionally, to exclude locals from tourists that are our main subject of interest, we searched for users who appear in the database for a period longer than one month (García-Palomares et al., 2015), we categorized them as locals and excluded them. After the preprocessing step, the final number of photos in the database was 193.554.

3.2 Experiments

The purpose of this research is to explore big volunteered geographic information data sets publicly available from the world wide web by extracting information about the tourist concentration in urban areas taking into account the city’s F&B POIs. Exploratory data analysis techniques on photograph geolocation data are applied, constrained in a specific urban area, aiming to uncover more points of interest, like F&B spots, beyond the traditional TAOIs within a city that show big concentration of visitors. In this direction, a number of clustering and spatio-temporal techniques have been applied on the geolocation data of our collected data.

Various spatial clustering algorithms (K-means, fuzzy C-means, Neural Network based, etc.) have been used for the problem of finding points of interest in an urban area. A comprehensive reference to these methods can be found in Devkota et al. (2019). In this work, the choice of the algorithm that has been used for our analysis was based on the following specifications:

  1. 1.

    A clustering technique should be used that doesn’t require a pre-determined number of clusters, rather it works in an unsupervised manner as the exact number of clusters (TAOIs) cannot be not known beforehand.

  2. 2.

    The clustering method should result in clusters with a user-defined minimum distance to each other. This is important as it can help us find TAOIs nearer to or further away from each other inside the city boundaries.

  3. 3.

    The user can define the minimum number of members in every cluster and thus define the minimum desired concentration of tourists in every TAOI.

  4. 4.

    The clustering method must be robust to noise and classify only the significant examples, rejecting any noisy data.

Taking the above specifications into consideration, the DBSCAN (Ester et al., 1996), a density-based spatial clustering method, is used. The algorithm’s performance depends on two hyper-parameters: (a) parameter Eps, the search radius of the algorithm, and (b) parameter MinPts, the minimum number of points within the search radius. These two parameters define a minimum density threshold, and clusters are identified at locations where the density of points is larger than this threshold.

The values of Eps and MinPts were selected heuristically using a series of tests on the available data. Parameter Eps is attributed to the scale of the regions that are to be clustered. The size of the found AOIs depends on the size of parameter Eps. The second parameter MinPts defines the minimum number of points (cluster members) that is required to form a new cluster. Again, the conducted tests show that using a large value of MinPts, a higher significance is secured for the detection of clusters but in this case some interesting and meaningful small areas are lost since the algorithm tends to unify them in a larger one; using a smaller value leads to larger number of clusters but some of them may also include noisy results.

The DBSCAN algorithm was used as implemented in the Statistics and Machine Learning Toolbox of Matlab (Matlab, 2020). The resulting figures were created using the QGIS software (QGIS Development Team, 2009). QGIS is a free and open-source geographic visualization system that has been used by a large community to create, visualize, analyze, and publish geospatial information in various applications that include but are not limited to tourist data, health data, etc.

4 Results

The application of DBSCAN algorithm on the Flickr data set describing the locations of tourists is shown in Fig. 1. It is clear that the algorithm returns 7 clusters which cover 7 significant TAOIs in the city of Athens (Koutras et al., 2019): Acropolis of Athens, Panepistimiou Street, Lycabettus Monument, Panathenaic Stadium, Syntagma Square, Temple of Olympian Zeus, and finally National Archaeological Museum of Athens. The above landmarks actually describe the city of Athens in its entity involving areas with ancient monuments, as well as downtown important sites and buildings (www.visitgreece.gr).

Fig. 1
figure 1

DBSCAN algorithm applied on photo-sharing data from the city of Athens

Next, the set of all F&B spots located in the center of Athens is investigated. This set is retrieved from the OSM project (OpenStreetMap, 2021) and it was initially pre-filtered to keep only those data labeled as ‘bar’, ‘cafe’, and ‘restaurant’. On this set, the DBSCAN algorithm is applied to reveal potential centers of F&B spots inside the center of Athens. The results are shown in Fig. 2. For the analysis of this data set, the algorithm’s parameter Eps was set to 0.0014 and MinPts to 25. The clusters with their different group members are shown in Table 1.

Fig. 2
figure 2

DBSCAN algorithm applied on F&B data in the city of Athens

Table 1 Results of the DBSCAN algorithm on F&B spots in the city of Athens from lighter (1) to darker (5) color

To further analyze visitors’ habitat behavior, as well as to provide an interpretation to the way F&B spots were developed in the city of Athens, especially in its historical center, the above two results are illustrated in the same map, as it is shown in Fig. 3. At first glance, it is evident that the most important F&B spots are lying inside the ATOIs, found by the clustering process.

Fig. 3
figure 3

Clustered photo-sharing data and F&B spots in the city of Athens

Additionally, the route traces for all visitors are overlayed on the same map, as it is shown in Fig. 4. The visitors’ route inside the city of Athens is designed by sorting all data over the time a photo is taken, and then by keeping only one photo per area and time instance. The resulting route network shapes an area which eventually contains all the found TAOIs and F&B spots providing an additional evidence of the validity of our proposed approach.

Fig. 4
figure 4

Visitors’ routes in the city of Athens

By closer inspection of the above pictures and tables, the following useful remarks can be drawn:

The visitors’ route illustration forms a polygon shape which in essence covers the center of the city of Athens as an outer center boundary with lines that are getting more intense in the historical center. The shaped rectangular actually includes important landmarks and museums and a great number of F&B spots.

F&B spots contained in clusters 2, 3, and 5 include a significant number of restaurants, bars, and cafés (Table 1) which are situated near the most significant landmarks of Athens like Acropolis, Lycabettus Monument, Panathenaic Stadium, Syntagma Square, Temple of Olympian Zeus and the National Archaeological Museum of Athens. Cluster 2 covers the area of Monastiraki, while cluster 3 corresponds to the area near to Stadiou street, between the National Garden and the Omonoia Square. The 5th cluster covers the area of Exarcheia to the Omonoia Square. The remaining clusters are covering the area between Acropolis of Athens and Temple of Olympian Zeus (cluster 4) and the area of Keramikos. Keramikos is on the border of the shaped center of Athens, but a place with vivid night life, near to the city center.

Finally, it is evident that almost all routes are running through all TAOIs as well as all the F&B POIs inside the center of city of Athens.

5 Discussion

In this work, a spatial–temporal analysis of geolocation data was presented. The used data were extracted from Flick photo-sharing web service, a volunteered geographic information data set, and concerns the city of Athens downtown, capital of Greece.

The analysis of the proposed methodology was carried out using a density-based spatial clustering algorithm (DBSCAN) in both photo-sharing data and F&B points of interest located in the center of the city. Then a combination of those results was depicted in a common map. The results of the algorithm were seven clusters corresponding to seven lively places and five clusters corresponding to five F&B areas in the city of Athens. Additionally, a route network was designed, describing the movement of tourists based on the trace they leave through the taken photos.

This research is a first step towards studying visitors’ tendencies, since more data from other social media platforms should also be considered. Furthermore, the study of the statistical correlation between the found TAOIs and the well-known landmarks and points of interest is in our next research plans towards understanding the existing and future infrastructure spatial planning inside an urban area. This may provide sustainable planning on big historical cities, dealing with the timely problem of overtourism.