Introduction

Tourism is treated as an important part of the development strategy in many cities. According to the Cities Report 2018 from the World Travel and Tourism Council, the Travel and Tourism sector directly contributed 3.2% of global GDP in 2017. Therefore, analysis of the dynamics of tourism is a popular issue for academia, business, and governments.

The emergence of Location-Based Social Networks (LBSNs) has provided abundant data for tourism studies, such as travelling reviews (Marine-Roig and Clavé 2015), footprints (Girardin et al. 2009), and photos of destinations (Nikjoo and Bakhshi 2019). When compared with the traditional field survey, these sources of data provide increased opportunities to explore large-scale human activities in detail. By utilizing LBSN data, many studies have investigated the spatial distribution of tourist attractions or the travel patterns of tourist activities (Hasnat and Hasan 2018; Vu et al. 2015; García-Palomares et al. 2015). However, the functional relations between tourist places (i.e. flows of tourists between these places) have not been investigated in depth. In fact, visiting tourist attraction (e.g. a landmark) is just one component of tourist activities, since tourist visits necessarily imply itineraries. Other places, such as restaurants, public transport and accommodations comprise the functional bases of tourism in a city. Such relations between tourist facilities are usually described based on practical experiences. Therefore, the quantification of the relation is beneficial for better understanding the dynamics of tourist activities.

This study aims to quantify the important tourist-functional relations between urban places, which are usually not noticeable on spatial proximity relations. Here, the functional relation between tourist places is defined as the flows of tourist users between Foursquare POIs. Further, we use the functional interaction value to measure the functional proximity of different groups of POIs which are classified by different categories of usages. The case study proposed in this paper investigates functional relations between tourist places in Barcelona using 18 months of Foursquare (an LBSN) check-in activity. On Foursquare, users make “check-ins” at venues which are predefined or created by themselves. Because Foursquare data provides the attributes of POIs and their geographic location, it is a popular source of information for mining online-user behaviors (Preotiuc-Pietro and Cohn 2013), urban mobility (Hasan et al. 2013), improvement of categorization systems (Hu et al. 2015), among other applications.

The main contribution of the study lies in its practical method of quantifying the functional relations of places, rather than trying to depict the dynamics of tourist activities in a specific city. The study process mainly comprises four steps: identification of tourist users, classification of POIs based on usage, calculation of flows among different categories of POIs, and analysis of functional relations. A network graph is used to present the significant relations among different usages. Finally, we also compare the spatial proximity of POIs with their functional proximity for finding out their differences.

The remainder of this paper is structured as follows. The literature review summarizes the main methods to identify tourist users and related studies of tourists’ activities using LBSN data. The part of methods introduces the methodology and implements the identification of tourists within the entire sample. The result analyzes the spatial characteristics of tourist movement and the functional relationships among tourist places in Barcelona. Finally, the conclusion discusses the limitations and future work.

Study of tourist activities based on LBSN data

The spatial distribution of tourist activities based on LBSN data has been investigated in-depth globally. It turns out that tourist movement have, on a macro level, some similar spatial characteristics in different cities. In general, they are more spatially concentrated than locals’ movements. Urban centers, airports, and tourist attractions, such as famous churches or museums, are typical clustering places for tourists. For instance, Girardin et al. (2009) identified the differences of spatial activities between tourists and locals in New York via cell phone data and Flickr data. They confirmed that the movement range of visitors is limited, especially for foreign visitors. Vu et al. (2015) combined GPS and Flickr data to cluster tourists’ main routes in Hong Kong and showed that tourists tend to travel to and between adjacent areas. Hasnat and Hasan (2018) described tourist movements as clustered around tourist attractions based on Twitter data in Florida. Béjar et al. (2014) extracted the spatial–temporal characteristics of Twitter and Instagram users in Barcelona; they found that the main tourist attractions were important connecting nodes in both datasets, though their study did not distinguish tourist users from locals. Kádár (2014) concluded that the majority of geo-tagged images generated from Flickr tourist users were gathered around tourist attractions or landmarks in Budapest.

However, such spatial–temporal characteristic fails to uncover how places interact with each other functionally, since it only shows the degree of spatial aggregation of human activities in separate places or the tendency of people’s movements. Although the quantification of flows between places have been involved in studies of travel patterns and land-use detection, the focuses are still different from functional linkages. Travel patterns focus on the temporal patterns of human mobility (Sagl et al. 2012; Luo et al. 2016; Thuillier et al. 2017) Land-use detection aims to reveal the relation between spatial distribution of POIs and urban land-uses (Gao et al. 2017), rather than the connections between different urban usages.

To date, few studies have investigated the functional linkages among POIs; so, it still offers room for exploration. The most closely related study to the present work is from Ferreira et al. (2015). They collected Foursquare data in London, New York, Rio de Janeiro, and Tokyo. They classified POIs into nine categories and compared the check-in patterns of tourists and residents in these categories. They found that some categories show a significant temporal difference of check-in pattern between tourists and residents, such as locations classified as Arts and as Transport. However, they focused on the mobility patterns between specific places, rather than the functional closeness between different categories. Preotiuc-Pietro and Cohn (2013) discussed the probability of transition of users between different types of Foursquare POIs, but did not explain the functional relationships among POIs. Therefore, our study actually proposes a new perspective to analyze the functional relations between places.

With regard to the identification of tourists, LBSN data offers the potential to identify vast numbers of tourists at low cost. According to previous studies, the identification mainly relies on the geo-location (Vu et al.2015; Da Rugna et al. 2012; Luo et al. 2016) or the time threshold (Kádár 2014; Girardin et al. 2008; García-Palomares et al. 2015). For example, Luo et al. (2016) distinguished residential Twitter users from visitors in Chicago by determining their locations during the night. Users are identified as locals if most of their check-ins during nights are in residential areas. However, this method is effective only if hospitality services are segregated from residential areas. It is also difficult to apply the approach in a compact city which has large amounts of mixed-use land, such as Barcelona. Manca et al. (2017) combined the time threshold and the geo-location of Twitter users to distinguish tourist users from locals. Users who posted tweets less than 20 days in Barcelona as considered as tourists. In general, a longer time threshold could be more reliable; however, it requires a dataset with more long-term data. Moreover, the determination of the time threshold is usually derived from empirical experiences or the advice of tourism experts, and so lacks objectivity. Above all, this study sought to classify users as tourists or local residents by examining users’ behaviors on Foursquare and applying a threshold based on statistical analysis of the dataset. A semi-supervised model, described in the next section, was adopted to identify tourist users. This method allowed further exploitation of the dataset, because both active and inactive users are included.

Methods

Study scope and description of datasets

Barcelona is an ideal city for an analysis of tourism; it has been known as a tourist destination since the early 1900s. The government established a Commission for the Attraction of Foreigners and Tourists in 1906 and aimed to build up the city as a tourist destination known as the “Pearl of Mediterranean”. Its importance as a tourist destination only increased after the Summer Olympic Games of 1992. According to the Annual Tourism Sector of Barcelona Report 2014, the total number of overnight tourists who stayed in hotel accommodation reached more than 7.5 million, ranking as the 20th most-visited city in the world. In 2017, this number rose to nearly 9 million. Currently, according to the local government of Barcelona, tourism accounts for about 12% of the city’s GDP and generates approximately 9% of employment in Barcelona.

This paper extracted data from a global Foursquare check-in dataset (Yang et al. 2016). The studied area includes the inner area of the Barcelona Metropolitan Region (RMB), due to the monitoring range of Foursquare data (Fig. 1) which is slightly larger than the first zone of the RMB. The first zone is also called the Metropolitan Area of Barcelona (AMB), which comprises 36 municipalities. According to the official statistics, the population was 3,239,337 in 2014, about half of whom lived in the city of Barcelona. The land uses of Barcelona city is highly mixed. Most check-ins, 57,764 items, occurred in Barcelona city (Fig. 1). Other than Barcelona, only four cities in the region have more than 1000 check-ins: L’Hospitalet de Llobregat, El Prat de Llobregat, Badalona, Sant Cugat del Vallés and Conellà de Llobregat.

Fig. 1
figure 1

Source: Authors

Distribution of check-ins in Barcelona Metropolitan Region.

The dataset includes data collected between 2012-04-03 and 2013-09-16. A total of 80,936 check-ins were made by 4527 users in the area of Barcelona. To eliminate noise, users who only checked-in one time in the area were excluded from the dataset. Therefore, 3350 “valid” users with a combined total of 79,798 valid check-ins were included (Table 1).

Table 1 Summary of valid users.

The method for measuring the functional relations of POIs included eight steps: (1) identifying tourist users as a group distinct from local users and classifying all Foursquare POIs into 22 different categories for purpose of analysis; (2) constructing the chronological path of activity of each tourist user based on their check-ins; (3) calculating the number of connections between two categories of POIs using the paths; (4) building a heat map of the matrix of flows; (5) introducing a model of interaction values to investigate the functional relation of these categories; (6) representing the matrix of interaction values in two dimensions through PROXCAL multidimensional scaling (MDS); (7) visualizing the prominent functional relationships via a network graph; (8) comparing the functional proximity with the spatial proximity based on different categories of POIs.

Identification of tourists

This study uses a semi-supervised method to distinguish tourist users from locals using Foursquare data. This method consists of K-means clustering with manual improvement. Departing from previous research, it is based on the assumptions that the number of tourists’ check-ins, total travel distance, and duration of stay are lower (on average) than those of residents. The method of calculation of duration is the following:

$${\text{Total}}\;{\text{Duration}} = T_{last} - T_{first}$$
(1)

where \(T_{last}\) is the latest timestamp of a check-in of a user and \(T_{first}\) is the earliest timestamp.

The travel distance is calculated via ArcGIS software:

$${\text{Total}}\;{\text{Travel}}\;{\text{Distance}} = \mathop \sum \limits_{i}^{n} \sqrt {\left( {X_{{P_{t + 1} }} - X_{{P_{t} }} } \right) + \left( {Y_{{P_{t + 1} }} - Y_{{P_{i} }} } \right)}$$
(2)

where \(P_{t}\) is the position of a given check-in at timestamp i, with coordinates (\(X_{{P_{t} }}\), \(Y_{{P_{t} }}\)) of the Universal Transverse Mercator (UTM) system. The distance between \(P_{t + 1}\) and \(P_{t}\) is calculated by the straight-line distance between two points.

Before clustering, several users were chosen to form a sample to test the validity of the K-means clustering by checking whether they are classified correctly. One user was initially randomly selected from each group of different frequency of check-ins (Fig. 2a), combined with the different duration of stay (Fig. 2b). In other words, each selected user was from different groups of check-ins and duration.

Fig. 2
figure 2

Source: Authors

Foursquare users’ check-ins and stay duration.

Therefore, except null values of inquiries, 23 users were selected out and their identities were manually verified. According to the numbers of check-ins, the places where they checked-in and their duration of stay, we identified whether the selected user is a tourist or a local. For example, if a user checked-in 10 times and the duration of stay was 2 days, and all places where the user checked-in are places of amusement or tourist attractions; we classified this user as a tourist.

Next, we utilized K-means clustering to divide all users into the two groups. For discrete data, algorithms of grouping data are classification and clustering. Classification requires a training dataset which contains samples whose category is known. As the characteristics of tourist behavior were unknown in our case, clustering was the better approach to divide users. K-means clustering is widely applied due to its simplicity. Moreover, K-means algorithms perform very well with huge datasets (Abbas 2008). The Z-score was used to standardize the three indicators for clustering:

$${\text{z}} = \frac{{{\text{x}} -\upmu}}{\upsigma}$$
(3)

where z is the standardized score of each of the indicators, x is the value of indicator, μ is the mean of x, and σ is the standard deviation.

Manual examination showed that two local users of the sample group were included in the tourist group because they had comparatively lower duration of stay. On the other hand, four tourist users of the sample group were included in the local group because their duration of stay were too long. Such difference is partly caused by our method of calculation of duration. It is possible that the duration of stay was incorrect for some visitors returning in the second year. Therefore, it was necessary to use a threshold of check-ins and duration to improve the classification. Four different combinations of threshold were tested (Table 2). Those users whose indicators were above the threshold were categorized as locals.

Table 2 Four thresholds of improvement.

The results of classification were more stable when time span was more than 84 days. Therefore, the strictest threshold, 4, was adopted as the final standard for improvement. Users who stayed more than 113 days and made check-ins over 24 times are classified as locals. In total, 2770 tourist users were identified. These users generated 19,180 check-ins during the monitoring period (Table 3).

Table 3 The summary of locals and tourists.

Classification of Foursquare POIs

With regard to the features of Foursquare POIs, it seems that the distribution of check-ins is mainly clustered into a few categories: Travel and Transport, Food, and Shopping. Abbasi and Alesheikh (2018) mentioned that shopping and eating places contributed 59% of check-ins in Manhattan. Li et al. (2013) investigated the popularity of Foursquare POIs in 14 urban regions all over the world. Their results show that the Travel and Transport category occupies the highest frequency of check-ins. Preotiuc-Pietro and Cohn (2013) collected Foursquare data of frequent users worldwide for 1 month. They used the basic category from Foursquare website and learned that Shopping and Services, Food, and Travel and Transport accounted for 53.8% all check-ins.

There are 13,887 unique Foursquare POIs in Barcelona, which are labeled by 385 sub-categories. Restaurants form a large portion of all types of POIs. According to the official website, the venue categories of Foursquare classifies them by nine major categories: Art and entertainment, Faculty and University, Event, Food, Night spots, Outdoor Recreation, Professional and others, Store and services, Travel and transportation. However, these categories need to be improved for conducting the tourist-functional analysis. For example, Hotel is under the category of Travel and transportation, but Hotel actually belongs to the category of accommodation. Moreover, some names of the category are too vague. For example, “Event” actually contains temporary and permanent markets. It is clearer to use “market” rather than “event”.

Therefore, this paper assembles these categories into 22 main types considering their usages, for example, all kinds of restaurants are grouped as “Restaurant”. Table 4 lists the new classification with brief descriptions. The Transport, Restaurant, Hotel and Outdoor Resorts make up a combined 47.5% of check-ins. It is worth noting that we extracted Plaza as a separate category, because it is a compound urban place which mixes multiple functions, such as leisure, transport hub, food, shopping services, etc. The volume of check-ins in the Plaza category also indicates that it is an important functional hinge for tourists in Barcelona.

Table 4 New category of POIs.

Construction of paths of Foursquare users

In this study, a path is defined as the time-sequential check-ins of a user; each user has a unique path. Based on the study of Scholtes (2017), the construction of the paths is built on two assumptions:

  1. (1)

    Each user’s path has a chronological order, hence, it is directed.

  2. (2)

    Paths are not transitive. Only direct connections count. For example, assuming there is a path: a → b → c, [a,b] and [b,c] are valid connections and [a,c] is not counted.

This method avoids the duplicate calculation of connections among nodes. We constructed the functional paths of a user in terms of the category of usages.

The matrix of usage-flows

Based on the paths, we calculated the number of direct connections between each pair of POIs, including the connections within the same category. The “inflow” from the category i to j is the total number of connections from i to j. The reverse is the “outflow”. The “flow” of paired usage is the sum of the “inflow” and the corresponding “outflow”:

$$f_{ij} = C_{ij} + C_{ji}$$
(4)

where \(f_{ij}\) is the flow between i and j, and \(C_{ij}\) is the number of connections from i to j.

Visualization of spatial paths

To visualize the spatial distribution of tourist paths, all paths taken by tourist between two POIs were counted. There are several methods to aggregate paths, such as edge bundling (Graser et al. 2019) or aggregating paths through characteristic points (Andrienko and Andrienko 2011). However, the former method requires very high-capacity computation, because it needs to compare the similarity of each path and then implement aggregation. The latter method extracts the characteristic points of each path first, and then calculates the centroids of these points in term of the distance between points to reduce the cost of computation. The aggregated flows are then generated from these centroids. The shortfall of this method is that the centroids are not the actual places, and thus it is hard to reflect the precise spatial relationships between places on small scales. Therefore, this study used the original traces to display the spatial relationships between POIs and tourist movement. To delimitate the major flows of tourists, only traces that repeated more than once are visualized.

The interaction values analysis

This study introduces the improved model of interaction value from Roca Cladera and Moix (2005) to depict the functional interactions among different usages. The first model of interaction value was created by Smart (1974), and was developed from the gravity model. The advantage of that model is that it explores the functional relation between two areas or objects without the inference of physical distance. It also eliminates distortion caused by differences in the “masses” of objects (i.e. the number of tourists visiting each of the POIs in our case), because the function divides the product of total flows of “sender” and “receiver” POI. Thus, this model can uncover the interaction relation between two objects effectively. It has been invoked in different studies of interactions, such as commuting flows between two areas (Roca Cladera et al. 2009), immigration flows (Dou et al. 2018) and air passenger flows (Burns et al. 2008). It takes the form

$$IV_{ij} = \frac{{f_{{{\text{i}}j}}^{2} }}{{O_{i} \cdot I_{j} }} + \frac{{f_{{j{\text{i}}}}^{2} }}{{I_{i} \cdot O_{j} }}$$
(5)

where \(IV_{ij}\) is the interaction value between the category i and j; \(f_{ij}\) is the existing flow from category i to j; \(O_{i}\) is the sum of outflows of category i, \(I_{i}\) is the sum of inflows of category i. Moreover, a statistic threshold was set to delimit the prominent relations of interaction:

$${\text{Prominent}}\;{\text{interaction}}\;{\text{value}} = {\text{Mean}}\;{\text{value}} + 1\;{\text{standard}}\;{\text{derivation}}\;{\text{of}}\;{\text{IVs}}$$
(6)

where mean value and standard derivation are the values of the whole matrix of interaction value.

To visualize the functional proximity between different usages, we used a PROXCAL multidimensional scaling (MDS) method to reduce the original matrix of interaction values to only two dimensions. MDS is a fast way to visualize the level of similarity of objects. In our case, the similarity is understood as the strength of interaction values between usages. The closer is one usage to other, the stronger their functional relationship. The closer is one usage to the center of the graph, the stronger its relationship with all other usages.

Comparison between spatial proximity and the functional proximity

To further analyze the difference between functional proximity and the spatial proximity of POIs, we also plotted the average geo-distance and the “functional distance” of each types of POIs. The functional distance is represented as the distance to the zero point of the PROXCAL plot:

$$Functional\_dist_{i} = \sqrt {PX_{i}^{2} + PY_{i}^{2} }$$
(7)

where (PXi, PYi) are the coordinates of i usage on the PROXICAL two-dimension plot. The spatial distance is calculated using the Euclidean distance from the weighted median center of all POIs containing tourists’ check-ins. Considering Foursquare check-ins are highly concentrated in Barcelona city (Fig. 1), it is reasonable to use the weighted median center as the central point to measure the spatial distribution of different types of POIs approximately. The advantage of median center is that it could indicate the spatially central tendency meanwhile is robust to outliers. Based on the algorithm of Arcgis, the weighted median center (Burt and Barber 1996; Kulin and Kuenne 1962) is given as:

$$D_{p}^{t} = \sqrt {(X_{p} - X^{t} )^{2} + (Y_{p} - Y^{t} )^{2} + (Z_{p} - Z^{t} )^{2} }$$
(8)

where (\(X_{p} , Y_{p}\)) is the geo-coordinates of a POI, \(Z_{p}\) is the weight of a POI which is the number of tourist check-ins in our case, (Xt, Yt) is a candidate median center at the t step of the iterative process, \(D_{p}^{t}\) is the distance between the candidate center and other POIs at step t. The final median center minimizes the Euclidean distance to all other points in the dataset. Therefore, the average geographic distance of each type of POI is the corresponding number of POIs divide by the total distance of the corresponding POIs:

$$Spatial\_dist_{i} = \frac{{\mathop \sum \nolimits_{i}^{n} \sqrt {\left( {X_{{P_{i} }} - X_{M} } \right) + \left( {Y_{{P_{i} }} - Y_{M} } \right)} }}{{N_{i} }}$$
(9)

where (\(X_{{P_{i} }}\), \(Y_{{P_{i} }}\)) is the geo-coordinates of a POI of the i type of usage, (\(X_{M}\), \(Y_{M}\)) is the median center, \(N_{i}\) is the total number of POIs of the i type.

Results

The spatial characteristics of tourist trajectories

Figure 3 depicts the spatial distribution of tourists’ check-ins in Barcelona. The place with the densest population of tourists is the airport, which garnered about 8% of total check-ins. 77% of check-ins were concentrated in the municipality of Barcelona, while nearby municipalities split the remainder of the check-ins.

Fig. 3
figure 3

Source: Authors

Distribution of tourist check-ins.

In regard to the specific locations of active tourist activities, 64% of POIs do not have records from tourists, suggesting that the range of tourists’ activity is limited. Only 18 POIs had more than 100 check-ins (Fig. 4). Except for the airport, all are located near the city center. Figure 4 also overlays the aggregated flows of tourist trajectories and restaurant-POIs that contains more than two check-ins: purple lines are tourist trajectories, and the big white dots the most popular POIs, small white dots are restaurant-POIs.

Fig. 4
figure 4

Source: Authors

Major tourist spatial flows and popular POIs.

Tourist attractions are clearly important nodes driving the spatial flows of visitors. La Sagrada Família, Casa Milà, Casa Batlló and Park Güell belong to famous historical heritage of Barcelona, while Camp Nou is a world-famous soccer center. These POIs were also on the official list of the top 15 most-visited places in Barcelona during 2012–2015 (Table 5). Hence, the hot spots of Foursquare POIs matched with the official data, meaning that tourist attractions organize the spatial movement of tourists.

Table 5 Number of visitors to major attractions in Barcelona.

Plaça de Catalunya takes the second rank of check-ins after La Sagrada Família. It is one of most important transport hubs and public spaces in Barcelona city, and it connects to the historical center of Barcelona. The Apple store, the main building of El Corte Inglés (the main department store in Spain), and the Hard Rock Café are located around Plaça de Catalunya. It is probable that each of these locations are meeting points for tourists since they have an outstanding position and/or buildings. Hence, the Plaça de Catalunya can be considered another activity center for tourists. Two other commercial places received high numbers of check-ins. Maremagrum is a shopping mall near the Aquarium of Barcelona that provides multiple services for tourists. Fira Gran Via Barcelona is an international conference center which hosts many important conferences every year, such as the World Mobile Conference.

The Matrix of usage-flows

The following figure is the heat map of the matrix of usage-flows. It is a symmetric matrix which displays the bilateral flows of usages. Each cell represents the number of direct connections between two categories or within the same category. It is obvious that Restaurant–Transport, Restaurant–Hotel, and Restaurant–Outdoor Resorts have the highest flows between each other, with over 400 connections. Transport has its highest flows with itself.

This result fits with the spatial distribution of check-ins, as well as it uncovers some relationships among different categories. First, those higher flows of usages, such as Transport, Tourist attractions and Plaza, were also present in popular venues. Meanwhile, as Fig. 4 shows, those POIs of restaurants mainly concentrate along with Passeig de Gràcia street and La Rambra street which are pedestrian streets and gather many tourist attractions. Secondly, it shows the typical usage of tourist activities. Restaurant, Transport, Hotel and Outdoor Resorts have more intense connections with each other than with non-tourist locations such as Gyms, Opera, Concert, Cinema (OCC) and Workplace (Fig. 5).

Fig. 5
figure 5

Source: Authors

Heat map of the usage-flow matrix.

Interaction values

Similarly, the bilateral interaction values are visualized via a heat map (Fig. 6). The highest interaction value is between Hotel and Transport, followed by Outdoor Resorts-MAH, and Restaurant–Hotel. This result indicates that tourists tend to move directly between Transport–Hotel more than other categories. The number of movements between Museum, Arts and Historical place (MAH) and Outdoor resorts is also higher than between other categories. Compared with the matrix of flows, the matrix of interaction values clearly shows the functional proximity of connections between two categories of usages, without the distortion of the scale of flows. For example, Outdoor Resorts-MAH does not have a high volume of flows, but their interaction value is at the second-highest rank of interaction values (Table 6).

Fig. 6
figure 6

Source: Authors

Heat map of the interaction values matrix.

Table 6 Prominent interaction values.

The PROXCAL-MDS reduce the original number of dimensions (i.e. 2222) to only two dimensions (Fig. 7). It is clear that Hotel, Transport and Restaurant have the closest functional interaction with each other, while Outdoor Resorts, MAH and Plaza are closer to each other. Both of these groups are located at the central part of the graph, indicating that they dominate the tourist flows. The usages located at the peripheral positions, such as Educational places, Gym, and Workplace, have minimal relations with other usages. This may imply that inside our tourist sample there are different kind of temporal visitors to the city: the first clearly attracted by heritage, cultural and leisure venues; the second, more linked to places intensively used by local population.

Fig. 7
figure 7

Source: Authors

Proxcal plot of interaction value matrix.

Next, we apply the chosen threshold (Formula 6) to show usages with prominent interaction values (Table 6). A total of 25 pairs of usages were qualified. Residential place and Workplace are also on the list; however, they are only connecting with Services and Transport separately. It is possible that Residential appears on the list due to trips to visit friends and relatives, or due to an error in the classification of tourists. According to the official survey, the motivation of travel for business accounted for 40% of all visitors in 2014. Thus, it is reasonable that Transport-Workplace is shown on the list.

To depict the main relations in a simple way, Fig. 8 exhibits the prominent interaction values. The nodes are the category of usages and edges are the interaction values. The graph is visualized by Networkx program using Fruchterman-Reingold force-directed algorithm. Because the algorithm of visualization is aimed to reduce the crossing edges as few as possible, the position of nodes and the distance between them do not have specific meanings. However, the nodes with less edges tend to be placed at the periphery. The value is represented by different colors. The black lines represent values equal or above 0.0216 which is the mean value of all prominent interaction values, and the grey dashed line those below 0.0216. This graph embodies the basic tourist activities: eating, travelling, visiting, shopping and getting accommodation. Those categories that do not correspond to typical tourist places are at periphery. “Restaurant” has the largest number of connections with other categories, i.e. it is the central vertex of this network.

Fig. 8
figure 8

Source: Authors

Paired usages with prominent interaction values.

Comparison between spatial proximity and functional proximity

As examples, Fig. 9 depicts the location of the median center and the distribution of POIs of Restaurant, Transport (except the airport), Outdoor resorts and Hotel separately. Transport-POIs are the sparsest and far away from the center. It is evident that restaurants and hotels gather in the city center and are close to the median center. The distribution of POIs of Outdoor resorts is comparatively father than them.

Fig. 9
figure 9

Source: Authors

The spatial location of median center and four types of POIs.

Figure 10 plots the functional distance (Y-axis) based on PROXCAL graph and the spatial distance of each type of POIs to the median center. For easier reading, the average spatial distance is rescaled by min–max normalization. The plot clearly shows that the Transport-POIs has the largest average spatial distance to the median center, but it has highly importance in tourist-functional relations. Hotel–Transport has the highest interaction value of all paired categories. Tourist information center is nearest to the median center, but its functional distance is distant due to few check-ins was generated in here. One possible explanation is that tourists would not like to check-in at places which only provide practical information.

Fig. 10
figure 10

Source: Authors

Comparison between functional proximity and spatial proximity.

Secondly, some important tourist attractions, such as La Sagrada Família, Park Güell and Camp Nou, are at “peripheral” area of tourist activities. However, as Fig. 4 shows, these tourist attractions play a pivot role among spatial flows of tourists, thus, their functional importance are highlighted. Although it seems that Public services is at the similar position with Outdoor Resorts, the functional distance of the two categories is at the opposite direction (Fig. 7). Similarly, Residential places is closer to the PROXCAL center, but is far away from POIs of Outdoor Resorts, Transport etc.

Discussion and conclusions

This research analyzes the tourist-functional relations of POIs in Barcelona and adopts a new method to identify tourist and non-tourist users of an LBSN. Essentially, in accordance to previous researches, the delimitation of tourists relies on the definition of tourists, i.e. the duration of stay is the key variable. The longer period a person stays in one place; the more traces tend to be recorded. Hence, the unsupervised method (K-means clustering) can identify users who have extreme attributes. However, for those users without typical characteristics, manual intervention is unavoidable.

Secondly, this study confirms that the functional centroids of tourist activities differ from the centroids of the spatial distribution. The spatial distribution of tourist activities is concentrated in the airport and the central area of Barcelona city where most of the landmarks are located. The spatial relations of tourist activities, in essence, are decided by the locations of tourist attractions. Tourists by their nature tend to make check-ins around tourist attractions. However, their core functional usages are led by daily habits. Eating out is one of the most important activities, and thus POIs in the Restaurant category become the hinge of all functions.

Thirdly, the tourist-functional proximity of POIs highlights the major nodes of places with tourism functions. As Fig. 8 shows, the chain of functional places appeared when we raised the threshold for interaction values. The categories of Restaurant, Transport, Outdoor Resort, Hotel, etc. are the places of basic functions for tourists. Moreover, such functional proximities are distinct from the spatial proximities of POIs. The closer spatial proximity between two categories of POIs does not lead to closer functional proximity with each other.

Furthermore, the interaction values depict the main patterns of tourist flows. For example, Outdoor Resorts has a higher interaction value with Museums, Arts and Historical Places. This indicates that tourists tend to travel directly from one tourist attraction to another. Transport–Hotel has the highest interaction value because people usually need to drop their luggage at the hotel when they arrive and carry their luggage from the hotel to transport hubs as they leave.

Indeed, it is undeniable that Foursquare data has a bias. Because the function of Foursquare is to provide users with practical information about places, a high proportion check-ins are in the category of restaurants. However, the average number of check-ins at a restaurant is only 0.82, while the mean number of check-ins at Outdoor resorts and MAHs is much higher. Hence, such bias does not affect the general results in this study. Moreover, the result reflects the typical land uses of tourists in a city, such as hotel, transport and tourist attractions; these categories have a higher intensity of flows. Essentially, Foursquare POIs are able to reflect tourist activities to some degree.

In summary, this paper reveals how tourist POIs functionally interact with each other. The method of evaluation of tourist-functional relations is possible to spread to other groups of people, e.g. a comparison between locals and tourists based on the same functional analysis. Moreover, the spatial distribution of our research also coincides with the pattern of tourist distribution in some other cities, as described in the literature review. This result is similar to the conclusion of Béjar et al. (2014) which utilizes Instagram data, a finding that indicates different datasets may be comparable, as long as the volume of the dataset is substantial.

It should be noted that the defect of data itself limits the scope of the present study. The meta-database that we received eliminated the personal profiles due to the potential privacy issues. Therefore, it lacks of any analysis of the background of users. For example, we cannot utilize the user’s profile to identify their home country or to discuss whether the Foursquare behavior of foreign visitors differs from that of domestic visitors. Secondly, with the decline of popularity of Foursquare, the availability of its data has shrunk in Barcelona. The representative nature of more recent Foursquare data is questionable. The use of new data sources from currently popular LBSNs, such as Instagram and Twitter, will be necessary to examine and understand the latest dynamics of tourists. Last but not the least, it is worth to noting that the availability of Wi-Fi would impact the density of check-ins. Despite of most of the public spaces do have free Internet Access that provided by the City Council and Private Firms, there is a small risk of overrepresentation in best-serviced premises such as the Airport.