Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Today’s metropolis with complex transport networks and numerous places for leisure activities pose great challenges to individuals who are willing to organize and participate in joint activities. In this study, we consider as joint leisure activities all activities conducted out-of-work involving the participation of two or more travelers.

Transport for London [1] posed that 29.2 % of all daily trips are related to leisure activities, while 28 % were conducted for shopping and personal business and 10.7 % for other activities including escort. Similar results were observed on the New York Regional Travel survey [2]. Given the surveys’ insights, it is evident that almost 70 % of all conducted trips (typical weekday trips = 2.51 \(\times \) number of inhabitants in the city of London) can strengthen agents’ interpersonal relations via shifting the general, non-recurrent trips to joint leisure trips. The aforementioned action is expected to promote the interpersonal relations among agents via increasing the number of physical meetings and improving the planning efficiency of out-of-work activities.

Fixed trips with recurrent characteristics (i.e., trips to work/school) can be easily recorded enabling the central transport authority or the individual agent to act on easing congestion or reducing traveling times via altering the transport/working schedules or shifting the departure times respectively. In contrary, leisure trips have a non-recurrent nature and that complicates the implementation of policy measures for congestion relief beforehand.

In the case of recurrent trips, travelers observe the repeated congestion patterns since they confront them on a daily basis while traveling over similar areas and adapt their schedules in order to reduce their waiting times. For instance, travelers are well-informed regarding the traffic conditions for trips to/from work due to their prior experience on traversing the same path on a daily basis, while they are less aware of the feasible set of trip-selection options when planning their out-of-work or other non-recurrent trips. Consequently, the lack of information yields three main inefficiencies:

  • Fluctuation of Travel Demand: Out-of-work trips cannot be easily predicted from the central operator since travelers’ actions cannot be forecasted and vary heavily from day to day

  • Interpersonal Activities Loss: Not aware of the daily schedules of other individuals, one examined agent is either not able to schedule a joint leisure activity or schedules an inefficient one with high opportunity cost and limited participants

  • Trip Selection Inefficiency: Agents enumerate a number of possible trips and select a most-preferred option via simple permutation or perceived utility-maximization without holding perfect information during the decision-making process

To that point, it should be stated that the individual-level planning of trips in metropolitan areas cannot be perceived as fully inefficient since it is based on a perceived utility-maximization approach; however, the lack of perfect information on the decision-making phase affects the efficiency of trip selection while attempting to maximize the utility function. Failure to construct a utility function which corresponds to the real-world conditions leads to the maximization of a non well-defined problem.

In this paper, it is assumed that the utility function is perceived correctly if the individual is well-informed during the trip-selection via holding information over three separate dimensions (refer to Fig. 1):

  • The current traffic on the road network and delays on public transport services

  • The exact location of all places of interest for leisure activities in the examined metropolis (i.e., location of bars, restaurants, cinemas)

  • The daily schedules and the preferences of all friends and acquaintances with whom a joint leisure activity can be organized (the degrees of freedom might differ depending on the social network of the examined agent)

Fig. 1
figure 1

The three dimensions of information flow for trip selection at the individual level

Several attempts have been made to define special laws to model and explain the movement of people (refer to [3]). However, the lack of information at the trip-selection phase hinders the maximization of utility. At this stage, the utilization of new information streams can be seen as a valuable resource for improving the awareness of agents over the three dimensions of the decision-making process. In an example of improving a service via raising the level information dissemination, early research in a survey with bus riders demonstrated various positive effects such as increased ridership and traveler satisfaction attributed to enhanced information availability pointing out that easy access to relevant travel information is a decisive factor for the success and adoption of public transport systems (refer to [4]).

Given the above, this study examines the state-of-the-art in the area of non-recurrent trips which can be turned to opportunities for leisure joint activities with the use of insights from user-generated data. Attention is given to searching for studies on utilizing near real-time user-generated data (i.e., data from smartphones, smartcards, PDAs) for tackling transportation problems. The aim is to formulate the problem of optimizing leisure travel considering the provision of user-generated data, present the direction of the state-of-the-art, understand why user-generated data has not been used for increasing the volume and the efficiency of joint leisure activities and propose actions to move towards this direction.

In Sect. 2, the utilization of user-generated Cellular Data (CD) in transportation problems is examined. In Sect. 3, we are investigating works utilizing Social Media (SM) data and in Sect. 4 works in the area of Smart Card (SC) data. In Sect. 5, the use of Geo-location data via personal navigators and smartphones is examined. In Sect. 6, a problem formulation for the joint leisure travel optimization with the use of user-generated data is proposed. Finally, the use of data from personal navigators is examined and a detailed catalog with future work directions is presented.

2 Utilizing Cellular Data in Transportation Problems

Cellular data is the form of user-generated data which have been studied the most for predicting individuals’ mobility patterns even if the mobile tracking via cell towers is not as accurate as the satellite-based positioning. Regardless the posed challenges, cellular data have been utilized to improve the understanding on human mobility and develop individual-level models for capturing the mobility and activity habits of individuals.

The most common individual-level models for predicting the mobility of individuals-which are not based solely on spatio-temporal travel pattern recognition-are the activity-based models (refer to [58]). Those models are the basis for forecasting individuals’ daily trip schedules from cellular data and perceive each trip as a means to participate at pre-scheduled activities.

In the literature, Musolesi and Mascolo [9] utilized Cellular data logs for correlating the mobility patterns of an individual with the mobility patterns of his friends and acquaintances. The underpinning theory of the correlation process includes the assumption that users’ travel patterns do not depend on time and space, but also on the travel patterns of other individuals inside their social network. The findings of the research showed that the mobility patterns of one examined agent can be predicted more accurately when the mobility patterns of his/her social network are considered as explanatory variables. De Domenico et al. [10] worked also on the same direction using data from the Nokia Mobile Data Challenge dataset. The work of Musolesi and Mascolo [9] can provide some evidence on the theoretical concepts developed by Carrasco et al. [11], Arentze and Timmermans [12] and Chen et al. [13] on predicting agents’ mobility based on their social networks. Those theoretical concepts place the traveler to the center of decision-making (ego-centric approaches) and offer a new framework for microsimulation, while harvesting large-scale user-generated data is expected to facilitate their implementation.

In addition, Carrasco et al. [14, 15], Gonzàlez et al. [3], Zhang et al. [16], Pan et al. [17] and White and Wells [18] utilized cellular data for predicting the mobility patterns of individuals in urban scenarios over time and space. Those studies, including studies of White and Wells [18] and Djuknic and Richton [19], attempted to exploit the emergence of cell tower positioning and the market penetration of mobile phones by developing methods for estimating the OD matrices in study areas. In the same way, Sohn and Kim [20] used cellular communication system and cell phone tower to transfer information and estimate OD matrices. To give a practical example, in the work of Calabrese et al. [14], an algorithm for estimating a population’s travel demand in terms of ODs from aggregating the trips of individual mobile phone users in the Boston Metropolitan area was developed. During the validation, it was shown also that the OD flows correlated well with the US Census estimates.

The limits of predictability in human dynamics by analyzing mobility patterns of mobile phone users were also analyzed and evaluated by Song et al. [21]. More recently, Dong et al. [22] and Wu et al. [23] proposed a methodology for using mobile phone data to analyze the mechanism of trip generation, trip attraction and the OD information with a pilot study at Beijing via using the K means clustering algorithm to divide the traffic zones. In addition, Ohashi et al. [24] worked on a method for separating trips (capturing the starting and ending points of a trip) on the basis of GPS data collected from smartphones by considering that even when the subject stays into a place, the collected GPS coordinates are not always exactly the same according to the surrounding environment assuming 81 % percision and recall rate of 62 %. Apart from detecting departure and arrival times, methods for classifying automatically modes of transportation on the basis of smartphone GPS data were also proposed by Ohashi et al. [25].

Finally, in another set of studies from [26, 27], Bluetooth devices were distributed to people to collect mobility data and study the characteristics of co-location patterns among them.

To summarize, works on utilizing cellular data in transportation have been focused on different areas:

  • Estimating the OD matrix in a study area

  • Exploring the mobility patterns of one individual based on the mobility patterns of his/her social network

  • Extracting the current mode of transportation

  • Extracting the starting and ending time of a trip

However, there is no work in our knowledge in the area of activity-participation analysis which can facilitate the development of new applications for suggesting common activities to users with social ties based on their willingness to participate simultaneously in similar activities in close proximity locations.

3 User-Generated Data from Social Media and Its Applications in Transportation

The research on data from social networks on understanding users’ mobility is in its early stages. The first studies focused on the power of micro-blogging on offering near real-time insights on crisis events when all other means of communication have failed. Routinely, the importance and the volume of the crisis event is captured through the magnitude of micro-blogging messages and their content information. A study from [28] explored crisis informatics using Twitter data after the Oklahoma Moore tornado demonstrating the potential of social media data on extracting relevant information during natural disasters.

In a similar fashion, social media data from social networks like Facebook, Twitter, and the image sharing service, Flickr, have already been used in research works describing crisis or natural disasters such as Virginia Tech shooting ([29]), Southern California wildfires ([30]), major Earthquakes in China ([31, 32]), Red River floods and Oklahoma grassfires ([33]).

In another set of works, [34] utilized the Internet as resource to capture the crowd levels during planned special events. In general, local events are not tracked from transport authorities since manual, labor-intensive tracking is needed. Pereira et al. [34] utilized the Internet as a resource for contextual information about special events and developed a model that predicts public transport arrivals in event areas. The results were demonstrated with a case study from the city-state of Singapore using public transport tap-in/tap-out data coupled with local event information obtained from the Internet performing primitive data fusion.

In another work, Gkiotsalitis and Alexandrou [35] focused on developing and testing analytic techniques for fusing user-generated data from Social Media and smartcards in order to capture the mobility patterns in urban areas. Automatic models for retrieving users’ mobility patterns from historic, user-generated data logs, comparing user’ profiles based on the similarity of their observed mobility patterns and categorizing users in clusters were developed. During the testing phase, user-generated data from London Smart Card and Social Media users collected between November 2012 and February 2014 were utilized to cluster users based on their mobility-activity pattern similarities. Results showed that it is possible to integrate data logs from multiple sources to capture the main mobility-activity patterns observed in an area. However the topic of joint participation in non-recurrent activities has not been addressed until now.

Social media have also been used for capturing the activity types performed by users at different locations via advanced spatio-temporal analysis and educated rules (refer to [36]). In the same work, techniques for estimating individuals’ daily schedules and the sequence of activities were developed. Alesiani et al. [37] focused also on the same topic introducing a probabilistic model for modeling individuals’ daily schedules based on input data from several sources (i.e., Social Media, Cellular Data).

Summarizing, social media data which is individualistic in nature has been utilized for:

  • Capturing the volume and the effects of crisis events

  • Estimating individuals’ mobility patterns and correlating them with with patterns observed with the use of other datasets

  • Retrieving activity types of users

  • Capturing the arrival times and the expected demand at local events

4 User-Generated Data from Smart Cards

With the deployment of automatic fare collection systems, large-scale data becomes available for real-world transport usage ([38]). As more and more sensors have been integrated into public transport infrastructures, large-scale transport data is produced at high rates ([39]). Nonetheless, studies of estimating individual travel patterns with smartcard data are sparse in public transport research compared to studies on cellular data and social media.

In the past, research has mainly focused on aggregate demand forecast ([40]). Based on a gravity model, Smith et al. [41] showed that some of the variation in mobility flows is influenced by distance and population of local residents via analyzing smartcard data, while Ceapa et al. [42] analyzed time series of automated fare collection data to identify events of overcrowding at public transport stations. Morency et al. [43] and Jang [44] also measured the transit use variability with smart-card data.

The potential of smart card data for travel behavior analysis in Britain was studied by Bagchi and White [45] where the pensioner concessionary pass in Southport, Merseyside, and the commercially operated scheme in Bradford were examined. There was stated that the nature of smart card data puts an emphasis on concept definition and rules-based processing; but limitations, such as the trip lengths which are not recorder to the system, were also recognized. The latter implicates also the efforts on performing individual-level analysis and predicting individuals daily travel schedules.

Foell et al. [46] utilized travel card data from a large population of bus riders from Lisbon, Portugal. The main intention of the work was to predict the future bus stops accessed by individual drivers and it was demonstrated that accurate predictions can be delivered by combining knowledge from personal ride histories and the mobility patterns of other riders. In another work, Ivanchev et al. [47] utilized smart card data from a bus line in Singapore for developing a modeling platform for testing bus transportation.

Finally, as discussed before, in the work of Gkiotsalitis and Alexandrou [35] a more individual-based approach was considered for clustering users based on the similarities of their mobility patterns as they were retrieved from pattern recognition on their historic smart card data logs. For the case study, data from 200 Oyster card users in London were utilized.

It is evident that smartcard data offers less qualitative information compared to social media or cellular data generating problems for predicting the daily schedules and the social networks of individuals. Nevertheless, it has great potential on the first scale of information retrieval: “Capture in real-time the traffic on road networks and the dealys on public transport”.

5 Use of Geo-Location Data via Personal Navigators and Smartphones in Transportation

More classic methods on dynamic OD estimation using automatic vehicle identification data can be found in the work of Zhou and Mahmassani [48] and Baek et al. [49]. Schuessler and Axhausen [50], Zheng et al. [51], and Brunauer et al. [52] proposed methods for distinguishing pedestrians, bicycles, cars, buses, and trains on the basis of GPS data only. Stenneth et al. [53] introduced an idea of using GIS information to enhance the accuracy of classification. They utilized information about the real-time location of buses and locations of rail lines and bus stops where they reported that they could improve the classification accuracy by 17 %.

Nitsche et al. [54] and Feng and Timmermans [55] proposed methods that use acceleration data together with GPS data following the work of Wu et al. [56] on estimating individuals’ activity patterns. Wu et al. [56] attempted to estimate the activity patterns of smartphone users. They developed a method for classifying “indoor”, “outdoor static”, “outdoor walking”, and “in-vehicle” status. Similarly, Hato [57] developed a special device, called a behavioral context addressable logger (BCALs), for collecting various kinds of data such as GPS coordinates, acceleration, atmospheric pressure, angular velocity, UV index, direction, and loudness. BCALs can distinguish situations in which smartphone users are classified as “walking”, “up/down-staircase”, “bicycling”, and “in-store”.

There are also studies on trip-separation methods (mainly by capturing the starting and ending time of a trip) by Li et al. [58], Bohte et al. [59], Chen et al. [60] and Li et al. [61]. Among these studies, only Witayangkurn et al. [62] reported an evaluation of a trip-separation method. The basic idea forming the basis of their method is to find the so-called “stay points”. They regard consecutive GPS coordinates as stay points if they satisfy the following two conditions: (i) they fall within a circle with diameter of 196 m; and (ii) the time difference between the first and last stay points is more than 14 min. The key idea behind this stay-point detection is that it eliminates outliers, which can cause mis-detection. This trip-separation method achieved precision of 92.4 % and recall rate of 90.5 %.

Given the above, one can conclude that geo-location data from smartphones or personal navigators have been mainly utilized for:

  • Estimating OD matrices

  • Capturing the type of utilized transportation

  • Activity-pattern estimation

  • Separation of trips

6 Optimizing Joint Leisure Travel with BigData

Continuous updated, user-generated data can be utilized to capture less frequent trips and improve the understanding of individuals’ mobility behavior. Collected data from Smartphones, Social Media, personal navigators and Smartcards has an individualistic nature since it is generated from distinct users. In a generalized example, it is assumed that the generated data footprint from an individual at each time instance returns information about the timestamp of data publishing, the utilized transport mode, the geo-location and the user ID.

$$\begin{aligned} \delta _{i,t} = {\left\{ \begin{array}{ll} t\\ \zeta \\ L\\ i\\ \end{array}\right. } \end{aligned}$$

where \(\delta _{i,t}\) is agent’s \(i\) generated data footprint, \(t\) the timestamp, \(\zeta \) the transport mode, where \(\zeta \in Z\) and \(L\) the geo-location where \(L \in \varLambda \) and \(\varLambda \) is the set of geo-locations defined by a pair of coordinates.

Fig. 2
figure 2

Estimating the daily evolution of states over a weekday with the use of probability matrix \(P_{i,t}\)

Following the above notation, individuals’ data generation can offer mobility insights regarding his/her daily mobility patterns via utilizing un-supervised pattern recognition models. Those models can be trained on datasets containing historical data from one individual’s data footprints accumulated over a significant time period (i.e., more than 6 months). The outcome of the pattern recognition phase can be summarized in a probability matrix with spatio-temporal characteristics, \(P_{i,t}[L,\zeta ]\), which returns the probability of individual, \(i\), to be at location, \(L\), and use transport mode, \(\zeta \), at time \(t\). Since individuals mobility patterns can vary significantly on weekends, for each individual, \(i\), two matrices can be assigned—one capturing the travel patterns of the user during the week and one during the weekend. Further discretization is allowed and can be decided in a case by case basis if certain individuals have significantly different mobility patterns over some days of the week.

Each matrix \(P_{i,t}[L,\zeta ]\) has \([T\times \lambda \times Z]\) elements, where \(T\) is the sum of time instances over a day and \(Z\) the set of available transport modes including on foot travel. Having calculated one individual’s probability matrix at day type \(k\), the daily mobility plan of the individual can be estimated with deterministic modeling via using single point estimates. For each single-point estimate the matrix \(P_{i,t}\) is utilized and the output is a sequence of states, \(q_{i,\rho }(t)\), where \(\rho \) the list of feasible states at time \(t\) and day type \(k\) as they are derived from the analysis of individual’s data footprints \(\delta _{i,t}\) (refer to Fig. 2).

Apart from the probabilistic matrix, historical, user-generated data can offer insights on the social network of individuals. For instance the list of friends, acquaintances and common preferences can be retrieved from user-generated data analysis (refer for instance to the released dataset from [63] using data from Facebook.com). Golder et al. [64] showed also that users only message to a small number of friends on Facebook (close friends) while they have a large number of declared friends (acquaintances) and Huberman et al. [65] showed that most of the links declared in Twitter are meaningless from an interaction point of view but hidden social networks can be revealed when tracing the spread of ideas. In the same direction, different forms of user-generated data can be utilized to identify the social network of one individual (ego-centric approach) and attach a weight representing the strength of bonds among individuals:

$$\begin{aligned} W_i = {\left\{ \begin{array}{ll} w_{i,j}\\ ...\\ ...\\ w_{i,N}\\ \end{array}\right. } \end{aligned}$$

where \(\{j,...,N\}\) is the set of individuals having social ties with user \(i\) and \(W_i\ge 0\) the weight symbolizing the connection strength among them.

On another note, the preferred undertaken activity of one individual \(i\) at one re-visited location \(L\in \varLambda \) can be estimated after analyzing historical user-generated data. In the work of Gkiotsalitis and Stathopoulos [66] empirical rules for allocating one activity \(A_m \in A\) at one re-visited location \(L\) were defined by categorizing all activities in a discrete set of four (Fixed; Quasi-Fixed; Flexible; Home-related). Hence, each re-visited location \(L\) by one individual \(i\) is associated to one and only one activity \(A_m\):

$$\begin{aligned} i[L] = A_m \in A \end{aligned}$$
(1)

For allocating activities to locations, one can utilize spatio-temporal analysis on historical data. Such approach had been used in the case of user-generated data from social media ([36]) and cellular data ([67]).

Moving further towards that direction, user-generated data can also provide information on how far one individual can travel to participate at a leisure activity at different day times and day types. For instance, one individual might not be willing to travel more than 500 m at working hours during the week for participating in leisure activities. In an attempt to model one individual’s choice of traveling a certain distance for participating in a leisure activity, a utility function can be defined. After applying a time discretization scheme \(t=\{1,...,T\}\), the choice options can be indexed by \(j=\{1,...,J\}\) where \(F_j\) is the traveled distance between two consecutive activities. The distance between two consecutive activities can be either calculated with the Haversine formula (Great-circle distance) or via the map-based shortest path distance with the use of a shortest path algorithm (refer to [68] for such algorithms).

$$\begin{aligned} j = {\left\{ \begin{array}{ll} 1: F_j\le 250\,\mathrm{{m}}\\ 2: 250\,\mathrm{{m}}< F_j\le 500\,\mathrm{{m}}\\ 3: 500\,\mathrm{{m}}< F_j\le 750\,\mathrm{{m}}\\ 4: 750\,\mathrm{{m}}< F_j\le 1\,\mathrm{{km}}\\ 5: ....\\ 6: ....\\ \end{array}\right. } \end{aligned}$$

For each day type, \(k\), an index of satisfaction for participating in different activity types with respect to their distance from the previous location can be defined in the form of a linear utility function:

$$\begin{aligned} V_{tj}(k)=\alpha _j(k)+\beta _j(k) A_t(k) \end{aligned}$$
(2)

where \(A_t(k)\) varies across different times of the day and represents the activity type (i.e., home, fixed, quasi-fixed or flexible) in the form of a categorical variable. In addition, \(\alpha \) is a scalar utility term representing individual’s preference for alternative \(j\).

The random utility of traveling distance \(F_j\), for an individual can be described by a random utility model:

$$\begin{aligned} U_{tj}(k)=V_{tj}(k)+\varepsilon _{tj}(k) \end{aligned}$$
(3)

where \(\varepsilon _{tj}(k)\) is the unobserved component of the utility function and can be treated as a random variable since it includes the impact of all the unobserved variables which influence the utility of selecting a specific alternative.

With the assumptions that errors follow a Gumbel distribution, are independent and identically distributed, the probability of selecting an alternative \(\lambda =F_j\) at a certain point in time, \(\rho _{t\lambda }(k)\), can be expressed via a multinomial logit model:

$$\begin{aligned} \begin{aligned} \rho _{t\lambda }(k)&=\rho \bigg (V_{t\lambda }+\varepsilon _{t\lambda }(k)\ge \mathrm{{max}}_{j\in \{1,...,J\}}V_{jt}(k)+\varepsilon _{tj}(k)\bigg )\\&=\dfrac{e^{V_{t\lambda }(k)}}{\sum _{j=1}^{J} e^{V_{tj}(k)}} \end{aligned} \end{aligned}$$
(4)

The parameters \(\alpha _j(k)\), \(\beta _j(k)\) can be estimated for each individual as the values that maximize the log-likelihood function:

$$\begin{aligned} \max _{\alpha _j(k),\beta _{j}(k)}\ell \big (\alpha _{j}(k),\beta _{j}(k)\big ) \end{aligned}$$
(5)

resulting to a non-linear optimization problem for which the optimization algorithm BHHH proposed by Berndt et al. [69] can be applied. The coefficient values are updated in an iterative approach beginning with a starting set of values and iterations continue until convergence.

To summarize, pattern recognition models can be applied to generate some value from user-generated data:

  • Estimate individuals’ daily schedule over different day types via single-point estimates of their state evolution over time

  • Capture the gravity of personal relationships and assign weights to friends and acquaintances of each individual

  • Replicate the decision-making process of each individual and return their willingness to travel certain distances at different times of the day to participate in leisure activities

To optimize joint leisure travel, a time and place for a joint leisure activity which maximizes the gain of all attendees should be defined. For such undertaking, the perceived utility of all individuals participating at one activity \(L\in \varLambda \) at each point in time \(t\) over a day should be estimated for selecting the spatio-temporal set \(L_{*},t_{*}\) which maximizes the perceived utility among all attendees. The computational cost of it is \(~\lambda \times T \times N^2\) where \(T\) is the discretized time scheme and \(N\) the number of individuals (refer to Fig. 4). Although the effect of time discretization to the overall time complexity is linear, a discretization every 30 min to one hour is proposed to avoid significant computational cost increases. Therefore, the time should be discretized and at each step the utility of attending one location of leisure activity can be computed (refer to Fig. 3).

Fig. 3
figure 3

Selecting place of interest \(L_{*}\) among a set of locations \(L\in \lambda \) at time instance \(t\) which maximizes the perceived utility among all attendees

In the problem of estimating the location of a leisure activity and the time of day that maximizes the utility of performing a joint leisure activity, the relationship weights among the attendees can be perceived as positive factors while the required travel distance from each individual’s current location to the meeting place can be perceived as negative.

Let us assume that the probability of individual \(i\) to travel a certain distance \(\tau _{i}\in F_j\) at time instance \(t\) is \(\rho _{tj}(k)\) as it is already derived from his/her utility-maximization model. Then, a threshold value \(\varUpsilon \) can be introduced and if \(\rho _{tj}(k)< \varUpsilon \) for one place of interest \(L\in \varLambda \) for which the distance from the previous individual’s location is within \(F_j\), then location \(L\) is perceived as non-feasible place for transition. Hence, for each individual \(i\) the distance between his/her current location, \(c\), and a place of interest \(L\in \varLambda \) is calculated. Then, if distance \(\tau (c,L)\in F_j\) and \(\rho _{tj}(k)< \varUpsilon \) or \(L\) is not a leisure activity location for individual \(i\), the place of interest \(L\) is assigned to the list of in-feasible transitions for time \(t\): \(\phi _{i,t}=\phi _{i,t}+\{L\}\). Then, a location \(L_{*}\in \varLambda \) and time \(t_{*}\in T\) is the optimal joint leisure activity set if:

$$\begin{aligned} \{L_{*},t_{*}\}=\text {argmax} (\alpha \sum _{i=1}^{N} \sum _{m=1}^{N} \dfrac{1}{2}w_{i,m}(L,t) - \beta \sum _{i=1}^{N}\tau _{i}(L,t)) \end{aligned}$$
(6)

where:

$$\begin{aligned} w_{i,m}(L,t)&= {\left\{ \begin{array}{ll} w_{i,m}\ge 0 \text {: weight of connection strength between users i, m}\\ 0 \quad \text {if} \quad L\in \phi _{i,t} \quad \text {or} \quad L\in \phi _{m,t}\\ \end{array}\right. } \\ \tau _{i}(L,t)&= {\left\{ \begin{array}{ll} \tau _{i}\ge 0 \text {: the traveled distance between the current location and L}\\ 0 \quad \text {if} \quad L\in \phi _{i,t} \\ \end{array}\right. } \end{aligned}$$

and \(\alpha , \beta > 0\) objective function coefficients. It is evident that \(\alpha \) is more significant than \(\beta \) if one considers the activity participation of attendees with strong social ties as the main objective, while \(\beta \) is more significant if the scope is to reduce the covered travel distance. Algorithm 1 summarizes the optimization procedure.

figure a

The computational cost of Algorithm 1 for joint leisure travel optimization was tested on a 2556 MHz processor machine with 1024 Megabytes RAM. During the testing, the number of locations was, \(\varLambda =200\) locations, and the time was discretized into ninety-six periods of 30-min. duration, \(T=96\). The main variable is the number of friends and acquaintances for which Algorithm 1 computes the location and time for a leisure activity and the computational cost is plotted in Fig. 4. Figure 4 provides an indication of the number of individuals which can be served within a reasonable time frame. Finally, it should me mentioned that Algorithm 1 runs centrally to avoid unnecessary re-computations (in general, the approach follows a central architecture where user-generated data is stored centrally and the travel patterns, list of friends and acquaintances and willingness to travel certain distances to participate at leisure activities are estimated after processing the stored data; therefore, enabling the implementation of Algorithm 1 at a central level.

Fig. 4
figure 4

Computaional cost considering \(\varLambda =200\) and \(T=96\) for different numbers of individuals (tested on a 2556 MHz processor machine with 1024 Megabytes RAM).

7 Discussion and Conclusions

This survey study attempted to investigate how different forms of user-generated data (cellular, social media, smart card and personal navigator data) have been utilized until now and examine if the data sources and the developed techniques have some potential on increasing the efficiency of joint leisure activities in today’s metropolis.

In a first attempt to summarize the results, Table 1 provides aggregated information on the usage of user-generated data from different sources according to the state-of-the-art studies.

Table 1 Aggregated information on the usage of user-generated data from different sources according to the state-of-the-art studies

In the introduction section of the survey paper, three information dimensions were considered for assuming that an individual is perfectly informed for making an optimal decision on selecting a leisure joint activity. In Table 2, we show which kind of information is expected to be retrieved from different sources of user-generated data. From Tables 1 and 2 one can observe that although the full information for forming an objective function is obtainable, research work has not been focused on that direction.

Table 2 Potential of user-generated data on providing information for joint leisure activity planning

Due to the above, the importance of developing new models for tapping the potential of user-generated data for improving the efficiency of joint leisure activity planning is highlighted. New models and techniques are recommended to focus on the following:

  • Data processing tools

  • Algorithmic tools for data aggregation and fusion

  • Processing tools that can calculate the maximum of the utility function and return an optimal joint leisure activity to the traveler

Considering those issues, we tried to formulate the problem of optimizing joint leisure travel by taking into consideration the special characteristics of user-generated data. The proposed formulation is flexible and can handle inputs even after data fusion since it requires a minimum information set (UserID; Timestamp; Geo-location; transport mode). The proposed algorithm is also designed to ensure scalability by enabling the computation of leisure travel optimization for up to \(30\) individuals in less than \(20\) minutes considering a 15-minute time discretization and up to 200 locations to choose from.

Proceeding towards this direction, around 70 % of the total number of trips in metropolitan areas can be planned more efficiently and the interpersonal activities can be heavily increased in numbers yielding remarkable gains for both the individual traveler and the central transport authorities.