Introduction

In location-based social networks (LBSNs), users are allowed to provide location data to traditional social networks, fostering several location-based services, e.g., Foursquare, Gowalla, and Geolife. It is very convenient for users to record and share their life experiences with such applications on mobile devices. With advances in location acquisition and wireless communication technology, location-based social networks are increasingly important in people’s everyday life. One of the most important challenges in LBSNs, personalized route query aims to provide an optimal route that passes through the locations specified by the query user and satisfies the travel distance constraints. This topic has recently become popular and has a wide range of applications. In addition, personalized route query has an important application in the field of cognitive computation that helps individuals decide on the optimal route plan. Cognitive computation is a multidisciplinary field of research aiming at devising computational models and decision making mechanisms [3, 15, 34, 36, 46]. The objective of cognitive computational models is to endow computer systems with the faculties of knowing, thinking, and feeling.

In the literature on personalized route query, Li et al. [26] proposed the trip planning query (TPQ) to find the shortest path from the starting point to the destination through several venues that can cover the category requirements of the query user. Cao et al. [4] proposed a novel keyword-aware optimal route query (KOR) able to find an optimal route covering a set of user-specified keywords and satisfying a specified budget constraint while optimizing the route’s objective score. Zeng et al. [45] studied optimal route searching with the coverage of user preferences. Additionally, most studies have focused on preferred route recommendation [25, 44]. The information provided by an untrusted friend may not influence certain behavioral decisions (personalized route query). We study the personalized route query by incorporating trust that has been neglected by the existing studies. Following the existing work [9], social trust as a probability value of one user relative to another is accepted by the great majority of authors and adopted in our work. Social trust measures the credibility between two persons and has been widely used in providing personalized recommendations [1, 24, 33, 41]. However, most studies focus on social tie recommendations.

In this paper, we propose a new type of personalized route query by incorporating social trust in LBSNs, called s ocial t rust aware p ersonalized r oute q uery (STPRQ). Specifically, given a set of venue categories (such as a restaurant, a gym, and a cafe), STPRQ aims to find a proper route R from the starting venue to the destination. R should pass through several venues of the respective categories and be credible and popular in the social circle of the query user. Furthermore, the route should satisfy a road network distance constraint.

As mentioned above, STPRQ is a novel problem that can help individuals make decisions. However, the existing approaches are not suitable for tackling the STPRQ problem. The challenge is threefold. First, the social trust of each pair of users is unknown. Therefore, the first challenge is to evaluate the social trust of each pair of users in the underlying LBSNs. Second, to the best of our knowledge, we are the first to study personalized route query by incorporating social trust. This problem is not trivial. Thus, the second challenge is to incorporate social trust into personalized route query. Third, the existing studies related to personalized route query are unsuitable for addressing the STPRQ problem. A naive solution is to enumerate all possible routes from the starting venue to the destination and subsequently select an appropriate route. Clearly, the naive solution is time-consuming and inefficient. Therefore, the third challenge is to efficiently search for an appropriate route satisfying the query constraints.

To overcome the above challenges, we design a social trust-based optimal trip selection (STOTS) framework that consists of three key components.

In the first component, we analyze social trust in location-based social networks. Extreme learning machine (ELM) provides a unified learning platform with a widespread type of feature mapping and can be applied in a regression analysis directly [20,21,22]. Inspired by this platform, we design an approach based on ELM to evaluate social trust effectively. To discover the social trust, we construct features that exploit social information and user behavioral patterns from the following aspects: user profiles, social structure and user behaviors.

In the second component, we design a rank model incorporating social trust into personalized route query that can be leveraged to estimate the credibility of a route. Specifically, we evaluate the popularity score of a venue (a point-of-interest, POI) with respect to a user. Assuming that we know the social trust between users, we can estimate the credibility score of a venue. After that, for a given route, we group venues by category. The sum of the highest ones in each of these groups can be regarded as the credibility score of the route.

In the third component, we propose an advanced search algorithm called RouteHunter to find the preferred route that can satisfy user constraints. RouteHunter adopts a greedy strategy that extends the route from the starting venue. We use a venue ranking scheme that considers the gain from adding a venue into a route. That is, we consider both the category diversity and distance constraint of a route.

Finally, extensive experiments on real-world datasets demonstrate the efficiency and effectiveness of out proposed algorithms.

The remainder of this paper is organized as follows. We begin in “Related Works”, where we discuss the related studies. We then introduce the preliminaries and problem formulation in “Preliminaries and Problem Formulation”. We present our framework for the problem of STPRQ in “Framework” and describe our approach in detail in “Approach”. We discuss the experiments in “Experiments” and conclude this paper in “Conclusion”.

Related Works

Personalized route query has attracted significant attention in recent years. For instance, Li et al. [26] propose the TPQ query that aims to find the shortest path from the starting point to the destination through several venues that can satisfy the category requirements of the query user. Cao et al. [4] consider both the user preferences and the budget constraint to find the optimal route. Dai et al. [6] propose a highly expressive personalized route planning query that considers both personalization and the sequence constraint. However, in the existing studies related to personalized route query, the social trust has not received sufficient attention. In this paper, as far as we know, we are the first to study the personalized route query by incorporating social trust. Our proposed problem is different from the problems in the existing studies. Additionally, the previous methods related to personalized route query are not suitable for tackling our proposed problem.

Social trust is a critical factor in decision-making and has recently attracted research interest from many fields, e.g., sociology, psychology, computer science, and cognitive sciences. Studying trust propagation [47], Golback and Hendler [12] propose a trust inference mechanism for assigning trust in Web-based social networks and investigate how trust information can be mined and integrated into applications. Guha et al. [13] develop a formal framework of trust propagation schemes, introducing the formal and computational treatment of distrust propagation. Meanwhile, several studies focus on social trust path selection [27, 39]. Hang et al. [16] propose a social trust path selection method in online social networks, selecting the social trust path with the maximum of propagated trust values as the optimal solution. Liu et al. [27] present a complex social network structure that considers trust information, social relationships and recommendation roles and propose an efficient heuristic algorithm in which multiple foreseen social paths are formed to help social trust path selection.

Several recent studies combine recommendation systems with social trust and reputation mechanisms. For instance, Walter et al. [35] propose a recommendation system that combines the concepts of social networking and trust relationships. Jamali and Ester [23] propose a random walk model combining the trust-based and collaborative filtering approaches to recommendations. It can be observed that although many recommendation systems incorporate social trust, such systems mostly focus on item or user recommendations. To the best of our knowledge, no study has considered personalized route query by incorporating social trust.

Preliminaries and Problem Formulation

In this section, we briefly review the extreme learning machine (ELM), which is used as a starting point of our proposed approach. Subsequently, we formally define the problem of social trust-aware personalized route query.

Brief Description of Extreme Learning Machine

Extreme learning machine (ELM), developed by Huang et al. [18,19,20,21,22], is based on a generalized single-hidden-layer feedforward network (SLFN). In ELM, the hidden-layer node parameters are mathematically calculated, instead of being iteratively tuned. Thus, a good generalization performance can be provided at speeds thousands of times higher than those of traditional popular learning algorithms of feedforward neural networks [20]. ELM has been widely applied in various fields, such as theoretical research [31, 43], image analysis processing [7, 38], prediction analysis [28, 29], and classification [8, 14, 40].

Given the activation function \(g(x)\) and N arbitrary distinct samples \((\mathbf {x}_{i},\mathbf {t}_{i})\), where \(\mathbf {x}_{i}=[x_{i1},x_{i2},\cdots ,x_{in}]^{T}\)Rn and \(\mathbf {t}_{i}=[t_{i1},t_{i2},\cdots ,t_{im}]\in \mathbf {R}^{\mathbf {m}}\), a standard SLFN with L hidden nodes is mathematically modeled as

$$ f_{L}(x)=\sum\limits_{i = 1}^{L}{\beta}_{i}g(\mathbf{a}_{i},b_{i},\mathbf{x})={o}_{j} $$
(1)

where \(\mathbf {a}_{i}\) and \(b_{i}\) are the learning parameters of hidden nodes, \( {\beta }_{i}\) denotes the output weight connecting the i th hidden node to the output layer, and \( {o}_{j}\) is the j th output vector of the SLFNs.

Usually, an SLFN can approximate these N samples with zero error. That is, the error of ELM is \({\sum }_{j = 1}^{L}\| {o}_{j}- {t}_{j}\|= 0\). Thus, there exist \(\mathbf {a}_{i}\), \(b_{i}\) and \( {\beta }_{i}\) such that

$$ \sum\limits_{i = 1}^{L}{\beta}_{i}g(\mathbf{a}_{i},b_{i},\mathbf{x})={t}_{j}, j = 1,2,\cdots,N. $$
(2)

Equation 2 can be written compactly as

$$ {H}{\beta}={T} $$
(3)

where

$$\begin{array}{@{}rcl@{}} &&{H}({a}_{1},{a}_{2},\cdots,{a}_{L},b_{1},b_{2},\cdots,b_{L},{x}_{1},{x}_{2},\cdots,{x}_{L}) \\ && = \left[\begin{array}{c}{h}({x}_{1})\\ {\vdots} \\ {h}({x}_{N}) \end{array}\right] = \left[\begin{array}{ccc}g(\mathbf{a}_{1},b_{1},\mathbf{x}_{1}) & {\cdots} & g(\mathbf{a}_{L},b_{L},\mathbf{x}_{1})\\ {\vdots} & {\cdots} & {\vdots} \\ g(\mathbf{a}_{1},b_{1},\mathbf{x}_{N}) & {\cdots} & g(\mathbf{a}_{L},b_{L},\mathbf{x}_{N}) \end{array}\right]_{N\times L} \end{array} $$
(4)
$$ {\beta}=[{\beta}_{1}^{T},\cdots,{\beta}_{L}^{T}]_{m\times L}^{T} $$
(5)
$$ {T}=[{t}_{1}^{T},\cdots,{t}_{N}^{T}]_{m\times N}^{T} $$
(6)

H is the hidden-layer output matrix of the network. The i th column of \( {H}\) is the i th hidden nodes’ output vector with respect to inputs \(( {x}_{1}, {x}_{2},\cdots , {x}_{N})\), while the j th row of \( {H}\) is the output vector of the hidden layer with respect to the j th input \( {x}_{j}\).

The activation function \(g(x)\) is infinitely differentiable and is usually designed to be the sigmoid function, the Gaussian function or the radial basis function (RBF). In this paper, we consider the RBF as the activation function; thus, we obtain

$$ g({a}_{i},b_{i},{x})=g(b_{i}\| {x}-{a}_{i}\|),b_{i}\in R^{+} $$
(7)

As shown in Fig. 1, the parameters of hidden-layer nodes in ELM, i.e., \( {a}_{i}\) and \(b_{i}\), can be randomly selected regardless of the training datasets. The output weight \( {\beta }\) can be estimated with the matrix computation formula \( {\beta }= {H}^{\dag } {T}\). Therefore, the output function of ELM can be formulated as \(f( {x})= {h}( {x}) {H}^{\dag } {T}\).

Fig. 1
figure 1

A single-hidden layer feedforward network

Problem Statement

Data Model

As shown in Fig. 2, a location-based social network (LBSN) consists of a road network and a road-social network (the venues checked-in by users must belong to the road network). We model the road network as an undirected graph RN(L, E), where L is the vertex set representing the venues, i.e., POIs, in the road network. Each venue \(l \in L\) has a keyword \(l.\phi \) indicating its category, e.g., restaurant or pub. Each \(e(l_{i},l_{j}) \in E\) is an edge between \(l_{i}\) and \(l_{j}\), indicating that there exists a path between the two venues. Each edge \(e \in E\) is associated with a positive weight \(w(e)\) showing the road network distance. We consider \(R=(l_{1}, l_{2},l_{n})\) to be a route if and only if \(l_{i} \in L\) and \((l_{i}, l_{i}+ 1) \in E\) for every \(i \in 1,2,\cdots ,n\). The venue categories covered by route R are denoted \(R.{\Phi }\). The sum of weights of all edges in a route R is regarded as the length of R. 4

Fig. 2
figure 2

Graph models of an LBSN

We model the road-social network as an undirected graph \(SN(U,C)\), where U is the vertex set representing the users in a social network, each \(u_{i} \in U\) has a check-in list recording the venues the user has visited, and each \(c(u_{i},u_{j}) \in C\) is an edge between \(u_{i}\) and \(u_{j}\) indicating that the two users have a relationship, such as friendship or being classmates. As shown in Fig. 2b , the hollow dots are the users of the social network, the solid dots are the locations visited by users of the social network, the solid lines connecting two hollow dots are the social edges indicating that the two users have a relationship, and the blue dotted line connecting a hollow dot and a solid dot means that the user has visited the location, with the number on the dotted line representing the number of visits.

Query Model

A social trust-aware personalized route query (STPRQ) in LBSNs is described as a tuple q =< qu, s, t,Φ,Δ > where

  • \(q_{u}\) is the query user;

  • s is the starting venue;

  • t is the destination;

  • \({\Phi }\) is a set of venue categories specified by the query user; and

  • \({\Delta }\) is the travel distance constraint.

Ranking Model

Our goal is to identify a proper route that is credible and popular in the social circle of the query user. Thus, we consider the following aspects to quantify the credibility and popularity of a route.

For the query user, the route popularity reflects the interest score of the route in the user’s social circle. The user similarity \(Sim(u,v)\) between users u and v can be calculated by the Jaccard distance[32]; thus, we obtain

$$ \text{Sim}(u,v)=\frac{|N(u)\cap N(v)|}{|N(u)\cup N(v)|} $$
(8)

where \(N(v)\) is the neighbor set of user v in the underlying social network.

A route may pass through multiple venues. Thus, we first describe the measurement of the popularity of a venue \(l_{i}\) with respect to a user v, denoted \(p(v,l_{i})\). We have

$$ p(v,l_{i})=\frac{{\sum}_{u\in U}\text{Sim}(v,u)\cdot \text{count}(u,l_{i})}{{\sum}_{u\in U}\text{count}(u,l_{i})} $$
(9)

where \(\text {count}(u,l_{i})\) represents the number of times u visited \(l_{i}\).

An important measure for social analysis is social similarity. In addition, another important measure is social trust. It is worth noting that social trust is different from social similarity. Specifically, a high social similarity between two users implies that their behavior patterns are similar, even though they may distrust each other. The information provided by an untrusted friend may not influence certain behavioral decisions. In contrast, the information from a trusted friend can be accepted, even if the users’ social similarity is very low.

Social trust \(t(u,v)\) is the probability value ([0,1]) between two users u and v. For simplicity, social trust is symmetric in this paper. In other words, \(t(u,v)=t(v,u)\). Evaluation of social trust will be described in “Social Trust Evaluation”. Next, we discuss the estimation of the credibility score of a venue for a user. We have

$$ c(v,l_{i})=\frac{{\sum}_{u\in U}t(v,u)\cdot p(u,l_{i})}{{\sum}_{u\in U}p(u,l_{i})} $$
(10)

where \(c(v,l_{i})\) denotes the credibility score of \(l_{i}\) with respect to v.

Now, we are ready to present the credibility score of route R that consists of a sequence of venues, i.e., \(R=\{l_{1},l_{2},\cdots ,l_{m}\}\). The intuition is that we group the venues in a route by category. Subsequently, we select the groups that match the query categories \({\Phi }\). The highest score in each selected group is added to the credibility score of route R associated with user v, denoted \(RS(v,R)\).

Now, we formally define our social trust-aware personalized route query .

Problem 1 (Social Trust-aware Personalized Route Query (STPRQ).)

Given a road network \(RN(L,E)\), a corresponding road-social network \(SN(U,C)\), a query user \(u_{q}\), a starting venue s, a destination t, a set of keywords \({\Phi }\), and a distance constraint \({\Delta }\), a social trust-aware personalized route query (STPRQ) \(q=\langle u_{q},s,t,{\Phi },{\Delta } \rangle \) aims to identify a route R from s to t such that

  1. 1.

    R starts from s and ends at t, denoted \(R:s\rightsquigarrow t\);

  2. 2.

    the length of R should not exceed \({\Delta }\), i.e., \(\text {Dist}(R) \leq {\Delta }\);

  3. 3.

    the query keyword set \({\Phi }\) should be covered by R, i.e., \({\Phi } \subseteq R.{\Phi }\); and

  4. 4.

    for any other route \(R^{\prime }:s\rightsquigarrow t\) with \(\text {Dist}(R^{\prime }) \leq {\Delta }\) and \({\Phi } \subseteq R^{\prime }.{\Phi }\), \(RS(u_{q},R^{\prime }) \leq RS(u_{q},R)\).

Example 1

Considering the location-based social networks shown in Fig. 2 as an example, assume an STPRQ q = 〈u1, l1, l9,Φ = {φ3, φ5},17〉, meaning that user \(u_{1}\) wants to find a route from venue \(l_{1}\) to \(l_{9}\). Additionally, the route must pass through two venue categories, one being \(\varphi _{3}\) and the other \(\varphi _{5}\). The credibility score of the route should be as large as possible, while the length should be no greater than 17.

Framework

When a user proposes a query \(q=\langle q_{u},s,t,\phi ,{\Delta } \rangle \), our goal is to respond efficiently to the social trust-aware personalized route query. As shown in Fig. 3, we propose a social trust-based optimal trip selection (STOTS) framework that consists of three components: social trust evaluation, route credibility estimation, and optimal route query.

Fig. 3
figure 3

The framework

Social Trust Evaluation

At this stage, we first extract features that exploit social information and user behavioral patterns, including user profiles, social structure, and user behaviors in LBSNs. Subsequently, we propose an approach based on ELM to evaluate social trust between the two users of any social ties in LBSNs. Additionally, we leverage a trust propagation model to evaluate the social trust of any pair of users. It is worth noting that this part can be performed offline.

Route Credibility Estimation

At this stage, our aim is to estimate the credibility of a route for the query user. As mentioned in “Problem Statement”, the route credibility can be obtained by calculating the credibility score of a venue for the query user. Thus, we design a novel index, namely, the category-oracle inverted index that can be used to accelerate the calculation of location credibility.

Optimal Route Query

When a user proposes a query q, the venue score can be calculated quickly based on the above two stages; an efficient method will be described to query a proper route in this part. We will illustrate this part in “Optimal Route Query”.

Approach

In this section, we present the social trust evaluation, detail the estimation of the credibility of a route, and then describe the optimal route query.

Social Trust Evaluation

Recall that our aim is to discern the social trust value of any two users. There are two cases: in the first, two users are connected by a social edge directly; the second is the opposite case.

Case 1. Existing direct edge connection

As mentioned above, social trust is a probability value ([0,1]) between two users u and v implying the creditability between two users. We consider social trust evaluation as a regression analysis problem. To this end, we propose a novel semi-supervised social trust prediction method based on extreme learning machine (ELM), denoted STP-ELM. As shown in Fig. 4, STP-ELM consists of two phases: feature extraction and trust prediction.

Fig. 4
figure 4

The process of STP-ELM

Feature Extraction

We now describe the extracted features attached to two users connected by a social edge directly. As explained below, various features are summarized from the aspects of user profile, social features, and user behavior.

  1. 1)

    User profile. In real life, user profiles have an important influence on the social trust between two users. For instance, two users with the same gender will trust each other. Specifically, we extract two kinds of features in this class, which are shown in Table 1. Feature \(f_{1}\) is the absolute value of the difference of two users’ ages. Feature \(f_{2}\) is the difference of two users’ genders. That is, \(f_{2}= 1\) if the two users’ genders are the same; otherwise, \(f_{2}= 0\).

  2. 2)

    Social features. In this class, we utilize the betweenness centrality [2, 11, 37] that has been proposed to measure the social importance of a user to reflect the social feature. Particularly, a user is likely to be trusted if the user is important in the social network. In the social graph, betweenness centrality measures the fraction of shortest social paths that pass through a given node, averaged over all pairs of node.

    Let \(SN(U,C)\) be the graph associated with a road-social network, given three nodes \(u,v,w \in U \). Brandes et al. in [2] introduces the pair-dependency of u and w on v, demonstrated as Eq. 5:

    $$ \delta(v) = \frac{\sigma_{uw}(v)}{\sigma_{uw}} $$
    (11)

    where \(\sigma _{uw}\) is the number of shortest social paths from u to w, while \(\sigma _{uw}(v)\) is the cardinality of the subset of social paths from u to w passing through v.

    The betweenness centrality of node v is obtained as the sum of the pair-dependency values of each pair of nodes on v. To eliminate the direct computation of all sums, Brandes introduces the dependence of vertex u on v as:

    $$ \delta_{u\bullet}(v) = \sum\limits_{w\in U} \delta_{uw}(v) $$
    (12)

    Thus, the betweenness centrality B of node v is obtained by summing contributions from all source nodes:

    $$ B(v) = \sum\limits_{u\in U} \delta_{u\bullet}(v) $$
    (13)

    It is worth noting that we can preprocess the shortest path for every pair of vertices. Subsequently, the betweenness of every node can be computed according to the preprocessed path.

    Recall that our aim is to extract features with respect to two users connected by a social edge directly. For a given edge \(e(u,v)\), we have \(f_{3}=B(u)\) and \(f_{4}=B(v)\).

  3. 3)

    User behaviors in LBSNs. To predict the score of social trust, we rely on basic intuition. That is, two users who often visit certain venues together in real life may have a high social trust. In most LBSNs, such venue-visiting behavior is denoted as a check-in. Therefore, we gather statistics of common check-ins (generated by the two endpoints of an edge) as the features of this social edge, such as the time of the generated check-ins. In addition, the attributes of venues, such as categories, will be also extracted as features of corresponding edges.

Table 1 Features in user profile class

We extract the five most visited venues from the common check-in lists of two users. That is, we have five features (f5-f9) in this class. Each feature \(f_{i}\) can be formulated as a vector \(f_{i}=<C_{t}imes, V_{c}at>\), where \(5\leq i \leq 9\), \(C_{t}imes\) denotes the common check-in times, and \(V_{c}at\) denotes the category of the visited venue.

Trust Prediction

After the feature construction, STP-ELM uses the above three classes of features to learn ELM-based social trust regression models. As mentioned above, STP-ELM leverages the ensemble ELM mechanism to further improve the accuracy of social trust evaluation.

Specifically, STP-ELM learns k regression models. We obtain partial edges with trust scores between the two endpoints via a crowdsourcing strategy and use these edges as the training data. That is, we use the three kinds of features as input and manually annotate the social trust score of two users with a direct social edge connection.

Finally, the social trust score of two users is the average value of the social trust value learned from the ensemble of k ELM-based regression models.

Case 2. No Direct Edge Connection

We leverage the learned social trust score in Case 1 to infer the social trust value of two users without a direct edge connection. Following the existing work [42], social trust can be propagated through the network. That is, two users are more likely to trust each other if they have a common trusted peer.

As shown in Fig. 5, for any two users(u and v) without a direct edge connection in the underlying social network, we obtain the shortest social path between u and v, denoted \(\text {path}(u,v)=\{(u=u_{0},u_{1}),(u_{1},u_{2}),\cdots ,(u_{h-1},u_{h}=v)\}\). Given the social trust value \(t(u_{i},u_{i + 1})\) of two users in each edge of this path, which would be learned in Case 1, we obtain

$$ t(u,v)={\Pi}_{(u_{i},u_{i + 1})\in path(u,v)}t(u_{i},u_{i + 1}). $$
(14)
Fig. 5
figure 5

The process of trust propagation

Route Credibility Estimation

As mentioned in “Problem Statement”, Eqs. 9 and 10 show how to calculate the popularity score and the credibility score, respectively, of a venue for a user, respectively. One important factor is the number of check-ins of a user at a venue. However, the number of social network users increases rapidly, as does the scale of check-in data. It is time-consuming to obtain the check-in information for a venue and the category of a venue by searching the check-in data directly. Inspired by the study [48], we design a category-oracle inverted index to quickly retrieve the visiting times of a venue by a user, which can be used to accelerate the processing of route credibility estimation.

Each check-in record has at least three items: the check-in user, the venue id, and the category of the venue. Figure 6a is an example of check-in data in location-based social networks of Fig. 2. For instance, user \(u_{1}\) has visited venue \(l_{1}\) once and \(l_{2}\) twice, while the categories of the two venues are pub and hotel, respectively.

Fig. 6
figure 6

An example of the category-oracle inverted index

The venues are classified by their categories; each category contains several venues in the road network. The counts of times each venue was visited by various users are summarized as one of the terms of our category-oracle inverted index.

The structure of the category-oracle inverted index is shown in Fig. 6b. To summarize, a category-oracle inverted index has four primary items:

  • Category. The first item in our index is the venue category, which can be used to obtain the venue category of a check-in record directly.

  • Venue id. Venue id is the only identifier of a venue, which can be used to obtain more information about the venue.

  • Check-in user. This item indicates the user in a check-in record.

  • Check-in times. In real life, a venue may be visited by the same user multiple times. The check-in times can reveal the degree of interest of a user in the venue. Thus, we use this item to record the number of times a venue was visited by the same user.

Considering the check-in data in Fig. 6a as an example, there are four categories: pub, hotel, cafe, and airport. Figure 6b shows the category-oracle inverted index, which can illustrate the check-in information intuitively. For instance, there are two hotels, i.e., \(l_{2}\) and \(l_{8}\), where \(l_{2}\) has been visited by user \(u_{1}\) twice and \(l_{8}\) has been visited by user \(u_{2}\) once.

A personalized route consists of a series of venues, whose category set should cover the query category set. The intuition for estimating the credibility of a route is that we first group venues in a route by category and subsequently select the groups that match the query category set \({\Phi }\). The sum of the highest credibility scores of venues in each of these selected groups can be regarded as the credibility score of a route. Here, we first describe the calculation of the credibility score of a venue and subsequently discuss the estimation of the route credibility score.

Computing the Credibility Score of a Venue

As mentioned in “Social Trust Evaluation”, the social trust between two users t(u, v) has already been evaluated. The process of calculating the credibility score between a user and a venue is described in Algorithm 1. We first compute the social similarity of any user pair by Eq. 8 and subsequently calculate the popularity score between each user and \(l_{i}\). It is worth noting that the popularity score reflects the interest score of a user in venue \(l_{i}\). By incorporating the social trust, we compute the credibility score of \(l_{i}\) with respect to v by Eq. 10, which incorporates the interest score and the social trust.

figure a

Computing the Route Score

In our query model described in “Problem Statement”, Φ denotes the set of categories specified by the query user. For a route \(R=\{l_{1},\cdots ,l_{m}\}\), we denote the category covered by R as \(RC=\{l_{1}.c,\cdots ,l_{m}.c\}\). We prune the route R without computing the route score if \({\Phi }\not \subset RC\). That is, we only compute the route score if the route can cover the query category set; thus, in the remainder of this subsection, we discuss this case by default.

Algorithm 2 illustrates the procedure of estimating the route credibility score. First, we group the venues in a route by category. For each group, if the category of venues belongs to \({\Phi }\), we calculate the credibility score of each venue in this group with respect to user v, and the highest credibility score in this group is added to the route score.

figure b

Example 2 shows a running instance of how Algorithm 2 operates.

Example 2

Suppose that the venue category set \({\Phi }\) is \(\{\varphi _{2},\varphi _{3}\}\) specified by user v in an STPRQ . As shown in Fig. 7a, a sample route R from s to t is \(\{l_{1},l_{2},l_{3},l_{4},l_{5},l_{6},l_{7},\) \(l_{8}\}\). The category set covered by R is \(RC=\{\varphi _{1},\varphi _{2},\varphi _{3},\varphi _{4}\}\). Next, we group the venues in R by category. The groups with the categories belonging to \({\Phi }\) are selected. Subsequently, for each venue in the selected groups, we calculate the credibility score. We sum the highest venue credibility scores in each selected group to obtain the credibility score of the route. As shown in Fig. 7b, assuming that the venue credibility score of each venue in the \(\varphi _{2}\) and \(\varphi _{3}\) groups has already been calculated, we obtain the credibility score \(RS(v,R)= 0.67 + 0.46 = 1.13\).

Fig. 7
figure 7

An example of route credibility score evaluation

Optimal Route Query

As the route score for the query user can be computed and a category-oracle inverted index has been constructed, we are now ready to present our proposed optimal route query algorithm called RouteHunter. The notations used throughout “Optimal Route Query” are listed in Table 2.

Table 2 Main notations

The primary idea of RouteHunter is that we evaluate the credibility score of route R if and only if the venue categories covered by R can contain the query category set \({\Phi }\). Furthermore, for a given route R, we only need to calculate the credibility score of a venue if its category belongs to \({\Phi }\). That is, assuming that the query user arrives at venue \(l_{i}\) from the starting venue, we calculate the credibility score of \(l_{i}\) for the user if and only if the category of \(l_{i}\) matches one of the categories specified by the user, i.e., \(l_{i}.\varphi \in {\Phi }\).

In the route extension phase, we traverse the neighborhoods of a venue; a neighbor \(l_{j}\) can be regarded as a candidate only if the sum of the distance from the start location s to the current location \(l_{i}\), \(\omega (e(l_{i},l_{j}))\), and the shortest distance from \(l_{j}\) to the destination \(l_{d}\) is less than the distance constraint \({\Delta }\). If the sum exceeds \({\Delta }\), the route from s to the neighbor \(l_{j}\) does not satisfy the requirement of the query user. By applying this approach, many venues can be pruned. It is worth noting that we precompute the shortest distance between each venue and s(and t) in the underlying road network.

As shown in Algorithm 3, for a query \(q=<q_{u},s,t,{\Phi },{\Delta }>\), RouteHunter first initializes the parameters (lines 1 and 2) and assigns s as the first element of a min-priority queue Q (line 3) used to maintain the candidate venues satisfying the distance constraint. Next, we select the optimal venue by the \(\textbf {Q.Ranking}\) function in line 5 that ranks venues in the following order: we denote \(l_{k} \prec l_{t}\) if and only if \(|p(l_{s}\rightarrow l_{k}).\varphi | > |p(l_{s}\rightarrow l_{t}).\varphi |\) or [\(|p(l_{s}\rightarrow l_{k}).\varphi |\) = \(|p(l_{s}\rightarrow l_{t}).\varphi |\) and \(RS(u,p(l_{s}\rightarrow )) \geq \) \(RS(u,p(l_{s}\rightarrow l_{t}))\)] or [\(|p(l_{s}\rightarrow l_{k}).\varphi |\) = \(|p(l_{s}\rightarrow l_{t}).\varphi |\) and \(RS(u,p(l_{s}\rightarrow l_{k}))\) = \(RS(u,p(l_{s}\rightarrow l_{t}))\) and \(Dist(p(l_{s}\rightarrow l_{k})) < Dist(p(l_{s}\rightarrow l_{t}))\)]. Subsequently, we extend the selected venue by applying the category cover strategy, i.e., we prefer to extend the venue with category belonging to the query category set (lines 6–8). In each extension, we compare the distance lower bound of the current partial route to the query distance constraint (lines 15–19). When the destination t emerges in a route T, we check whether this route satisfies the query category constraint and the distance constraint and subsequently evaluate the route score and update the optimal route obtained thus far (lines 9–14).

figure c

Example 3

Considering the location-based social networks in Fig. 2 as an example, the category-oracle inverted index are shown in Fig. 6. Assuming a query \(q=<u_{1}, l_{1}, l_{9}, {\Phi } = \{\varphi _{3}, \varphi _{5}\}, 15>\), the detailed steps of RouteHunter are shown in Fig. 8. We first extend the starting location \(l_{1}\); locations \(l_{2}\) and \(l_{3}\) are the neighbors of \(l_{1}\). For each neighbor \(l_{i}\), we examine the sum of \(w(l_{1},l_{i})\) and the shortest distance from \(l_{i}\) to the destination \(l_{t}\) that we have obtained using the well-known Floyd algorithm [10]. If the sum exceeds the distance constraint \({\Delta }\), we can prune this partial route from \(l_{1}\) to \(l_{i}\), as the lower bound of such a partial route with respect to distance has violated constraint \({\Delta }\).

Fig. 8
figure 8

An example of how RouteHunter operates

In this example, as the distance lower bound for both partial routes from \(l_{1}\) to \(l_{2}\) (or \(l_{3}\)) satisfies the distance constraint, we put both of them into min-priority queue Q, calculate the interest score for all locations in Q (we rank them by the Rank function). After each iteration, we record the categories covered by the respective partial routes. As the ranking order of \(l_{3}\) is superior to that of \(l_{2}\), we extend \(l_{3}\) in the second iteration. The locations \(l_{4}\) and \(l_{5}\) can be reached via \(l_{3}\); as the distance of the partial route \(\{l_{1}, l_{3}, l_{4}\}\) is 9 and the shortest distance from \(l_{4}\) to the destination \(l_{9}\) is 8, the sum is 17, exceeding the distance constraint 15; hence, this partial route would be pruned. In Fig. 8b, \(l_{4}\) is marked by a yellow color. In contrast, we put \(l_{5}\) into the priority queue Q, calculate the interest score and rank those locations in Q. After that, we check the categories covered by the least partial route to determine whether these categories contain the categories specified by the query user, i.e., \({\Phi } = \{\varphi _{3}, \varphi _{5}\}\). If the answer is positive, this means that a feasible solution has been obtained. For instance, in Fig. 8c, location \(l_{5}\) meets the constraint of the lower bound being less than 15, and the categories covered by the partial route contain Φ = {φ3, φ5}. Subsequently, we continue to extend the location with the greatest priority in Q until Q becomes empty. In this example, we extend location \(l_{2}\) to \(l_{4}\), checking that the distance of the partial route \(\{l_{1}, l_{2}, l_{4}\}\) is 8, and the distance lower bound of this partial route exceeds the constraint 15; hence, we terminate the \(RouteHunter\) algorithm. The partial route \(\{l_{1}, l_{3}, l_{5}\}\) merged with the shortest path from \(l_{5}\) to \(l_{9}\) is the query answer to be returned.

Experiments

In this section, we experimentally evaluate the effectiveness and efficiency of our proposed algorithms for the STPRQ problem. We perform a series of sensitivity tests to study the impact of query parameters with real social graph datasets. In the following, we first describe the experimental settings and subsequently analyze the experimental results.

Experimental Settings

Dataset

We use three real-world datasets, namely, Gowalla, Brightkite and Foursquare. Table 2 provides the details.

  • Gowalla. The Gowalla dataset, available at the Stanford Large Network Dataset CollectionFootnote 1, contains data generated worldwide from February 2009 to October 2010. This dataset contains 6,442,892 check-ins generated by 196,591 users at 1,280,969 venues. Additionally, there are 950,327 social edges and 1,356,399 road network edges.

  • Brightkite. The Brightkite dataset is available at the Stanford Large Network Dataset Collection3. Brightkite was once a location-based social networking service provider with users sharing their locations by checking-in. This dataset contains 4,491,143 check-ins generated by 58,228 users at 772,789 venues. Additionally, there are 214,078 social edges and 1,025,364 road network edges.

  • Foursquare. We crawl the Foursquare dataset via the Foursquare APIFootnote 2 from November 2014 to January 2016. This dataset has 76,503 users and 1,531,357 social edges. For each user, we collect the check-ins during this period of time. Each check-in is a pair of user and venue. In total, we obtained 299,995 venues located in Singapore and 969,549 check-ins. Each venue has its category type description, such as museum, bar, and hotel. In addition, we leverage the OpenStreetMapFootnote 3 to obtain a road network of Singapore. It is worth noting that we regard the venues as the vertices of the road network. Thus, the number of road network edges is 416,723.

Comparative Analysis

In this paper, we conduct three sets of experiments. First, we test the performance of our proposed social trust regression analysis method based on extreme learning machine. Second, we test the performance of our proposed index, including the effectiveness of our index, the index construction time and the index space cost. Third, we provide an overall evaluation of our proposed algorithms, including the route query precision analysis and the effect of parameter Φ.

Experiment 1: Evaluation of Social Trust Regression Analysis

As discussed in “Social Trust Evaluation”, a plurality voting method using multiple regression models is used to increase the social trust prediction accuracy. To test the performance of STP-ELM (illustrated in “Social Trust Evaluation”), we consider ensemble SVM (support vector machine) [5, 17], denoted EN-SVM, as the comparison algorithm, which uses multiple SVM regression models to obtain a better predictive performance. We set 40 as the default value of the number of regression models in both EN-SVM and STP-ELM.

Table 3 shows the accuracy rate of EN-SVM and STP-ELM applied to the three datasets. Table 4 shows the efficiency of EN-SVM and STP-ELM. The above two sets of experiments show that our proposed STP-ELM algorithm can obtain a comparable accuracy yet needs less training time and testing time than EN-SVM.

Table 3 Accuracy evaluation for EN-SVM and STP-ELM
Table 4 Performance evaluation for EN-SVM and STP-ELM

In Fig. 9, the effect of the number of used regression models is investigated. The testing accuracy varies similarly for the three datasets. It can be observed that the accuracy of the STP-ELM algorithm increases with the number of regression models. The reason is that the accuracy of social trust regression analysis can be improved by increasing the number of regression models. For the number of regression models ranging from 40 to 80, the accuracy remains relatively stable. Therefore, we set 40 as the default value of the number of regression models in the following experiments.

Fig. 9
figure 9

Testing accuracy vs. the number of regression models

Experiment 2: Evaluation of the Index

In this subsection, we first test the effectiveness of our proposed index. As mentioned in “Optimal Route Query”, our optimal route query algorithm, RouteHunter, leverages the constraints specified by the query user to reduce the search space, and the heuristic extends from the starting venue. We consider RouteHunter without our index as the baseline and denote by “RouteHunter+index” the algorithm version with the index described in “Route Credibility Estimation”.

In this set of experiments, we set the distance constraint \({\Delta }\) to be 7, randomly select the staring and ending venues, and assign the size of the venue category set to be 3. As shown in Fig. 10, the average runtime of 50 queries for algorithm “RouteHunter+index” is less than that of the route query algorithm without the index (RouteHunter). It can be concluded that our proposed index can speed up the query processing.

Fig. 10
figure 10

Effectiveness of the index

Next, we examine the index construction time and index space cost. The results are shown in Table 5. It can be observed that the index construction for the Gowalla datasets needs much more time and space than that for the Foursquare datasets. The reason is that our proposed index is a venue category-aware inverted file implying that each term in our index is a check-in record. The index construction time and space cost are proportional to the size of check-in data.

Table 5 Index construction time and memory cost

Experiment 3: Performance of the Query Algorithm

In this paper, to the best of our knowledge, we are the first to study the personalized route query by incorporating social trust. Our proposed problem is different from the problems in the existing studies. Additionally, the previous methods related to personalized route query are not suitable for tackling our proposed problem. Thus, we implement a baseline algorithm that only considers social similarity instead of social trust, called similarity-based personalized route query (Sim-PRQ). It is worth noting that our proposed approach, STOTS, not only considers the user similarity but also incorporates the social trust into personalized route query. By comparing it to Sim-PRQ, we will evaluate the performance of STOTS in this subsection.

Efficiency of Methods

In this set of experiments, we compare our STOTS approach to the baseline method (Sim-PRQ) by considering the run time under various values of the travel distance constraint Δ. Note that both methods are able to obtain the optimal solution. Figure 11 shows that the run time of both methods increases quickly with \({\Delta }\). The reason is that a query with a larger travel distance constraint \({\Delta }\) is likely to have more potential routes.

Fig. 11
figure 11

Query time

Effectiveness of Methods

Note that any route can be represented by a sequence of venues (POIs). From user history check-in data, we extract successive check-in behaviors that can provide venue sequences. We consider such venue sequences as route test sequences. For all datasets, we extract 1000 test sequences with the length of six. For each venue sequence with length six, the size of the category set in a generated query is four after subtracting the starting venue and the destination. We measure the difference between the routes generated by our algorithms and the test sequences to evaluate the route query accuracy. Thus, the edit distance is applied as the evaluation metric, as it measures the distance between two sequences in terms of the minimum number of edit operations required to transform one sequence into the other [25, 30].

Table 6 shows the performance of the two compared approaches, i.e., Sim-PRQ and STOTS, for all three datasets. The results show that our proposed approach STOTS attains a higher accuracy (i.e., a lower edit distance) than Sim-PRQ. The reason is that STOTS not only considers the user similarity but also incorporates the social trust into personalized route query. In contrast, the Sim-PRQ approach only considers the user similarity.

Table 6 Average edit distance

Effect of Parameter Φ

In this set of experiments, we study the accuracy and the run time of our STOTS algorithm under various sizes of the query category set \({\Phi }\). Figure 12 shows the results. In our experience, the size of the category set specified by query users is usually small. We set \(|{\Phi }|\) to values of 1, 3, 5, and 7. We observe that the run time of the STOTS algorithm increases quickly with \(|{\Phi }|\). The reason is that a query with a larger \(|{\Phi }|\) will cause our algorithm to converge more slowly. We also observe that the edit distance increases with \(|{\Phi }|\), indicating that the accuracy decreases with the increase in \(|{\Phi }|\). A larger \(|{\Phi }|\) will introduce more inexact factors into the query result, reducing the accuracy.

Fig. 12
figure 12

Effect of parameter |Φ|

Conclusion

Personalized route query is an important topic due to the growing popularity of location-based social networks and related applications on mobile devices. However, the existing studies did not adequately explore the trust between users in the context of personalized route query. In this paper, we propose a novel personalized route query considering the factor of social trust. We note three major challenges in solving such personalized route query incorporating social trust (STPRQ), namely, evaluating social trust, incorporating social trust, and finding the required route. Therefore, we propose the social trust-based optimal trip selection (STOTS) framework that uses extreme learning machine (ELM) to evaluate social trust, applies a rank model to incorporate social trust, and includes an algorithm to find the required route. Experimental results on a real-world dataset encouragingly demonstrate the efficiency and effectiveness of our proposed approach.

In our future work, we plan to study the evaluation of social trust in location-based social networks. Another direction of our future research is to seek other approximate algorithms for solving this new problem.