Keywords

1 Introduction

Location-based social networks (LBSNs) enable users can share location-based information with their friends. Numerous studies have been conducted on the use of LBSNs for the discovery of popular attractions, travel planing and tour recommendations [2, 4, 6, 12]. However, most of these studies have focused on mining movement patterns from crowds in LBSNs, and largely disregarding the potential impact of social influence hidden in LBSN. Social influence refers to situations in which a group of people influence individuals within the group in their decision making based on their interdependence or cohesion with the group. Online social networks offer a rich forum for observing social interactions. Social influence analysis has considerable potential in fields such as marketing and recommendation system.

According to the Trust In Advertising report in 2015 from NielsenFootnote 1, Recommendations from people I know have the greatest influence on consumers, with 83 percent of global average. In other words, recommendations from specific individuals, such as idols or friends with similar hobbies, may attract individuals to locations that are largely ignored by the general public. For example, many stores and restaurants now provide discounts to people who “like” them or check in on Yelp or Facebook. Also, when searching for travel tips, the opinion from a friend may be more convincing than rankings on an official website. However, recognizing members with the greatest influence on an individual can be a challenging problem for those users with a large friend base.

Many researchers have adopted social factors (the similarity of visited POIs between individuals and their friends) as weighted factors in recommendation systems [1, 2, 6, 10, 11]. They concluded that the social factors have far less impact than other factors such as geographic distance and user interest. In contrast, the researchers in [9] considered social influence from the viewpoint of the “users” rather than the “POIs”, and found that (1) social relationships may differ in the degree of influence with regard to an individual’s decisions; and (2) the influence is not necessarily generated directly by friends but may originate with the friends of friends. This is referred to as the directionality and transition of social influence.

This study developed an innovative framework for social influence mining on location-based social networks. We consider the fact that a person is influenced by her friends in her choices of where to visit, and weight the factors affecting this influence in term of space, time and POI categories. Using the information available through LBSNs, the proposed framework, ST-SIM, is used to mine the top-k influential users based on a user’s query related to a specific geospatial region.

In summary, the contributions of this paper are four-fold:

  • We propose a novel Spatio-temporal Social Influence Mining framework (ST-SIM) to identify influential users in an LBSN. The model captures the interaction among the social network, physical location and the effects of time to quantify the influence among user pairs.

  • We define the spatio-temporal social follow relationship to formulate the spatio-temporal social influence on user behavior. Building on our empirical findings, ST-SIM use spatial and temporal features in order to quantify each connection between user pairs according to the probability of one user following the other’s lead. We model the social influence over the network in terms of information propagation based on heat diffusion model.

  • Considering the diversity of an individual’s location interest as well as the impact of social effect, a dynamic weight tuning method is presented. Social effect and self effect are used in the computation of unified followship probability scores for top-k user recommendation.

  • We conducted empirical experiments on real-world LBSN datasets to evaluate the effectiveness of ST-SIM framework.

2 Problem Formulation

Location-based social networks provide a platform on which the location of a particular user and the time of activities are recorded and shared. This means that a user is able to use online social network website/application to share her real-world mobility.

Fig. 1.
figure 1

An example of a heterogeneous graph, that captures user-user virtual community, user-POI mobility activities and time effects in an LBSN.

A location-based social network can be structured as a heterogeneous graph (HG) with multiple types of nodes, edges, static attributes and dynamic, interconnected activities (see Fig. 1). We characterize LBSNs according to three aspects: (1) a social layer S comprising nodes representing the members \(u\in U\) of the service and edges showing their friendship links, (2) a location layer L containing all the POIs \(p\in P\)that have been visited and (3) a set of check-in activities C which connects the social layer and the location layer; a check-in activity c(upt) represents a user u visits a location p at time t.

Definition 1

Spatio-temporal social influence: Social influence refers to the effect of implicit recommendations obtained on social network. The closer the relationship between two users is, the more effective the recommendation is in influencing the user. This study focused on the spatio-temporal social influence of LBSNs; i.e., if user \(u_i\) is influenced socially by \(u_j\), then \(u_i\) will tend to visit a POI in accordance with the recommendation obtained from \(u_j\). This reveals a relationship in which \(u_i\) checks in to the same POI after \(u_j\) shares her own check-in.

Definition 2

Spatio-temporal social follow relationship: A spatio-temporal social follow relationship, hereafter denoted as followship, represents a directed link from user \(u_i\) to her friend \(u_j\) iff \(u_i\) visits a location that previously visited by \(u_j\).

Formally, a followship exists under the following conditions:

$$\begin{aligned} \begin{aligned}&followship(c(u_i,l,t),u_j,\delta ) \\&= {\left\{ \begin{array}{ll} true &{} \exists t':c(u_j,l,t') \bigwedge \delta =t-t'>0 \\ false &{} otherwise. \end{array}\right. } \end{aligned} \end{aligned}$$
(1)

where \(\delta \) represents a valid time period. We can also define \(u_i\) as a follower and \(u_j\) as influencer.

Definition 3

ST-social strength: ST-social strength is defined as the quantitative measure of the influence of check-in histories, which is directed and varies with distance and time. The ST-social strength of how user \(u_j\) influence user \(u_i\) is abbreviated as \(s_{ij}\). Note that friendship \(f_{ij}\) is undirected and ST-social strength \(s_{ij}\) is directed; and the members of the two sets are not necessarily equivalent.

Problem Definition

Spatial-temporal Social Influence Mining: Using heterogeneous graph HG, the problem of social influence mining on LBSN with spatial and temporal factors (ST-SIM) involves inferring ST-social strength \(s_{ij}\) for any two users \(u_i\) and \(u_j\) according to the characteristics of their movements, i.e., whether they exhibit a followship.

Based on the inference, the k users with the greatest influence on each user \(u_i\) are identified. The result cam be personalized using optional queries associated with geospatial region or user preference.

3 Spatio-Temporal Social Follow Relationship and User Mobility

Since social influence has been verified in [9], in this section, we characterize the spatio-temporal social follow relationship by examining the influence of spatial and temporal features on user mobility.

3.1 Dataset Description

This study used the four real-world LBSN datasets listed in Table 1. The FB dataset is collected by the Facebook APIFootnote 2. We used the Facebook accounts of 96 volunteers as seeds (most of the users live in Taiwan). Once a user allows us to use the private information, we obtained details related to the location of all of the user’s friends via check-ins and geo-tagged photos for the period of Jan. 2012 - Dec. 2014. For example, one user may have 300 friends. Then from this user we can create 301 user nodes and all the related locations as POI nodes. The GWL dataset [3]Footnote 3, FS [5, 8] and FS-CA [12] are check-in datasets within an undirected friendship network. Note that GWL and FS are larger but lack information related to POI categories.

3.2 Effects of Spatial and Temporal Features

We then sought to identify the factors that determine how much influence each followship has on the selected users. A number of assumptions were made prior to observation:

Assumption 1:

The check-in behavior of users at times closer to the target time are more relevant, and thus more important with regard to their effectiveness as recommendations [12].

Assumption 2:

Users tend to visit their nearby POIs [10].

Assumption 3:

POI characteristics should be taken into consideration. Hot spots, such as train stations and shopping malls, are very popular and therefore more likely to result in followship [7].

Table 1. Details of the Heterogeneous Social Networks

To deal with Assumption 1, we measured the length of time that individuals maintain followships. Figure 2(a) plots the number of followships as a function of time for FB, GWL, FS and FS-CA. It was observed that the distribution corresponds to a power law with periodic peaks for each week. The distribution decays faster after the first week. Another interesting observation is that the larger the dataset (GWL > FS > FB > FS-CA), the flatter the distribution. Nonetheless, the periodic peaks are similar in all datasets.

Fig. 2.
figure 2

Distribution of (a) time period, (b) distance and (c) POI characteristics of spatio-temporal social follow relationships.

Travel distance is considered to be the distance between the hometown of user and the target location (Assumption 2). One’s hometown information is not explicitly given; therefore, we infer this as the location associated with the most frequent check-in events [3]. As shown in Fig. 2(b), we calculated the distribution of distances between the hometown of friends and where the followship events took place. However, the distribution was shown to vary greatly between the datasets, due differences in population cluster size among countries. We can find that the probability of FB approaches zero when the distance over \(10^4\) km, while the distance of other three datasets are farther (\(10^5\) km).

Figure 2(c) illustrates the frequency of followship events by ratio and the entropy using the user frequency of POI categories, respectively. For example, the type of “restaurant” was shown to have the highest frequency, representing that the visiting activities at restaurants are socially influential. However, the “airport” category also has high followship frequency but with high user entropy. We deduce that the location is popular and the followship events may happen by coincidence.

Finally, we can make the following observed conclusions:

Observation 1:

Individuals are more likely to visit the same place after friends with whom they have recent followships. This trend decays exponentially with time.

Observation 2:

Most users tend to visit nearby POIs; however, in cases where an individual follows another user of a POI located at a long distance, then the leader may have stronger social influence.

Observation 3:

POIs with high user entropy are considered hot spots. In other words, followship events associated with hot spots are considered less influential.

These three observations conclude three weighting features of the importance of each followship event, spatial, temporal and POI entropy factors, which will be applied in the computation of ST-social strength in our ST-SIM framework.

4 ST-SIM Model

This section describes the process of quantifying the social influence on LBSNs in terms of the ST-SIM model. A heterogeneous graph HG = (S,C,L) was built using raw LBSN records in order to extract the interactions between user nodes and location nodes; i.e., followship events. In Sect. 4.1, we began by utilizing followship events as the main contribution to ST-social strength. To measure the importance of followship events, we modeled the background features into two classes: (1) personal background in the view of each individual user for different locations, and (2) global background in the view of all the users that has visited each locations. Moreover, we have already observed that the importance decays over time and may propagate from strangers. Thus, in Sect. 4.2, we developed a diffusion-based model to simulate the propagation of influence. Finally, the measure of ST-social strength is based on the interaction between the two users (inter factor) and the similarity of individual’s preference on POI category (intra factor).

4.1 Background Featurization

Personal Background. The personal background models the individual’s preference to be influenced. Users tend to frequent some locations more than others based on the specific meaning they have for the user. Thus, it is important to look into this user’s location history in order to determine how different locations affect the followship of users. Using the observation in Sect. 3, we extracted two factors for the modeling of personal background.

The temporal feature considers the time difference \(\varDelta t\) of the followship event, which decays exponentially over time (Observation 1). \(f_t = exp(-\varDelta t)\).

The spatial feature considers the distance from user’s hometown to the location, and the probability of followship within the distance (Observation 2). \({f_s = \frac{1}{d(l_u,l)}\times P_d(d(l_u,l))}\), where \({d(l_u,l)}\) represents the distance from user u’s hometown \(l_u\) to location l, and \(P_d\) is the probability of distance distribution as shown in Fig. 2(b).

Global Background. It was also noted that the aggregation of location histories obtained from all of the users exhibited different characteristics. The global background captures the popularity of specific locations, as inferred from all of the users. Followship events in popular locations such as train stations are often less indicative of the strength of mobility relationship. Conversely, two individuals could be expected to have a strong relationship in less popular locations (Observation 3).

To model the popularity of a place, POI entropy is given by Shannon entropy, as follows: \({H_l= -\sum _{u,P_{u,l}\ne 0}{P_{u,l}log(P_{u,l})}}\), where \(P_{u,l}\) is the probability that user u has visited location l. A high value for POI entropy indicates that a location is visited by many different users.

4.2 Diffusion-Based Influence Model

The process of exerting social influence can be seen as a specific type of information diffusion. By illustrating the physical diffusion of heat, a member in a social network can be seen to act as a heat source diffusing influence to friends via shared activities such as check-in events. Through these friends, the influence gradually propagates. At a certain time point, influence is diffused to the margin of the social network, whereupon complete strangers may be affected.

Spatio-Temporal Social Influence Propagation. As mentioned previously, this study focused on followship events rather than simple friendships. Simple social network is insufficient to capture the effects of social influence or its propagation among users. We have defined a novel followship graph \({G_F}\) to represent the possibility that an individual may visit a location because she is influenced by her friends. \({G_F=(U,E_F)}\), where V is the set of users and \(E_F\) is the set of spatio-temporal follow relationships among users in U.

Via the followships in \(E_F\), social influence may propagate among the users within \(G_F\). Formally, we define \({p_{ij}=\frac{n_{ij}}{\sqrt{n_i}\sqrt{n_j}}}\) as the probability of influence moving from \({u_i}\) to \({u_j}\); where \({n_{ij}}\) denotes the followship from \({u_i}\) to \({u_j}\) and \({n_i}\) denotes the total number of locations \({u_i}\) has visited. Let us assume that user \({u_i \in V}\) is only influenced by herself initially, whereupon influence propagates to others in \(G_F\).

The influence-based diffusion model two key parameters: (1) initial state probability for each followship event; (2) state transition probability from the influencer to the follower. During the process of propagation, users receive stimulation from their neighbors. Let vector s(t) denote the proportion of the social influence score of users in V at time t. The change at \({u_i}\) between time \({t+\varDelta t}\) can be defined by applying the following equation to the diffusion model:

$$\begin{aligned} \frac{s(t+\varDelta t)-s(t)}{\varDelta t}=\alpha Inf s(t) \end{aligned}$$
(2)

where \({\alpha }\) is the propagation coefficient and Inf is a \({N_{G_F}\times N_{G_F}}\) matrix used to define the one-hop process of information diffusion (Fig. 3).

$$\begin{aligned} Inf_{ij}= {\left\{ \begin{array}{ll} p_{ij}&{} {(u_i,u_j) \in E_F}\\ -\tau _i&{} {i = j}\\ 0&{} otherwise. \end{array}\right. } \end{aligned}$$
(3)

where \({\tau _i}\) denotes the amount of influence diffused from \({u_i}\) via external links, such that \({\tau _i}\) = 0 if \({u_i}\) does not have any neighbors, otherwise, \({\tau _i=\sum _{(u_i,u_j)\in E_F,i \ne j}{p_{ij}}}\).

Using Eq. 2, we obtain the following differential equation when \({\varDelta t \rightarrow 0}\):

$$\begin{aligned} \begin{aligned} \frac{ds(t)}{dt}=\alpha Inf s(t), s(t)=e^{\alpha tI}s(0) \end{aligned} \end{aligned}$$
(4)
Fig. 3.
figure 3

An example of valid influence propagation among four users. The nodes with frame indicate the occurrence of spatio-temporal social follow relationships and the number in nodes indicate the followship weight.

4.3 Spatio-Temporal Social Strength

Let \(s_{ij}\) denote the spatio-temporal social strength (ST-social strength) of user \(u_j\) for query user \(u_i\) in region r; i.e., the likelihood of \(u_i\) maintaining a followship with \(u_j\) proportional to the value of \(s_{ij}\). We intuitively take \(s_{ij}\) as the sum of the influences of others and one’s own interests (influenced by herself), which are denoted as \(s_{inter}\) and \(s_{intra}\) respectively. \(s_{inter}\) and \(s_{intra}\) are two weighting parameters (0 \(\le \) \(s_{inter}\) + \(s_{intra}\) \(\le \) 1). Here \(s_{inter}\) = 1 refers to the case where \(s_{ij}\) depends entirely on the prediction based on the social effect of \(u_j\), while \(s_{intra}\) = 1 refers to the case where \(s_{ij}\) is based only on user interests. If we want to combine these two measures to produce an overall value for ST-social strength, it is necessary to determine the relative importance of each component-measure to ST-social strength.

Applying the above diffusion process to the follow graph, we obtain results that can be utilized in a dynamic weighting mechanism. \(s_{ij}\) represents the likelihood of a followship event by \(u_j\) to \(u_i\), which fits the characteristic of social effect. In the case of user \(u_i\), as the \(\textit{inter factor}\) from any user \(u_j,j \ne i\) represents the tendency of how \(u_i\) follows \(u_j\), while the \(\textit{intra factor}\) represents \(u_i\)’s own interests, in other words, how \(u_i\) follows herself.

Further, while the \(s_{intra}\) represents \(u_i\)’s own interests, \(\textit{intra factor}\) should increase when \(u_j\) and \(u_i\) have similar preferences. The similarity is simply defined as the cosine similarity to weight the \(\textit{intra factor}\) for different user pairs.

The unified geo-social strength can be revised as follows:

$$\begin{aligned} s_{ij}= {\left\{ \begin{array}{ll} s_{inter}+s_{intra}\times (\sum _p^m{w_{ij}^p}) &{} i \ne j \\ 0 &{} i = j \end{array}\right. } \end{aligned}$$
(5)

In the proposed ST-SIM framework, we consider followship events as new sources of influence in the follow graph. For each followship \(<(u_i,l,t),u_j,\varDelta t>\), \(s_{ij}\)(0) is initialized to the followship weight based on the background features mentioned in Sect. 4.1, which jointly cover the three features:

$$\begin{aligned} s_{ij}(0)&= H_l\times f_s\times f_t \nonumber \\&= -\sum _{u,P_{u,l}\ne 0}{P_{u,l}log(P_{u,l})} \times \frac{1}{d(l_i,l)}\times P_d(d(l_i,l)) \times exp(-\varDelta t) \end{aligned}$$
(6)

where time period \(\varDelta t\) = current timestamp − t.

5 Experimental Evaluation

In this section, we were particularly interested in the predictive performance of the ST-SIM framework; i.e., we sought to predict the set of users with the greatest influence on the travel behavior of an individual as accurately as possible.

5.1 Settings and Evaluation Methods

Experimental Setup. We employed the real-world FS-CA dataset described in Sect. 3.1. The data was ordered according to the creation time and then divided into two subsets, a training set and an evaluation set. The training set contained the first 70% of the check-in activities, whereas the evaluation set contained the remaining 30% of the data.

Performance Metrics. We use two popular measures to evaluate the performance of our techniques: average precision in overall results and MAP (Mean Average Precision) for ranked results. The definitions of the metrics are given as follows.

Precision@k is the fraction of the top-k users with influence over other users.

$$\begin{aligned} Precision@k=\frac{\#\text { influential users in~top-k results}}{k} \end{aligned}$$

MAP stands for the mean of the AP values of all queries. AP is defined as the average of the precision values for all relevant results of a single query.

$$\begin{aligned} AP=\frac{\sum _{i=1}^{k}{(Precision@i\times rel(i))}}{\#\text { influential users in~top-k results}} \end{aligned}$$

where Precision@i is the precision at cut-off i in the list, rel(i) is an indicator function equal to 1 if the item at rank i is a relevant ranking and otherwise zero.

5.2 Comparison Methods

In addition to ST-SIM, the recommendation approaches under evaluation are listed below.

Baseline1 - Order by public frequency: this approach represents the public’s trend by considering the top-k users with the most visiting counts in the query region.

Baseline2 - Order by following counts: this approach directly rates the users by the number of geo-social following relations. The result is confined to the friend circle.

Entropy-Based Model for Co-occurrence (EBM): this is one of the state-of-the-art model to infer social connection from LBSN [7]. EBM quantifies the strength of each social connection by considering the co-occurrences in the context of locations.

Only consider social effect (Inter): this is a special case of ST-SIM by setting the intra factor as zeros. In other words, only social effect from others is considered for recommendation.

Only consider self effect (Intra): this is also a special case of ST-SIM with the inter factor set to zeros. Only the user’s interests are considered in the recommendation.

5.3 Performance Evaluation

Tuning Propagation Coefficient. Although the self-tuning technique of ST-SIM properly assigns the parameters for weighting inter factor and intra factor, the diffusion model of ST-SIM uses two parameters: \(\alpha \) and t. Parameter \(\alpha \) controls the diffusion rate of our model and time t varies from 0 to 1.0. As time t = 0, the influence score is centralized in query user vertex. When t increases, more and more people are influenced by their neighbors. Similarly, the magnitude of \(\alpha \) represents how fast the influence diffuse. In this set of experiments, we want to examine how the propagation coefficient \(\alpha \) controls the rate of influence diffusion and find the optimal value for \(\alpha \) for the dataset.

Fig. 4.
figure 4

Ranking results for different \(\alpha \) value.

We set t = 1.0 in all our experiments and Fig. 4 (a) and (b) shows the results of the query users with top-200 check-in counts and top-200 followship counts. Note that the value change has small influence on the final order when \(\alpha t\) \(\le \) 5.0. But when \(\alpha t\) increases more, the performance decreases because of most of the influence scores diffuse out and muti-degree friends may have similar scores to first-degree friends. Finally, we choose \(\alpha \) = 1.0 in the following experiments.

Goodness of Prediction with Baseline Heuristics. Our goal in this experiment is to evaluate how well the geo-social strength from training set fits the observed strength from evaluation set (ground truth).

Fig. 5.
figure 5

Ranking results for different recommendation methods

Figure 5 depict the MAP and average of Precision@k results of the different recommendation methods at k = 3, 10, 30 under the following scenario: with the recommendation systems build from training set and given a member in LBSN, who we should choose as the top-k influential candidates and what is the performance according to the ground truth (stimulated by the individual’s future behaviors in testing set). Each figure corresponds to an approach. Generally, ST-SIM and Inter performs the best in terms of all metrics, and EBM performs better than the two baseline methods. These all perform better than Intra. Specify that Intra has the worst hit value might reflect that social influence is more influential than individual’s own preference.

6 Conclusion and Future Work

This paper presents a recommendation framework based on social influence (ST-SIM) to facilitate the identification of influential users in a location-based social network. We first built a heterogeneous graph to model the interaction between user-user pairs as well as user-category pairs. A diffusion-based influence model was also developed for the extraction of interactive features for user ranking. A dynamic weight tuning mechanism is included in the model to provide personalized recommendations for each user. We evaluated ST-SIM using real datasets of LBSN check-in logs. According to the experiment results, the proposed method provides recommendations that are more effective than many existing recommendation strategies.