1 Introduction

In recent years, we have witnessed the increased development of location-based social networking (LBSN) services, such as Foursquare, Twitter, Yelp and Sina Weibo. LBSNs allow users to explore their surrounding places through the sharing check-in data. The check-in data contains rich knowledge which can be used for various recommendation problems, e.g., Point-of-Interest (POI) recommendation. The task of POI recommendation is to provide personalized recommendations of places of interest to a user, and it returns his/her unvisited places that he/she may be interested. In this paper, we focus on POI recommendation.

Considerable research efforts [4, 12, 15, 20, 37] have been devoted to develop models for providing accurate locations. The existing POI recommendation methods can be categorized into memory-based and model-based systems. Memory-based systems [27, 34] compute the missing rates of user-item pairs from similar users (user-based systems [27, 34]) or similar items (item-based systems). Compared to memory-based systems, model-based systems [2, 4, 12, 14] learn a model based on the patterns recognized from the check-in data. Although successes, a widely acknowledged challenge of POI recommendation is data sparsity. The check-in data is very sparse due to a majority of users only visited a very small number of POIs. When we make personalized recommendation for users, the sparse data may cause low accuracy of recommendation system.

Many researchers exploit to use the context information, e.g., geographical coordinates of POIs, for solving the data sparse problem. Lian et al. [12] incorporated the spatial clustering phenomenon [27] into the weighted matrix factorization model and proposed activity area vectors of users and influence area vector of POIs to augment user’s and POI’s latent factors in matrix factorization model. Li et al. [15] proposed a ranking based geographical factorization model (Rank-GeoFM) for addressing the data sparsity. Their ranking model used both the with and without check-in data, thus the data sparsity can be alleviated. Zhang et al. [37] proposed to exploit the geographical correlations, social correlations and categorical correlations among users and POIs. Wang et al. [20] utilized multi-fold context information to solve the data sparsity problem. In the pipelines of most existing POI methods that use the geographical information, all assume that the matrix of user-POI is globally low-rank. However, since most of the entries in the user-POI matrix are unknown, it is hard to complete the matrix, and it will give suboptimal solutions of the users’ and POIs’ latent factors in the factorization model. The assumption of globally low-rank thus still has limitations.

Although the check-in data is very sparse, fortunately, the check-in data always has the local structure for the users who live nearly, that is “everything is related to everything else, but near things are more related than distant thing” [27]. This motivates us to recover more unknown values in user-POI matrix by using the local geographical constraint. In this paper, we propose to recommend the unvisited POIs for a user based on three local properties. 1) Local low-rank matrix. Instead of assuming that the user-POI matrix is globally low-rank, we explicitly enforce that the user-POI matrix is locally low-rank [10, 11], which the locality is defined in the vicinity of certain row-column combinations. We construct several low-rank sub-matrices, each sub-matrix corresponds to a local region in the user-POI matrix. To calculate the similarity, the anchor point is used as representation for local region. In the local low-rank model, user-POI sub-matrices are much denser than the original matrix. Thus, the sparsity problem can be relieved. 2) Spatial similarity. In the real world, there exists spatial locality in users’ behaviour, that is people living in the neighbourhood would have the similar behavior. For example, there are two users A and C. They live closely. Since there are limited number of check-ins for each user, they have no common check-in. This doesn’t mean they are not similar with each other any more. As their active areas overlap, maybe a nearby POI visited by A will also be visited by C in the future. To represent the spatial locality, we propose a new similarity: spatial similarity. The corresponding method is proposed for calculating the spatial similarity between users (the same to POIs). With the spatial similarity, we can calculate the similarity more accurately between user-POI point with anchor point, then we can find more similar user-POI points undiscovered before and make the local matrix denser. 3) Spatial favor. When a user chooses a restaurant, the distance is an important consideration. In this paper, we propose the spatial favor to model user’s spatial preference. We utilize a personalized geographical model to get the spatial favor of user to POI. Compared to the existing works [12, 15] that fuse the spatial factor into their models directly, our model can learn the parameter more accurately. Finally, we use the local collaborative ranking (LCR) [11] with these three factors and propose to mix geographical information into local collaborative ranking (MG-LCR) for the sparse data.

Our contributions in this paper can be summarized as follows:

  • We propose a local matrix factorization method that explicitly enforces the user-POI matrix to have a locally low-rank structure. Due to sufficient samples are available in the small neighbourhoods, the local matrix approximation can relieve data sparsity.

  • We propose to mix geographical information into local collaborative ranking, in which the spatial similarity and the spatial favor are proposed for modelling the geographical correlations. Our spatial similarity can get more similar user-POI points in the local region and make the local matrix denser. We also utilize personalized geographical model to get the spatial favor of user to POI.

  • We conduct extensive evaluations on two benchmark datasets for POI recommendation. The empirical results show that the proposed method outperforms other state-of-the-art methods, including LCR, GeoMF, Rank-GeoFM.

The rest of this paper is organized as follows: In Section 2, we firstly propose locally low-rank model to predict users’ rating to POI, then we mix geographical information into local collaborative ranking. The experimental results are given in Section 3. In Section 4, we briefly review related work. Finally, we conclude the paper in Section 5.

2 The proposed approach

In this paper, we propose to mix geographical information into local collaborative ranking (MG-LCR) for dealing with small amount of check-in data. The MG-LCR has three building blocks for solving the sparsity of data: 1) local low-rank matrix factorization; 2) spatial similarity and 3) spatial favor. In the following, we will firstly introduce the research problem with required notations, and then we will present the details of these three parts, respectively.

2.1 Problem definition

Let be the set of all users, and be the set of all POIs. In POI recommendation, S is the observed implicit feedback data, S ⊂× is available. Each POI is associated with latitude and longitude coordinates (x,y). We denote u+ as the set of POIs that user u have visited:

$$ \mathcal{V}_{u}^{+} :=\{v\in \mathcal{V}:(u,v)\in S \}. $$
(1)

Each v in u+ is also associated with frequency information fv, which indicates the frequency of user u visited POI v. POI recommendation system’s task is to recommend POIs that user has not visited.

To predict users’ ratings to unvisited POIs, Li et al. [15] and Lian et al. [12] utilized methods based on matrix factorization, which assumes user-POI matrix is globally low-rank. It measures users’ preference by their historical check-ins over the entire POI space. They used matrix factorization to learn the rating matrix X : ×. With matrix factorization, X is approximated by the matrix product of two low-rank latent matrices

$$ \hat{X}:=UV^{T} , $$
(2)

where U : ||× r and V : ||× r and r is the rank of latent matrix.

2.2 Local low-rank matrix factorization

In fact, as user’s preference includes various aspects, he/she may have a special taste on restaurants, and another personalized favor on sceneries. Analogously, each POI’s attribute contains several aspects, it may provide a good service, but its location is remote. In other words, there exists local property in users’ check-in behavior and POIs’ attribute.

Therefore, we utilize an alternative approach to predict users’ ratings to POIs. By assuming user-POI matrix is locally low-rank instead of globally low-rank, we apply local low-rank matrix approximation to estimate ratings in personalized ranking. Thus, X is approximated by a number of low-rank matrices. Each low-rank matrix represents one local subgroup, and each of these matrices describes the original matrix for a subgroup of user-POI points [1, 24].

$$ \hat{X}_{u,v}=\sum\limits_{t=1}^{q} w^{t}_{u,v} \hat{X}^{t}_{u,v}, $$
(3)

where q is the number of low-rank matrices. X̂u,v represents the estimation of element in original matrix X, X̂u,vt is the estimation obtained from t-th low-rank matrix Xt, wu,vt denotes the weight of low-rank matrix element X̂u,vt in Xu,v. To get X̂u,vt and wu,vt, we operate as follow:

Step 1::

Sample q anchor points \((u_{t_{1}},v_{t_{1}}),...,(u_{t_{q}},v_{t_{q}}) \in \mathbb {R}^{|\mathcal {U}| \times |\mathcal {V}|}\) uniformly from the training set. Each anchor point will be used to construct corresponding sub-matrix. And it is conceivable that more sophisticated adaptive sample techniques will result in a more accurate model.

Step 2::

Identify neighborhoods surrounding t-th anchor point to form a sub-matrix. If the similarity between user-POI point (u,v) with anchor point (ut,vt) is not less than a threshold, then (u,v) is a neighborhood of t-th anchor point. To calculate the similarity between user-POI point with anchor point, we use the product of users’ similarity with POIs’ similarity to represent it. Here we use function s(u,ut) to indicate the similarity between users in each point, the same to POIs.

$$ \begin{array}{@{}rcl@{}} &&s(u,u_{t})=arcos\left( \frac{\langle U_{u},U_{u_{t}} \rangle}{||U_{u}||\cdot||U_{u_{t}}||}\right)\\ &&s(v,v_{t})=arcos\left( \frac{\langle V_{v},V_{v_{t}} \rangle}{||V_{v}||\cdot||V_{v_{t}}||}\right)\\ &&s((u,v),(u_{t},v_{t}))=s(u,u_{t})s(v,v_{t}), \end{array} $$
(4)

where s(u,ut) describes user similarity, which can be computed solely based on the partially observed user-POI matrix by cosine similarity. However, the partial observed matrix is very sparse, leading to poor estimates of similarity. Therefore, we factorize X into latent factor U,V firstly, then we calculate the cosine similarity between users, the same to POIs (omitted here). Moreover, the similarity can also be calculated by other information which will be introduced in the latter section. As mentioned in [10], we also introduce smoothing kernel into the similarity. Since Epanechnikov kernel’s performance is better than uniform and triangular kernel, we adapt Epanechnikov kernel as smoothing kernel. The similarity between two users becomes:

$$ \begin{array}{@{}rcl@{}} && K_{h_{1}}(u,u_{t})= \frac{3}{4}(1-s(u,u_{t})^{2})\textbf{1}[s(u,u_{t}) < h_{1}\\ && K'_{h_{2}}(v,v_{t})= \frac{3}{4}(1-s(v,v_{t})^{2})\textbf{1}[s(v,v_{t}) < h_{2}, \end{array} $$
(5)

where h1,h2 are bandwidth of smoothing kernel. So the similarity between user-POI point with anchor point becomes

$$ s((u,v),(u_{t},v_{t}))=K_{h_{1}}(u,u_{t})K'_{h_{2}}(v,v_{t}). $$
(6)
Step 3::

With the similarity K(⋅,⋅), we can get the weight wu,vt as following formula

$$ w^{t}_{u,v}=\frac{s((u,v),(u_{t},v_{t}))}{{\sum}_{s=1}^{q} s((u,v),(u_{s},v_{s}))}. $$
(7)

where X̂t can be obtained by the local low-rank matrix, in left part of Figure 1, the sub-matrices are decomposed by matrix factorization:

$$ \hat{X}^{t}_{u,v}={U^{t}_{u}} {{V^{t}_{v}}}^{T}. $$
(8)
Step 4::

Via local low-rank matrix approximation, the estimation rating of user-POI X̂u,v will be aggregated, as shown in right part of Figure 1:

$$ \hat{X}_{u,v}=\sum\limits_{t=1}^{q} \frac{s((u,v),(u_{t},v_{t}))}{{\sum}_{s=1}^{q} s((u,v),(u_{s},v_{s}))}[U^{t} {V^{t}}^{T}_{u,v}. $$
(9)
Figure 1
figure 1

Local low-rank matrix factorization. Original matrix is realigned by the anchor points to form sub-matrices, there are five sub-matrices indicated by different colors. Black-edged box indicates the anchor point

We calculate the similarity between users (the same to POIs) by latent vector. In fact, besides the general local property, there exists spatial local property for users and POIs in the geographical space. In the following, we first introduce the spatial similarity, then we describe spatial favor in our model. At last, we mix the geographical information into LCR, and provide corresponding learning algorithm.

2.3 Spatial similarity

Each user may have different favors in different geographical regions. For example, when user is in Las Vegas, he/she would like casino, having the same favor with local users. While he/she probably likes Lake Powell, if the user is in Phoenix.

At the same time, there is also spatial local property in POIs. Ye et al. [27] mentioned “nearby POIs are more related to each other, which exhibits strong geographical influence.” The adjacent POIs are more or less similar. POIs closed with shopping malls are likely related to shopping. We use data from Yelp Dataset Challenge, including Phoenix, Las Vegas et al. Via latent vector obtained from matrix factorization, we calculate the similarity between POIs by cosine (\(\frac {V_{v_{1}} \cdot V_{v_{2}}}{||V_{v_{1}}||\cdot ||V_{v_{2}}||}\)), getting the correlation between distance (POI to POI) and similarity (POI to POI). The x axis means that POI-POI’s distance is not larger than a corresponding value. In Figure 2, the correlation between distance and similarity is negative, especially when the distance is 100m, the absolute value of correlation is more than 0.55. This indicates when POI to POI’s distance isn’t larger than 100m, they are more likely similar. Thus it can be seen that there exists spatial local property in the geographical space. We can also see when the distance is less than 100m, the absolute value of correlation is not big. This may be because there are not too many POI pairs, of which distance is less than 100m, and there are many noises in the similarities between POIs.

Figure 2
figure 2

Correlation between sim (v1,v2) and distance (v1,v2) on Yelp. When the POI to POI’s distance is not large than a certain value, we calculate the corresponding correlation value between similarity and distance

To represent the local property on geographical space, we will introduce spatial similarity, just as similarity for general local property. Next, we will introduce how to compute the spatial similarity. The spatial similarity contains two parts: user’s spatial similarity and POI’s spatial similarity.

User’s Spatial Similarity depicts how similar between the active areas of two users. This concept is first proposed in POI recommendation. In earlier works, they used the general similarity to make user-based collaborative filtering recommendation, ignoring that there exists spatial similarity. Through there already exists spatial similarity in other domains, such as image retrieval [6] and biogeography [9], but the spatial similarity in these domains is not same with POI recommendation. Because the check-in data is just some points in spatial area, and each point is associated with frequency information. And we concern their spatial active areas’ relationship, not just their overlapped area. Here we propose a kernel estimation method [13, 34] to calculate it. Defining user a’s check-in set as \(L_{a}=\{(v_{1}^{(a)},f_{1}^{(a)}),(v_{2}^{(a)},f_{2}^{(a)}),...,(v_{n}^{(a)},f_{n}^{(a)})\}\), where \(v^{(a)}_{j}= \langle x_{j},y_{j} \rangle \), xj and yj represent the latitude and longitude of POI vj(a), which is visited by the user fj(a) times. Since user may visit a POI several times, the frequency information is important to depict user’s geographical attribute, and it can reflect the probability of a user occurs in somewhere. Each POI v’s spatial similarity with user a is defined as below:

$$ f_{s}(v|L_{a},h)=\frac{1}{|F_{a}|}\sum\limits_{j=1}^{n} K^{\prime\prime}_{h}(v,v^{(a)}_{j})\cdot f^{(a)}_{j} $$
(10)
$$ K^{\prime\prime}_{h}(v,v^{(a)}_{j})=\frac{1}{\sqrt{2\pi} h}e^{-\frac{\|v-v^{(a)}_{j}\|^{2}}{2h^{2}}}, $$
(11)

where |Fa| is the sum of all \(f^{(a)}_{j}\) in La, \(K^{\prime \prime }_{h}\) is a kernel function defined in formula (11), h is the bandwidth of kernel function, \(\|v-v^{(a)}_{j}\|^{2}\) is the spatial distance between v and \(v^{(a)}_{j}\). In a similar way, user b’s check-in set is \(L_{b}=\{(v_{1}^{(b)},f_{1}^{(b)}),(v_{2}^{(b)},f_{2}^{(b)}),...,(v_{m}^{(b)},f_{m}^{(b)})\}\). So the spatial similarity sp(a,b) between a and b will be:

$$ \begin{array}{@{}rcl@{}} sp(a,b)&=&\frac{1}{|F_{b}|} \sum\limits_{i=1}^{m} f_{s}(v_{i}^{(b)}|L_{a},h)\cdot f_{i}^{(b)} \\ &=&\frac{1}{|F_{b}||F_{a}|} \sum\limits_{i=1}^{m} \sum\limits_{j=1}^{n} K^{\prime\prime}_{h}(v_{i}^{(b)},v_{j}^{(a)})\cdot f_{j}^{(a)} \cdot f_{i}^{(b)} \\ &=&\frac{1}{|F_{a}|} \sum\limits_{j=1}^{n} f_{s}(v_{j}^{(a)}|L_{b},h)\cdot f_{j}^{(a)} \\ &=&sp(b,a), \end{array} $$
(12)

where |Fb| is the sum of all \(f^{(b)}_{i}\) in Lb. From the formula above, we can see the sp(a,b) is symmetric. So the calculation will be reduced by half.

To prove the assumption that similarity contains general similarity and spatial similarity, we calculate the correlation between user’s general similarity with user’s spatial similarity. In Figure 3, as the spatial similarity increases, the correlation improves synchronously. This indicates the bigger users’ spatial similarity are, the more similar their preference are. Besides, we show the heatmap of three users’ check-in in Figure 4. In Figure 4(a), user u1’s active area is shown in yellow/green-edge area, user u2’s active area is shown in blue-edge area, their active areas overlap at the center part of the map. Based on the formula of general similarity and spatial similarity, we get general similarity is 0.68, and their spatial similarity is 0.27. In Figure 4(b), user u1’s active area is still shown in yellow/green-edge area, user u3’s active area is shown in blue-edge area. In contrast to u2, user u1’s general similarity with u3’s is 0. However, they have two overlapped active areas, and their spatial similarity is 0.25. So the spatial similarity can remedy the insufficient of general similarity. By the two examples, we can see the spatial similarity can depict users’ spatial relationship well. At the same time, the spatial similarity can also reflect users’ similarity properly when general similarity doesn’t work.

Figure 3
figure 3

Correlation between users’ general similarity and spatial similarity on Yelp. When the user to user’s spatial similarity is not large than a certain value, we calculate the corresponding correlation value between general similarity and spatial similarity

Figure 4
figure 4

Heatmaps of (u1,u2,u3) on Yelp. In Figure 4(a), user u1’s active area is shown in yellow/green-edge area, user u2’s active area is shown in blue-edge area, their active areas overlap at the center part of the map. In Figure 4(b), user u1’s active area is still shown in yellow/green-edge area, user u3’s active area is shown in blue-edge area. There are two overlapped active areas

POI’s Spatial Similarity depicts how similar between two POIs on spatial space. Basically, the normalized distance between two POIs can be used as the spatial similarity. However, the spatial similarity is not linear relationship with distance. Ye et al. [27] utilized a power-law probabilistic model between two POIs’ relationship. But it is based on users’ visited POI set. Here the POI’s spatial similarity is independent with users and is calculated from POI perspective. To obtain POIs’ spatial similarity from the distance information and reflect their non-linear relationship, we use the kernel estimation method, which is a non-parametric way to estimate the probability density function of a random variable. Defining the POI vi’s coordinate as \(v_{i}=\langle x_{i},y_{i}\rangle \), POI vj’s coordinate as \(\langle x_{j},y_{j}\rangle \), the spatial similarity \(sp(v_{i},v_{j})\) between vi and vj will be:

$$ sp(v_{i},v_{j})=\frac{1}{\sqrt{2\pi} h_{*}}e^{-\frac{({d}_{v_{i} v_{j}})^{2}}{2h_{*}^{2}}}, $$
(13)

where \({d}_{v_{i} v_{j}}\) is the spatial distance between vi and vj. How to select the bandwidth h will be introduced in section 2.4.

2.4 Spatial favor

As we know, user visits a POI considering not only personalized preference but also the distance factor. As the distance between user and POI becomes smaller, user will probably prefer the nearby POI much more, which we call spatial favor in this paper. To better learn user’s personalized preference, we obtain user’s spatial favor to POI in advance. Since we cannot get user’s current position, we use user’s history check-ins to depict user’s active area, then calculate distance between target POI with the POIs visited by the user. Ye et al. [27] used power-law distribution to get spatial favor by the distance. However, it needs to learn the parameter of the power-law distribution firstly. Here, we use kernel function to depict user’s spatial favor to POI.

$$ f_{S}(v|L_{u},h)=\frac{1}{|F|}\sum\limits_{j=1}^{n} K^{\prime\prime}_{h}(v,{v^{u}_{j}})\cdot f_{j} $$
(14)
$$ K^{\prime\prime}_{h}(v,{v^{u}_{j}})=\frac{1}{\sqrt{2\pi} h}e^{-\frac{({d}_{v {v^{u}_{j}}})^{2}}{2h^{2}}}, $$
(15)

where v is the POI to calculate, \(f_{S}(v|L_{u},h)\) depicts user u’s spatial favor to POI v. h is the bandwidth parameter. Moshe et al. [13] suggested h be the Euclidean distance of the k th nearest neighbor to vju in the training data. However, user’s check-in data is very sparse, this method works not very well in the individual’s level. Here we utilize fixed h and perform a grid-search to selecting h by a validation set. We will use log-probability score function LPu(h) (formula is shown below) and precision of top-k for bandwidth selection.

$$ LP_{u}(h)=\frac{1}{n_{t}}\sum\limits_{r=1}^{n_{t}}\log f_{S}(v_{r}|L_{u},h), $$
(16)

where the nt is the number of POIs in the validation set. By Table 1, we can see when h = 0.05, the performance is best.

Table 1 log-probability scores on validate set, comparing the fixed, adaptive approaches for kernel density estimation on Foursquare check-in data

2.4.1 Mix geographical information into local collaborative ranking

We use local low-rank matrix factorization to predict users’ ratings on POIs. The rating can only reflect user’s general preference excluding geographical factor. As we know, users tend to visit the POIs around them. Then the spatial favor proposed before can reflect the probability of user appearing at the POI in the spatial perspective. Then we fuse the spatial favor into LCR, getting a new rating of user to POI, which is defined as follows:

$$ \hat{X}_{u,v}=\sum\limits_{t=1}^{q} \frac{s((u,v),(u_{t},v_{t}))}{{\sum}_{s=1}^{q} s((u,v),(u_{s},v_{s}))}[U^{t} {V^{t}}^{T}_{u,v}+f_{S}(v|L_{u}). $$
(17)

At the same time, when calculating the relationship between user-POI point with anchor points s((u,v),(ut,vt)), we should consider not only the general similarity calculated between latent vectors but also spatial similarity. The similarity s(u,ut), s(v,vt) in formula (5) between users (and POIs) can be updated by the mixed similarity sm(u,ut), sm(v,vt) below:

$$ \begin{array}{@{}rcl@{}} s_{m}(u,u_{t})&=&(1-\alpha)s(u,u_{t})+\alpha \cdot sp(u,u_{t})\\ s_{m}(v,v_{t})&=&(1-\alpha)s(v,v_{t})+\alpha \cdot sp(v,v_{t})\\ 0&\leq&\alpha\leq1 , \end{array} $$
(18)

where α is a parameter to balance the weight of spatial similarity. However, this is a linear combination of different factors, and there is an extra variable α to be inferred. When facing new datasets, the variable needs to be recomputed. Here the proposed method utilizes the proportion of two factors in exponential space to replace the extra variable. As shown below, it can avoid extra variable to infer, the same to POI.

$$ \begin{array}{@{}rcl@{}} s_{m}(u,u_{t}) & =&\frac{\exp(s(u,u_{t}))}{Z}s(u,u_{t}) +\frac{\exp(sp(u,u_{t}))}{Z} sp(u,u_{t})\\ Z & =&\exp(s(u,u_{t}))+\exp(sp(u,u_{t})). \end{array} $$
(19)

2.5 Parameter inference

To optimize parameter in the model, we use bayesian analysis of the problem [18], utilizing the likelihood function for p(viuvj|Θ) as objective function, and p(Θ) is the prior probability of the model. Our model assumes that a user prefers a visited POI than an unvisited one, and user’s rating to visited POI should be higher than rating to unvisited POI. It learns the POIs’ ranking for each user, so our proposed method is called local collaborative ranking mixed with geographical information. As a result, we will maximize likelihood function p(≻u|Θ) for all users

$$ \prod\limits_{u\in \mathcal{U}} p(\succ_{u}|{\Theta})=\prod\limits_{(u,v_{i},v_{j})\in \mathcal{U}\times \mathcal{V}\times \mathcal{V}} p(v_{i} \succ_{u} v_{j}| {\Theta}). $$
(20)

In order to get a personalized total rank, we define the individual probability that a user really prefers item vi to item vj as:

$$ \begin{array}{@{}rcl@{}} p(v_{i}\succ_{u}v_{j}|{\Theta})&=&\sigma(\hat{x}_{uv_{i}v_{j}}({\Theta}))\\ \hat{x}_{uv_{i}v_{j}}({\Theta})&=&\hat{X}_{uv_{i}}({\Theta})-\hat{X}_{uv_{j}}({\Theta}), \end{array} $$
(21)

where σ is the logical function:\(\sigma (x)=1/1+e^{-x}\), \(\hat {x}_{uv_{i}v_{j}}({\Theta })\) is an arbitrary real function, the model parameter vector Θ which captures the special relationship between user u, item vi and item vj. Here, \(\hat {X}_{uv_{i}}({\Theta })\), \(\hat {X}_{uv_{j}}({\Theta })\) is estimated by local collaborative ranking mentioned in Section 2.4.1. In order to complete ranking learning task, we introduce a general prior density p(Θ), its mean is 0, the variance matrix of the normal distribution is ∑ Θ. Then to reduce the number of unknown parameters, we formulate the maximum posterior (MAP) estimation for the personalized ranking, which is derived below.

$$ \begin{array}{@{}rcl@{}} MAP & =& \ln p({\Theta}|\succ_{u}) \\ & =& \ln p(\succ_{u}|{\Theta})p({\Theta})\\ & =& \ln \prod\limits_{(u,v_{i},v_{j})\in D_{s}} \ln \sigma(\hat{x}_{uv_{i}v_{j}})p({\Theta})\\ & =& \sum\limits_{(u,v_{i},v_{j})\in D_{s}} \ln \sigma(\hat{x}_{uv_{i}v_{j}})+ \ln p({\Theta})\\ & =& \sum\limits_{(u,v_{i},v_{j})\in D_{s}} \ln \sigma(\hat{x}_{uv_{i}v_{j}})- \lambda_{\Theta}\|{\Theta}\|^{2}\\ D_{s}&=&\{(u,v_{i},v_{j})|u \in \mathcal{U} \wedge v_{i}\in \mathcal{V} \wedge v_{j}\in \mathcal{V}\backslash \mathcal{V}_{u}^{+} \}, \end{array} $$
(22)

where λΘ are model specific regularization parameters.

2.6 Learning algorithm

We have just derived an objective function for ranking learning. To learn the objective function, we adopt stochastic gradient descent algorithm. As the objective function is differentiable, the gradient of MAP with respect to the model parameters is:

$$ \begin{array}{@{}rcl@{}} \frac{\partial MAP}{\partial {\Theta}} &=&\sum\limits_{(u,v_{i},v_{j})\in D_{s}} \frac{\partial}{\partial {\Theta}}\ln \sigma(\hat{x}_{uv_{i}v_{j}})-\lambda_{\Theta}\frac{\partial}{\partial {\Theta}}\|{\Theta}\|^{2} \\ & \propto& \sum\limits_{(u,v_{i},v_{j})\in D_{s}} \frac{-e^{-\hat{x}_{uv_{i}v_{j}}}}{1+e^{-\hat{x}_{uv_{i}v_{j}}}} \frac{\partial}{\partial {\Theta}}\hat{x}_{uv_{i}v_{j}}-\lambda_{\Theta} {\Theta}. \end{array} $$
(23)

With local collaborative ranking, the target matrix X is grouped by several local matrices Xt, each is approximated by the matrix product of two low-rank matrices Ut : |tr and Vt : |tr. t is the user set of local matrix Xt, t is the POI set of local matrix Xt, r is the rank of feature matrix.

$$ \begin{array}{@{}rcl@{}} \hat{X}_{u,v}&=&w^{t}_{u,v}\cdot[U^{t} {V^{t}}^{T}_{u,v}\\ w^{t}_{u,v}&=& \frac{s((u,v),(u_{t},v_{t}))}{{\sum}_{s=1}^{q} s((u,v),(u_{s},v_{s}))}, \end{array} $$
(24)

where wu,vt represents the weight of local low-rank matrix element X̂u,vt in X̂u,v. Each row [Uut in Ut can be seen as a feature vector describing a user u in local matrix and similarly each row [Vvt of Vt describes a POI v. Thus the prediction formula can also be written as:

$$ [U^{t} {V^{t}}^{T}_{u,v}=\sum\limits_{f=1}^{r} [U^{t}_{uf} \cdot [V^{t}_{vf}. $$
(25)

Then we can get:

$$ \frac{\partial}{\partial {\Theta}}\hat{x}_{uv_{i}v_{j}}=\left\{ \begin{array}{lll} & w^{t}_{u,v}\cdot([V^{t}_{v_{i}f}-[V^{t}_{v_{j}f}), & if\ {\Theta}=[U^{t}]_{uf} ,\\ & w^{t}_{u,v}\cdot [U^{t}_{uf}, & if\ {\Theta}=[V^{t}]_{v_{i}f} ,\\ & -w^{t}_{u,v}\cdot [U^{t}_{uf}, & if\ {\Theta}=[V^{t}]_{v_{j}f} ,\\ & 0, & else \end{array} \right. $$
(26)

The stochastic gradient descent algorithm is defined as follow:

Step 1::

initialize each [Ut,[Vt as Gaussian distribution matrix.

Step 2::

choose the triple (u,vi,vj) randomly (uniformly distributed) from users’ check-in set, learn the parameters in [Ut and [Vt in iteration.

$$ {\Theta}\leftarrow{\Theta}-\alpha(\frac{e^{-\hat{x}_{uv_{i}v_{j}}}}{1+e^{-\hat{x}_{uv_{i}v_{j}}}}\cdot \frac{\partial}{\partial {\Theta}}\hat{x}_{uv_{i}v_{j}}+\lambda_{\Theta}{\Theta}). $$
(27)
Step 3::

judge the iteration converged or not. If it converges, then the algorithm finishes, otherwise turn to Step 2.

3 Experiment and results

In the experiments, we compare MG-LCR with other recommendation models. First, without considering geographical information, we choose three basic models, weighted matrix factorization (WMF), probabilistic factor model (PFM) and bayesian personalized ranking (BPR). We use LCR comparing with them. Second, to show the effect of introducing geographical information, we compare MG-LCR with LCR. At last, in order to understand the superiority of our models, we compare MG-LCR with the state-of-art model Rank-GeoFM.

3.1 Data sets

Experiments are conducted on two datasets, One is FoursquareFootnote 1 check-in data [15], the other is YelpFootnote 2 check-in data. In Foursquare check-in data, there are total 49,062 users, 206,416 POIs and 221,128 check-ins in New York City(NYC). The density of user-POI matrix is 2.1 × 10− 5, average number of visited POIs per user is 4.5, average number of check-ins per POI is 1.07. In Yelp check-in data, there are total 552,000 users, 77,000 POIs and about 2,200,000 check-ins. The density of user-POI matrix is 5.1 × 10− 5, average number of visited POIs per user is 3.98, average number of check-ins per POI is 28.57. We can see that both of the datasets are sparse, and the user-POI matrix in Foursquare is much sparser than Yelp.

3.2 Evaluation metrics

Here we use top-k evaluation criterion, for POIs each user accessed, we randomly selected 70% of check-ins as the training set, and 20% as a test set, the rest as validate set. We learn model’s parameters from the training set, then we predict where the user may go by recommendation model. The recommendation model is to rate each POI unvisited and rank them by the ratings, then it returns the top-k POIs as recommendation list to the user. By comparing recommendation list and test set, we assess the model’s accuracy. Evaluation metrics include recall@k and precision@k. The first evaluation metric shows how many percent of POIs user really visited are in the recommendation list for each user. And the second evaluation criterion indicates how many percent of POIs in the recommendation list are really visited by each user. The formulas are defined as below:

$$ \begin{array}{@{}rcl@{}} recall@k=\frac{1}{M}\sum\limits_{u=1}^{M} \frac{|R_{u}(k)\cap T_{u}|}{|T_{u}|}\\ precision@k=\frac{1}{M}\sum\limits_{u=1}^{M} \frac{|R_{u}(k)\cap T_{u}|}{k}. \end{array} $$
(28)

where Ru(k) represents the top-k POIs recommended by the model for user u. Tu represents the POI set really visited by user u. M is the number of users. This process is just an experiment of one time. The performance of the final recommendation model is through 5 independent experiments and we get the result by average.

3.3 Baseline methods

For POI recommendation, we compare our model with the following baseline methods.

  • Weighted Matrix Factorization(WMF) WMF is especially designed for matrix factorization based on implicit feedback [8, 17]. WMF considers all the unvisited locations as negative examples, and it assigns weights of small value to all negative examples, while the weights assigned to positive examples are monotonically increasing respect to frequency.

  • Probabilistic Factor Models(PFM) Since the distribution of frequency data doesn’t obey Gaussian distribution, PFM [2] places Beta distributions as priors on the latent vectors, then PFM defines a Poisson distribution on the frequency.

  • Bayesian Personalized Ranking(BPR) BPR is the most popular ranking-based matrix factorization, our local collaborative ranking model is also improved based on the Bayesian Personalized Ranking criterion [18].

For integrated models considering geographical information on POI recommendation, the following methods are used as baselines.

  • GeoMF GeoMF includes geographical factor into weighted matrix factorization, then it extends the dimensions of latent vectors to learn users’ spatial activity and POIs’ spatial influences [12].

  • Rank-GeoFM Rank-GeoFM is the state-of-the-art method on personalized ranking for POI recommendation [15].

3.4 Parameter setting

Next, we show the parameter values. We set the bandwidth parameter in Epanechanikov kernel as h1 = h2 = 0.8. Then we compare the performance under different number of anchor points. In Figure 5, the performance is increasing as more anchor points, and when the number is bigger than 100, the performance doesn’t increase too much, however, the calculation time will consume more. Then we select 100 anchor points for Foursquare dataset and 100 anchor points for Yelp dataset. At the same time, we compare LCR with Personalized Ranking (PR, also called BPR) at different rank values, including 5,10,15,20,30. Since the performance doesn’t increase too much when r > 15, we don’t try any larger ranks. In Figure 5, LCR gets the best performance at r = 30 on two datasets. However, from the experiment results, we observe the increase becomes smaller when the rank exceeds 15. Therefore, to balance the costing time for learning parameters and performance, we set rank as 15 in our models. In the same way, we set the rank of traditional matrix factorization as 15. In spatial favor, we set h = 0.05 as fixed bandwidth for individual’s frequency distribution, which is shown in Table 1. We compare the performance of MG-LCR at different values of α and our improved fusing method with softmax function. In Figure 6, when α = 0.2, α = 0.3, the precision@5 is best on each of two datasets. This indicates spatial similarity is indeed an important factor. And our fusing method in formula 19 can also obtain nearly best performance, it is robust and not effected by different datasets. At the same time, it doesn’t need extra calculation to determine the weight parameter, so we adopt the improved fusing method in our later experiment.

Figure 5
figure 5

Compare preformance of LCR with PR which considers matrix as global low-rank, at different value of rank on two datasets, Foursquare and Yelp

Figure 6
figure 6

Compare precision@5 at different value of α and softmax method on two datasets, Foursquare and Yelp

3.5 Results and discussion

In Figure 7, we compare LCR with BPR, WMF, PFM in different number of top-k. LCR is best all the time. On each number of top-k, WMF is better than PFM. Even through PFM is optimized for frequency matrix, WMF is more suited to POI recommendation where data of check-ins in POI recommendation is implicit feedback. And BPR is better than WMF, PFM. Not only because it is a ranking-based factorization, but also the train set of BPR consists of positive examples, negative examples and missing values to be predicted. However, in WMF and PFM, all the missing values are assumed as negative examples, this would bring some false negative examples. We can see that LCR is much more better than BPR all the time. The improvements, in terms of precision@5, are more than 5.10% and 7.20% on Foursquare and Yelp datasets, respectively. This may be caused by the local property existed in check-in matrix, and the local low-rank factorization can capture the local property exactly. This can prove the locally low-rank is much suited than globally low-rank in POI recommendation.

Figure 7
figure 7

Compare the performance of LCR with others approaches on two datasets, Foursquare and Yelp

In Figure 8, we compare MG-LCR with LCR+fav, LCR+sim and LCR. LCR+fav is the model mixed spatial favor into local collaborative ranking, without considering spatial similarity. LCR+sim is the model mixed spatial similarity into local collaborative ranking, without considering spatial favor. MG-LCR is the model fused geographical information into local collaborative ranking, which considers not only spatial favor but also spatial similarity in fact. We can see all of MG-LCR, LCR+fav, LCR+sim are better than LCR, this proves geographical information is important in POI recommendation. Considering precision@5, in Foursquare dataset, LCR+fav and LCR+sim improve LCR by 14.9%, 4.98% respectively, in Yelp dataset they improve by 17.5%, 8.30% respectively. The experiments also show LCR+fav is a little better than LCR+sim. This can indicate spatial favor is a little more important than spatial similarity. Then we compare them with MG-LCR, which is the model mixed both spatial favor and spatial similarity into local collaborative ranking. We can see that, considering precision@5, MG-LCR is superior to LCR+fav and LCR+sim, improving LCR by 20.6% in Foursquare, 25.4% in Yelp.

Figure 8
figure 8

Compare the performance of our proposed model MG-LCR with Rank-GeoFM and GeoMF on two datasets, Foursquare and Yelp

At last, we compare them with Rank-GeoFM (the state-of-art method), GeoMF. We can see that MG-LCR, LCR+fav and Rank-GeoFM are all better than GeoMF, this may be caused by that they are optimized for ranking. Further more, LCR+fav is better than Rank-GeoFM, by 4.85% and 5.48%, in terms of precision@5 on each dataset. MG-LCR is much better than Rank-GeoFM, by 11.0% and 11.9%, in terms of precision@5 on each dataset. Maybe because MG-LCR supposes the user-POI matrix is locally low-rank and includes spatial similarity to make sub-matrix more denser, which relieves the sparsity greatly. Moreover, the spatial favor of user to POI can characterise users’ geographical attribute better.

4 Related work

POI recommendation is a very important research topic in location-based service. The researchers’ works mainly focus on predicting users’ ratings to locations. In these works, there are mainly two kinds of Collaborative Filtering (CF). One is memory-based CF, such as user-based CF and item-based CF, including Ye et al. [27], Zhang et al. [34] and so on. As social networks and mobility devices become popular, a large number of location-based data can be achieved, [27, 34] got user similarity from the friendship of users, besides the check-in data, to improve the accuracy of prediction. The other is model-based CF, such as Probabilistic Factor Models (PFM) [2], non-negative matrix factorization (BNMF) [14], matrix factorization (MF) [4], weighted matrix factorization(WMF) [12].

The methods mentioned above are designed for approximating the absolute ratings of the locations. However, POI recommendation is actually optimizing for locations ranking. Traditional ranking-oriented CF algorithms directly generate a preference ordering of items for each user, including CoFiRank [23], ListRank-MF [19]. However, these ranking-oriented CF methods are based on explicit feedback, such as rating. In POI recommendation, the data we possess is implicit feedback, i.e. check-in data. Li et al. [15] used Rank-GeoFM to optimize POIs’ rank, they ranked POIs for user by the frequency of check-in. On one hand, it works better for implicit feedback data, and on the other hand, the sparsity problem can be alleviated to some degree. Because both visited(positive examples) and unvisited POIs will contribute to the learning of a ranking function.

Besides the check-in data we can utilize to make POI recommendation, there are many other sources of data can be used to improve the accuracy of ranking.

Above all, the geographical information is proved to be very important in POI recommendation [27]. Ye et al. [27] argued that the geographical proximities of POIs have a significant influence on users’ check-in behavior. The probability of POIs visited by user is closely related to the distance between user and POIs. They proposed that each user’s check-in activities exist geographical clustering phenomenon. Then they used the power-law distribution to fit the check-in probability. Different from [27], Cheng et al. [2] captured the multi-center characteristics of individual check-in POIs and used the multi-center model for encoding the spatial clustering phenomenon. To make personalized recommendation, not model users’ check-in behavior in a universal way, Zhang et al. [34] used a kernel density estimation approach to personalize the geographical influence on users’ check-in behaviors. Moreover, to learn individual spatial density model, Lichman et al. [13] proposed a mixture kernel density estimation to interpolate between an individual’s data and broader patterns in the population as a whole. Besides, to make use of collaborative filtering to learn users’ spatial activity, Lian et al. [12] augmented users’ and POIs’ latent factors to users’ spatial activity factors and POIs’ spatial influence factors, then they used weighted matrix factorization to incorporate the clustering phenomenon into human mobility behavior for POI recommendation.

Other types of context utilized in POI recommendation include text [5, 7, 14, 20, 26], time [3, 22, 25, 31, 33, 36], friendship [34] and so on [21]. Yin et al. [28, 32] focused on solving out-of-town problem with content information. Yin et al. [29] proposed geographical locality from users’ and POIs’ perspectives. Even though it considers spatial similarity, it is still based on global low-rank. Besides, to complete the online recommendation, Yin et al. [30, 31] proposed a clustering-based branch and bound algorithm to prune the POI search space and facilitate fast retrieval of the top-k recommendations.

Different from them, for the first time in this article we introduce local collaborative ranking for POI recommendation, which assumes user-POI matrix is locally low-rank instead of globally low-rank and is more suited for the local property existed in the check-in data. Even though they [10, 11, 16, 35] also considered the locally low-rank exists in user-item matrix, other useful information is ignored, such as geographical information. Unlike them, we introduce the geographical information systematically by the forms of spatial favor and spatial similarity, then we mix them into local collaborative ranking to improve the recommendation accuracy. From theoretical analysis, our model can also work on out-of-town recommendation scenario. Since the estimated value of user’s preference to POI is constructed by the corresponding values of the related local low-rank matrices, we can utilize spatial and content information to find more potential related local low-rank matrices, which are constructed by the out-of-town users and POIs. Therefore, our model can utilize local low-rank matrices of out-of-town to relieve data sparsity in out-of-town recommendation scenario. Besides, it also needs to do more experiments to prove the real effect, and this is a significant work we would conduct in future.

5 Conclusions

There are the following differences between our work and the existing works.

First, we put forward to utilize local collaborative ranking for predicting user’s rating to POI, which assumes the user-POI matrix is locally low-rank instead of globally low-rank. The local collaborative ranking can relieve the sparsity of data. The experiment also reflects that local collaborative ranking is indeed superior to global low-rank factorization algorithms on POI recommendation.

Second, we analysis geographical relationship between user with POI, user with user, POI with POI systematically in POI recommendation, and mix geographical information into local collaborative ranking seamlessly. By geographical information, we model users’ spatial favor to POIs (spatial favor) and spatial similarity between users (and POIs). We consider users’ preference to POIs includes general favor and spatial favor. And the geographical influence to users varies greatly. We utilize personalized geographical model to get the spatial favor of user to POI. By experiment, we can see LCR+fav is much better than LCR. Besides, the similarity between users (the same to POIs) includes traditional similarity (obtained by latent factor) and spatial similarity. We make use of kernel density estimation to calculate the spatial similarity between users (and POIs). By spatial similarity, we can get more similar users (and POIs) in subgroup, and make the local matrix denser. The experiment also reflects LCR+sim is indeed better than LCR which only considers general similarity.

Third, experiment shows MG-LCR is much better than state-of-the-art method. At the same time, with good scalability, MG-LCR can incorporate more different types of context information, such as review text. By review text, we can obtain topic similarity ts(u,ut) between users (and POIs) and find more potential similar users and POIs with anchor points, making the local sub-matrix much denser. This is a future work we will go on studying.

$$ \begin{array}{@{}rcl@{}} s_{m}(u,u_{t})&=&(1-\alpha-\beta)s(u,u_{t})+\alpha \cdot sp(u,u_{t})+\beta \cdot ts(u,u_{t})\\ \alpha&=&\frac{\exp(sp(u,u_{t}))}{Z}\\ \beta&=&\frac{\exp(ts(u,u_{t}))}{Z}\\ Z&=&\exp(s(u,u_{t}))+\exp(sp(u,u_{t}))+\exp(ts(u,u_{t})). \end{array} $$
(29)