1 Introduction

With the advances in wireless community technologies, location-based social networks (LBSNs) such as Foursquare and Gowalla have grown rapidly and become pervasive in our daily lives. Millions of people on LBSNs share their experience on points of interest (POIs) when they check-in at locations. The huge amount of accumulated check-in data could be utilized to determine users’ preferences on POIs and further predict their mobility behaviors. Therefore, many POI recommendation methods have been proposed [1,2,3,4]. As an essential service of LBSNs, POI recommendation not only improves user experience but also increases advertising revenue of e-commerce platforms [5].

Among a variety of POI recommendation tasks, next POI recommendation is absolutely the most challenging one. It aims to capture sequential dependencies of user travel behaviors and recommend exactly the next POI to visit [6]. In recent years, a large number of methods have been proposed to model such sequential transitions. Previous studies mainly adopt Markov Chains technique [7, 8], which exploit user’s latest check-in in the check-in sequence. Such models would degrade recommendation performance since user’s next check-in is also highly dependent on other visited POIs [9]. With the superiority of modeling long-term sequential transitions, recurrent neural networks (RNNs)-based methods are currently the state-of-the-art solutions [9,10,11]. Based on the assumption that the spatiotemporal factors along with nearby check-ins restrict user movements [10], these models first integrate various contextual information, such as transition time and transition distance between consecutive visited POIs, with user’s check-in sequence, and further apply RNN to estimate user’s temporal preferences on POIs to predict the next visit.

Although existing methods have achieved satisfactory results, there are still two major limitations.

Firstly, most of previous works only capture the transition distances between consecutive visited POIs in users’ independent check-in sequences [10,11,12,13,14,15]. However, since a POI is geographically adjacent to multiple other POIs, the geographically adjacent relations between POIs are not thoroughly captured. In addition, the collaborative relations between POIs, whose importance has been verified in collaborative filtering-based models [16], are barely considered in next POI recommendation. In real scenarios, the correlations between POIs are fairly complex. First, it is common that people would consecutively visit geographically adjacent POIs. As shown at the bottom part of Fig. 1, POI \(l_{3}\) is geographically close to both \(l_{4}\) and \(l_{5}\). More generally, a POI is geographically adjacent to multiple other POIs in real world. Therefore, it would be better to construct a POI graph to fully capture the geographically adjacent relations for better recommendations.

Fig. 1
figure 1

A toy example of POI correlation modeling

In addition, collaborative filtering-based methods have demonstrated that items interacted by the same users are similar to some extent [17, 18]. As shown at the top part of Fig. 1, \(l_{2}\) and \(l_{5}\) are simultaneously visited by all users, i.e., \(u_{1}\), \(u_{2}\) and \(u_{3}\). Thus, \(l_{2}\) and \(l_{5}\) tend to share similar user preferences. If another user has visited POI \(l_{2}\), the user may also be fond of \(l_{5}\). Inspired by this, we define the “interactive correlation” to capture the important collaborative signals from similar POIs to refine POI representations. By thoroughly exploring the geographical distances and interactive correlations using POI graph, the POI characterizations will be refined and the recommendation performance will be further improved.

Secondly, most existing models only capture user’s short-term preference based on several recent check-ins [12, 19]. Nevertheless, user’s long-term preference, which contains both relevant and irrelevant information, also has contributions to user’s behaviors. Several recent studies tried to capture both long- and short-term preferences for user representation [20, 21]. However, they failed to adaptively differentiate the importance of POIs to find valuable check-ins of users. In fact, different POIs have distinguished meanings to a certain user according to their significances at different time-points. Therefore, weighting relevant POIs dynamically is necessary for enhancing recommendation performance.

To overcome the above limitations, we propose a novel method named MPGI (Mining Preferences from Geographical and Interactive Correlations) for next POI recommendation. Specifically, we design a POI correlation modeling layer to capture geographical distances and interactive correlations between all of POI pairs. Then, we connect highly correlated POIs through weighted edges and employ Node2Vec to train embeddings of POIs. Furthermore, we develop position-aware attention units and an attention network to adaptively select the most valuable information, so as to obtain both of user’s long-term and short-term preferences dynamically. Finally, we compute the visiting probability distribution over all POIs using a softmax function in the prediction layer.

Our primary contributions could be summarized as follows:

  • -We propose a novel next POI recommendation model to learn POI embeddings by explicitly capturing the complex correlations, i.e., geographical distances and interactive correlations between all of POI pairs.

  • -We design position-aware attention units and an attention network to adaptively select the most valuable check-ins for modeling user’s long- and short-term preferences.

  • -We conduct extensive experiments on two real-world datasets, which demonstrates the effectiveness of MPGI against the state-of-the-art methods in next POI recommendation.

2 Related work

In this section, we delve into relevant previous works on general POI recommendation, next POI recommendation and attention mechanism in recommendations.

2.1 General POI recommendation

General POI recommendation captures user’s general preferences for POIs without considering the sequential transitions of user behaviors. Generally, the most well-known recommendation method is collaborative filtering (CF). Earlier studies mainly integrated CF with various contextual information to estimate user’s general preferences [22,23,24,25,26]. For example, Ren et al. [22] presented a context-aware probabilistic matrix factorization method which fused five kinds of contextual information comprehensively. Furthermore, Davtalab et al. [27] integrated POI similarity and user similarity into the framework of probabilistic matrix factorization to capture the implicit correlations. Nowadays, deep learning has been introduced into POI recommendation to model nonlinear complex relationships between users and POIs [5, 28,29,30,31,32]. For example, Pang et al. [5] developed a structure of local-to-global to attentively mine more hidden information from multiple features of POIs. Considering that user preference was the key factor influencing user behaviors, Liu et al. [29] proposed a pair-wise ranking-based method to enhance the feature learning of POIs and preference learning of users.

2.2 Next POI recommendation

Compared with general POI recommendation, next POI recommendation emphasizes sequential dependencies of check-ins and models dynamic evolutions of user preferences [9]. Earlier studies mainly used the first-order Markov chains to capture the transition relationships between consecutive POIs [7, 8, 33]. For instance, Cheng et al. [8] developed FPMC-LR model which not only applied personalized Markov chains to the check-in sequence, but also considered user’s movement constraints. However, these methods only exploited user’s latest check-in of a sequence, thus failed to capture high-order transition relationships between previous visited POIs and the predicted one.

Recently, recurrent neural networks (RNNs) have been widely applied in sequential recommendations to model the long-term sequential patterns. Numerous studies integrated various contextual information into RNNs for next POI recommendation. For example, Liu et al. [13] considered time intervals and geographical distances between consecutive check-ins and integrated them into vanilla RNN. DSPR explored user’s preferences and real-time demands simultaneously, both of which are crucial factors to determine user’s behaviors [11]. To overcome the data sparsity problem, Yu et al. [34] incorporated POI category and geographical influence into RNN to reduce search space. DeepMove was an earlier attempt to combine user’s long-term preference and short-term preference, which, respectively, represented user’s behavior periodicity and real-time needs [14]. To pursue this further, Sun et al. [21] developed a nonlocal network to model long-term preference and a geo-dilated RNN to model short-term preference. Similarly, PLSPL [35], NATR [20] and RTPM [36] learned user’s both long-term and short-term preferences to get a comprehensive user representation. However, complex relationships between POI pairs are seldomly considered in the above works. GT-HAN [9] was a beneficial attempt which developed three matrices to explicitly capture geographical interactions between POI pairs in user’s check-in history.

Although RNN-based methods have demonstrated great superiority in next POI recommendation, one major issue is that the geographical distances and interactive correlations between any pair of POIs in the entire POI set are not thoroughly explored. In contrast, the proposed model constructs a POI graph to incorporate geographical distances and interactive correlations between POIs for more accurate POI representations. At the same time, it is of great importance to capture user long- and short-term preferences simultaneously. However, existing methods fail to differentiate the importance degrees of POIs in user’s check-in trajectories, thus cannot capture user dynamic preferences to boost recommendation.

2.3 Attention mechanism in recommendations

The attention mechanism [37], which derives from the study of human vision, has shown great power in recommendation systems. The advantage of attention mechanism lies in its ability to adaptively capture the important parts of input data. For example, Zhou et al. [38] employed attention mechanism in the interest evolving layer to strengthen the effects of relative information during interest evolution, and further captured the latent user interests dynamically. Considering the importance of the latest interacted item that usually plays a decisive role in the next behavior, SR-GNN [39] and CaSe4SR [40] took last item’s representation as query vector to attentively represent the global session preference. To recommend POIs in the target time, Shi et al. [41] assigned visited POIs into corresponding time slots and designed a temporal-level attention mechanism to adaptively give different weights to history time slots. Zheng et al. [42] analyzed the contributions of attributes and interaction behaviors using attention mechanism to better represent users and items. Similarly, Ni et al. [43] developed multi-attention-based convolutional neural networks to refine the features of users and items. Inspired by previous attempts, we will leverage attention mechanism to adaptively select valuable check-in information to improve recommendation quality.

3 Preliminaries

Formally, the set of users is defined as \(U = \left\{ {u_{1} ,u_{2} , \ldots ,u_{\left| U \right|} } \right\}\) and the set of POIs is defined as \(L = \left\{ {l_{1} ,l_{2} , \ldots ,l_{\left| L \right|} } \right\}\), where \(\left| U \right|\) and \(\left| L \right|\) are the total number of users and POIs, respectively. Since POIs are spatial entities with specific longitudes and latitudes, each POI \(l\) could be geocoded as \(\left( {{\text{lon}}_{l} ,{\text{lat}}_{l} } \right)\). For each user \(u \in U\), we split his/her long check-in sequence into multiple trajectories \(S = \left\{ {S_{1} ,S_{2} , \ldots ,S_{n} } \right\}\), according to time intervals between consecutive check-ins. Here \(n\) represents the total number of trajectories. Each trajectory \(S_{h} \in S\) is denoted by \(S_{h} = \left\{ {l_{1}^{h} ,l_{2}^{h} , \ldots ,l_{{\left| {S_{h} } \right|}}^{h} } \right\}\), representing a set of POIs visited by \(u\) in chronological order. Key notations and explanations are listed in Table 1.

Table 1 Notations and explanations

Given history trajectories \(S_{his} = \left\{ {S_{1} ,S_{2} , \ldots ,S_{n - 1} } \right\}\) and current trajectory \(S_{n}\) of target user \(u \in U\), the recommendation task is to generate Top-k POIs that \(u\) will probably visit next.

4 The proposed method

In this part, we detail the proposed model MPGI. The illustration of the overall framework is shown in Fig. 2. MPGI consists of three parts: POI correlation modeling layer, long- and short-term preferences modeling layer and prediction layer. Specifically, POI correlation modeling layer is included to capture the geographical distances and interactive correlations between all of POI pairs for learning POI representations. Sequentially, position-aware attention units and attention network are developed to select the most valuable information for better inferring user long- and short-term preferences. Lastly, comprehensive user representation is obtained to calculate the possibilities of POIs being visited next by the target user. Top-k POIs will be selected as the recommendation result.

Fig. 2
figure 2

The framework of the proposed model MPGI

4.1 POI correlation modeling layer

Most studies only capture the transition time and transition distance between consecutive check-ins, neglecting complex correlations between POIs. Consequently, the features of highly correlated POIs could hardly be fused into the target POI, which heavily constrains the expressiveness of POI embeddings. To this end, we take the complex correlations between POIs into consideration. Here, the “complex” property stems from two aspects. First, the geographical distances and interactive correlations are supposed to be considered simultaneously. Second, these two correlations of all of POI pairs, not a part of POI pairs, will be thoroughly captured in this paper. After capturing such correlations via Node2Vec training, similar POIs would have similar representations in specific aspects, laying a foundation for the following user preferences learning.

First, the geographical distance matrix \({\mathbf{D}} \in {\mathbb{R}}^{\left| L \right| \times \left| L \right|}\) is defined as follows:

$$ {\mathbf{D}} = \left[ {\begin{array}{*{20}l} {d_{1,1} } & {d_{1,2} } & \cdots & {d_{1,\left| L \right|} } \\ {d_{2,1} } & {d_{2,2} } & \cdots & {d_{2,\left| L \right|} } \\ \vdots & \vdots & \ddots & \vdots \\ {d_{\left| L \right|,1} } & {d_{\left| L \right|,2} } & \cdots & {d_{\left| L \right|,\left| L \right|} } \\ \end{array} } \right] $$
(1)
$$ \begin{aligned} d_{pq}& = {\text{Haversine}}\left( {{\text{lon}}_{{l_{p} }} ,{\text{lat}}_{{l_{p} }} ,{\text{lon}}_{{l_{q} }} ,{\text{lat}}_{{l_{q} }} } \right) \\ & = 2R{\text{arg\;sin}}\left( {\sqrt {\sin^{2} \left( {\frac{{{\text{lat}}_{{l_{p} }} - {\text{lat}}_{{l_{q} }} }}{2}} \right) + \cos \left( {{\text{lat}}_{{l_{p} }} } \right)\cos \left( {{\text{lat}}_{{l_{q} }} } \right)\sin^{2} \left( {\frac{{{\text{lon}}_{{l_{p} }} - {\text{lon}}_{{l_{q} }} }}{2}} \right)} } \right) \\ \end{aligned} $$
(2)

where geographical distance \(d_{pq}\) between POI \(l_{p}\) and \(l_{q}\) is calculated by Haversine distance. \(R\) denotes the radius of the earth; \({\text{lon}}_{{l_{p} }}\) and \({\text{lat}}_{{l_{p} }}\) are the longitude and latitude of POI \(l_{p}\); \({\text{lon}}_{{l_{q} }}\) and \({\text{lat}}_{{l_{q} }}\) are the longitude and latitude of POI \(l_{q}\).

Haversine distance is widely used in the existing studies [1, 21, 44], which assume that the earth is a sphere and calculate the great-circle distances between POIs along the surface of the sphere. Compared with other metrics such as Euclidean distance, the advantage of Haversine distance is that it takes the earth sphere into consideration to approximately estimate real path distances between POIs. Although users care more about the distance traveled by the actual path between the POIs, the travel distance data are not available. Therefore, we use Haversine distance as calculation method.

Next, the interactive correlation is defined as the collaborative relationship between POIs reflected in overall user check-in behaviors. For example, if two POIs have always been simultaneously visited by users, they are supposed to have high interactive correlations in some aspects. In this way, we consider not only the geographical distances between POIs from the perspective of physical space, but also the interactive correlations observed from all users’ check-in data.

The interactive correlation matrix \({\mathbf{I}} \in {\mathbb{R}}^{\left| L \right| \times \left| L \right|}\) is defined as follows:

$$ {\mathbf{I}} = \left[ {\begin{array}{*{20}c} {i_{1,1} } & {i_{1,2} } & \cdots & {i_{1,\left| L \right|} } \\ {i_{2,1} } & {i_{2,2} } & \cdots & {i_{2,\left| L \right|} } \\ \vdots & \vdots & \ddots & \vdots \\ {i_{\left| L \right|,1} } & {i_{\left| L \right|,2} } & \cdots & {i_{\left| L \right|,\left| L \right|} } \\ \end{array} } \right] $$
(3)
$$ i_{pq} = \frac{{\left| {U_{{l_{p} }} \cap U_{{l_{q} }} } \right|}}{{\left| {U_{{l_{p} }} \cup U_{{l_{q} }} } \right|}} $$
(4)

where the interactive correlation \(i_{pq}\) between POI \(l_{p}\) and \(l_{q}\) is estimated by the Jaccard similarity of the visitor sets \(U_{{l_{p} }}\) and \(U_{{l_{q} }}\).

We define two hyperparameters: the threshold of geographical distances \({{dist}}\) and the threshold of interactive correlations \(\theta\). For any pair of POIs \(l_{p}\) and \(l_{q}\), if \(d_{pq} \le {\text{dist}}\) and \(i_{pq} \ge \theta\) are simultaneously met, \(l_{p}\) and \(l_{q}\) are considered as highly correlated POIs, i.e., \(l_{p}\) is one of the correlation context \(L_{q}^{{{\text{Pos}}}}\) for \(l_{q}\), at the same time, \(l_{q}\) is one of the correlation context \(L_{p}^{{{\text{Pos}}}}\) for \(l_{p}\). By traversing values in the geographical distance matrix and the interactive correlation matrix, the correlation context for each POI is obtained. Then, POIs and their corresponding correlation contexts are connected by edges to form a network. After that, we employ Node2Vec to learn representations of all POIs.

It is worth noting that the higher the interactive correlation value between \(l_{p}\) and \(l_{q}\), the more similarities they share. Therefore, the interactive correlation value \(i_{pq}\) is assigned as the weight of the edge between them. Specifically, when we construct the weighted graph, we have already considered the geographical distances through the condition \(d_{pq} \le {{dist}}\);thus we only take the interactive correlations as the edge weights for simplicity. More complicated method fusing geographical distances and interactive correlations explicitly will also be welcomed. After constructing the weighted graph, more information from the most correlated POIs will be fused to accurately characterize the target POI.

We use negative sampling technique. The embeddings of all POIs are learned by minimizing the loss function defined as follows:

$$ {\text{loss}} = - \sum\limits_{{l_{p} \in L}} {\sum\limits_{{l_{q} \in L_{p}^{{{\text{Pos}}}} }} {\log \left( {\sigma \left( {{\mathbf{e}}_{p} {\mathbf{e}}_{q} } \right) - \frac{1}{r}\sum\limits_{{l_{q}^{{{\text{neg}}}} \in L_{p}^{{{\text{Neg}}}} }} {\sigma \left( {{\mathbf{e}}_{p} {\mathbf{e}}_{q}^{{{\text{neg}}}} } \right)} } \right)} } $$
(5)

where POI \(l_{p}\) is each POI in \(L\), \(l_{q}\) is one of the correlation context \(L_{p}^{{{\text{Pos}}}}\) for \(l_{p}\), \(L_{p}^{{{\text{Neg}}}}\) is a set of \(r\) negative POI samples for \(l_{p}\). \({\mathbf{e}}_{p}\), \({\mathbf{e}}_{q}\) and \({\mathbf{e}}_{q}^{{{\text{neg}}}}\) are POI representations trained by Node2Vec model.

The POI correlation modeling algorithm is depicted in Algorithm 1.

figure a

4.2 Long- and short-term preferences modeling layer

So far, we have obtained POI embedding matrix \({\mathbf{E}} \in {\mathbb{R}}^{\left| L \right| \times d}\) which composes of embeddings of all POIs learned via Node2Vec training. Next, taking the POI embedding matrix as input, we retrieve the embedding representations of check-in records to capture user’s preferences.

Previous studies have shown that it is crucial to model both long- and short-term preferences of users in the task of next POI recommendation. Nevertheless, they failed to select the most valuable check-ins dynamically to get user’s temporal preferences. To this end, for history trajectories \(S_{his} = \left\{ {S_{1} ,S_{2} , \ldots S_{{\left| {n - 1} \right|}} } \right\}\) and current trajectory \(S_{n}\), we design position-aware attention units to capture preference representation of each trajectory. Moreover, an attention network is developed to adaptively select important trajectory representations that conform to user’s short-term demand for long-term preference modeling.

4.2.1 Preference representation of each trajectory

In this paper, user’s long check-in sequence is divided into multiple trajectories based on the threshold of time intervals. This means that user’s behaviors and preferences in each trajectory are homogeneous and can change immediately across different trajectories. Therefore, trajectories are relatively independent from each other, and we will model different trajectories, respectively. Note that the existing literature has demonstrated that extracting preferences from multiple sessions or trajectories would model user’s sequential behaviors more effectively [45, 46]. Next, we employ position-aware attention unit to extract user preferences from each trajectory.

Taking the current trajectory \(S_{n}\) as example, the impact of check-ins in the trajectory increases along with time. In other words, POIs on different positions have different contributions to the trajectory representation. Therefore, we define a position embedding matrix \({\mathbf{P}} = \left[ {{\mathbf{p}}_{1} ,{\mathbf{p}}_{2} , \ldots {\mathbf{p}}_{m} } \right] \in {\mathbb{R}}^{m \times d}\) to indicate the specific position of each POI in the trajectory, where \(m\) is the maximum length of trajectories in the dataset and \(d\) is the dimensionality of position vectors. Inspired by GCE-GNN [47], the last POI can be regarded as user’s current preference, meanwhile, the distance between a POI and the last POI in the sequence contains more meaningful information. For example, given two trajectories \(S^{\prime} = \left\{ {l_{1} ,l_{2} ,l_{4} } \right\}\) and \(S^{\prime\prime} = \left\{ {l_{1} ,l_{2} ,l_{3} ,l_{7} ,l_{8} } \right\}\), the distance between \(l_{2}\) and the last check-in POI is smaller in \(S^{\prime}\) than that in \(S^{\prime\prime}\), which means that \(l_{2}\) is more representative of user’s current preference in \(S^{\prime}\). Therefore, POI \(l_{2}\) is more important for recommendation in \(S^{\prime}\) than in \(S^{\prime\prime}\). Given that, reversed position embeddings are assigned to check-ins in the trajectory. In addition, we map the representation of the last POI to a query vector and map the representation of each POI in the trajectory to the key and value, which is an effective way to remove noise data. Similarly, the preference representations of history trajectories are also captured by the position-aware attention units. Following the design of LSTPM [21], the parameters of attention mechanism in all history trajectories are shared with each other but not shared with that in the current trajectory.

The attention weight of the p-th POI \(l_{p}^{h}\) in trajectory \(S_{h}\) is defined as follows:

$$ \alpha_{p}^{h} = {\mathbf{q}}^{T} \sigma \left( {{\mathbf{W}}_{1} {\mathbf{e}}_{p}^{h} + {\mathbf{W}}_{2} {\mathbf{e}}_{{\left| {S_{h} } \right|}}^{h} + {\mathbf{W}}_{3} {\mathbf{p}}_{{\left| {S_{h} } \right| - p + 1}} + {\mathbf{b}}_{1} } \right) $$
(6)

where \({\mathbf{e}}_{p}^{h}\) and \({\mathbf{e}}_{{\left| {S_{h} } \right|}}^{h}\) are, respectively, representations of \(l_{p}^{h}\) and the last POI \(l_{{\left| {S_{h} } \right|}}^{h}\), \({\mathbf{p}}_{{\left| {S_{h} } \right| - p + 1}}\) is the reversed position embedding for \(l_{p}^{h}\), \(\sigma \left( \cdot \right)\) is the sigmoid activation function. \({\mathbf{W}}_{1} ,{\mathbf{W}}_{2} ,{\mathbf{W}}_{3} \in {\mathbb{R}}^{d \times d}\) and \({\mathbf{q}},{\mathbf{b}}_{1} \in {\mathbb{R}}^{d}\) are trainable parameters.

The importance weight \(\alpha_{p}^{h}\) of POI \(l_{p}^{h}\) is then normalized by the softmax function:

$$ \beta_{p}^{h} = \frac{{\exp \left( {\alpha_{p}^{h} } \right)}}{{\sum\nolimits_{k = 1}^{{\left| {S_{h} } \right|}} {\exp \left( {\alpha_{k}^{h} } \right)} }} $$
(7)

Finally, the preference representation of \(S_{h}\) is obtained from the weighted sum of representations of all POIs in the trajectory:

$$ {\mathbf{s}}_{h} = \sum\limits_{p = 1}^{{\left| {S_{h} } \right|}} {\beta_{p}^{h} } {\mathbf{e}}_{p}^{h} $$
(8)

4.2.2 Long- and short-term preferences representation

User’s short-term preference is extracted from the current trajectory \(S_{n}\), i.e., \({\mathbf{s}}_{{{\text{Short}}}} = {\mathbf{s}}_{n}\). Accordingly, user’s long-term preference is extracted from set of all history trajectories \(S_{{{\text{his}}}} = \left\{ {S_{1} ,S_{2} , \ldots ,S_{n - 1} } \right\}\). After 4.2.1, preference representations of all history trajectories have been obtained, i.e., \({\mathbf{s}}_{{{\text{his}}}} = \left\{ {{\mathbf{s}}_{1} ,{\mathbf{s}}_{2} , \ldots ,{\mathbf{s}}_{n - 1} } \right\}\). We suppose that trajectories that are highly relevant to the current trajectory play important roles in deciding user’s long-term preference [48]. It will inevitably introduce noise that all history trajectories are given the same importance. Therefore, an attention network is introduced to adaptively give relevant history trajectories more weights.

Specifically, we calculate the affinity scores between history trajectories and the current trajectory:

$$ \gamma_{h} = \frac{{\exp \left( {{\mathbf{s}}_{{{\text{Short}}}} {^{T}\mathbf{s}}_{h} } \right)}}{{\sum\nolimits_{j = 1}^{n - 1} {\exp \left( {{\mathbf{s}}_{{{\text{Short}}}} {^{T}\mathbf{s}}_{j} } \right)} }} $$
(9)

We transform \({\mathbf{s}}_{h}\) through a feedforward neural network to get the final trajectory representation. Then, we multiply the final representation by the trajectory’s corresponding weight and sum all multiplication results to obtain user’s long-term preference:

$$ {\mathbf{s}}_{{{\text{Long}}}} = \sum\limits_{h = 1}^{n - 1} {\gamma_{h} \left( {{\mathbf{W}}_{4} {\mathbf{s}}_{h} + {\mathbf{b}}_{2} } \right)} $$
(10)

where \({\mathbf{W}}_{4} \in {\mathbb{R}}^{d \times d}\) and \({\mathbf{b}}_{2} \in {\mathbb{R}}^{d}\) are trainable parameters.

4.3 Prediction layer

In this part, we obtain user’s preference representation by concatenating short-term preference \({\mathbf{s}}_{{{\text{Short}}}}\) and long-term preference \({\mathbf{s}}_{{{\text{Long}}}}\) together. Then, we calculate user’s visiting possibility distribution \({\hat{\mathbf{y}}}\) over all \(\left| L \right|\) POIs:

$$ {\hat{\mathbf{y}}} = {\text{softmax}}\left( {{\mathbf{W}}_{u} \left[ {{\mathbf{s}}_{{{\text{Long}}}} ;{\mathbf{s}}_{{{\text{Short}}}} } \right] + {\mathbf{b}}_{u} } \right) $$
(11)

where \(\left[ { \cdot ; \cdot } \right]\) is the concatenation operation, \({\mathbf{W}}_{u} \in {\mathbb{R}}^{\left| L \right| \times 2d}\) and \({\mathbf{b}}_{u} \in {\mathbb{R}}^{\left| L \right|}\) are trainable parameters. Let \({\hat{\mathbf{y}}}^{{{\text{true}}}} \in {\hat{\mathbf{y}}}\) denote the visiting possibility of the ground-truth POI, then log likelihood function is used to measure the loss of model training:

$$ L = - \sum\limits_{i = 1}^{N} {\log \left( {{\hat{\mathbf{y}}}_{i}^{{{\text{true}}}} } \right)} $$
(12)

where \(N\) represents the total number of training samples and \({\hat{\mathbf{y}}}_{i}^{{{\text{true}}}}\) is the possibility of the ground-truth POI generated by our model in the i-th training sample.

To recommend next POI for user \(u\), we take user’s all check-in trajectories as input and generate visiting possibility distribution \({\hat{\mathbf{y}}}\). POIs that have k largest probability values in \({\hat{\mathbf{y}}}\) are selected as Top-k recommendation list.

4.4 Model size analysis

Firstly, in the POI correlation modeling layer, the POI embedding matrix has the parameter of size \(\left| L \right| \times d\). Secondly, in the long- and short-term preferences modeling layer, the position embedding matrix has the parameter of size \(m \times d\). In each trajectory, the weight matrices \({\mathbf{W}}_{1}\), \({\mathbf{W}}_{2}\) and \({\mathbf{W}}_{3}\) for the attention mechanism are of the same size \(d \times d\), and the parameters of \({\mathbf{q}}\) and \({\mathbf{b}}_{1}\) are of size \(d\). Since MPGI has two sets of trajectory parameters (one for history trajectories and one for the current trajectory), the total parameters of position-aware attention units are \(md + 2\left( {3d^{2} + 2d} \right)\). Besides, as for the attention network, parameter is \(d^{2} + d\). Finally, in the prediction layer, \({\mathbf{W}}_{u}\) of size \(\left| L \right| \times 2d\) and \({\mathbf{b}}_{u}\) of size \(\left| L \right|\) are required for classification. Therefore, the total model size approximates \(\left( {3\left| L \right| + m + 5} \right)d + 7d^{2} + \left| L \right|\). Considering that \(d\) is small number generally not greater than 256, and \(m\) is usually less than 10, the proposed model is fairly light. As will be verified in the following experiments, 160 thousand check-ins in NYC are enough to learn the model with fairly good performance.

5 Experiments

We conduct extensive experiments on two real-world datasets, to demonstrate the effectiveness of the proposed MPGI model. In this section, we first introduce the experimental setup, then report the experimental results and corresponding analysis.

5.1 Experimental setup

5.1.1 Datasets

We evaluate our model on two real-world datasets: the Foursquare check-ins in New York and Tokyo [49] from April 12, 2012, to February 16, 2013. They are, respectively, denoted as NYC and TKY in the following contents. Each original check-in record in the two datasets consists of user ID, POI ID, POI category ID, POI category name, longitude and latitude of the POI and check-in timestamp.

For both datasets, we remove inactive users that have less than 10 check-ins and unpopular POIs visited less than 10 times. Next, for each user, we split the user’s long check-in sequence into multiple trajectories according to the time intervals between consecutive check-ins. In detail, if the time interval exceeds 48 h, two consecutive POIs are divided into different trajectories. Then, we eliminate trajectories that have less than 3 POIs and remove users who have less than 5 trajectories. Thus, the statistics of both datasets after pre-processing are summarized as Table 2.

Table 2 Statistics of datasets after pre-processing

In Table 2, the sparsity of the datasets is calculated by:

$$ {\text{sparsity }} = 1 -\frac{{\#{\text{check}} - {\text{ins}}}}{{\# {\text{users}} \cdot \# {\text{POIs}}}} $$
(13)

where #check-ins, #users and #POIs represent the total number of check-ins, users and POIs, respectively.

Similar to most studies, the first 80% of each user’s trajectories are used as the training set to train the model, and the last 20% are used as the test set to evaluate the model’s performance.

5.1.2 Baselines

We compare MPGI with the following state-of-the-art next POI recommendation models.

LSTM [50]: LSTM is known as a variant of RNN. It solves the problem of vanishing gradient and exploding gradient, thus gains widespread applications in sequential models. Here, we leverage LSTM as a baseline model for next POI recommendation.

TMCA [19]: TMCA embeds spatiotemporal transitions as well as POI categories. It further combines these embeddings with POI representations for next POI recommendation. Here, we remove the POI category because none of other models incorporates it. Especially, TMCA leverages only several check-ins to capture user’s short-term preference for recommendation.

DeepMove [14]: DeepMove models both of user’s long-term and short-term preferences for next POI recommendation. In particular, it learns short-term preference using RNN and aggregates long-term preference from history activities using the attention mechanism.

STAN [51]: STAN is one of the state-of-the-art models. To capture complex relationships between non-consecutive check-ins in the entire check-in sequence, STAN fuses spatiotemporal transition matrices into self-attention and further refines representations of visited POIs.

LSTPM [21]: LSTPM is currently one of the state-of-the-art methods, learning both of the short-term and long-term preferences. Specifically, LSTPM takes spatiotemporal relationships between history check-ins and the latest check-in into consideration and then fuses these relationships together.

5.1.3 Evaluation criterion

To evaluate recommendation performance of different models, we adopt two widely used metrics: Recall@k and Normalized Discounted Cumulative Gain (NDCG@k). Here, k represents the number of recommended POIs. The two metrics measure the performance of recommendation models from different perspectives. Recall@k denotes whether the truly visited POI appears in the recommendation list. NDCG@k is a position-based ranking metric. The value of NDCG@k would be larger if the truly visited POI appears on a higher position of recommendation list. In this paper, we report Recall@k and NDCG@k with \(k \in \left\{ {1,5,10,20} \right\}\).

5.1.4 Parameter settings

The optimal parameters for all models are achieved by experimental analysis or adopting default settings in original papers. As for our model, MPGI is implemented in python with PyTorch deep learning library. On both datasets, the embedding size of POIs d is searched in the range of {32, 64, 128, 256}, and the optimal value is finally determined as d = 128. The optimal setting for the threshold of interactive correlations \(\theta\) is searched in {0.01, 0.03, 0.05, 0.07, 0.09, 0.11}, and the best setting on both datasets is \(\theta\) = 0.05. The optimal threshold of geographical distances on both datasets is dist = 5.

The best settings for hyperparameters in Node2Vec model are as follows: the walk length l and the number of walks r are set as l = 15, r = 20; the return parameter p and in–out parameter q are set as p = 0.2, q = 2; the context size k, the number of iterations iter and the number of negative samplings num are, respectively, set as k = 2, iter = 30, num = 5. As for the learning rate lr of our MPGI model, the initial value is lr = 0.001 and becomes 0.1 times after every 10 epochs. The detailed analysis of representative parameter settings will be introduced in 5.2.3.

5.2 Results and discussions

5.2.1 Methods for comparison

The results of different models for next POI recommendation on both of the NYC and TKY datasets are illustrated in Table 3 and Fig. 3.

Table 3 Performance comparison on NYC and TKY datasets
Fig. 3
figure 3

Performance comparisons on NYC and TKY datasets

Firstly, the proposed MPGI model significantly outperforms all baselines in terms of Recall and NDCG on both datasets. As for the NYC dataset, MPGI achieves the optimal performance on all metrics except for Recall@20. Compared with LSTPM which has the best performance among baseline models, MPGI has an improvement of 6.3%, 4.3% and 4.3% on Recall@1, Recall@5 and Recall@10, respectively. At the same time, MPGI has an improvement of 6.3%, 5.1%, 5.0% and 5.9% on NDCG@1, NDCG@5, NDCG@10 and NDCG@20, respectively. As for the TKY dataset, MPGI consistently outperforms all the baselines in terms of all metrics, where the overall improvement over LSTPM achieves 6.8% on average. Moreover, although STAN shows advantage in terms of Recall@20 on the NYC dataset, it gains poor performance in terms of other metrics. In addition, the metrics we normally focused on for recommendation are Recall@k and NDCG@k with \(k \in \left\{ {1,5,10} \right\}\). Therefore, the experimental results consistently demonstrate the superiority of the proposed MPGI model.

Secondly, we observe that TMCA consistently outperforms LSTM, which implies that spatiotemporal transitions are important for next POI recommendation. Furthermore, DeepMove and LSTPM, both of which simultaneously capture long-term and short-term preferences, achieve higher prediction accuracy than TMCA that only captures user’s short-term preference. This phenomenon highlights the necessity of modeling user’s long-term preference from history check-ins. Moreover, although both of LSTPM and DeepMove try to simultaneously capture long- and short-term preferences, LSTPM outperforms DeepMove. The reason is that DeepMove fails to make full use of important spatiotemporal context. On the contrary, LSTPM not only captures relationships between history check-ins and user’s current status, but also applies a geo-dilated RNN to capture geographical connections between POIs in the current trajectory.

Besides, the performance improvement of MPGI over LSTPM is attributed to the POI correlation modeling layer. LSTPM captures the spatial impacts by incorporating the geographical distances into attention weights of history check-ins. However, apart from geographical distances, MPGI captures interactive correlations to thoroughly model the complex correlations between POIs, and further learns high-quality POI embeddings, thus achieves higher recommendation performance.

5.2.2 Ablation analysis

To verify the effects of key components in MPGI, we implement six degraded versions of MPGI variants by eliminating one component each time in the following ablation study. Here, we divide these ablation versions into two groups for discussions.

  1. (1)

    Ablation analysis of three components of MPGI

In this part, the following variants of the proposed MPGI are constructed to validate the effects of both the long- and short-term preferences of users as well as the POI correlation modeling:

MPGI-WL: the first degraded version by removing the long-term preference modeling part.

MPGI-WS: the second version by removing the short-term preference modeling part.

MPGI-WC: the third version by removing the POI correlation modeling layer. Instead, POI embeddings are randomly initialized similar to most studies.

The experimental results are shown in Fig. 4. We observe that all the variants have poorer performance than the complete MPGI model on both datasets. Among three ablation versions, MPGI-WL performs worst among all models on both datasets. For instance, on the NYC dataset, MPGI outperforms MPGI-WL by 27.6% and 26.7% on Recall@5 and NDCG@5. On the TKY dataset, MPGI outperforms MPGI-WL by 40.6% and 45.2% on Recall@5 and NDCG@5. This observation indicates that user’s long-term preference plays a significant role in recommendations. Therefore, it is important to select the valuable information in user’s long-term check-in trajectories for user preferences modeling. Moreover, except MPGI-WL, MPGI-WS also does not perform well on both datasets, indicating that capturing user’s short-term preference is a necessity in next POI recommendation. In fact, many models such as TMCA only capture user’s short-term preference and could achieve decent performance. Furthermore, the performance of MPGI-WC also suffers a significant decrease, especially on the NYC dataset. It is worth noting that the POI embeddings in MPGI-WC are randomly initialized. So, explicit POI correlation modeling is indispensable for obtaining accurate POI representations and further boost recommendations.

  1. (2)

    Ablation analysis of three key technologies

Fig. 4
figure 4

Performance comparison among four versions of MPGI variants

So far, the effects of subtle components of MPGI, such as the position-aware attention unit, have not been fully explored. To this end, we further remove three key technologies to prove their effectiveness. The ablation versions are described as follows:

MPGI-WP: MPGI that removes position-aware attention units. Instead, the mean pooling is used to capture the preference representation of each trajectory.

MPGI-WA: MPGI that removes the attention network. Instead, mean pooling is applied to all history trajectories to get user long-term preference.

MPGI-WW: In POI correlation modeling layer, we construct an unweighted graph where the different weights of edges are removed.

As shown in Table 4, MPGI has the best performance, demonstrating that all the key components are effective in boosting recommendation. On the NYC dataset, the attention network contributes most to the performance improvement. This verifies that it is inappropriate to assign the same weights to different history trajectories. Therefore, an attention network is needed to adaptively aggregate history trajectories’ representations for long-term preference modeling. On the TKY dataset, however, MPGI-WP gains the most performance drop. This clearly demonstrates that position-aware attention units are able to adaptively select valuable information and avoid noise for effective trajectory representations. Lastly, MPGI-WW also performs worser than MPGI. By assigning different weights to edges, important signals from more correlated POIs are incorporated into the central POI to refine its representation.

Table 4 Validation of three key technologies in MPGI

5.2.3 Sensitivity analysis of parameters

Now we take Recall@5 and NDCG@5 as example to investigate how representative hyperparameters affect the performance of MPGI, including the embedding size of POIs d, the threshold of interactive correlations \(\theta\), and the threshold of geographical distances dist. In the following experiments, a hyperparameter is tested while the settings for the rest parameters are optimal as introduced in 5.1.4.

  1. (1)

    Effect of embedding size d

We evaluate how the embedding size of POIs d impacts the recommendation performance by varying the value of d in {32, 64, 128, 256}. As shown in Fig. 5, on NYC dataset, the performance of MPGI first climbs up and yields the best performance with d = 128, while significantly drops with further increase in d (d = 256). On TKY dataset, the performance first grows dramatically while then improves very slowly as the increase in d (d = 256). The results demonstrate that a larger value of d is able to encode more valuable information to sufficiently capture the characteristics of POIs. However, oversized embeddings are too complicated to describe the POIs, and even over-represent POIs. In this work, we choose the embedding size d = 128 for both datasets.

  1. (2)

    Effect of the threshold of interactive correlations \(\theta\)

Fig. 5
figure 5

Parameter sensitivity of d on Recall@5 and NDCG@5

In POI correlation modeling layer, the threshold of interactive correlations \(\theta\) plays an important role in selecting the correlation context for the target POI. Therefore, we vary values of \(\theta\) in the range of {0.01, 0.03, 0.05, 0.07, 0.09, 0.11} to study its impact on recommendation performance. As shown in Fig. 6, on both datasets, the performance improves with the increase in \(\theta\) from 0.01 to 0.05, since a larger value of \(\theta\) introduces more useful information of highly correlated POIs to enrich the representation of the target POI. The optimal performance of MPGI is obtained when \(\theta\) = 0.05 on both datasets. However, the performance drops with further increase in \(\theta\), as too large value of \(\theta\) will inevitably propagate the information from irrelevant POIs into the target POI, thus introducing noise and misleading the inference of POI features.

  1. (3)

    Effect of the threshold of geographical distances dist

Fig. 6
figure 6

Parameter sensitivity of \(\theta\) on Recall@5 and NDCG@5

The threshold of geographical distances dist is another factor affecting the selection of highly correlated POIs. To this end, we examine the influence of dist by varying dist in the range of {1, 5, 9, 13, 17}. From the results shown in Fig. 7, we note that both Recall@5 and NDCG@5 increase with larger dist, while significantly drop on NYC and slightly drop on TKY after dist reaches the optimal setting dist = 5. Similar to \(\theta\), the value of dist is supposed to be cautiously set to facilitate the effective construction of POI correlation contexts.

Fig. 7
figure 7

Parameter sensitivity of dist on Recall@5 and NDCG@5

5.2.4 Visualization of POI embeddings

Next, we examine whether the POI correlation modeling layer is able to learn effective POI representations. Taking NYC dataset as example, we randomly select 20 users’ visited POIs. Note that each POI is visited by one of 20 users; therefore, each POI is labeled by the corresponding user’s ID. For these selected POIs, we use t-SNE method [52] to visualize their embeddings learned from four models, namely LSTPM, MPGI-WC, MPGI-Correlation and MPGI. Here, MPGI-Correlation represents our degraded model that only contains POI correlation modeling layer. Specifically, we map these POI embeddings as two-dimensional vectors, and visualize them as points with different colors. The visualization results are presented in Fig. 8.

Fig. 8
figure 8

Visualization of POI embeddings trained by different models

As shown in Fig. 8a, for the LSTPM method, POIs visited by the same users are scattered randomly, which indicates that LSTPM fails to explicitly capture the complex correlations between POIs, thus achieving sub-optimal performance. Similarly, as for the MPGI-WC method, the distribution of POIs in Fig. 8b is irregular.

Instead, Fig. 8c verifies the effectiveness of POI correlation modeling: POIs visited by same users are close to each other, tending to form multiple clusters. The clustering phenomenon verifies that POI correlation modeling layer of the proposed model is able to capture correlations between POIs visited by same users. This is reasonable because users tend to visit POIs that have short geographical distances and high interactive correlations in real scenarios. Furthermore, POIs in Fig. 8d also exhibit discernible clustering, while the clustering situation is different from that in Fig. 8c. This is because that POI embeddings are further adjusted in the training process. In fact, the POI correlation modeling layer only captures the general features of POIs, which is irrelevant to the specific next POI recommendation task. Only in the following training process can the proposed model capture user’s temporal preferences and characterize POIs more precisely.

6 Conclusions

In this paper, we propose a novel next POI recommendation framework named MPGI to capture the complex correlations between POIs and further model user dynamic preferences. Specifically, we explore the complex correlations, i.e., geographical distances and interactive correlations between all of POI pairs, to obtain the correlation context for each POI. Then, we construct a weighted graph and apply Node2Vec to learn high-quality POI representations. Corresponding POI representations in user’s trajectories are retrieved from the learned POI embeddings. Furthermore, position-aware attention units and attention network are designed to adaptively select the most valuable information for user’s long- and short-term preferences modeling. Experimental results on two real-world datasets demonstrate that MPGI consistently outperforms all the baselines. Further, six ablation experiments and a visualization study demonstrate the effectiveness of the key components in our model.

For future work, we will enhance MPGI from the following aspects. First, we will employ graph neural networks to implement our idea in an end-to-end framework. Second, we will assign more effective edge weights on the constructed weighted graph. Finally, we consider as a promising direction the exploration of descriptions associated with POIs and users’ comments.