1 Introduction

With the rapid development of mobile internet technology and the widespread use of GPS-enabled devices, Location-Based Social Networks (LBSNs) such as Foursquare and Yelp, become ubiquitous and gain great popularity for users to leave their footprints and share their experience. Accordingly, this results in large amounts of user−location interaction data that contain various kinds of Point-of-Interests (POIs), such as restaurant, museums, shopping malls, parks, and many others. These user-generated content are usually associated with geo-tags. Analyzing such rich data can be beneficial for many downstream applications, for example, building personalized POI recommender systems. The POI recommendation has spurred a significant research interest in both industry and academic communities [1, 13, 31, 46, 54], as it can provide various value-added services, for example, recommending tourists’ vacation rentals (e.g., Airbnb), advertising scenic areas (e.g., TripAdvisor), and promoting experiences (e.g., Mafengwo).

One of the objectives in POI recommendation is to discover the yet-unvisited places of potential interest to users. Unlike typical recommendation tasks (e.g., movie, music and e-commerce item recommendation), POI recommendation exhibits several special characteristics, such as strong spatial-temporal dependence among POIs and the geographical constraints on users. For example, the recommendation of restaurants to users should take into account the geographical location of both users and restaurants. Prior studies have shown that there is a spatial clustering phenomenon in user check-ins, i.e., people prefer to visit POIs close to their home locations. Individual visited POIs tend to cluster together. In addition, the social relationship and visiting time also play important roles in personalized POI recommendation. People prefer to visit places where their friends visited / recommended. People are more likely to visit recreation parks on weekends while go to tech / financial firms often in weekdays. How to incorporate these features into POI recommender systems to better understand the relations among users and POIs has become a trending research topic. Of course, POI recommender systems also confront several challenges commonly rooted in traditional recommender systems, such as data sparsity, where an user usually visits a very small number of locations among millions of POIs in a LBSN. As an example, the density of the user-POI check-in count matrix is about 0.1% [31]; cold start where some users have no visiting history or some (new) POIs have never been visited by any users.

Researchers have proposed various methods to improve the POI recommendation performance by mainly focusing on exploiting different implicit context features embedded in user check-ins. For example, Collaborative Filtering (CF)-like techniques such as Matrix Factorization (MF) [16, 19] are used to predict user rating on POIs through explicit/implicit feedback while taking into consideration various constraints such as social influence [53, 61], temporal features [56, 57], sequential dependence [33, 63], and geographical constraints [15, 22, 26, 28]. All of these methods follow a similar procedure where they first extract latent features underlying interactions between users and POIs, and then predict users’ preference based on the inner product of latent factors. However, they may not fully discover the complicated user-POI interaction from the data, since the inner product combines latent features linearly and limits the expressiveness of the methods [31, 34].

Recent advances in deep learning have inspired efforts on applying various neural networks for discovering non-linear and non-trivial relationships between users and POIs. For example, word2vec [38] has been used for transforming users and POIs to vectorized representations [52], while recurrent neural networks (RNN) are used for learning sequential behavior of user check-ins [10, 37, 66]. To distinguish users’ preference over different POIs, an attention-based model has been proposed in [34] where the denoising autoencoder is employed to capture the geographical influence. Another recent work [71] borrows the idea of adversarial learning from deep generative adversarial neural networks [12] and attempts to improve the POI recommendation performance by exploiting the social influence and geographical information in a reinforcement learning manner. Although existing efforts have shown promising performance improvement and are able to handle non-linear interactions between users and POIs, most of them just integrate the auxiliary information (e.g., POI context, social influence and spatial-temporal characteristics) by transforming from pre-existing features with historical data, and thus fail to encode the high-order structure information and capture users’ potential long-distance interest. All the while, the data sparsity and cold-start problems are still major challenges for existing solutions.

In this work, we propose Hybrid Graph convolutional networks with Multi-head Attention for POI recommendation (called HGMAP), a general and flexible framework that captures user-POI interactions effectively by mining the social influence and geographical attributes with graph-based neural networks. Inspired by the success of graph neural networks (GNNs) [6, 18, 51], we use two independent Graph Convolutional Networks (GCNs) to explicitly incorporate the spatial and social influence aspects of the auxiliary information into our POI recommender system. However, unlike the existing GNN-based recommendation models [9, 39, 43, 47,48,49, 55] that directly employ convolutional layers to exploit interactions between users and items, we alternatively use two independent GCNs to learn the geographical relationship and social influence, respectively. Specifically, we build a POI graph based on the pairwise distance of corresponding POIs with Radius Basis Function (RBF), and learn the geographical relationship and the implicit relations among POIs using a graph neural network. We also model the social networks of users and aggregate feature information of connected users from local neighborhood using another GCN. By recursively propagating the embedding of geographical and social information, HGMAP can conceptually capture the high-order connectivity in an efficient, explicit, and end-to-end manner. In addition, we leverage a multi-head attentive encoder to capture non-linear user-POI interactions while learning the importance of each POI during information retrieval for personalized recommendation. The proposed model has the ability to learn good user and POI representation, and recommend users in an efficient manner due to its inductive learning capability.

Overall, the main contributions of this paper are four-fold:

  • First, we present a novel POI recommender system using hybrid graph neural networks to learn both users’ and POIs’ latent representations, which effectively encode: (1) the social influence and geographical constraints – the most important features in POI recommendation [31]; and (2) the underlying relationship between users and POIs. Moreover, the cold-start problem at both user and POI sides can be largely alleviated by aggregating features from two heterogeneous graphs.

  • Second, we provide a new perspective of incorporating geographical locations into POI recommendation by constructing a POI adjacency graph and learning complex POI relations beyond the Euclidean distance via layered graph convolutions. By doing so, our model can sample neighboring POIs from the graph to augment data for each user while, to an extent, overcoming the data sparsity issue.

  • Third, we introduce a multi-head attention encoder to adaptively compute a preference score for each check-in and obtain user latent preference representation over unvisited POIs. User preference representation and user social representation are used to model the user influence on POI recommendation. In addition, we leverage this preference score combined with POIs’ location representation to learn the influence of checked-in POIs on unvisited POIs. This enables our model to capture non-linear user-POI interactions and nuances between different POIs while bounding the user preference with geographical regularization.

  • Last, we conducted extensive experiments on several large-scale benchmark datasets, i.e., Gowalla, Foursquare and Yelp, demonstrating that HGMAP can significantly improve recommendation performance as compared to state-of-the-art POI recommendation baselines.

The remainder of this paper is organized as follows. We review the relevant works in Section 2. The details of our HGMAP model are presented in Section 3. Experimental evaluations demonstrating the superiority of our model are discussed in Section 4, followed by conclusion and direction for future work in Section 5.

2 Related work

In this section, we review the relevant studies in POI recommendation, the graph learning-based recommender systems, as well as the attention-based recommendation models, and position our work in that context.

2.1 Personalized POI recommendation

POI recommendation (a.k.a. location recommendation or venue recommendation) helps users to discover new POIs of their interest, which can be beneficial to both users and businesses [31]. Collaborative Filtering (CF)-like techniques such as Matrix Factorization (MF) [19], Bayesian probabilistic matrix factorization [42], and Bayesian Personalized Ranking (BPR) [41] are widely used in modern recommendation systems. Previous works on POI recommendation have shown that the contextual information associated with users (e.g., visiting time and social connections) and POIs (e.g., geographical locations) play important roles in enhancing the effectiveness of POI recommendation [22, 24, 26, 28, 53, 56,57,58, 61, 72]. These methods assume that users who have the same check-ins share similar preferences, so they are inclined to visit similar locations in the future, and therefore leveraging these learned latent features of users and POIs to predict user preference to unvisited locations may improve performance. In addition, some studies [24, 26] have shown that check-ins can be treated as implicit feedback, which can be incorporated into MF-based models to improve POI recommendation accuracy, while other research works [30] leverage Probabilistic Factor Models (PFM) [14] to consider auxiliary factors such as geographical influence. A comprehensive survey [31] compared representative CF-based POI recommendation models and summarized that (i) geographical information and social influence are the two most effective factors for capturing user preference; (ii) GeoMF [26] and RankGeoFM [22] exhibit superior performance on POI recommendation over other CF-based methods.

However, the performance of the CF-based recommendation methods often drops significantly when user-POI interactions are extremely sparse. Meanwhile, they cannot be directly used for recommending new POIs that have not been visited by any users or making recommendations to new users who have no visiting records, which is a well-known cold-start problem. More importantly, these latent factor models are inherently linear, which limits their modeling capacity to capture non-linear user-POI interactions. To overcome these issues, a growing body of recent works have applied deep neural networks to the collaborative filtering setting for POI recommendation [13, 32, 34, 37, 40, 52, 56, 64, 71]. For example, recent efforts [37, 52, 56, 64] use POI embedding and recurrent neural networks to capture the check-in context and user sequential visiting behavior. A translation-based POI recommendation framework is proposed [40] to model the relations among users, POIs, and spatial-temporal context, where knowledge graph embedding techniques are used to encode users and POIs in a latent space. Similarly, a denoising autoencoder has been adopted [34] to capture spatial-temporal context and interactions among users and POIs. Adversarial learning [12] has also been employed to learn underlying user preference distribution in [32, 71], unifying the reinforcement learning and matrix factorization methods into an adversarial learning framework for POI recommendation.

Despite their effectiveness and some inspiring results, existing methods are not able to yield optimal recommendation performance, in part due to the data sparsity and cold-start issues in POI recommendation. In addition, the aforementioned methods mainly focus on exploiting deep learning techniques to enhance the interaction function, so as to capture the nonlinear relations between users and POIs, which, however, neither explicitly captures the transitivity property of both users and POIs, nor guarantees the closeness of similar users and POIs in the embedded space.

2.2 GNNs in recommender systems

Graph Neural Networks (GNNs) [6, 18, 45, 51] which aggregate node features from the locally connected neighbors of nodes using deep neural networks, have attracted a considerable attention in recent years due to their effectiveness and remarkable success in various tasks such as graph classification, semi-supervised node classification, traffic forecasting [74], meta-graph learning [67], information cascade prediction [4], network alignment [68] and image segmentation [35]. The main idea of GNNs is to generate graph convolutional layers based on graph spectral theory, and adaptively transform node feature vectors with different neighborhood aggregation and graph-level pooling schemes. Most recently, several works leverage GNN architectures for building recommender systems [9, 39, 43, 47,48,49, 55]. GC-MC [43] first applies the Graph Convolution Networks (GCNs) [18] on the user-item interaction graph. PinSage is an industrial solution that combines random walks and GCNs to generate nodes’ embeddings for a bipartite graph in Pinterest. NGCF [50] exploits the user-item graph by expressively modeling high-order connectivity in user-item interactions with GCNs, which can inject collaborative signals into the procedure of propagating embedding on the graph. Another category of works [47,48,49] try to apply GNNs on knowledge graphs in order to provide additional guidance for recommendations, which relies heavily on external knowledge graphs and manual design of meta-paths/meta-graphs, and thus are hard to implement in practice.

Existing GNN-based recommendation models mainly focus on exploiting the CF signals from user-item interaction graphs. Although the CF effect between users and items can be efficiently captured, they cannot be directly applied to POI recommendation due to the extremely sparse user-POI interactions [8, 31]. In this spirit, our approach is different from existing works in that we sidestep the graph convolutions on a sparse user-POI interaction graph but alternatively capture the implicit user social relationship and POI connections, which not only provides useful information for cold-start users/POIs, but also alleviates the sparsity problem with the constructed geographically adjacent POI graphs.

2.3 Attention mechanism for recommendation

Recently, attention mechanism allows us to learn the importance of specific positions of the input. It, combined with various neural network architectures, has been proven to be effective in many tasks, such as machine translation [2], human mobility learning [11, 72], image retrieval [62], object detection [59], as well as recommender systems [3, 34, 37, 49, 65, 69, 73]. Earlier works [3, 37] utilize vanilla attention vectors to dynamically model the influence of items and learn the interactions between users and items. However, these recommendation models rely on standard attention mechanism and can only capture single aspect of the item importance and linear interactions among items. Nervelessness, users’ preference is too complex to be captured by a single importance vector, while high-order item feature interactions are essential for improving recommendation performance [30].

Multi-head self-attention mechanism [44] is a natural language processing (NLP) model fully relying on self-attention module to learn structures of sentences and complex word representations. It has achieved state-of-the-art performance on a wide range of NLP tasks (e.g., translation, word-embedding, etc.) and inspired a variety of excellent models such as BERT [7] and ALBERT [20]. In this work, we utilize multi-head self-attention to learn users’ multiple-aspects preference over POIs. By projecting POI embedding into multiple subspaces, different interactions between different subspaces can be retrieved to reflect users’ various aspects of preference over POIs. In addition, it helps us better differentiate users that have similar preference while making more personalized recommendations.

3 HGMAP recommendation framework: model and methodology

We now proceed with details of our model HGMAP for POI recommendation. First we define some basic terminology used throughout this paper and formally introduce the POI recommendation problem. Subsequently, we discuss the basic aspects of HGMAP, which consists of four components: two graph convolutional networks, a multi-head attentive encoder and a prediction module (cf. Figure 1). Specifically, we utilize GCNlocation to learn POIs’ location representation from a POI location network constructed based on POI geographic coordinates. In order to get users’ social representation and incorporate social influence and check-in similarity information, GCNsocial is employed on user social network and user-POI interactions. Then we implement a multi-head attentive encoder to learn users’ preference representation and a preference score for every check-in from user-POI interactions. Users’ social representation and users’ preference representation are combined to learn SIP (Social Influence on user Preference) ratings on POIs. Meanwhile, we utilize POIs’ location representation with the preference score to learn GIP (Geographical Influence on user Preference) ratings. Finally, we make predictions based on the learned SIP ratings and GIP ratings. In this section, we also present three loss functions regarding how to optimize the proposed model HGMAP.

Fig. 1
figure 1

Overview of our proposed HGMAP model

3.1 Preliminaries

Definition 1

POI recommendation: Let \(\mathcal {U}=\{u_1,\cdots ,u_m\}\) denote a set of users, \(\mathcal {P}=\{p_1,\cdots ,p_n\}\) be a set of POIs and \(\mathcal {D}=\{d_1,\cdots ,d_n\}\) be a set of corresponding geographical coordinates (latitude and longitude) of POIs. Let ci = [c1,⋯ ,cn] be the POIs that user ui checked in. Given historical check-in information for all m users \(\mathcal {C}=\{\mathbf {c}_1,\cdots ,\mathbf {c}_m\}\), POI recommendation is to recommend a list of POIs for each user that the user might be interested in but never visited.

POI recommendation is commonly studied using an user-POI check-in frequency matrix \(\mathbf {G}\in \mathbb {R}^{m \times n}\) constructed from interaction between m users and n locations. Each element gi,jG represents the number of times that user ui has been to location pj. In this work, we make the user-POI check-in binary matrix B\(\mathbb {R}^{m \times n}\), where each element bi,j ∈{0,1} represents whether user ui has been to location pj. All notations used throughout the paper are listed in Table 1.

Table 1 List of notations

3.2 Learning POI and User Representation via Hybrid GNNs

We now describe how to leverage a variant of GNNs (i.e., GCNs) to learn POI and user representation. We have a connectivity network each for users and POIs used for capturing similarities among users and POIs, respectively. Figure 2 is toy example to show the connectivity for a POI p1 and a user u1.

Fig. 2
figure 2

A toy example showing the connectivity of POI p1 and user u1

3.2.1 Modeling POI location representation

To learn POI representation, we leverage GCNlocation to capture local and global structural information in a network, especially the geographic relations among POIs (e.g., distant or close POIs). Thus, we first construct a POI geographic location network \(\mathcal {G}=(\mathcal {P},\mathbf {A})\), where \(\mathcal {P}=\{p_{1},\cdots ,p_{n}\}\) represents a set of POIs and \(\mathbf {A} \in \mathbb {R}^{n \times n}\) is a sparse adjacency matrix and ai,j denotes the location similarity for a pair of POIs pi and pj. In this study, we choose a Gaussian Radial Basis Function (RBF) kernel to measure the location similarity ai,j ∈ [0,1] for POI pi and POI pj, as follows:

$$ a_{i,j}=\exp(-\eta \parallel d_{i}-d_{j}\parallel^{2}), $$
(1)

where di and dj are the geographic coordinates of two POIs pi and pj, and η > 0 is a hyper-parameter to control the level of geographical relevance between two given POIs. A larger value of ai,j indicates two POIs’ geo-locations are closer. For the purpose of simplicity, we set ai,j = 0 if it is less than a threshold value λ (i.e., λ = 0.125).

For GCNlocation with K layers, we take the location similarity matrix A as an input to the first layer:

$$ \mathbf{H}^{(0)}=\mathbf{A}, $$
(2)

The multi-layer GCNlocation follow the layer-wise propagation rule. Let S denote the normalized adjacency matrix:

$$ \mathbf{S}=\tilde{\mathbf{D}}^{-\frac{1}{2}} \tilde{\mathbf{A}} \tilde{\mathbf{D}}^{-\frac{1}{2}}, $$
(3)

where \(\tilde {\mathbf {A}}\) = A + I is the adjecency matrix of the graph \(\mathcal {G}\) with added self-connections, I is the identity matrix and \(\tilde {\mathbf {D}}\) is the degree matrix of \(\tilde {\mathbf {A}}\). The representation update of all location nodes becomes a simple sparse matrix multiplication:

$$ \overline{\mathbf{H}}^{(k)} \leftarrow \mathbf{S H}^{(k-1)}, $$
(4)

We adopt ReLU, which is a non-linear activation function, to optimize each layer. The updating rule of the k-th layer is as follows:

$$ \mathbf{H}^{(k)} \leftarrow \text{ReLU} \left (\overline{\mathbf{H}}^{(k) } {\Theta}^{(k)} \right) $$
(5)

Following [51], our GCNlocation module is a 2-layer simple graph convolution (SGC), which is a variant of GCNs and can compute more efficiently with significantly fewer parameters than traditional GCNs. The Kth layer output H(K) is considered as the final location representation \(\mathbf {P}=[\mathbf {p}_{1},\cdots ,\mathbf {p}_{n}]^{\top } \in \mathbb {R}^{n\times L} \) and L is the latent dimension of POI location representation.

3.2.2 Modeling user social representation

To learn user representation, we take a user social network and social similarity as input of GCNsocial with multiple layers. The user social network is defined as \(\mathcal {G}^{*} =(\mathcal {U},\mathbf {A}^{*})\), where \(\mathcal {U}\) represents a set of users \(\left \{ {u_{1}, . . . , u_{m}} \right \}\), A\(\mathbb {R}^{m \times m}\) is a sparse adjacency matrix, and \(a^{*}_{i,j}\) is the edge weight between users ui and uj, representing how close two users are.

Based on the idea of collaborative filtering, user preference can be discovered by aggregating the behavior from similar users [53], which cannot be fully achieved by traditional GCNs where the edge weight is binary \(a^{*}_{i,j} \in \{0,1\}\). In order to understand the relationship among users, we make weights continuous \(a^{*}_{i,j} \in [0,1]\) and compute them by incorporating two semantic information: check-in and friendship. Let \(\mathcal {F}\) and \(\mathcal {R}\) denote users’ friend set and users’ check-in set in a LBSN. Then the edge weight \(a^{*}_{i,j}\) between user ui and user uj is calculated as follows if they are not friends:

$$ a^{*}_{i,j}= \frac{\left| \mathcal{R}_{i} \cap \mathcal{R}_{j}\right|}{\left| \mathcal{R}_{i} \cup \mathcal{R}_{j}\right|} $$
(6)

If user ui and uj are friends, \(a^{*}_{i,j}\) is:

$$ a^{*}_{i, j}=\upbeta \cdot \frac{\left|\mathcal{F}_{i} \cap \mathcal{F}_{j}\right|}{\left|\mathcal{F}_{i} \cup \mathcal{F}_{j}\right|}+(1-\upbeta) \cdot \frac{\left|\mathcal{R}_{i} \cap \mathcal{R}_{j}\right|}{\left|\mathcal{R}_{i} \cup \mathcal{R}_{j}\right|} $$
(7)

where β > 0 is a tunable hyper-parameter with a range of [0,1] that is used to balance the relative weight of friend circle similarity and user visiting similarity. We denote the output of GCNsocial \(\mathbf {U}=[\mathbf {u}_{1},\cdots ,\mathbf {u}_{m}]^{\top } \in \mathbb {R}^{m \times L}\) as social representation for all m users. L is the dimension of the representation. Note that there is no social relationship for the Foursquare data where we only incorporate the check-in information.

3.3 User preference learning with multi-head attention

We now have obtained user representation and POI representation. Since our goal is to efficiently and comprehensively learn user preference over different POIs, it is requisite to measure the relevance between users and POIs while capturing the joint effect on user-POI interactions. Recently, attention mechanism has been widely used for recommender systems [3, 34, 37, 49]. For example, SAE-NAD [34] exploited the self-attentive autoencoders to learn complex user preference for POIs. However, the standard attention mechanism usually assigns a single importance value to a POI, which makes the model focus on only one (latent) aspect of POIs. This is not sufficient to reflect the sophisticated human sentiment on POIs [30]. Particularly, some important (latent) aspects of POIs that might directly or indirectly influence user preference are missed.

The above-mentioned evidence inspires us to learn various aspects of user preference, which, through assigning multiple scores to each POI that user has visited, allow us to model the dependencies and importance of long-short term POI interactions. Towards this goal, we adopt multi-head self-attention [44] to effectively capture high-order interactions between POIs and retrieve the multi-aspect preference of users over POIs.

Technically, we first obtain a POI embedding matrix denoted by W(1)\(\mathbb {R}^{L \times n}\). It is also the weight matrix of the embedding layer. Then, we utilize the multi-head attention mechanism with h attention heads to learn the preference over visited POIs for each user. The h attention matrices are:

$$ \mathbf{T}= \left[\mathbf{t}_{1},\cdots,\mathbf{t}_{h}\right]^{\top} $$
(8)

where \(\mathbf {T} \in \mathbb {R}^{R \times L}\) and \(\mathbf {t}_{1} \in \mathbb {R}^{[R/h] \times L}\) represents the 1st attention head that learns user preference on POIs on some dimensions, i.e., traffic, food and scenery. \(\mathbf {t}_{2} \in \mathbb {R}^{[R/h] \times L}\) learns preference on different dimensions, and so on. R is the latent dimension of preferences.

ci = [c1,⋯ ,cn] is a binary vector representing the set of check-in POIs for user ui, where cj(1 ≤ jn) is 1 if user ui has visited POI pj and 0 otherwise. We utilize ci and W(1) to get check-in POI representation of user ui.

$$ \hat{\mathbf{o}}_{i}=\left[c_{1}\mathbf{W}^{(1)}_{(*,1)}, \cdots, {c_{n}} \mathbf{W}^{(1)}_{(*,n)}\right] $$
(9)

where \(\mathbf {W}^{(1)}_{(*,n)}\) is the nth column of W(1) and is the representation of the nth POI. Note that \(\hat {\mathbf {o}}_{i} \in \mathbb {R}^{L \times n}\) might have some zero columns. We delete them and get \(\mathbf {o}_{i} \in \mathbb {R}^{L \times n_{i}}\), where ni is the number of check-in POIs of user ui and is the same as the number of non-zero columns in \(\hat {\mathbf {o}}_{i}\). The set of check-in POI representaion is denoted as O = {o1,⋯ ,om}.

Then, we learn the user ui preference using h attention heads T and check-in POI representation O:

$$ \begin{cases} \mathbf{s}_{r}=\text{softmax}(\tanh(\mathbf{t}_{r}\cdot\mathbf{o}_{i})), \text{r}=1,\cdots,\text{h} \\ \mathbf{Score}= [\mathbf{s}_{1}, {\cdots} , \mathbf{s}_{h}]^{\top} \end{cases} $$
(10)

where \(\mathbf {Score} \in \mathbb {R}^{R \times n_{i}}\) is the user preference score matrix. Lastly, the preference of user ui over check-in POIs can be computed as follows:

$$ \mathbf{v}_{i}= \mathbf{Score} \cdot {\mathbf{o}_{i}}^{\top} $$
(11)

where \(\mathbf {v}_{i} \in \mathbb {R}^{R \times L}\) denotes a preference representation of user ui. We use \(\mathbf {V}=[\mathbf {v}_{1},\cdots ,\mathbf {v}_{m}]^{\top } \in \mathbb {R}^{m \times R \times L}\) to denote the preference representation of all m users.

3.4 Prediction module

POI recommendation in LBSNs is different from other recommendation tasks [34] in that there exist physical distances between users and POIs, and such an unique property spurs a well-known geographical clustering phenomenon – users usually appear in several specific areas and prefer to visit unvisited POIs that are around their checked-in POIs. Incorporating such a property is likely to improve the POI recommendation performance [23, 26, 34]. According to this clustering phenomenon, we speculate that check-in POIs of each user may affect other unvisited POIs with respect to geographic locations. Different from most of the previous studies that mainly exploit geographical influence from the perspective of POIs, our model combines user preference and geographical influence from both users and POIs. Specifically, we construct two ratings: (1) GIP (Geographical Influence on user Preference) rating with location influence and preference influence included; (2) SIP (Social Influence on user Preference) rating with social relationship and user preference considered.

  1. (1)

    From the perspective of POI geographic location, we first get check-in POI representation of user ui from ci = [c1,⋯ ,cn] and POI location representation P.

    $$ {\hat{\mathbf{j}_{i}}=[c_{1}\mathbf{P}_{(*,1)}, \cdots, {c_{n}} \mathbf{P}_{(*,n)}] } $$
    (12)

    where P(∗,n) is the nth column of P and represents the nth POI location representation. Note that \(\hat {\mathbf {j}}_{i} \in \mathbb {R}^{L\times n}\) might have some zero columns. We also delete thems in \(\hat {\mathbf {j}_{i}}\) and get \(\mathbf {j}_{i} \in \mathbb {R}^{L \times n_{i}}\). ni, the number of check-in POIs of user ui, is the same as the number of non-zero columns in \(\hat {\mathbf {j}}_{i}\). We leverage check-in POI representation ji to compute the influence of check-in POI on unvisited POIs and incorporate the influence of user preference into the geographical influence as follows:

    $$ \mathbf{f}_{i} = \text{sum}({\mathbf{Score} \cdot (\mathbf{j}_{i}}^{\top} \cdot \mathbf{W}^{(4)})) $$
    (13)

    where sum is an addition function that adds elements by row. \(\mathbf {Score} \in \mathbb {R}^{R \times n_{i}}\) denotes the user preference matrix, \({\mathbf {j}_{i}}^{\top } \in \mathbb {R}^{n_{i} \times L}\) and \(\mathbf {W}^{(4)} \in \mathbb {R}^{L \times n}\) is the parameter matrix in the MLP. Each fi\(\mathbb {R}^{1 \times n}\) denotes GIP rating vector of user ui and F = [f1,⋯ ,fm]\(\mathbb {R}^{m \times n}\) denotes the all users’ GIP ratings.

  2. (2)

    From the perspective of the user, we leverage user preference representation \(\mathbf {v}_{i} \in \mathbb {R}^{R \times L}\) of user-POI interactions, combined with user social representation ui\(\mathbb {R}^{1 \times L}\), to compute a rating vector of users on POIs as follows:

    $$ \begin{cases} {\mathbf{z}_{i}=\mathbf{w}_{a} \cdot \text{Concat}(\mathbf{v}_{i},\mathbf{u}_{i}}) \\ { \mathbf{e}_{i}=\text{MLP}(\mathbf{z}_{i})} \end{cases} $$
    (14)

    where wa\(\mathbb {R}^{(R+1)}\) is the parameter vector of the aggregation layer. We use zi as the input of MLP to get a SIP rating vector \(\mathbf {e}_{i} \in \mathbb {R}^{1 \times n}\) of user ui for all POIs. W(2)\(\mathbb {R}^{L \times D}\), W(3)\(\mathbb {R}^{D \times L}\) and W(4)\(\mathbb {R}^{L \times n}\) are the parameter matrices of the MLP. D is the latent dimension of hidden layer. \(\mathbf {Z}=[\mathbf {z}_{1},\cdots ,\mathbf {z}_{m}]^{\top } \in \mathbb {R}^{m \times L}\) and \(\mathbf {E}=[\mathbf {e}_{1},\cdots ,\mathbf {e}_{m}]^{\top } \in \mathbb {R}^{m \times n}\) denotes all users’ SIP ratings for POIs.

  3. (3)

    Finally, we combine the GIP rating fi and the SIP rating ei to get a final rating \(\hat {\mathbf {y}_{i}}\), which is used to recommend a list of POIs for user ui.

    $$ \hat{\mathbf{y}}_{i}=\text{sigmoid}(\mathbf{f}_{i} + \mathbf{e}_{i}) $$
    (15)

    where ei captures user ui’s preference from user-POI interactions and social influence, and fi models the influence of geographic location and preference influence. sigmoid is a activation function and \(\hat {\mathbf {Y}}=[\hat {\mathbf {y}}_{1}, {\cdots } ,\hat {\mathbf {y}}_{m}]^{\top } \in \mathbb {R}^{m \times n}\) denotes the predicted ratings for all m users.

figure a

3.5 Optimization

We now turn towards optimizing three components: two GCNs and HGMAP. To do so, we first need to define the objective loss function of each and the overall. The training processes are summarized in algorithm 1 and algorithm 2.

  1. (1)

    For the GCNs learning POI location representation (denoted by GCNlocation), we utilize the Cross Entropy loss to capture both POIs’ location similarity and POIs’ geographic location representation.

    $$ \begin{cases} \mathbf{X_{1}}=\mathbf{P} \cdot \mathbf{P}^{\top} \\ {\mathcal{L}_{\text{GCN}_{\text{location}}} = {{\sum}_{i=1}^{n}} - \left[\mathbf{A} \log(\mathbf{X_{1}})+(1-\mathbf{A})\log(1-\mathbf{X_{1}}) \right]} \end{cases} $$
    (16)

    where \(\mathbf {A} \in \mathbb {R}^{n \times n}\) is the location similarity matrix and \(\mathbf {P} \in \mathbb {R}^{n \times L}\) is the location representation.

  2. (2)

    For the GCNs learning user social representation (denoted by GCNsocial), we incorporate users’ social similarity and users’ social representation into the loss function.

    $$ \begin{cases} \mathbf{X_{2}}=\mathbf{U} \cdot \mathbf{U}^{\top} \\ {\mathcal{L}_{\text{GCN}_{\text{social}}}={\sum}_{i=1}^{m}-\left[\mathbf{A}^{*} \log(\mathbf{X_{2}})+(1-\mathbf{A}^{*})\log(1-\mathbf{X_{2}}) \right]} \end{cases} $$
    (17)

    where \(\mathbf {A}^{*} \in \mathbb {R}^{m \times m}\) is the social similarity matrix and \(\mathbf {U} \in \mathbb {R}^{m \times L}\) is the social representation. During the GCNs training process, we take 5,000 POI locations (or 5,000 users) in each batch to calculate their corresponding representation.

  3. (3)

    Following prior work [34], the Mean Square Error (MSE) loss is commonly used to optimize MLP. In this study, we leverage a general weighting scheme [16] to distinguish visited and unvisited POIs, where we provide a confidence level for each POI [34] to tackle the One Class Collaborative Filtering (OCCF) problem. \(\mathbf {Q} \in \mathbb {R}^{m \times n}\) denotes the confidence matrix and is computed using the observed check-in frequency matrix \(\mathbf {G} \in \mathbb {R}^{m \times n}\). This can calculate loss values more accurately and optimize our model better.

    $$ q_{i, j}=\begin{cases} { \log \left( 1+g_{i, j}/\xi \right)} & {\text { if } g_{i, j}>0} \\ {1} & {\text { otherwise }} \end{cases} $$
    (18)

    where ξ is a hyper-parameter. The objective function \({\mathscr{L}}_{\text {HGMAP}}\) for optimizing MLP is to measure the discrepancy between predicted value \(\hat {\mathbf {Y}}\) and ground-truth value Y.

$$ \begin{array}{@{}rcl@{}} \mathcal{L}_{\text{HGMAP}}&=&\sum\limits_{i=1}^{m} \sum\limits_{j=1}^{n}\left\|q_{i, j}\left( \mathbf{y}_{i, j}- \hat{\mathbf{y}}_{i, j}\right)\right\|_{2}^{2} + \gamma(\|\mathbf{W}^{(*)}\|_{F}^{2}+\|\mathbf{w}_{a}\|_{2}^{2}) \\ &=& \|\mathbf{Q} \otimes (\mathbf{Y}-\hat{\mathbf{Y}})\|_{F}^{2} + \gamma(\|\mathbf{W}^{(*)}\|_{F}^{2}+\|\mathbf{w}_{a}\|_{2}^{2}) \end{array} $$
(19)

where ⊗ is the element-wise multiplication and ∥⋅∥F is the Frobenius Norm. γ is the regularization parameter and W(∗) includes W(1), W(2), W(3) and W(4). W(1) is the parameter matrix of the embedding layer and wa is the learned parameter vector in the aggregation layer. W(2), W(3) and W(4) are the parameter matrices of the MLP. We leverage Adam [17] to automatically adjust the learning rate during learning.

figure b

4 Experiments

In this section, we report observations from experiments conducted on three real-world datasets to quantitatively address the following questions:

  • Q1. How does HGMAP perform compared with the state-of-the-art POI recommendation models?

  • Q2. How do the hybrid GCNs in HGMAP affect the recommendation performance?

  • Q3. How do the key hyper-parameters affect HGMAP’s performance?

  • Q4. Can HGMAP provide reasonable interpretability regarding user preference towards POIs?

4.1 Dataset and Evaluation Metric

To evaluate the effectiveness of HGMAP, we conducted experiments on three benchmark LBSN datasets, including:

  • Yelp dataset. It is obtained from the Yelp challenge.Footnote 1 This dataset does not provide the exact check-in times but coarse check-in dates.

  • Gowalla dataset. It is a widely used for POI recommendation and was collected between February 2009 and October 2010.

  • Foursquare dataset. It is collected between April 2012 to September 2013 within the mainland of United States. Note that this data does not have social information, thus we do not model the social influence for this data.

Following the settings in [31, 34], we filter out those users with fewer than 20 check-in POIs and those POIs with fewer than 20 visitors for the Gowalla dataset. For Foursquare and Yelp datasets, we discard those users with fewer than 10 check-in POIs and those POIs with fewer than 10 visitors. We also partition each dataset into training set and test set. For example, for each user, we randomly select the 80% check-ins into the training, and treat the remaining as the testing. The descriptive statistics of three datasets after pre-processing are described in Table 2, from which we can see that they are all extremely sparse, i.e., the frequency of most POIs being visited is about 0.1%.

Table 2 Descriptive statistics of three datasets

Similar to previous works [31, 34] , we use three standard metrics, i.e., precision (P@k), recall (R@k) and mean average precision(M@k), to evaluate models. P@k is the percentage of locations that are visited by user in the top-k recommended locations. R@k indicates the ratio of recovered POIs to visited locations and M@k considers the rank of recommendations by assigning higher score to hits at higher positions.

4.2 Baselines

We conduct extensive comparisons to the following 12 state-of-the-art POI recommendation models:

  • MGMMF [5] is a multi-center Gaussian model fused with matrix factorization, taking into account social influence and incorporating multi-center geographical influence into the fused framework. The main idea is based on the observation that a user tends to check-in around several geographical centers.

  • BPRMF [41] is a Bayesian personalized ranking with matrix factorization method. It adopts a generic optimization criterion and models the implicit feedback to recommend top-N items. Note that BPRMF only focuses on user preference modeling, without utilizing any context information.

  • WRMF [16] is a weighted regularized matrix factorization model. It couples the estimate of user preference to items with a confidence level based on matrix factorization while minimizing the square mean error. It assigns both observed and unobserved check-ins with different confidence values.

  • IRenMF [29] is based on weighted matrix factorization and incorporates the geographical characteristics of neighboring POIs in both individual level (i.e., user has similar preference on neighboring POIs) and region level (i.e., POIs that are geographically close may share similar user preference) into the model.

  • GeoMF [26] is a state-of-the-art MF-based POI recommendation model based on weighted matrix factorization. It considers check-ins as an implicit feedback and incorporates geographical influence by fitting nonzero check-ins with large weights and zero check-ins with smaller weights.

  • RankGeoFM [22] is a ranking based geographical factorization method that incorporates the geographical influence of neighboring POIs to learn user preference rankings for POIs. It uses another latent matrix to represent user geographical preference, in addition to user preference matrix.

  • PACE [52] is a deep neural architecture based on user preference and context embedding with representation methods [38]. It is a general semi-supervised learning framework that jointly models social influence and user trajectory behavior to predict both user preference over POIs and various context associated with users and POIs.

  • SAE-NAD [34] is an attention-based POI recommendation model consisting of a self-attentive encoder and a neighbor-aware decoder. It uses a self-attentive encoder to differentiate the user preference, and adopts the neighbor-aware decoder to model the geographical influence of POIs.

  • STGN [64] is a Spatio-Temporal Gated Network towards enhancing long-short term memory of the sequential visiting behavior learning. It uses coupled gates, i.e., time gate and distance gate, to capture the spatial-temporal relationship among successive check-ins.

  • APOIR [71] is the first adversarial learning-based POI recommendation model. It consists of two parts, a recommender and a discriminator, which are jointly trained for learning user preference by playing a minimax game considering geographical influence and social relation as rewards in a reinforcement learning manner.

  • Geo-ALM [32] is a geographical information based adversarial learning model which is very similar to APOIR, except that Geo-ALM directly fuses geographical features (both POI features and region features) and uses generative adversarial networks [12] without explicitly considering the social influence.

  • NGCF [50], Neural Graph Collaborative Filtering, is the most recent item-based recommendation model built upon graph convolutional networks. NGCF only focuses on convolutional operations on user-item interactions while HGMAP learns additional information from both the user side and the POI side.

4.3 Parameter setting

We implement our HGMAP with Pytorch on a machine with NVIDIA GeForce GTX 1080Ti. In our experiments, the latent dimension L of both users’ social representation and POIs’ location representation is set to 200. For two GCNs, the minimum value λ regarding user similarity and location similarity are both set to 0.125 unless otherwise specified. The geographical relevance level η is set to 60 in GCNlocation and the parameter β used for balances the importance of friend circle similarity and user visiting similarity is set to 0.3. The latent dimension R of the user preference vector and the number of attention heads are set to 36 and 6. The batch size of HGMAP is set to 256. The learning rate and regularization parameter γ are set to 0.001 and 0.001 respectively. We set the architecture of two-layer GCNsocial as [m, 3000, 200]. GCNlocation with two-layer has architecture as [n, 3000, 200]. m and n are the number of users and POIs in the input layer, respectively. For three datasets, we use an embedding layer and a 3-layer MLP as [200, n] and [200, 50, 200, n].

4.4 Performance comparison (Q1)

Tables 34 and 5 illustrate the performance of HGMAP in comparison to the existing state-of-the-art POI recommendation models for top-K POI recommendation on Gowalla, Foursquare and Yelp, respectively. A pair ttest is performed and the results are statistically significant (p < 0.005). By scrutinizing the results, we can make the following observations:

  1. (O1):

    General MF-based models, such as WRMF and BPRMF, achieve poor performance on three datasets, because they ignore the context information, e.g., social influence and geographical constraints. Meanwhile, simply incorporating geographical clustering phenomena of check-ins (e.g., MGMMF) does not perform well, since it fails to overlook the fine-grained POI-level context. In contrast, geographical MF-based implicit ranking methods, such as IRenMF, GeoMF and RankGeoFM, perform relatively well, which indicates that modeling user check-ins as implicit feedback is more appropriate in POI recommendation and that geographical influence is the most important factor for POI recommendation.x

  2. (O2):

    Compared to MF-based models, neural networks-based methods, including HGMAP, exhibit better performance. This demonstrates the importance of non-linear feature interactions between users and POI embeddings. In other words, the inner product in MF-based methods is insufficient to capture the complex interactions between users and POIs.

  3. (O3):

    Among the deep recommendation models, PACE does exhibit the performance as expected, because it only learns the shallow embedding of users and POIs, while the collaborative filtering signals are not fully exploited. Similarly, STGN, mainly focusing on sequential check-in behavior of users, does not show expected performance. The possible reason is that STGN fails to explicitly explore the important interactions between users and POIs, as well as other user and POI contexts, e.g., social influence and POI-level neighboring information.

  4. (O4):

    Furthermore, SAE-NAD shows good performance on POI recommendation, mainly because it captures the non-linear interactions between users and POIs with deep autoencoder and attention mechanism. However, it ignores the social influence, as well as the high-order connectivity among POIs. In addition, two adversarial POI recommendation models, APOIR and Geo-ALM, generally achieve better performance than SAE-NAD, due to their high-quality negative sampling and capability of general user preference learning. The slight improvement of APOIR over Geo-ALM indicates the effectiveness of social influence modeling in APOIR.

  5. (O5):

    Our HGMAP consistently yields the best performance across all datasets. For example, HGMAP improves over the second best baseline w.r.t. R@10 by 7.2%, 3.8% and 10% on Gowalla, Foursquare and Yelp datasets, respectively. Compared to APOIR and SAE-NAD – two representative non-linear interaction learning methods – HGMAP explicitly models the POI adjacent graph by propagating the connectivity over the graph. Note that although SAE-NAD considers the POI distance, it neither learns high-order connectivity among POIs, nor does it incorporate the social influence. This result also demonstrates the effectiveness of our graph convolutions on both social graph and POI graph.

  6. (O6):

    Lastly, we note that NGCF does not perform well on POI recommendation, although it adopts the graph convolution for non-linear user-POI interaction learning. The performance gain of HGMAP over NGCF demonstrates the effectiveness of social influence learning in HGMAP. Moreover, our method does not learn the collaborative interactions via graph neural networks – which is the case of NGCF, but instead applies graph learning on social relationship and POI neighboring connection. This result also provides another perspective of incorporating graph neural networks into recommender systems. Due to the extremely sparse check-ins, the collaborative signal, arguably, cannot be effectively captured only by graph neural networks.

Table 3 Performance comparison between HGMAP and baselines on the Gowalla dataset
Table 4 Performance comparison between HGMAP and baselines on the foursquare dataset
Table 5 Performance comparison between HGMAP and baselines on the Yelp dataset

4.5 Ablation study (Q2)

To investigate the impact of social influence and geographical constraints, we conducted an ablation study by comparing to three variants of HGMAP. In particular: the first variant HGMAP-I is formed by disabling the graph convolutional networks modeling social influence – note that there are no social relationship in the Foursquare data; the second variant HGMAP-II replaces the POI adjacent graph neural networks with a simple distance matrix, as used in [34]; the third variant HGMAP-III replaces the multi-head attention module in HGMAP with another GCN, which propagates the user interest over POIs in the user-POI interaction graph, similar to the GCN used in NGCF [50]. We summarize the experimental results in Table 6, from which we have the following findings:

  1. (F1):

    The discrepancy between HGMAP and HGMAP-I implies the effectiveness of social influence, which makes sense since social relationship plays an important role in (POI) recommendation [31, 60], especially for those cold-start users who have less and even no check-in records. This result also explains that why those deep recommendation methods, such as PACE, STGN, Geo-ALM and NGCF, do not perform well. Note that there exist many social graph learning models such as DeepWalk and node2vec that explore the local connectivity among nodes. However, these methods mainly focus on preserving the local structure, therefore ignoring high-order connectivity among nodes.

  2. (F2):

    Compared to HGMAP-II, HGMAP yields remarkable improvements, which demonstrates the effectiveness of the proposed POI graph neural networks in HGMAP. It is commonly acknowledged that geographical influence is one of the most important factors in POI recommendation [22, 26, 31]. However, existing methods vary significantly from each other on how to incorporate this constraint. While earlier efforts have incorporated the geographical information into MF which are limited by the non-linear interactions of inner product, the recent deep learning-based methods either simply compute the POI distance [34] or model it as a reward function [71] – both of which are not sufficient to capture the implicit connections and possible patterns among POIs. In contrast, HGMAP explicitly learns the relationship from the POI graph, which not only captures meaningful but non-existing check-in behavior of users, but also provides a way of augmenting the sampling data by propagating the information on the POI graph. In this vein, it can be considered as a LBSN data augmentation to alleviate the sparse check-in problem [70].

  3. (F3):

    Moreover, HGMAP-III does not show comparable performance even with another graph convolution on user-POI interactions. This result proves our conjecture that the sparse check-in problem in LBSN dataset renders the graph collaborative filtering method inapplicable for capturing user-POI interactions. The reason behind this phenomena can be understood intuitively. That is, aggregating the embeddings of the interactions between users and POIs would be largely hindered for users with few check-ins or POIs with few visitors. Therefore, the collaborative signals would be easily “blocked” for cold-start users and/or POIs when embedding propagation, which could be further aggravated by stacking multiple layers of graph convolutions for sparse check-in data.

Table 6 Ablation study of HGMAP

4.6 Sensitivity of parameters (Q3)

Now we investigate several important parameters of HGMAP, i.e., the number of attention heads h and the parameter λ which is the threshold value of user and POI similarity in GCNs.

Effect of h:

HGMAP adapts a multi-head self-attention mechanism to capture the multi-aspects of user-POI interactions. Figure 3a, b and c plot the influence of the number of heads, where we can observe that 4 or 6 heads are enough for our model to achieve good performance.

Fig. 3
figure 3

Impact of number of heads h

Effect of λ

: Parameter λ specifies the lower bound value of identifying similar users and POIs, below which the similarity between two users (or POIs) is to 0, i.e., the lower the value, the more non-zero similarity scores, and therefore more computation required in the model. Figure 4a, b and c show the effect of λ, which indicates that HGMAP attains the best performance when λ = 0.125. Note that it is better to distinguish this hyper-parameter for users and POIs. However, we found that the difference is very nuance in our experiments.

Fig. 4
figure 4

Impact of similarity lower bound value λ

Effect of η

: Parameter η is used to control the geographical relevance level between POIs in GCNlocation, which can be used to jointly capture both POIs’ location similarity and POIs’ geographic location representation. Figure 5a, b and c show the impact of η on three datasets, which indicates that HGMAP attains the best performance when η is within the range of [60-80].

Fig. 5
figure 5

Impact of geographical relevance level control value η

Effect of β

: Parameter β balances the relative weight of friend circle similarity and user visiting similarity in GCNsocial. Figure 6a and b reveal the influence of β on model performance. Clearly, HGMAP achieves the best performance when β= 0.3. This demonstrates that visiting similarity of users has a higher influence score than friend circle similarity for modeling user presentation. Note that foursquare dataset has no user social information.

Fig. 6
figure 6

Impact of relative weight value β between user social similarity and visiting similarity

Convergence

: Another merit of HGMAP is the high computational efficiency. HGMAP consists of three main components, i.e., two GCNs for social influence and geographical influence learning, and one multi-head attention encoder for user-POI interaction learning. For the two GCNs, they only have 2-layer convolutions without non-linear transformation in the first layer – which yields improvements in computational efficiency. In addition, HGMAP also consists of 3-layer MLPs, which also has a fast converge rate. Figure 7 illustrates the training of HGMAP, which indicates that our model can fast converge to optimal performance. For example, it achieves the best performance on Precision and MAP with around 40 epochs.

Fig. 7
figure 7

Convergence of HGMAP

4.7 Interpretability (Q4)

To better understand HGMAP, we visualize the user and POI embeddings learned from HGMAP using t-SNE [36]. Figure 8 plots the 2D visualization of the representation derived from the training of Yelp, Foursquare, and Gowalla. Obviously, the closeness of users and POIs are well reflected in the learned representation space, and users (POIs) of the same type are usually mapped to close positions in two-dimensional space. Each point denotes a user in Fig. 8a, c and e; and a POI in Fig. 8b, d and f, respectively. Figure 8a, c and e show that the embeddings of users are well clustered, meaning that our model can distinguish users. Additionally, each color represents a type of users who have a similar circle of friends and visiting record. In other words, users do exhibit certain discernible patterns in their POI check-ins which our HGMAP aims to capture. Similarly, we observe that the proximity of POI embeddings corresponds well with the similarity of user check-ins. In the same fashion, each color denotes a type of POIs that have a similar geographical position in Fig 8b, d and f. It means that a given POI presented to a user was relevant enough for that user to check-in this POI, so that HGMAP can retrieve it later, i.e., it is beneficial for the accurate recommendation of HGMAP.

Fig. 8
figure 8

Visualization of the learned user and POI representation on the Yelp, Foursquare and Gowalla datasets

5 Conclusion

In this study, we present a novel hybrid graph-based model HGMAP for POI recommendation, which consists of two graph neural networks and one multi-head attention encoder. Instead of only modeling user-item (POI) interactions as previous works do, we exploit the graph neural networks for capturing auxiliary information including social influence and geographical constraints. A POI adjacent graph is constructed to capture the implicit user mobility patterns by propagating the check-in embeddings on the POI graph. The experimental results based on three real-world datasets demonstrate that the proposed model outperforms the state-of-the-art baselines, and the latent space learned from both user and POI embedding propagation can well reflect discernible clustering patterns. This, in turn, indicates a promising direction that training and optimizing recommendation tasks with graph-based auxiliary information learning, especially for sparse data and cold-start users (items).

One of our immediate future works is to incorporate other auxiliary information for better POI recommendation, such as temporal features, POI categories and sequential check-in behavior. An important question that we plan to tackle is the shallow issue of graph neural networks due to the vanishing gradient problem in stacking multiple layers. We also plan to investigate methods against the sparse user-POI interactions by leveraging deep generative models [21, 25, 27] to discover underlying non-linear user-POI interactions while improving the recommendation performance.