1 Introduction

The rapid development of mobile technologies and LBSNs has resulted in a growing demand for personalized POI recommendations [1,2,3]. As individuals navigate urban environments and engage in their daily activities, the need for systems that provide personalized POI recommendation has become increasingly important. These tailored suggestions not only enhance the user experience but also significantly contribute to attracting customers to local businesses. Within the context of smart cities, next POI recommendation systems seamlessly integrate into the city’s digital infrastructure. They offer residents a more intuitive and personalized urban navigation experience, highlighting how next POI recommendation research is transforming the way users engage with their surroundings.

The next POI recommendation predicts the subsequent location that a user might visit based on their past check-ins and visited POIs. This process considers both the geographical attributes of the POIs and the temporal patterns of user movements, aiming to offer timely, context-aware suggestions for location-based services. It’s crucial to differentiate between “next POI recommendation” and general “POI recommendation.” While both suggest locations based on past check-ins, the next POI recommendation emphasizes predicting the immediate next location by viewing check-ins as a continuous sequence, capturing dynamic user preferences, and recognizing geographical factors. In contrast, traditional POI recommendations focus more on a user’s overall historical preferences, sidelining the importance of sequence, spatial context, and time sensitivity.

The next POI recommendation inherently possesses distinct characteristics that present specific challenges. At its essence, this approach interprets user check-ins as a continuous sequence, focusing on predicting the user’s immediate subsequent visit. However, the details of the next POI recommendation extend beyond simple sequential forecasting. It places a strong emphasis on spatiotemporal dynamics, recognizing that user preferences are fluid and evolve over time. The need to capture these dynamic shifts in user inclinations becomes paramount, with geographical contexts emerging as crucial influencers in the recommendation process. From the user’s perspective, a recommendation system must adeptly integrate both immediate actions and overarching behaviors. This integration is further complicated by user biases in data recording and the occasional scarcity of data for certain users. Additionally, the POIs themselves exhibit intricate geographical interrelations, and specific POIs manifest strong temporal associations. In subsequent sections, we will delve deeper into these detailed characteristics and challenges.

Historically, the next POI recommendation primarily relied on techniques such as matrix factorization [4], Markov chains [5], and deep learning methods like recurrent neural networks (RNNs) [1]. While these methods are effective to a certain extent, they tend to emphasize local spatiotemporal relationships, often overlooking broader global spatiotemporal contexts. For instance, they mainly depend on users’ individual historical visitation data, potentially neglecting global trajectories not present in historical records, and struggle to capture the high-order geographic relationships between POIs. In contrast, GNNs, as an cutting-edge technology, can grasp the high-order connectivity between POIs and integrate diverse information. Leveraging their inherent capabilities, GNNs can model both local and global spatiotemporal relationships, offering a superior solution for next POI recommendation.

In this survey, we strive to provide a comprehensive overview of the advancements in GNN-based next POI recommendation systems. Section 2 introduces the background of Smart Cities, Next POI Recommendation, and GNNs. In this section, we start with the concept of Smart Cities, categorize the distinct characteristics, analyze the challenges of the next POI recommendation, and delve into the GNN variants tailored for the next POI context. Section 3 discusses GNN-based methods for the next POI recommendation. We approach this from two distinct perspectives: the graph construction and the GNN-based approaches aimed at addressing different characteristics of next POI recommendation. Section 4 introduces the commonly used datasets and evaluation metrics for the next POI recommendation. Future research directions are mentioned in Sect. 5. Lastly, we conclude this paper in Sect. 6. Notably, compared to existing surveys on the next POI recommendation [6, 7], ours stands out as the first to offer insights from a GNN-based perspective.

2 Background

This chapter is organized into three interconnected sections, each highlighting a critical element of the next POI recommendation within the context of smart cities. It begins by examining the transformative impact of smart cities, where digital innovations are leveraged to improve urban life, positioning the next POI recommendation as an essential feature of this advanced ecosystem. The discussion then shifts to a detailed examination of the next POI recommendation, outlining its definition, reviewing initial research efforts, and analyzing the complex characteristics of it. The final section delves into GNN variants in next POI recommendation.

2.1 Smart cities

A smart city is an urban area that uses different types of electronic methods and sensors to collect data to gain insights and manage assets, resources, and services efficiently; in return, that data is used to improve the operations across the city [8]. This includes data collected from citizens, devices, and buildings that are then processed and analyzed to monitor and manage traffic and transportation systems, crime detection, information systems, and other community services.

A key component of smart cities is the development of Intelligent Environments. These environments leverage advanced technologies to enhance the interaction between residents and the urban infrastructure. For instance, IoT technology facilitates the seamless communication of data across various systems, enabling more efficient city operations [9, 10]. The Internet of Medical Things (IoMT) further enhances data collection, providing valuable health insights [11]. Advanced human sensing techniques, such as 3D point cloud analysis, contribute to creating more responsive and adaptive environments [12]. Moreover, methods like the Fuzzy Cognitive Network Process are used for software reliability and quality measurement in smart city applications, ensuring that the underlying systems are robust and dependable [13].

The concept of a smart city is intrinsically linked to improving the quality of life for its residents [8]. One of the ways to achieve this is by providing personalized services and recommendations based on the vast amount of data collected. The next POI recommendation is one such service. In a smart city, as residents move about, interact with various city services and the advancement of IoT, data about their preferences, behaviors, and routines are collected [14]. This data, when processed using advanced models like GNNs, can predict the next place a resident might want to visit or the next service they might require. For instance, if a person frequently visits a particular coffee shop after work, a next POI recommendation system might suggest trying out a new dessert place nearby. Such recommendations not only enhance the user’s experience in the city but also help in efficient city planning. By understanding where residents are likely to go next, city planners can optimize public transport routes, manage crowd control, or even plan urban development projects more effectively.

2.2 Next POI recommendation

This section elaborates on the problem definition of the next POI recommendation, early research in this field, and the characteristics of next POI recommendations.

2.2.1 Problem definition

Next POI recommendation refers to the task of predicting the next possible location or place a user is likely to visit based on their historical check-in sequences or past visited POIs [1,2,3]. This task takes into account both the geographic information of the POIs and the temporal patterns of the user’s movements. The goal is to provide timely and context-aware recommendations to users, enhancing their experience in location-based services.

It’s crucial to understand that “Next POI Recommendation” and “POI Recommendation” are distinct concepts. While both involve suggesting POIs based on a user’s historical check-ins, “Next POI Recommendation” specifically aims to predict the exact location a user is likely to visit next. This method treats user check-ins as a continuous sequence, emphasizing the importance of sequential dependency - the high correlation between sequential locations visited by a user. Moreover, it acknowledges temporal dependency by considering different check-in preferences during various times, such as day and night. Spatial dependency is also a key focus, recognizing that users generally prefer visiting locations closer to them. These aspects collectively contribute to capturing dynamic user preferences, underscoring geographical influences as crucial in the recommendation process [15,16,17]. In contrast, traditional POI recommendation predominantly relies on a user’s broader historical preferences, placing less emphasis on sequentiality, spatial proximity, and temporal sensitivity.

2.2.2 Early research in next POI recommendations

The task of the next POI recommendation aims to suggest the subsequent POI a user might visit, based on their historical check-in sequences which encompass geographic information and mild time constraints. Traditional methods employed for this task predominantly include Matrix Factorization and Markov Chain-based techniques, such as FPMC [18] and FPMC-LR [19]. Matrix Factorization [4] seeks to decompose the user–POI interaction matrix into two low-rank matrices, which can be roughly interpreted as the latent representations of users and items, respectively. However, a limitation of this approach is its neglect of the temporal relationships among check-ins, making it challenging to accurately recommend the next potential POI. On the other hand, the essence of the Markov Chain approach [5] is to capture the sequential mobility patterns of users. It leverages the transition probabilities estimated from past check-ins to predict the next POI a user might visit. This method places a strong emphasis on the patterns and trends within time series. Yet, to enhance the accuracy of recommendations, researchers have integrated various factors into these techniques, such as social dynamics, time-based attributes [20], and geographical considerations [19]. While these methods excel in extracting latent features from user–POI interactions and predicting preferences, their linear combination might not fully grasp the high-order sequential pattern in user–POI engagements.

The dawn of deep learning has ushered in innovative strategies to unearth the nuanced, non-linear relationships between users and POIs. RNNs have emerged as a powerful tool for addressing complex dependencies within extended sequences, playing a pivotal role in deciphering the chronological patterns of user check-ins. This prowess has positioned them as the go-to approach for next-POI recommendation tasks. STRNN [1] innovatively marries RNN units with spatiotemporal matrices, where distance and time matrices are parameterized based on specific spatial and temporal distances between proximate POIs in check-in sequences. As an advanced variant of RNNs, Long Short Term Memory Models (LSTMs) adeptly tackle the long-term dependency challenge, offering nuanced modeling of users’ long-term and short-term preferences. LSTPM [2], capitalizing on LSTM’s capabilities, bridges users’ long-term and short-term inclinations, employing a context-aware network architecture to probe the temporal and spatial interplay between preceding and current trajectories. STGN [21], building upon the foundational LSTM, introduces two additional time gates and two distance gates, capturing the spatiotemporal nuances in both short-term and long-term sequences. HST-LSTM [3], on the other hand, harnesses the spatiotemporal intervals between successive visits, embedding them within an LSTM that boasts a hierarchical structure.

The aforementioned studies primarily treat the next POI recommendation as a sequential recommendation task. While they effectively account for the temporal and spatial influences on user choices, they might still oversimplify spatial contextual information. This approach may not fully harness the spatial structure and could struggle to capture the high-order geographical and heterogeneous relationships introduced in Sect. 2.3.2 between POIs.

2.3 Characteristics in next POI recommendation

Next POI recommendation treats user check-ins as a continuous sequence, aiming to suggest the specific location a user is likely to visit next. However, the complexity of the next POI recommendation goes beyond mere sequence prediction. It places a heightened emphasis on the spatiotemporal relationship, recognizing that user needs and preferences evolve. Capturing these dynamic user preferences is crucial, with geographical influences emerging as key determinants in the recommendation process. From the user’s perspective, there’s a need to balance and integrate both short-term and long-term preferences, while also addressing data challenges such as user biases in recording data and limited data availability for some users. Regarding the POIs themselves, there exists a high-order geographical interrelation between them, and certain types of POIs exhibit strong temporal associations. In the following sections, we will categorize and delve deeper into the distinct features. The characteristics of the next POI recommendation task can be found in Table 1.

Table 1 Characteristics of the next POI recommendation

2.3.1 Inter-user preference

The “Inter-User Preference” refers to the relationship and similarities between the preferences of different users [22]. The tendency of users in the next POI recommendation system is to explore new locations that they haven’t visited before. This behavior complicates the prediction of their next move based solely on their historical trajectories. As users are inherently curious and may seek novel experiences, relying only on their past behaviors can limit the accuracy and relevance of recommendations [24, 38]. Furthermore, the inter-user preference-based relationship comes into play, where users might be influenced to explore locations based on the visits or preferences of other users [22]. Addressing both these biases and relationships is crucial for enhancing the adaptability and effectiveness of the next POI recommendation systems.

2.3.2 POIs complex interactions and dependencies

The high-order geographical and heterogeneous relationships inherent to POIs present both a distinctive feature and a challenge for the next POI recommendation systems [15, 25]. High-order dependencies reveal complex patterns of user behavior that transcend simple geographic proximity, reflecting preferences that can link disparate POIs through common themes or user interests. For example, a user may consistently choose venues that host live music, regardless of their location, indicating a high-order connection based on entertainment value rather than distance. On the other hand, heterogeneous relationships underscore the varied attributes of POIs, such as a local library’s quiet ambiance contrasted with a nearby bustling marketplace. These relationships add layers of complexity to the recommendation process, as systems must discern and predict user preferences that are influenced by a rich tapestry of factors, from cultural interests to activity types. Addressing these multifaceted dependencies is crucial for developing sophisticated recommendation systems that can navigate the intricate web of user–POI interactions and offer truly personalized suggestions.

2.3.3 Implicit feedback

Users exhibit implicit feedback issues concerning POI data [29]. “Implicit feedback” refers to the indirect responses from users, such as browsing history and frequency of visits without explicit user preferences such as ratings. It’s a crucial component of POI data, as users often are either unwilling or don’t have the time to provide direct feedback in the form of ratings and reviews. Implicit feedback primarily captures positive interactions that when a user visits a particular POI. This leads to a lack of negative feedback when a user dislikes a specific POI. Implicit feedback leads to difficulties in capturing user preferences, as the data only consists of user action check-ins without indicating the preference level. It complicates modeling user–POI interactions effectively, leading to potential inaccuracies in recommendation. Also, implicit feedback will introduce noise into the data. For instance, a user might visit a POI by chance, not necessarily because they are genuinely interested in it. However, other factors can contribute to the noise. For example, users might deliberately manipulate their check-in records for personal reasons.

2.3.4 Sparsity of raw relational data

Data sparsity in the next POI recommendation refers to a situation where there is a significant lack of data points in a dataset due to the incomplete or infrequent actions of users. Users have the option to “check in” at specific locations to indicate their presence. However, not every user will check in every time they visit a place. Some might forget, some might not want to share their location due to privacy concerns, and others might not use the application frequently. Additionally, to know and rate a POI, users must physically visit it, which is more costly than rating a movie online [39]. Even when users visit a POI, they often do not check in due to privacy or safety concerns. This leads to sparse data, where only a fraction of a user’s actual visits to places is recorded. In comparison to other recommendation tasks, the check-in data in the POI recommendation is notably sparser. For instance, the density of check-in data is typically around 0.1 percent [40], whereas, for movie recommendations like on Netflix, the data density is around 1.2% [41]. This data sparsity poses a significant challenge for the next POI recommendation [30, 37]. Sparse data results in short sequences, making it difficult to capture a user’s sequential pattern. Moreover, data for POI recommendations tends to be binary implicit rather than explicit, meaning the regularity of user behaviors is harder to discern and leverage.

2.3.5 Cold start

The “Cold-start” problem refers to the challenge of making accurate recommendations for new or infrequently visited locations, or new users with limited interaction history. This problem arises because the recommendation model struggles to collect sufficient knowledge about these users or POIs, resulting in poor prediction performance. The lack of historical data for new users or less-visited POIs makes it difficult for traditional recommendation systems to predict future interests accurately. This results in a less personalized and effective recommendation experience, particularly for new users or for recommending new or niche POIs. Due to the absence of ample interaction data, this issue hinders traditional recommendation algorithms from effectively predicting user preferences and behaviors. The cold-start dilemma is a significant hurdle in the next POI recommendation domain [23, 29]. Addressing the cold-start issue is pivotal for effectively onboarding and engaging new users in the system.

2.3.6 Temporal sensitivity of POI categories

Temporal sensitivity refers to the pattern or trend observed in the frequency or likelihood of check-ins at a particular POI based on the time of day or other temporal factors. In the context of location-based social networks, certain POIs exhibit strong temporal patterns in user check-ins [23, 33]. This means that the frequency with which users check into certain locations can vary significantly depending on the time of day, day of the week, or even season. For instance, as mentioned in the statement, train stations might see a surge in check-ins during rush hours when people are commuting to and from work. In contrast, during the late hours of the night or early morning, the check-in frequency at train stations might be much lower. This pattern indicates a strong temporal correlation for the train station category. Traditional recommendation systems that do not account for temporal sensitivity can make irrelevant suggestions, such as recommending a bar during working hours. Incorporating temporal sensitivity allows for more accurate and contextually appropriate recommendations, enhancing user experience by suggesting POIs that align with their current temporal context.

2.3.7 User dynamic preferences

GE [39] posits that the dynamic preferences of users are the most critical challenge in the next POI recommendation problem. Location-based recommendations need to rely on users’ current preferences and spatiotemporal context to make effective recommendations. Users exhibit dynamic preferences, which can be primarily characterized by three aspects: the fluidity of preferences, the interplay between short-term and long-term inclinations, and the influence of time and context.

  • Fluidity of Preferences: This denotes that a user’s affinity towards a POI is not static but evolves over time and with experiences. For instance, a user might frequently visit a specific cafe over a period, but as time progresses, they might shift their preference to other venues or cease visiting altogether.

  • Interplay of Short-term and Long-term Preferences: Users’ behaviors are often influenced by a combination of their immediate inclinations and enduring interests. Short-term preferences might arise from recent experiences or current emotions, while long-term preferences encapsulate a user’s consistent interests and habits. For example, a user might have a long-standing preference for outdoor activities, but on a particular weekend, influenced by a movie advertisement, they might opt to visit a cinema.

  • Influence of Time and Context [37]: Users’ preferences can vary based on different times and settings. For instance, a user might favor fast-food outlets on weekdays but lean towards upscale restaurants on weekends. Similarly, their choice of activities might be indoors during the winter and outdoors during the summer.

2.4 Graph neural network

Graph structures have become a prevalent framework for representing entities and their interconnections. With the advancements in deep learning techniques and the surge in graph data, GNNs [42,43,44] have gained significant popularity and success. Unlike traditional Convolutional Neural Networks (CNNs) [45], GNNs excel at handling non-Euclidean structures, processing the topological structures of graphs, and learning high-dimensional representations. They have demonstrated superior performance in various graph-related tasks. Notably, since the success of GNNs, their exceptional ability to handle spatial-temporal patterns, capture the intricate relationships between users and POIs, and learn high-level representations has led to extensive research in the domain of next POI recommendation.

2.4.1 Different variants of GNNs in next POI recommendation

Given graph data, the core idea of GNN is to iteratively aggregate feature information from neighbors and integrate this aggregated information with the current representation of the central node during the propagation process. From a network architecture perspective, GNN consists of multiple propagation layers, each composed of aggregation and update operations. The propagation can be formulated as:

$$\begin{aligned} h_u^{(l)}= & {} \text {Updater}^{(l)} \left( h_u^{(l-1)}, \text {Aggregator}^{(l)} \right. \\{} & {} \left. \left( \{ h_v^{(l-1)} | v \in \text {Neighbors}(u) \} \right) \right) \end{aligned}$$

Here, \( h_u^{(l)} \) represents the representation of node \( u \) at the \( l^{th} \) layer. The functions \( \text {Aggregator}^{(l)} \) and \( \text {Updater}^{(l)} \) correspond to the aggregation and update operations at the \( l^{th} \) layer, respectively. In this section, we will briefly introduce GCN [42], GAT [44], and GraphSAGE [43], three mainstream GNNs:

2.4.2 GCN

GCN [42] is one of the pioneering works in the domain of GNNs. It aims to generalize the traditional convolution operation from regular grids, like images, to irregular graphs. In GCN, the feature representation of a node is updated by aggregating information from its neighbors and itself. The aggregation is typically a weighted average, with weights determined by the node’s degree. The formula for a single layer of GCN is:

$$\begin{aligned} h_u^{(l)} = \sigma \left( \sum _{v \in \text {Neighbors}(u) \cup \{u\}} W^{(l)} h_v^{(l-1)} \right) \end{aligned}$$

where \( \sigma \) is a non-linear activation function, and \( W^{(l)} \) is a learnable weight matrix for the (lth) layer.

2.4.3 GAT

GAT [44] introduces attention mechanisms to the graph domain, allowing nodes to assign different importance weights to their neighbors. This is particularly useful for graphs where the relationship strength between nodes varies. In GAT, the attention coefficients between two nodes are computed using a shared attention mechanism, which is then used to weight the feature aggregation. The key formula for the attention mechanism in GAT is:

$$\begin{aligned} \alpha _{uv} = \text {softmax}_u \left( \text {LeakyReLU}\left( a^T [W h_u; W h_v] \right) \right) \end{aligned}$$

where \( a \) is a shared attention vector, and \( W \) is a shared weight matrix.

The softmax function normalizes the attention coefficients, ensuring that they sum up to 1 across all neighbors of a node, thus providing a probabilistic interpretation of the attention scores. The LeakyReLU function is an activation function that allows a small, positive gradient when the input is negative, which helps the model learn more effectively.

2.4.4 GraphSAGE

GraphSAGE [43] is designed to generate embeddings by sampling and aggregating features from a node’s neighbors. Unlike GCN and GAT, which consider all neighbors, GraphSAGE samples a fixed-size set of neighbors at each depth level, making it more scalable for large graphs. The method introduces several aggregation functions, such as mean, LSTM, and pooling, to combine the features from the node’s neighbors. The general aggregation function can be expressed as:

$$\begin{aligned} h_u^{(l)}= & {} \sigma \left( W \cdot \text {CONCAT}(h_u^{(l-1)}, \text {Aggregator}^{(l)}(\{h_v^{(l-1)}\right. \\{} & {} \left. | v \in \text {Sample}(u)\})) \right) \end{aligned}$$

where \( \text {Sample}(u) \) is a function that samples a fixed number of neighbors for node \( u \), and \( \text {Aggregator}^{(l)} \) is an aggregation function at the (lth) layer.

2.4.5 Why GNN is suitable for next POI recommendation

GNNs combine deep representation learning, the ability to capture complex relationships, integration of diverse information sources, and solutions to data sparsity, making them particularly effective for the next POI recommendation task.

2.4.6 Deep representation of graph-structured data

The inherent nature of the next POI recommendations is that interactions between users and POIs, as well as historical visits, can be naturally represented as a graph. GNNs excel at processing graph data, allowing them to produce rich embeddings for each node. These embeddings capture the intricate interactions and preferences of users towards different POIs. By iteratively aggregating information from neighboring nodes, GNNs can produce a comprehensive representation that captures both the individual characteristics of a user or POI and their relationships within the larger network.

2.4.7 Capturing high-order connectivity

Traditional recommendation systems often focus on direct interactions between users and items. However, in the realm of POI recommendations, it’s essential to consider not just direct interactions but also indirect ones. For instance, if a user’s friend frequently visits a particular POI, it might also be of interest to the user. GNNs, with their ability to propagate information through the graph, can capture these multi-hop relationships, ensuring that both direct and indirect connections influence the recommendation. This high-order connectivity provides a richer context, enhancing the accuracy of the recommendations.

2.4.8 Integration of auxiliary information

Beyond the direct interactions between users and POIs, there’s a plethora of auxiliary information that can refine recommendations. This includes attributes of a POI, user profiles, social connections between users, and more. GNNs can seamlessly integrate this auxiliary information into their framework, providing a holistic view of the user and the environment. By doing so, they ensure that the recommendation is influenced by a broader set of factors, leading to more personalized and accurate suggestions.

2.4.9 Addressing data sparsity

One of the significant challenges in recommendation systems is the sparsity of user–POI interaction data. Many users might have interacted with only a small subset of POIs, leading to a lack of sufficient data to make accurate recommendations. GNNs can alleviate this issue by leveraging information from neighboring nodes in the graph. Even if a user hasn’t directly interacted with a particular POI, the GNN can infer potential interest based on the interactions of similar users or related POIs. This capability allows GNNs to provide robust recommendations even in the face of sparse direct interaction data.

3 GNN-based approaches for next POI recommendation

Next POI recommendations with GNN approaches often learn POI representations and combine the embedding with sequence modeling. The primary step often involves representing users and POIs as nodes in a graph. Through GNN layers, node embeddings are generated by aggregating information from neighboring nodes, capturing both local and global spatial-temporal relationships. Once these embeddings are obtained, they can be fed into sequential models like RNNs to capture the temporal dynamics of user behaviors. The combination of GNNs for spatial relationships and RNNs for temporal patterns allows for a more holistic understanding of user preferences, leading to more accurate POI recommendations. The overall framework for GNN-based approaches for the next POI recommendation is depicted in Fig. 1.

Fig. 1
figure 1

Overall framework for GNN-based approaches for the next POI recommendation

In this article, we approach the discussion of GNN-based methods for the next POI recommendation from two perspectives. First, we analyze how various studies employ different graph construction techniques to fit the next POI recommendation task. Second, we explore how these studies leverage GNN-based approaches to face specific characteristic and challenges inherent to the next POI recommendation task.

3.1 Graph construction

In the next POI recommendation systems, one of the core components when leveraging GNN is the choice of graph structure. Different graph structures capture varying contextual information and interaction patterns, leading to distinct recommendation characteristics. The methods for graph construction can be found in Table 2. Some examples of graph construction in the next POI recommendation are depicted in Fig. 2 and will be explained in the following subsections.

Table 2 Graph construction methods

3.1.1 POI transition graph

POI Transition Graph offers a representation of user movements. While its nodes are generally POIs, the edges symbolize the patterns or sequences in which users traverse these locations. ATST-GGNN [46] proposes a Spatiotemporal Graph which is one variant of the POI Transition Graph. This model intricately blends temporal and spatial dimensions into the graph’s architecture. Each node represents a POI, while edges signify not just the sequence but also the frequency of user visits, enriched by unique temporal and spatial weight matrices. The temporal matrix captures the variability in user behavior over different times, and the spatial matrix emphasizes geographical proximity between POIs.

Fig. 2
figure 2

Graph construction examples in next POI recommendation

GETNext [23] proposes a user-agnostic trajectory flow map that extended the basic POI Transition Graph and the weight on each edge quantifies the frequency or likelihood of these transitions, reflecting how often users move between specific POIs. This graph structure is crucial for modeling and predicting user behavior in terms of their next POI visit, leveraging the observed movement patterns and frequencies among different POIs.

One of graphs in DynaPosGNN [47] is also constructed based on transitions between POIs, similar to other models. However, a distinct feature of this graph is that there can be multiple edges between two nodes, each representing a different visit time. This design allows for a more nuanced representation of user movements, capturing the dynamic nature of their interactions with various POIs based on the specific times of their visits.

Compared to the above static POI transition graph, mapping only direct transitions from one POI to the next is based on historical data. In contrast, MSDP [31] introduces a dynamic structure that evolves, taking into account the sequence of visited POIs, and uses a learning-based method to infer the graph structure. This includes employing RotatE [57] knowledge graph embedding and Eigenmap [58] methods to identify and leverage latent relationships between POIs, overcoming the sparsity of direct observation.

3.1.2 POI relationship graph

The POI relationship graph is primarily concerned with the relationships among POIs. Its node set exclusively consists of POIs, while the edges depict different relationships such as geographical proximity, shared category, or temporal relationship.

STP-UDGAT [22] created three types of POI relationship graphs including POI-POI Spatial Graph, POI-POI Temporal Graph, and POI-POI Preference Graph. These graphs capture the relationships between POIs from different dimensions. The Preference Graph is built based on user preferences, reflecting the connections between POIs favored by similar users. The Temporal Graph considers the temporal aspect, constructing relationships based on time-related associations among POIs, such as popularity during specific time frames. Lastly, the Spatial Graph is formed based on the geographical locations of POIs, capturing the relationships between physically proximate points.

MobGT [48] proposes two different POI Relationship Graph, Global Spatial Graph and Global Temporal Graph. The Global Spatial Graph is constructed with nodes representing POIs and edges based on geographical proximity, capturing spatial relationships between POIs. The Global Temporal Graph, on the other hand, is built with the same nodes but the edges reflect the temporal sequence of user visits to these POIs, emphasizing the temporal aspect of user mobility patterns.

This graph focuses on the nuances of POIs, capturing the inherent similarities and contexts that might not be immediately evident when only considering user interactions. For example, GSTN [15], DRAN [25] and KBGNN [27] construct a Distance-based Graph with each POI as a node. If the distance between two POIs is less than a threshold, an edge is created between them. Furthermore, a Gaussian kernel function is used to reflect the closeness of the distance.

3.1.3 User–POI bipartite graph

The User–POI Bipartite Graph serves as a foundational structure for many POI recommendation systems. It’s distinguished by its clear division of nodes into two categories: users and the POIs. These nodes intermingle through edges that signify the interactions or visitations between users and specific POIs. This structure inherently emphasizes the direct relationships between users and POIs. DynaPosGNN [47] created two dynamic graphs, one is User–POI Graph. The edges in this graph are not just simple connections but are dynamic, reflecting the timing of user visits to POIs. GFUC [49] and MEMO [50] also utilized the User-POI Bipartite Graph which comprises two distinct sets of nodes, representing users and POIs respectively. Edges in the graph connect users to POIs, indicating a user’s visit to a POI, with the edge weight reflecting the frequency of visits.

3.1.4 User–user social graph

Opting for a more user-centric view, the User–User Graph revolves around user interactions and similarities. Here, nodes exclusively represent users, while edges connect those with analogous visitation or interaction patterns. By prioritizing user similarities, this graph structure enables recommendation systems to infer preferences based on peer behavior, highlighting communal patterns. STP-UDGAT [22] constructed an undirected user–user graph. In this graph, nodes represent individual users, while edges indicate the similarity between users. Specifically, an edge is formed between two users if their Jaccard similarity coefficient exceeds 0.2. In MEMO [50], various types of user–user social graphs are constructed, with nodes representing individual users. These social relationships are primarily based on familial and professional contexts, leading to the formation of distinct user–user social graphs.

3.1.5 Knowledge graph

A Knowledge Graph is a specialized graph structure that captures and represents intricate relationships between users and locations in the Next POI recommendations. By integrating the attributes of POIs, users’ historical behaviors, and the geographical and semantic connections between locations, knowledge graphs provide rich contextual information for recommendation systems. This enhances the system’s ability to accurately understand user preferences and interests, leading to more tailored POI recommendations. ARNN [51] proposes a knowledge graph where nodes of the knowledge graph are primarily users, POIs, and vocabulary elements such as POI categories and tags. The edges represent different types of relationships between these entities. For example, there are relationships between users and locations based on user visits, geographical proximity relationships between locations, and associations between locations and vocabulary elements. This structure allows the knowledge graph to comprehensively represent user behaviors, location characteristics, and their interactions, providing valuable information for the recommendation system. STKGRec [52] proposes a STKG which is constructed by integrating spatial and temporal data about user interactions with POIs. This involves capturing user movements and preferences over time and in different locations. The graph is built using nodes representing users and POIs, and edges depicting the interactions, such as visits to a POI at a certain time. This structure allows the graph to represent complex patterns of user behavior and preferences in both time and space, making it useful for sophisticated POI recommendation systems. The knowledge graph in Graph-Flashback [53] constructed includes users and POIs as nodes, and edges representing the relationships between users and POIs, temporal and spatial relationships between POIs, and social relationships among users. This structure enables the knowledge graph to comprehensively represent user behaviors, characteristics of locations, and their interactions, thus providing rich contextual information for the recommendation system.

3.1.6 Category–category graph

A category–category graph is a specialized type of graph designed to represent and analyze relationships between different categories of POIs. In this structure, the nodes symbolize distinct POI categories, which can range from broad classifications like “restaurants” or “parks” to more nuanced ones such as “Italian restaurants” or “modern art museums.” The edges in the graph capture the sequential behavior of users concerning category visits. An edge between two categories indicates that users frequently transition from visiting a location in one category to a location in the subsequent category. For example, an edge connecting “gyms” and “health food stores” might imply that users often visit a health food store right after their workout session at the gym. MobGT [48] proposes three different graphs including global category graph which is a directed category–category graph constructed by using nodes to represent different POI categories and edges to depict transitions between these categories. These edges are weighted based on the frequency with which pairs of categories appear consecutively in user trajectories. This method captures the patterns of how users move between different types of POIs, reflecting their preferences and behaviors in navigating through various categories. POI–RGNN [54] extends its prediction scope from specific POI locations to POI categories. Therefore, it constructs a POI category graph, enabling the model to understand and predict patterns in user movements between different categories of locations. This approach allows for a more generalized and versatile understanding of user behavior in terms of category preferences rather than specific locations. Compared to the basic Category–category graph, the graph in CHA [55] comprises POIs as leaf nodes and their categorical information as non-leaf nodes. Edges represent parent–child relationships in a Directed Acyclic Graph with parent nodes providing generalized information. This setup enables combining POI embeddings with their hierarchical categories through an attention mechanism. The graph in CHA aims to enhance POI recommendations by leveraging hierarchical category information and mitigating data sparsity with attention mechanisms. Conversely, category–category graphs are used for understanding relationships between categories.

3.1.7 User flow geographical graph

The User Flow Geographical Graph is an illustrative tool that captures the movement patterns of users across specific geographical areas. In this graph, nodes represent distinct geographical regions, which could be neighborhoods, districts, cities, or any demarcated area of interest. The edges, meanwhile, represent the directional flow of users transitioning from one area to another. Crucially, each edge carries a weight, which quantifies the volume of users moving between the connected regions. ADQ-GNN [56] proposes an Area Graph which is a kind of User Flow Geographical Graph, where each node represents a specific geographical area. These areas are determined using a quadtree structure. The edges between nodes indicate the flow of users from one area to another, capturing the spatial relationships and user movements.

3.2 Advanced graph structures

  • Adaptive graph. Unlike traditional methods, AGRAN [28] does not rely on pre-defined graphs. Instead, it employs an adaptive graph, optimizing the graph structure to automatically infer the inherent geographical relationships among POIs. Subsequently, with the learned graph structure combined with the GCN, it generates adaptive graph-based POI representations that possess robust expressiveness for capturing geographical dependencies.

  • Hierarchical graph. The graph constructed in HMT-GRN [30] is not a simple POI–User graph or POI-POI graph. Instead, it utilizes a more complex hierarchical graph structure designed to address the sparsity issue in the User-POI matrix and to better learn the relationships between users’ behaviors and locations. In this Hierarchical Spatial Graph, different levels represent different tasks, with vertices in each level connected to vertices in the next level, forming a hierarchical structure. This structure allows the model to more effectively capture the complex interactions between users and locations, as well as among locations themselves.

  • Hypergraphs. In traditional graphs, each edge connects two nodes, whereas in a hypergraph, a hyperedge can connect any number of nodes. This allows hypergraphs to represent more complex relationships. STHGCN [32] utilizes hypergraphs to capture and learn users’ historical trajectories and collaborative trajectory information among different users. In the hypergraph, nodes represent POIs, and hyperedges signify the complex high-order relationships between nodes.

3.3 Application of GNN in next POI recommendation

In the previous sections, we outline the characteristics of the next POI recommendation problem. In this chapter, we will conduct a survey of research focused on these characteristics, exploring how various models address and enhance solutions to the next POI recommendation challenge.

3.3.1 Inter-user preference

A challenge inherent to the next POI recommendation is that users might explore areas unfamiliar to them, implying that their future visits might not be reflected in their historical trajectories. Consequently, an individual’s next POI recommendation is not solely based on their past behaviors but leans heavily on other users’ preferences.

In this context, a global POI transition graph becomes a potent tool, capable of learning and capturing global movement patterns and trends. The STP-UDGAT [22], GETNext [23], GSTA-GNN [24], GSTN [15], and DRAN [25] models harness this idea, aiming to exploit global trajectories through user-agnostic trajectory flow maps, also termed as POI transition graphs. They utilize GNN methods such as GCN to learn the embeddings of each node (POIs), capturing the relationships between POIs from a global perspective across all user trajectories. These embeddings emphasize the significance of POIs in a global context, offering a comprehensive understanding of user movements. However, there are nuances in their approaches.

In GETNext [23], the weights of the edges in its POI transition graph are determined by aggregating all trajectories. In other words, the more frequent the transition from one POI to another POI across all trajectories, the higher the weight of the edge connecting them. In STP-UDGAT [22], the POI transition graph is termed the “POI-POI Preference Graph,” which is constructed similarly to that of GETNext. Additionally, STP-UDGAT introduces two other distinct POI Relationship Graphs. One graph captures the spatial relationships between POIs, focusing on their distances from one another. The other graph emphasizes the temporal aspect, detailing the time intervals between user visits to different POIs. GSTA-GNN [24] also creates POI transition graphs where nodes represent POIs and edges indicate transitions between them. The edge weights in the two transition graphs are based on the spatial distance and temporal intervals between POIs, respectively. For GSTN [15] and DRAN [25], a distinguishing feature of both GSTN and DRAN is their use of distance-based POI semantic graphs to capture the semantic relationships between POIs based on proximity. What sets these models apart is the method they employ to determine edge weights in these graphs. Specifically, they utilize the Gaussian kernel to compute the weights and an edge is established only if the distance between the two nodes is below a certain threshold.

Unlike building a global POI transition graph, SGRec [26] performs data augmentation for each sequence. Constructing augmented graphs, incorporates relevant POIs from other user sequences as context for the target POIs. This allows the model to learn interest similarities across users, namely inter-user preferences, which facilitates users to explore a wider range of POIs instead of just relying on their historical sequences. Therefore, the sequence augmentation mechanism enhances the modeling of user interest similarities.

3.3.2 POIs complex geographical dependencies

Capturing the complexity of geographical influences between POIs is critical for the next POI recommendation, as the geographic location is highly correlated with actual user mobility patterns and interest preferences. Simply put, many existing next POI recommendation models solely rely on physical distance or direct sequential relationships between POIs, such as STGN [21] and HST-LSTM [3]. However, in reality, there also exist complex long-distance, cross-category transitional behaviors between POIs based on their inherent attributes and functions. For instance, it is rare for a user to directly transition from a restaurant to a neighboring restaurant of the same type. Therefore, linear distance metrics and naive models alone cannot sufficiently describe people’s mobility decision process and evolving interests.

The recently proposed GSTN [15] and DRAN [25] present a novel approach to modeling complex geographical influences between POIs. They construct two types of POI semantic graphs—a distance-based graph and a transition-based graph. Through graph embedding, it learns latent representations that capture non-linear and high-order relationships beyond physical proximity. For example, the transition-based graph encodes the mobility flow patterns between POIs based on historical check-in data. By integrating these spatial dependencies with temporal sequencing signals, GSTN provides a holistic spatiotemporal modeling framework. More importantly, attention mechanisms are introduced to enable adaptive, personalized aggregation of geographical influences.

Similarly, KBGNN [27] adopts comparable methods. Its geographical graph aligns with the POI distance graph, and its user-aware sequential graph mirrors the POI transition graph discussed earlier. What sets KBGNN apart is its utilization of the random walk graph kernel, which is specifically employed to delve into sequential influences.

Instead of using pre-set graphs, AGRAN [28] uses an adaptive graph. This approach fine-tunes the graph structure to better understand the natural geographical relationships between POIs. After setting up this graph structure, it’s combined with the GCN to create adaptive graph-based POI representations. These representations are especially good at showing geographical dependencies.

3.3.3 Implicit feedback

Implicit feedback significantly impacts the next POI recommendation. Implicit feedback, derived from user behaviors like dwell time or visit, lacks explicit preference indications, complicating accurate interest inference. It contributes to challenges in modeling user preferences and predicting the next POIs accurately, necessitating sophisticated data preprocessing and modeling techniques to mitigate their effects and improve recommendation quality.

In the domain of the next POI recommendation, early research focusing on collaborative filtering and matrix factorization techniques frequently addressed the use of implicit feedback. For instance, reference [1, 59] is a notable example where implicit feedback is extensively utilized.

Graph-based approaches to addressing implicit feedback data are relatively underexplored. EEDN [29] represents one of the few research efforts that tackle this issue using a graph-based method. EEDN that utilizes a hypergraph convolution encoder to enhance the aggregation of graph convolutions, learning robust user representations. The decoder mines local and global interactions, modeling implicit feedback by considering both graph and sequential-based patterns. The hybrid approach of combining hypergraph convolutions with matrix factorization and leveraging both local and global interaction patterns enables the model to capture a more comprehensive and nuanced understanding of user preferences, effectively addressing the implicit feedback challenge. This approach allows for a more detailed and nuanced understanding of user preferences and behaviors, improving the accuracy of POI recommendations despite the inherent challenges of implicit feedback.

3.3.4 Sparsity of raw relational data

In the next POI recommendation task, the user–POI interactions are often sparse, making it difficult to learn complex sequential patterns at the POI level. However, POI categories are much more dense and stable in terms of data distribution. By incorporating category information into model learning, richer context can be utilized to supplement the capability of uncovering transition dependencies solely from sparse POI data. Data augmentation is an effective approach to alleviate data sparsity in recommender systems. The key idea of data augmentation is to generate more abundant and diversified data samples in various ways, providing broader coverage for model training. This compensates for the difficulties of feature learning caused by the inherent sparsity of user–item interactions.

SGRec [26] alleviates the data sparsity issue by constructing augmented graphs for each sequence. It incorporates relevant POIs from other user sequences to provide additional context for the target POIs. The enriched contexts allow a more comprehensive representation of the POIs’ semantic properties. Additionally, SGRec introduces category-awareness by incorporating POI category embeddings into node representations and making auxiliary predictions on the next POI categories. By modeling the denser category-level transitions, SGRec further supplements the capability of learning complex sequential dependencies solely from sparse POI interactions. Therefore, the sequence augmentation mechanism and multi-task category-aware learning allow SGRec to tackle the interaction data sparsity challenge through enriched feature expression, increased context, and multi-granularity pattern learning.

GETNext [23] constructs a global trajectory flow map by aggregating generic transition patterns from all historical records. This provides supplementary information, especially for inactive users with insufficient contexts or new users with short trajectories. In effect, crowd knowledge offers augmented data to compensate for individual data scarcity. GETNext incorporates POI category information to model check-in temporal patterns at the category level. As category-based data is much more dense and stable compared to sparse user–POI interactions, modeling transitions across categories compensates for the lack of fine-grained details at the individual POI level.

HMT-GRN [30] utilizes graph-based methods by learning not only User–POI but also user-region matrices at different levels of granularity, using a Graph Recurrent Network (GRN) module to capture both sequential dependencies and global spatio-temporal POI–POI relationships. By reducing data sparsity through multi-task learning and hierarchical search space reduction via Hierarchical Beam Search (HBS) as a method for reducing the search space and incorporates a selectivity layer to balance personalization with exploration. The model achieves significant improvements in prediction accuracy and efficiency in the next POI recommendation.

MSDP [31] proposes a novel approach combining RotatE [57] knowledge graph embedding and Eigenmap [58] methods to extract POI relationships from sparse check-in data. This approach builds a POI similarity graph, enhancing the model’s capability to generalize POI features by aggregating similar POIs. By dynamically selecting neighboring nodes for aggregation based on users’ previous POI sequences, the model can make more accurate and context-aware predictions, thereby significantly improving the effectiveness of the next POI recommendations.

3.3.5 Cold start

The cold start problem is prevalent in next POI recommendation task due to insufficient user data. For new users or inactive users with scarce historical records, personalized models often fail to capture their preferences and mobility patterns, leading to poor recommendation performance. In general, data augmentation and incorporating side information are two common remedies.

Specifically, data augmentation enriches training data by supplementing original sparse interaction data with augmented samples synthesized from public knowledge or via generative strategies. This facilitates feature learning under cold start settings. Additionally, leveraging supplementary information beyond interactions compensates for the lack of contextual details, e.g., categorization imposes structural constraints on user movements.

Addressing the ubiquitous cold start challenges, GETNext [23] develops a trajectory flow map augmented by crowd knowledge to provide generic POI transition patterns. Graph neural network encodes such common flows into POI representations to benefit new users. GETNext also fuses time encoding with category embedding to better model temporary interests over categories. By multi-view learning on augmented data and categories, GETNext alleviates the cold start problem and outperforms sequential models relying solely on individual trajectories.

EEDN [29] addresses the cold start problem by utilizing a novel hypergraph convolution encoder. This encoder enhances the ability to select effective neighbors and aggregates collaborative signals more effectively, thus improving the learning of robust user and POI representations, even when there is limited interaction data. The hypergraph convolution approach is effective because it captures complex, high-order collaborative signals among users and POIs, overcoming the limitations of sparse interaction data. By leveraging these collaborative signals, the EEDN can better infer the preferences of new users or the attractiveness of less-visited POIs. This method enhances the model’s generalization capabilities, making it more adept at handling the cold start problem and providing accurate recommendations despite limited data.

Furthermore, STHGCN [32] also leverages a hypergraph to capture high-order relationships among user trajectories, incorporating both intra-user and inter-user collaborative information. By modeling high-order relationships and utilizing hypergraph methods, STHGCN effectively alleviates the cold start problem, improving prediction accuracy for both short and long trajectories through a novel hypergraph transformer that combines spatiotemporal context.

3.3.6 Temporal sensitivity of POI categories

Temporal sensitivity is a critical aspect where users’ preferences for POIs change dynamically over time. This temporal aspect is crucial for predicting future POIs that a user is likely to visit. By dividing the day into time intervals and constructing virtual trajectories for each user, it can reflect users’ changing preferences throughout different times of the day. Ignoring temporal sensitivity leads to static and inflexible recommendations that fail to adapt to users’ changing preferences.

Reference [33] utilizes a graph-based approach by transforming Voronoi diagrams into undirected graphs. This method calculates geographic similarity between POIs within the same temporal interval by determining adjacency through shared edges of Voronoi polygons and adding undirected edges between adjacent points. The geographic similarity between POIs visited by the same user is then calculated to enhance the recommendation process. GETNext [23] incorporates a fusion of time encoding and POI category embeddings to model users’ time-aware preferences over different categories. This is because distinct POI categories exhibit evident temporal correlation patterns such as the check-in rush hours at train stations.

3.3.7 Dynamic user preferences

In the domain of dynamic user preferences for the next POI recommendation, there is a growing focus on integrating both long-term and short-term user behaviors. Recognizing the importance of temporal dynamics, studies such as LSPL [34], PLSPL [35], and LSTPM [2] leverage deep learning’s sequential models, notably RNNs, to better capture these evolving preferences.

GNN-based methods, particularly those integrating GCN and RNNs (especially for LSTM), have made significant strides. These approaches effectively combine spatial and temporal dynamics through the use of GNNs and sequential models. For example, LSPHGA [36] significantly enhances our capability to understand dynamic user preferences by employing a heterogeneous GCN. This innovative model, LSPHGA, adeptly navigates the complexities of dynamic user preferences by integrating long- and short-term user behaviors through an integrated heterogeneous graph neural network complemented by attention mechanisms. Notably, it adopts a self-attention mechanism to analyze recent user check-in sequences, enabling the model to promptly and accurately adjust to the latest user behaviors and preferences. This mechanism is crucial for tailoring short-term user preferences. Furthermore, LSPHGA ingeniously merges these long- and short-term preferences, assigning personalized weights to each user. This balanced integration allows for nuanced predictions of the next POI by adaptively weighting the importance of enduring, long-term interests against the immediate, short-term desires, tailored to each user’s unique pattern of behavior. GCN-LSTM [37] effectively addresses dynamic user preferences by capturing both the temporal dynamics of user behavior and the spatial dynamics of POIs. Specifically, the use of GCN allows for the extraction of complex spatial relationships and user–POI interactions from the heterogeneous graph. This captures the influence of geographical and social contexts. LSTM models the temporal sequences of user check-ins, capturing the evolving nature of user preferences over time. By integrating these two models, the framework can dynamically adapt to changes in user preferences, leading to more accurate and personalized POI recommendations.

4 Common datasets and evaluation metrics

In this section, we will first introduce the commonly used datasets for the next POI recommendation, followed by an overview of the prevalent evaluation metrics in the field.

4.1 Datasets

Historical studies in the next POI recommendation have leveraged check-in records sourced from a diverse array of LBSNs, encompassing platforms like Foursquare, Gowalla, Brightkite, and others. These datasets are predominantly present in tabular formats, detailing interactions between users and POIs, as well as inter-user connections within LBSNs. The data related to user–POI interactions often encapsulates details of user check-ins, encompassing time stamps, geographical coordinates, and associated semantic attributes. Such semantic details for POIs might span categories, tags from user posts, establishment dates, geospatial data, frequency of check-ins, and more. Conversely, user-centric semantic data might cover aspects like post counts, friend counts, and check-in frequencies. To maintain the essence of social interactions, certain datasets, like those from Foursquare or Gowalla, also depict user-to-user connections, illustrating a network where users are linked to their entire friend list. The statistical details of some commonly used datasets can be found in Table 3.

FoursquareFootnote 1 collaborated globally to collect and distribute location data, primarily used for location check-ins and sharing. Many POI recommendation models discussed in recent literature utilize Foursquare datasets spanning from 2010 to 2014. These datasets predominantly feature check-in data from regions like the USA (such as New York) and Japan (such as Tokyo). Additionally, the dataset encompasses the list of all friends for each user within the LBSNs.

Gowalla,Footnote 2 a location-centric social media platform was established in 2007 and later acquired by Facebook in 2012. Its primary function revolved around location check-ins. Available datasets from Gowalla cover check-ins from February 2009 to October 2010. Similar to Foursquare, the Gowalla dataset also includes a list of friends for every user. Furthermore, detailed descriptions of each POI and user profiles enrich this dataset.

BrightkiteFootnote 3 was a location-based social networking website launched in 2007. It allowed users to check in at different locations via text messaging or a mobile app, thereby sharing their whereabouts and seeing who else was nearby or who had been there before. It also enabled users to post notes and photos at these locations, which other users could comment on. The platform was known for its public API, which let developers create applications that integrated with Brightkite’s services.

Table 3 Common used dataset and statistics

4.2 Evaluation metrics

Selecting adequate metrics to evaluate the performance of different models is essential. For the next POI recommendation task, understanding the accuracy and relevance of the suggested locations is of paramount importance. As users rely on these systems to guide their next visits or activities, the quality of recommendations directly impacts user satisfaction and system utility. Therefore, metrics that evaluate the accuracy of top-K recommendations are widely adopted. These metrics provide insights into how well the recommendation system can predict users’ next POIs, ensuring that users receive timely and relevant suggestions.

In next POI recommendation tasks, Recall@k measures whether the actual next POI (ground truth) appears within the top-k recommended POIs. For each instance (i.e., each recommendation for a user), the model generates a list of the top-k recommended POIs. If the ground truth POI is present in this top-k list, Recall@k is 1; otherwise, it is 0. This metric evaluates the model’s ability to include the correct POI within the top-k recommendations, without considering the exact rank of the POI within the list.

Similarly, Accuracy@k (Acc@k) is defined in the same way for next POI recommendation tasks, measuring whether the actual next POI (ground truth) appears within the top-k recommended POIs. Thus, in the context of next POI recommendation tasks, Recall@k and Accuracy@k are essentially the same metric, both measuring whether the correct POI appears within the top-k recommendations.

$$\begin{aligned}{} & {} Recall@k =\frac{1}{m} \sum _{i=1}^m \mathbbm {1}\left( {\text {rank}}_i \le k\right) \end{aligned}$$
(1)
$$\begin{aligned}{} & {} Accuracy@k =\frac{1}{m} \sum _{i=1}^m \mathbbm {1}\left( {\text {rank}}_i \le k\right) \end{aligned}$$
(2)

where \(\mathbbm {1}\) is the indicator function that returns 1 if the condition inside is true, and 0 otherwise. Here, \({rank}_i \le k \) indicates whether the predicted POI for is within the top-k recommendations.

MRR(Mean Reciprocal Rank) measures the position of the correctly recommended POI in the ordered result list. It takes into account the order of recommendations, giving more weight to correctly recommended items that are ranked higher. MRR places particular emphasis on the position of the first correct recommendation. If the first correct next POI is ranked high, the MRR value will be close to 1. Conversely, if the first correct next POI is ranked low, the MRR value will be much less than 1. If the correct next POI does not appear in the recommendation list, then the reciprocal MRR for that query is 0. This makes MRR an excellent metric for evaluating the performance of a recommendation system in terms of the order of its suggestions.

$$\begin{aligned} MRR = \frac{1}{m} \sum _{i=1}^{m} \frac{1}{{\text {rank}}(i)} \end{aligned}$$
(3)

where rank(i) represents the position of the first correctly recommended next POI for the i-th query in the ordered recommendation list. If the correct next POI does not appear in the recommendation list, \(rank_i = \infty \), and thus \(\frac{1}{rank_i} = 0\).

NDCG@K (Normalized Discounted Cumulative Gain) is a metric that evaluates the quality of a ranked list of recommendations. It not only considers whether the recommendations are correct but also takes into account their position in the list. The higher the relevance of recommendations at the top of the list, the higher NDCG@K score. Unlike metrics that treat recommendations as binary (relevant or not), NDCG@K allows for graded relevance, meaning items can have varying degrees of relevance.

$$\begin{aligned} NDCG@k=\frac{DCG@k}{IDCG@k} \end{aligned}$$
(4)

where DCG@k (Discounted Cumulative Gain) is calculated as:

$$\begin{aligned} DCG@k=\sum _{i=1}^k \frac{2^{rel_i}-1}{\log _2(i+1)} \end{aligned}$$
(5)

\(rel_i\): the graded relevance of the result ranked at position i. \(rel_i = 1\) if the recommended POI is in the ground truth and IDCG@k (Ideal Discounted Cumulative Gain) is the DCG of the ideal ranking, where the top-k results are perfectly ranked by relevance. It is calculated as:

$$\begin{aligned} IDCG@k=\sum _{i=1}^k \frac{1}{\log _2(i+1)} \end{aligned}$$
(6)

4.3 Performance evaluation

In this survey, we have not conducted experiments to compare the performance of various baselines directly. Instead, we refer to performance metrics reported in different studies to illustrate their capabilities. It’s crucial to recognize that despite using the same dataset, differences in preprocessing methods, experimental conditions, and parameter selections exist. These discrepancies render a direct comparison of the baselines’ performance inappropriate. The performance of certain baselines on the Foursquare - Tokyo dataset can be observed in Table 4.

Table 4 Performance of different baselines under the Foursquare - Tokyo Dataset

5 Future research directions

As the field of GNNs continues to evolve, several promising research directions emerge, particularly in the context of the next POI recommendations. Here, we delve deeper into these avenues:

5.1 Scalability and reliability in GNNs

In the context of scalability in GNNs, particularly for recommendation systems dealing with burgeoning user and location data, the challenges of substantial memory requirements and the high computational complexity of GNNs become even more pronounced [60]. As user–location interaction graphs expand in size, the need for efficient GNN algorithms capable of real-time recommendations on large-scale graphs becomes critical. This situation is further compounded by the dynamic nature of these graphs, where the frequent addition of new POIs and the removal of outdated ones demand agile model updates. Future research should therefore focus on developing GNN architectures and training methods that are not only memory-efficient and computationally less intensive but also capable of adapting quickly to changes in the graph structure. Such advancements will be pivotal in ensuring that GNN-based recommendation systems can scale effectively to handle large and evolving data sets without sacrificing performance or accuracy.

In the task of next POI recommendation, the reliability of the model is crucial. Enhancing the interpretability of GNNs is a key factor in improving model reliability. References [61,62,63] has addressed this issue by providing transparent and interpretable decision-making processes, allowing users and developers to understand the basis of the model’s recommendations, thereby increasing trust in the model. Moreover, improving the ability of GNNs to handle data noise and outliers is equally important. Real-world data often contains various noises and anomalies, which can affect the model’s performance. Reference [64] has discussed ways to enhance the robustness of GNNs in the face of these challenges, ensuring that the model performs well even in noisy and anomalous data environments. Further enhancing model reliability is a critical direction for future research. This includes developing new techniques and methods to improve the reliability of GNNs specifically for next POI recommendation models.

5.2 Dynamic graph neural networks

Users’ location preferences and behavioral patterns are continually evolving, making the adaptability of models to these changes crucial for providing relevant and timely recommendations. Dynamic GNNs promise significant advancements in capturing these shifts. Unlike static models, dynamic GNNs can adapt in real-time to changes in user behavior and preferences, offering a more accurate reflection of current needs [65]. While dynamic GNNs offer promising adaptability to capture users’ changing preferences, ongoing research is essential to fully harness their potential in reflecting real-time user behavior and POI status changes. The integration of dynamic GNNs in online POI recommendation systems represents a forward-looking approach that could dynamically adapt to changing preferences and behaviors by leveraging real-time data streams. This integration is pivotal for the development of recommendation systems that are not only responsive to the immediate context but also scalable and efficient in processing high-velocity data. Future research in this direction is crucial for achieving a balance between adaptability and computational efficiency, enabling recommendation systems to provide timely, relevant, and personalized suggestions that reflect the latest user behaviors and preferences.

5.3 Data privacy

Addressing the critical challenge of user privacy is paramount for enhancing trust in lots of intelligent systems [66, 67], especially for the next POI recommendation systems. Implementing advanced privacy-preserving techniques, such as decentralized collaborative learning, could mitigate concerns and enable more accurate recommendations. Users’ hesitance to share GPS traces stems from valid concerns over potential privacy invasions, as such data can unintentionally reveal sensitive personal information [68]. This reluctance compromises the accuracy of the next POI recommendations, as these systems depend on analyzing large volumes of sensitive user data, thus posing substantial privacy risks. The quality of recommendations diminishes when users withhold their data due to these concerns.

Addressing privacy concerns is crucial for enhancing user trust and participation, which in turn improves the quality and reliability of recommendation systems. Implementing measures to ensure privacy can mitigate these risks, enabling more personalized and accurate recommendations without compromising sensitive information. One noteworthy approach to tackling this issue is the DCLR [69] method, which allows users to train models locally. This minimizes reliance on cloud-based systems and significantly enhances user privacy by keeping sensitive data on the user’s device. Improving user privacy in the next POI recommendation systems is essential for addressing data sparsity and enhancing model personalization. It remains a critical area of focus for future research, aiming to balance the dual objectives of maintaining high-quality recommendations and protecting user privacy.

5.4 Deployment on mobile devices

GNN-based next POI recommendation on mobile devices is identified as a future direction due to its ability to leverage location-based social networks for personalized service, overcoming privacy concerns and computational limitations of centralized systems [48]. The move towards on-device recommendation systems addresses issues of privacy, reliance on centralized servers, and the need for personalized, context-aware recommendations without sharing sensitive data.

6 Conclusion

In smart cities, the significance of next POI recommendations has increased, especially with the advent of advanced technologies. Recognizing the inherent strengths of GNNs in processing graph data, there has been growing interest in leveraging GNN techniques for next POI recommendations. In this review, we embarked on a thorough exploration of the latest advancements in GNN-based next POI recommendations. We analyzed the characteristics of the next POI recommendation problem and categorized these characteristics to understand how different studies utilize them with GNNs. We also summarized common graph construction methods for GNNs in the context of next POI recommendations. Additionally, we provided an overview of frequently used datasets and evaluation metrics, giving researchers a clearer understanding of the field. Moreover, we identified several future research directions, including GNN scalability and reliability, dynamic architectures, and pressing concerns like data privacy. In essence, this survey aims to offer readers a comprehensive view of the current state-of-the-art in GNN-based next POI recommendations, while also highlighting potential directions for future innovation in smart city applications.