1 Introduction

With the widespread use of smartphones and mobile wearable devices, location-based social networks (LBSNs) such as Foursquare, Gowalla, and Dianping have developed rapidly, allowing people to easily share their check-in information. However, the tremendous amount of location information also introduces the challenge of information selection when using LBSNs. One important application of recommendation systems on LBSNs is to recommend the geo-location and services that users may be interested in based on the check-in data of location (Wang et al. 2017), comments, and other information (Zeng et al. 2017). As a powerful tool to solve this problem of handling massive data (Li et al. 2020), point-of-interest (POI) recommendation systems help users select the locations they may be interested in and filter out useless information as well as promote tourism planning, business marketing, and scenic tourism services (Uddin and Habibullah 2020). The innovation of POI recommendation algorithms has become a research focus in recent years, and many researchers have undertaken efforts to improve algorithm performance (Liao et al. 2018). Popular POIs, concentrated check-in time, and POI category with a high check-in frequency can be identified based on users’ check-in data (Zhang et al. 2017). By combining such information with users’ profile (Kurashima et al. 2013), POI location and category (Li et al. 2016a, b), social contacts, and comments, various hybrid recommendation models can provide personalized POI recommendations to users (Nassar et al. 2020).

1.1 POI recommendation with temporal information

Time is the most important situational factor considered in the research of POI recommendation. Users’ check-in activities have a time sequence, which reflects the temporal attributes of user preferences and needs (Chen et al. 2016). Considering latent factor models are widely used in recommendation system, because most data of a recommender system, such as ratings, impacts, and link hit-counts, can be expressed as the interaction of a user latent vector and an item latent vector (Luo et al. 2019a, b). Latent factor models and the Markov method are widely used in sequence data analysis and recommendation tasks (Luo et al. 2016). The factorizing personalized Markov chain (FPMC) proposed by Rendle et al. (2010) combines matrix factorization and Markov chains for next-basket recommendation. On this basis, Cheng et al. (2016) added restrictions on user activities in Markov chains based on FPMC and applied the modified model to next-POI recommendation.

Latent factor models with the item’s profile property mainly use matrix factorization to generate a weighted matrix factorization model (Luo et al. 2019a, b). The combination of temporal property and matrix factorization usually divides user check-ins with different time intervals and obtain a check-in matrix in each interval. Next, the probability score of the candidate POI in each time interval is calculated via matrix factorization. The total score is the sum of the scores in all time intervals, which is used to make a recommendation. The four-dimensional tensor decomposition model proposed by Li et al. (2017) incorporates users’ long- and short-term preferences with the check-in matrix derived from temporal factorization and considers the POI category to reduce the data sparsity and cold-start problems, which would occur when only the check-in sequence data are used.

Recently, recurrent neural networks (RNNs) have been successfully applied in sequence data modeling and have become a popular method for sequence data analysis and recommendation (Li et al. 2017), in particular, it can significantly alleviate the effect of data sparsity in the matrix factorization model (Luo et al. 2015). The approach of Quadrana et al. (2017) using RNNs in session-based recommendation tasks depends on the user’s session data for recommendation, and it does not require user ID or embedding to characterize users. Similarly, researchers have proposed various POI recommendation methods that use the tracking of user check-in POIs rather than user ID and embedding to characterize users for recommendation (Zeng et al. 2017). Based on the long short-term memory network (LSTM), Zhao et al. (2018) proposed a time-LSTM network architecture to solve the recommendation problem that considers time factors, sets a “control gate” for temporal information in check-in data, and constructs a time-segment-based model for “next-item” recommendation. On this basis, Xia et al. (2017) further improved RNN neurons and combined the RNNs with the attention network to determine the spatio-temporal relationships between users’ long-term and short-term preferences and check-in behavior.

1.2 POI recommendation with geographical information

In contrast to sequence-based recommendation for purposes such as shopping and natural language, geo-location information is a distinct feature that is important for POI recommendation. Geo-location information contains the objective physical relation between POIs and is closely associated with the activities of users. Many studies have used geo-location information to improve the quality of recommendations. The commonly used geo-location-based modeling methods for POI recommendation include the power law distribution method, kernel density estimation (KDE) (Zhang and Chow 2016) (Zhang and Chow 2015a, b), and LBSNs (Zhang et al. 2014a, b). In various studies, the distance between the user’s current location and the candidate recommended POI location has been calculated to obtain a candidate set of users’ check-in distance preferences based on the power law distribution of user check-in behavior. Some studies considered nearby POIs of a candidate POI calculated using the distance between two POIs in the matrix factorization and determined the geographical effect based on the weight of the location—i.e., the proximity. This method is called weighted matrix factorization (Liu et al. 2014). The GeoMF method (Lian et al. 2014) divides the impact on user check-in behavior at a particular POI into users’ impact and POIs’ impact to describe the scope of a user’s activity area to determine both the users’ impact on the visits of POIs and the geographical impact on the users’ check-in behavior. The relation between a user and a POI is represented by the vector product, and the final POI score is the sum of the vector products derived from matrix factorization. KDE can model the check-in preference of a user in two-dimensional (2D) space. It describes the geographical distribution of the user’s visits in a specific area based on the longitude and latitude coordinates of the check-in POIs and estimates the user’s check-in probability based on the kernel function (Zhang et al. 2014a, b).

Most LBSNs-based POI recommendation systems, including USG (Ye et al. 2011), iGSLR (Zhang and Chow 2013), LORE (Zhang et al. 2014a, b), and Geo-So-Ca (Zhang and Chow 2015a, b), use hybrid methods that integrate geographic location, POI category, and social relations of users. For example, Geo-PFM (Liu et al. 2015) constructs a 2D POI check-in distribution and uses different polynomial functions to represent user check-in behavior based on the geographical information of all POIs in the user’s check-in records, based on the consideration that the user’s check-in behavior is mainly related to user preference and location. In ASMF (Li et al. 2016a, b), the final POI score of a user is derived from a comprehensive consideration of the check-in record of the user’s neighbors on POI, the user’s POI category preference shown in her/his past check-in records, and the distance between the candidate POI and the user’s “home”.

In this paper, a POI recommendation method that considers multiple spatio-temporal factors is proposed. Using users’ check-in sequence, POI location, and POI category data, the correlation score between users’ candidate POI and historical visits is calculated to generate Top-N recommendation. LSTM neural networks for deep learning are used to construct a time-based check-in behavior model for users’ periodic, repetitive, and complex daily activities. After extracting the information regarding the user’s long- and short-term preferences from the user’s historical check-in sequence data, the model predicts the probability whether the candidate POI conforms to the user’s historical preference—namely, whether the user will visit the candidate POI in the future, which is the probability estimated with temporal information. Regarding the geographical clustering of users’ check-in behavior, this paper uses the KDE method to build a 2D spatial model of user’s POI location preference to demonstrate and quantify user’s preference for geographical location and provide a spatial-location-based prediction of check-in probability. In addition, we construct a model of a user’s personal preference on POI category to calculate the user’s visit probability based on POI categories. Adding the factors of POI location and category preferences into the neural network model based on check-in sequence helps to resolve the sparsity problem of check-in data. Moreover, this paper offers a comprehensive user preference model constructed from three dimensions—time, location, and category—achieve more precise and personalized recommendations.

The contributions of this paper include the following. (1) In addition to users’ check-in sequence, the recommendation model uses the POI location (longitude and latitude) and POI category as inputs, which alleviates the data sparsity problem caused using only the user-POI check-in sequence. The experimental results also prove that using POI location and POI category data can greatly improve the precision and recall of recommendation. (2) In addition to sequence-based check-in probability prediction by LSTM neural network, the proposed model predicts the visit probability based on users’ location and POI category preferences. User preference is interpreted from the perspectives of time, POI location, and POI category, and a recommendation in line with user preference is generated, enhancing the understandability of the comprehensive recommendation model based on deep learning and the and interpretability of the recommendation results. (3) This study derives two conclusions that have reference values for POI recommendation research. First, although check-in behavior has a clustering phenomenon, a user’s location preference is stable; therefore, both long- and short-term location preferences determined from historical check-ins have very few fluctuations, which can be used to improve the recommended effect. Second, the POI category preference of a user is prone to “drift”; as a result, the category preference obtained from historical check-ins is unstable and can improve only the performance of “Top-1” recommendation.

2 The proposed recommendation algorithm

This paper proposes an LSTM-based POI recommendation model that considers multiple situational factors of time, location, and category. The framework of the model is shown in Fig. 1. First, mathematical models of check-in sequence, POI location, and POI category are constructed to predict the check-in probability of users in different dimensions. The LSTM network is used to mine the behavioral pattern of a user in terms of check-in sequence and predict the user’s check-in probability at a candidate POI in the temporal dimension. KDE is applied to describe the spatial distribution of the user’s visits and construct a personalized location preference model for check-in probability prediction at the candidate POI in the spatial dimension. The proposed method also calculates the category preference to predict the probability that the user will visit a candidate POI in the category dimension. Finally, the comprehensive user check-in probability Scorek at the candidate POI k—i.e., the sum of the check-in probabilities in the above three dimensions which is described as Eq. (1)—is determined and used to generate a Top-N recommendation list. The performance of the proposed algorithm is evaluated using two indicators, precision and recall. In addition, the sensitivities of time, location, and category in the improvement of the recommendation effect are investigated. The notations used in this paper are listed in Table 1.

$$ {\text{Score}}_{k} = \alpha \cdot {\text{Score}}_{{{{time}}}} + \beta \cdot {\text{Score}}_{{{{location}}}} + \gamma \cdot {\text{Score}}_{{{{category}}}} , $$
(1)

where α, β, γ represent the coefficients used in the integration of the probability scores in different dimensions.

Fig. 1
figure 1

Framework of proposed recommendation model

Table 1 Notations

2.1 LSTM-based check-in probability prediction in the temporal dimension

The time sequence of a user’s check-ins is a check-in track from the past to the present. Each segment of the track represents the user’s status during the corresponding period. For example, in the complete check-in track of user u1 consisting of the check-ins (x1, x2, x3,……, xn-1, xn), there are (x1, x2, x3,……, xn-1) → xn at time t = n and (x1, x2, x3,……, xn-2) → xn-1 at time t = n-1. Each user check-in track corresponds to a conditional probability represented by its prefix sequence and the last check-in, the probability is represented by the below equation:

$$ P\left( {x_{t} } \right) = P\left( {x_{t} \left| {X_{t - 1} } \right.} \right) = P\left( {x_{t} \left| {\left( {x_{1} ,x_{2} \cdots x_{t - 1} } \right)} \right.} \right). $$
(2)

According to the time sequence of check-ins, the user’s current check-in is affected by the prefix sequence. The user’s personal preference, real-time needs, and changes are reflected in all check-in segment sets; hence, the user’s historical check-ins are not independent of each other, and the joint probability is represented as Eq. (3):

$$ P\left( {x_{1} ,x_{2} \cdots x_{T} } \right) \ne \prod\limits_{i = 1}^{T} {P\left( {x_{i} \left| {X_{i - 1} } \right.} \right)} , $$
(3)

In this paper, LSTM is used to mine the personal attribute in user’s check-in sequence. After processing with the basic neurons of LSTM, a state vector (hidden vector) \(\mathop {c_{t} }\limits^{ \sim }\) and a state value (hidden state) ct are obtained at each POI on the user’s check-in track, the transfer function of \(\mathop {c_{t} }\limits^{ \sim }\) and ct are specified as Eqs. (4) and (5) which are represented as below. The hidden state vector represents the temporal feature vector of the current check-in extracted by the neural unit. The state value is passed to the processing step of the next check-in as an abstract of this feature, which represents the impact of the current check-in on the subsequent check-in.

$$ \mathop {c_{t} }\limits^{ \sim } = \tanh \left( {W_{c} x_{t} + U_{c} h_{t - 1} + b_{c} } \right), $$
(4)
$$ c_{t} = f_{t} \odot c_{t - 1} + i_{t} \odot \mathop {c_{t} }\limits^{ \sim } . $$
(5)

In the basic neurons of LSTM, the memory and forgetting of the current location and prefix sequence it are controlled via the input gate, the forget gate ft, and the output gate. The parameters are determined during the training process, and the time feature vector (history feature) ht containing check-in preference information is extracted from the check-in track. The vectors are updated as follows: Wi, Wf, and Wo represent the weight matrix in input gate, forget gate, and output gate; bi, bf, and bo represent the bias in input gate, forget gate, and output gate; Ui, Uf, and Uo represent the cell state in input gate, forget gate, and output gate.

$$ i_{t} = \sigma \left( {W_{i} x_{t} + U_{i} h_{t - 1} + b_{i} } \right), $$
(6)
$$ f_{t} = \sigma \left( {W_{f} x_{t} + U_{f} h_{t - 1} + b_{f} } \right), $$
(7)
$$ o_{t} = \sigma \left( {W_{o} x_{t} + U_{o} h_{t - 1} + b_{o} } \right), $$
(8)
$$ h_{t} = o_{t} \odot \tanh \left( {c_{t} } \right). $$
(9)

For any training sample X =  < H, T > composed of actual check-in sequence H and candidate POI T, each training sample corresponds to a tag y. If T is the user’s actual check-in POI after H, then y = 1; otherwise, y = 0. The probability whether the user visits the candidate POI is \({\text{Score}_{time}} = P\left( {\mathop y\limits^{ \wedge } = 1\left| {\left\langle {H,T} \right\rangle } \right.} \right)\), which is the case when y = 1 in a binary classification problem. First, the historical check-in behavior matrix WeH and the vector WeT of the candidate POI are obtained through embedding. After the time feature extraction of LSTM from WeH, user’s historical check-in feature vector is obtained, which is concated with the candidate POI vector. Next, the multilayer full connection and sigmoid activation function are applied to calculate the tag \(\mathop y\limits^{ \wedge }\) between the candidate POI and the historical check-ins, \(\mathop y\limits^{ \wedge }\) is updated by Eq. (10).

$$ \mathop y\limits^{ \wedge } = sigmoid\left( {concat\left( {LSTM\left( {W_{e} H} \right),W_{e} T} \right)} \right). $$
(10)

The LSTM model updates its parameters based on the error between the predicted and actual check-in tags. Iterative training is performed to minimize the objective function L and optimize the model parameters until the model converges and achieves the expected prediction precision. Cross entropy loss is used as the objective function and the loss function is specified as Eq. (11).

$$ L = - \sum\limits_{i = 1}^{n} {\mathop {y_{i} }\limits^{ \wedge } } logy_{i} + \left( {1 - \mathop {y_{i}}\limits^{ \wedge }} \right)log\left( {1 - \mathop {y_{i}}\limits^{ \wedge } } \right). $$
(11)

2.2 KDE-based check-in probability prediction in the spatial dimension

The user’s localization characteristics of interest and activities lead to geographical “clustering” of the user’s check-ins. Figure 2 reflected two users’ actual check-in records in the Foursquare dataset. The left panel shows that all check-in activities of the user are clustered in the centers of two connected circular distribution areas; the right panel shows only one obviously clustered distribution area, and the visited locations in other two areas are scattered. The following can be seen in the figure. First, analysis of the check-in location clustering can provide an important reference for determining the correlation between a candidate POI and historical check-ins. A candidate POI closer to the center of the cluster is more likely to be visited. Second, selecting a candidate POI based on only the distance between the candidate POI and historical check-ins might have errors due to missing information, because the relationship between user’s historical check-ins and the candidate POI can be better represented in 2D spatial. Third, the user’s check-in location distribution shows a distinct characteristic of personalization; therefore, user-level geographical preference modeling can significantly improve the quality of personalized POI recommendation.

Fig. 2
figure 2

Example of user preference on POI location

Based on the study of Zhang and Chow (2014), this paper applies the KDE method, which is based on a binary Gaussian function to fit the check-in location distribution at user level. The check-in probability density function on a 2D plane is obtained using the historical check-in data of the target user, and then the check-in probability of the candidate POI is calculated.

The location set Lu specified as Eq. (12), which is expressed by the latitudes and longitudes of POIs, is obtained from user’s historical check-in records. For a given candidate POI, the user’s check-in probability \({\text{Score}_{location}} = f_{Geo} \left( {l\left| u \right.} \right)\) is approximately the user’s check-in frequency at its nearby POIs, which is given as Eq. (13).

$$ L_{u} = \left\{ {l_{1} ,l_{2} ,l_{3} , \cdots ,l_{n} } \right\} = \left\{ {\left( {x_{1} ,y_{1} } \right),\left( {x_{2} ,y_{2} } \right),\left( {x_{3} ,y_{3} } \right), \cdots ,\left( {x_{n} ,y_{n} } \right)} \right\}, $$
(12)
$$ f_{Geo} \left( {l\left| u \right.} \right) = \frac{1}{N}\sum\limits_{i = 1}^{n} {\left( {C_{{u,l_{i} }} \cdot K_{{Hh_{i} }} \left( {l - l_{i} } \right)} \right)} ,\quad N = \sum\limits_{i = 1}^{n} {C_{{u,l_{i} }} } , $$
(13)

where n is the total count of historical check-in locations of user u, Cu,li is the user’s check-in frequency at location li, KHhi which can be represented as Eq. (14) is calculated using the standard deviations of the longitudes and latitudes of all POIs, and H1 and H2 are two global coefficients calculated according to the longitudes and latitudes of POIs (Zhang and Chow 2015a, b).

$$ K_{{Hh_{i} }} \left( {l - l_{i} } \right) = \frac{1}{{2\pi H_{1} H_{2} h_{i}^{2} }}exp\left( { - \frac{{\left( {x - x_{i} } \right)^{2} }}{{2H_{1}^{2} h_{i}^{2} }} - \frac{{\left( {y - y_{i} } \right)^{2} }}{{2H_{2}^{2} h_{i}^{2} }}} \right). $$
(14)

The local bandwidth coefficient hi and the location coefficient g are used to represent the personalized check-in characteristics of user u for probability estimation based on the self-adaptive kernel function of user u, which are described by the following equations:

$$ h_{i} = \left( {g^{ - 1} \cdot \mathop {f_{Geo} }\limits^{ \sim } \left( {l_{i} \left| u \right.} \right)} \right)^{ - \alpha } , $$
(15)
$$ g = \sqrt[n]{{\prod\limits_{i = 1}^{n} {\mathop {f_{Geo} }\limits^{ \sim } \left( {l_{i} \left| u \right.} \right)} }} $$
(16)

2.3 Category preference modeling based on check-in frequency

A user’s check-in preference includes the POI category preference. In the user’s check-in POI category set, the check-in frequency of each category is obtained. A higher check-in frequency of a POI category indicates that the user more strongly prefers that POI category and the POI category is more important to the user. The preference vector Uc of the POI category is obtained by Eq. (17), and the number of dimensions of this vector is the same as the total number of POI categories (Fig. 3).

$$ U_{ci} = \frac{{\sum\nolimits_{k = 1}^{n} {u_{ki} } }}{{\sum\nolimits_{i = 1}^{m} {\sum\nolimits_{k = 1}^{n} {u_{ki} } } }}, $$
(17)

uki is valued 1 or 0 to indicate whether the POI of the kth check-in of user u belongs to category i.

Fig. 3
figure 3

Example of user preference on POI category. All POIs are grouped into nine categories in Foursquare, including 1: nightlife spot, 2: food, 3: shop and service, 4: travel and transport, 5: art and entertainment, 6: professional and other places, 7: college and university, 8: outdoors and recreation, and 9: residence. We use a 9-dimensional vector to represent the POI category and user’s category preference respectively

The POI category is represented by one-hot vector Pc. Uc and the category vector Pc of the candidate POI have the same dimensions. The inner product of the two vectors represents the user’s historical check-in frequency in the same category as the candidate POI, which is the probability estimation Scorecategory = Uc·Pc of the user’s check-in at the candidate POI.

3 Experiment and result analysis

3.1 Data analysis

The real check-in dataset Foursquare was used to validate the personalized POI recommendation method proposed in this paper. The check-in records in the Foursquare dataset were from March 2012 to October 2013, and the check-in data in the two cities of New York and San Francisco were used in our experiments. In the two sub-datasets of NY (Now York) and SF (San Francisco), the total check-in counts show a similar fluctuation trend over time as shown in Fig. 4. To minimize the interference of irrelevant factors, this study used the data in only the two datasets from October 2012 to June 2013 when the total check-in counts were relatively stable. In addition, the users with less than 10 check-ins and POIs with less than five check-ins during the study period were discarded to reduce data sparsity. The data details are presented in Table 2 and illustrated in Fig. 4.

Fig. 4
figure 4

Total check-in counts in NY and SF

Table 2 Description of experimental datasets

Figures 5 show the distribution and cumulative distribution of the length of the user check-in sequence in the two sub-datasets. In NY, the maximum length of the user check-in track is 1144, with an average length of 43. A cutoff length of 200 can fully retain more than 85% users’ check-in information. In SF, the maximum length of the user check-in track is 1041, with an average length of 27. A cutoff length of 120 can fully retain 85% users’ check-in information. To retain as much check-in information as possible while minimizing the impact of data sparsity on calculation, the “padding” treatment was applied on user check-in sequences of different lengths. For NY, the fixed length for “padding” was 200. The check-in sequences with fewer than 200 check-ins were padded with zeros, and for those with more than 200 check-ins, only the first 200 check-in POIs were included. The fixed length for padding in SF was 120.

Fig. 5
figure 5

Distribution and cumulative distribution of check-in length in NY and SF

3.2 Result analysis

3.2.1 Comparative experiment and evaluation indicators

We selected three models for comparison to verify the performance of the proposed method in improving recommendation quality. To minimize the interference on the results caused by the difference in the amount of information used, the location and category factors were added into the original model used in the previous research. The three models used for a parallel comparison with the model proposed in this paper are described as follows:

(1) Deep neural networks (DNN). DNN uses the deep section in wide and deep model for pop-up recommendation. The model gives recommendations in pop-up windows based on a user’s click sequence. In the deep section of the model, fully connected networks are used to abstract the features of a user’s “query” and “item” and calculate the matching degree between the two. The model is described in Reference De and Kao (2019).

(2) The deep context-aware POI recommendation model (DCPR). DCPR is a location recommendation model based on check-in sequences. It mainly consists of three collaborative layers. The first is the convolutional neural network (CNN) layer for POI feature mining. The second is the RNN layer that can extract sequential dependency and can be used to construct a user preference model. The last layer is an interactive layer based on matrix factorization to optimize the overall model using a stochastic gradient descent algorithm. The model and its experimental parameters are described in Wang et al. (2017).

(3) Gated recurrent units (GRU). GRU is a variant of RNN that achieve superior performance among RNN. Their architecture is simpler than that of LSTM, with only two gates, the update gate and the reset gate. Some parameters used in this study were based on Reference Van and Fellow (2014), whose representative GRU have been widely used for comparisons with other RNN models.

To evaluate the impacts of the time, location, and category factors on the quality of location recommendation, the proposed recommendation method was analyzed in depth by investigating the impacts of not only each individual factor but also different combinations of the factors. Three methods were used progressively for comparison:

  • LSTM. LSTM networks and users’ check-in data (only user ID and location ID) were used for POI recommendation based on check-in sequence.

  • LSTM + GEO. LSTM networks and KDE were used for POI recommendation based on both the check-in sequence and geo-location data.

  • LSTM + GEO + CATE. LSTM networks and KDE were used to give POI recommendations based on check-in sequence, geo-location, and POI category. This is the final recommendation method proposed in this paper.

The precision and recall of the recommendation lists derived from the above methods were compared to evaluate the performance of different algorithms. The experimental results are shown in Sect. 3.2.2.

3.2.2 Experimental results

We counted the POI categories in experimental dataset and conducted several pre-experiments to determine the optimal number of categories which is the dimension of user’s category preference vector and the POI category vector due to that some category is fairly infrequent. Figure 6 depicts the check-in frequency of categories, the POIs with “residence” are fairly infrequent in two sub-datasets. We set up three experiments: All category: use all categories to construct a 9-dimensional category vector. Frequent category: use all categories expect “residence” to construct an 8-dimensional category vector. No category: do not use category information. The result of the pre-experiments is shown in Fig. 7, model performance with all categories achieved the highest precision (0.568 at NY and 0.256 at SF) and recall (0.036 at NY and 0.059 at SF). After removing an infrequency category, the precision (0.561 at NY and 0.254 at SF) and recall (0.035 at NY and 0.059 at SF) were slightly reduced but still higher than the No category. There are indeed fewer POIs in the dataset with infrequent category, constructing a 9-dimensional category feature vector according to the original dataset does not cause data sparseness, but preserves complete user preference information. Therefore, in the following experiments, we preserved all categories to represent POI’s category feature and user’s category preference. We use a 9-dimensional one-hot vector to represent the category feature of each POI, each dimension is a binary value that 0 means the POI is not belong to this category, and 1 means the POI is belong to this category. User’s category preference is represented by a 9-dimensional vector, and each dimension represents the frequency of the user's check-in on each category.

Fig. 6
figure 6

Description of POI categories in Foursquare. Bars indicate the check-in frequency of each category. Curve shows the cumulative check-in frequency of each category. The mapping between numbers and categories is: 1: food, 2: shop and service, 3: professional and other places, 4: nightlife spot, 5: travel and transport, 6: art and entertainment, 7: college and university, 8: outdoors and recreation, 9: residence

Fig. 7
figure 7

Pre-experimental results for the number of categories determination (Precision@Top-1 and Recall@Top-1)

The experimental results of different models are shown in Fig. 8. The location recommendation model proposed in this paper is significantly superior to other models in terms of the precision and recall. In particular, when the Top-N recommendation list is short, the proposed model achieves a precision of 0.568, which is 25.39% higher than the GRU model and 35.23% higher than the DNN model (Top-1 on the NY sub-dataset). The recall of the proposed model reaches 0.036, 28.37% higher than the GRU model and 40.31% higher than the DNN model (Top-1 on the NY sub-dataset). The results derived from both sub-datasets confirmed that the recommendation quality can be stably improved by the proposed algorithm. Compared with the GRU model, the LSTM architecture used in this paper is more suitable for the prediction of users’ check-in location. This indicates that users’ check-in preference might be either long term or short term, and LSTM can control the preference attenuation through it forget gate for long-term user preference. GRUs perform slightly worse than the proposed model because they do not distinguish between long-term and short-term preference. CNN and DNN are two completely static neural networks, which extract features of the matrix only after sequence embedding and do not consider the sequence features between locations. Therefore, DCPR and DNN have the worst recommendation quality among all models. In the SF sub-dataset, because the check-in sequences have a small average length and do not have distinct sequence features, DCPR performs better than GRU due to the advantage of CNN in processing the embedding matrix of the sequences.

Fig. 8
figure 8

Comparison of the precision and recall of recommendation results between different models [NY (top), SF (bottom)]

The results in NY (Fig. 9) show that the LSTM + GEO probability estimation method, which considers the geographical factor in addition to the temporal factor in the application of LSTM, has remarkably higher recommendation precision and recall. LSTM + GEO + CATE probability estimation with the addition of category can significantly improve the precision and recall of recommendation results in Top-1—i.e., the “Next Point” recommendation. However, when N > 1, the addition of category has a weak or even negative influence on the recommendation quality, as evidenced by the fact that LSTM + GEO + CATE probability estimation always performs worse than LSTM + GEO. This indicates that users’ location preference obtained from users’ historical check-in data is stable under the geographical clustering effect, and it can be used to stably and positively affect the location prediction quality of future check-ins. Users’ preference for location category plays a relatively stable role in the “next point” prediction, but its drifting in future check-ins may lead to a negative impact on the improvement of recommendation performance when N > 1.

Fig. 9
figure 9

Precisions and recalls of the recommendation methods when using different sub-datasets [NY (top) SF (bottom)]

However, in SF, in which the average number of users’ check-ins is far less than that in NY, the importance of category information in probability estimation increases because the time and spatial information of users’ check-ins are more sparse. As a result, the LSTM + GEO + CATE method with the addition of the category factor is superior to other recommendation methods that consider only time and location information in terms of precision and recall. Hence, the category correlation score can be applied to significantly improve the recommendation quality when users’ check-in sequences are short and only a small amount of check-in data is available.

3.3 Parameter analysis

The ranking scores are calculated using the formula Scorek = α· Scoretime + β· Scorelocation + γ·Scorecategory, where α, β, γ are, respectively, the coefficients of time, location, and category used in the calculation of the comprehensive probability. The coefficient value reflects the sensitivity of the corresponding factor in the improvement of recommendation quality. The value of the coefficients reflects the degree of time, location and category relevance considered in ranking score. For example, the larger the value of α, the more time correlation of candidate POIs is considered in the recommendation process, while if α is 0, indicating that the time correlation is not considered in the recommendation process. Table 3 shows the optimal recommendation results with different factors combinations.

Table 3 Performance of different recommendation methods under optimal parameter combination

3.3.1 Effects of α, β, and γ on recommendation precision and recall at NY

From the experimental results, the impact of these three parameters on precision and recall is dataset related. In the experiments using NY, the precision and recall are basically affected by three parameters, showing a monotonous trend which is shown in Fig. 10. It can be seen from Fig. 10 (1) and (2), this change is caused by α when α is non-zero. The maximum of precision is 0.548 when (α, β) = (0.981, 0.750), and 0.334 when (α, γ) = (0.899, 0.021). Without considering the time correlation, when (β, γ) = (0.990, 0.010), the precision is 0.488. The recommendation with time and location information can achieve the highest precision than others. One can see that time correlation on the NY dataset is a key factor for accurate recommendation and plays a decisive role in the recommendation result. The same conclusion is still valid in recall. The highest recall is 0.036 when (α, β) = (0.986, 0.742), the highest recall is 0.021 when (α, γ) = (0.915, 0.006), and the highest recall is 0.029 when (β, γ) = (0.018, 1.000). In general, the sensitivities of time, category, and location information in the improvement of recommendation precision and recall are in a descending order on NY.

Fig. 10
figure 10

Effects of the combinations of α and β, α and γ, and β and γ on Precision@Top-1 and Recall@Top-1 (sub-dataset: NY)

3.4 Effects of α, β, and γ on recommendation precision and recall on SF

The results on the SF dataset are slightly different, it can be seen from Fig. 11, When (α, β) = (0.005, 0.800), (α, γ) = (0.700, 0.600), and (β, γ) = (0.950, 0.0005), the highest recommendation precisions using correlation scores of only two kinds of information are 0.204, 0.231, and 0.269, respectively. In the experiment using SF, when (α, β) = (0.005, 0.991), (α, γ) = (0.950, 0.800), and (β, γ) = (0.872, 0.0005), the highest recommendation recalls achieved using only two kinds of information are 0.054, 0.067, and 0.068, respectively. It can be seen that the sensitivities of location, time, and category information in the improvement of recommendation precision and recall are in a descending order on SF.

Fig. 11
figure 11

Effects of the combinations of α and β, α and γ, and β and γ on Precision@Top-1 and Recall@Top-1 (sub-dataset: SF)

The time, location, and category information have different effects on the recommendation results on different datasets. In the experiments using NY, the average length of user's check-in sequence is 200, then user’s check-in behavior reflects more time preferences. The time correlation between the candidate POIs and the user’s historical check-in records determines the recommendation result. The average length of user check-in sequence in SF is 120, which is significantly shorter than sequence in NY. Therefore, user’s time preference is less reflected, on the contrary, the geo-location of the check-ins is more stable. Then location correlation in SF determines the ranking score of candidate POIs.

In summary, recommendation with the combined use of time, location, and category information can achieve the highest precision and recall. Compared with the recommendation using only check-in sequence information, the addition of location and category information can significantly improve the precision and recall (Fig. 12). When the number of check-ins is large and check-in sequences are long, which generates abundant time sequence information, the sensitivities of time, category, and location information in the improvement of recommendation precision and recall are in a descending order. When the number of check-ins is small and the check-in sequences are short—that is, when there is a lack of check-in sequence data—the sensitivities of location, time, and category information in the improvement of recommendation precision and recall are in a descending order. This indicates that abundant sequence data can effectively improve the quality of location recommendation. In addition, the remarkable geographical “clustering” phenomenon of user check-in behavior causes the location information to be the second important in location recommendation except time information. However, when the sequence data are insufficient, POI category information can significantly improve the prediction of the “next point” and plays a more important role than sequence data. However, users’ category preference may drift during multiple check-ins in the long term.

Fig. 12
figure 12

Precision@Top-1 (left) and Recall@Top-1 (right) when using optimal parameter combinations

4 Conclusion

In this paper, a location recommendation model based on the comprehensive probability with time, location, and category information is proposed. LSTM networks are used to extract the temporal features of users’ check-in sequence, which is used to make a time-based probability prediction that a user will visit the candidate POI. The binary Gaussian kernel function is used to estimate the probability density and fit the user-level personalized check-in location distribution, which is used to estimate user’s check-in probability based on the geographic location of the candidate POI. In addition, the probability that a user will visit the candidate POI is also considered user’s category preference. The validation experiment using check-in data in two cities in the Foursquare dataset proved that the proposed recommendation method can effectively improve the precision and recall of POI recommendation. Further, this paper explores the sensitivities of time, location, and category information in the improvement in recommendation quality and clarifies the effects of the time, location, and category factors by analyzing the data characteristics in the dataset. The results indicate that users’ check-in locations have a long-term stable geographical “clustering” phenomenon, and the information regarding a user’s location preference can steadily improve the recommendation quality. However, the information regarding the user’s category preference can improve the recommendation performance for only a short recommendation list because the user’s category preference will drift in the long term and may negatively affect the quality of recommendation.

Considering the limitations in this paper and the potential research directions, the following topics can be investigated in the future. (1) Other situational factors relevant to POI recommendation can be considered in the recommendation model, including users’ social relations, POI descriptions, and check-in environment. (2) The POI category information can be further mined. The Foursquare dataset includes data regarding different levels of POI categories, which can be used to construct category trees to describe users’ category preference in detail.