1 Introduction

Point-of-interest (POI) recommendation is an important application in location-based social networks (LBSNs), which mines user historical check-in records and recommends users interesting locations where they would like to check-in in the future. LBSNs have become very popular recently with the explosively increase of smart phones. Traditional online social networks such as Facebook introduce location-based services, e.g., Facebook Places, to evolve to LBSN. In the meantime, LBSNs such as Yelp and Foursquare, serving for special target like restaurant recommendation, emerge and attract a great number of users. In such a context of prosperous LBSNs, POI recommendation is proposed, which recommends a personalized POI list for each user according to their tastes. To this end, POI recommendation creates values in two aspects: (1) helping users to discover interested locations and services in the city, and (2) helping the businesses in LBSNs launch advertisements for target customers. The potential value of POI recommendation attracts much academic attention, which motivates a variety of POI recommendation systems [4, 32, 38].

To establish a practical POI recommendation system, temporal influence plays an important role. Previous studies show that the user mobility in LBSNs exhibits distinct temporal features [3, 4, 32]. For example, users always stay in the office in the Monday afternoon, and enjoy entertainments in bars at night. In summary, the temporal features in users’ check-in data can be abstracted in three aspects.

  • Periodicity Users share the same periodic pattern, visiting the same or similar POIs at the same time slot [3, 32]. For instance, a user always visits restaurants at noon, so do other users. Hence, the periodicity inspires the time-aware collaborative filtering method to recommend POIs [32].

  • Consecutiveness A user’s current check-in is largely correlated with the recent check-in [2, 4]. Gao et al. [4] model this property by assuming that user preferences are similar in two consecutive hours. Cheng et al. [2] assume that two checked-in POIs in a short term are highly correlated in latent feature space.

  • Non-uniformness A user’s check-in preference changes at different hours of a day [4]. For example, at noon a user may visit restaurants while at night the user may have fun in bars.

By capturing the observed temporal features, a variety of systems are proposed to enhance POI recommendation performance [2, 4, 32], which gain better performance than general collaborative filtering (CF) methods [30]. Nevertheless, previous work [2, 4, 32] cannot model the three features together. Moreover, an important fact is ignored in prior work that the temporal influence exists at different time scales. For example, in day level, you may check-in POIs around your home in the earning morning, visit places around your office in the day time, and have fun at nightclubs in the evening. In week level, you may stay in the city for work in weekdays and go out for vocation in weekends. Hence, to better model the temporal influence, capturing the temporal features at different time scales is necessary.

In this paper, we propose an Aggregated Temporal Tensor Factorization (ATTF) model for POI recommendation to capture the three temporal features together, as well as at different time scales. We construct a user-time-POI tensor to represent the check-ins as shown in Fig. 1, and then employ the interaction tensor factorization [26] to model the temporal effect. Different from prior work that represents the temporal influence at single scale, we index the temporal information for latent representation at different scales, i.e., hour, week, day, and month. Furthermore, we employ a linear combination operator to aggregate different temporal latent features’ contributions, which capture the temporal influence at different scales. Specifically, our ATTF model learns the three temporal properties as follows: (1) periodicity is learned from the temporal CF mechanism; (2) consecutiveness is manifested in two aspects—time in a slot brings the same effect through sharing the same time factor, and the relation between two consecutive time slots can be learned from the tensor model; (3) non-uniformness is depicted by different time factors representing different time slots from each time scale perspective. Moreover, an aggregate operator is introduced to combine the temporal influence at different scales, i.e., hour, week, day, and month, and represent the temporal effect in a whole.

Fig. 1
figure 1

Tensor illustration for check-ins

To sum up, we propose the ATTF model to seek a better way to capture the temporal influence for POI recommendation. Note that we have presented our ATTF model for POI recommendation in [36]. In this paper, we extend the work [36] with a comprehensive related work review, an in-depth data empirical study, and more detailed performance analysis. Moreover, we establish an embedding neural network to represent the ATTF model, which gives new insights to understand the proposed model from the neural network perspective. Specifically, this paper makes the following additional contributions. First, we provide an empirical data analysis on the check-in data, which motivates our proposed ATTF model. Second, we demonstrate the three-slice time indexing scheme and show our memory reducing trick via a binary representation transformation. Third, we present the detailed inference procedure of our ATTF model and provide extensive discussion about our proposed model on performance and parameter effect. Last but not the least, we understand the ATTF model from the embedding neural network perspective, providing new insights to interpret the latent ranking models.

The contributions of this paper are summarized as follows:

  • To the best of our knowledge, this is the first temporal tensor factorization method for POI recommendation, subsuming all the three temporal properties: periodicity, consecutiveness, and non-uniformness.

  • We propose a novel model to capture temporal effect in POI recommendation at different time scales. Experimental results show that our model outperforms prior temporal model more than 20%.

  • The ATTF model is a general framework to capture the temporal features at different scales, which outperforms single temporal factor model and gains 10% improvement in the top-5 POI recommendation task on Gowalla data.

  • We understand the ATTF model from the embedding neural network perspective, verifying the effectiveness of the embedding neural network that is a general framework for latent factor models, including rating estimation models (e.g., MF [9]) and ranking models (e.g., our ATTF model).

This following of this paper is organized as follows. Section 2 reviews the most relevant work. Section 3 introduces our empirical analysis on check-in data and demonstrates the time labeling scheme. Next, Sect. 4 presents the details of our model. Then, Sect. 5 reports experimental results conducted on two real-world data sets. Finally, Sect. 6 draws the conclusion and gives possible further work in this study.

2 Related Work

In this section, we first review the literature of POI recommendation. Then, we summarize the progress of modeling temporal effect for POI recommendation. Finally, we review the literature of embedding learning and its applications, which inspires us to understand the proposed ATTF model from the neural network perspective.

POI Recommendation Most of POI recommendation systems base on the collaborative filtering (CF) techniques, which can be reported in two aspects, memory-based and model-based. On the one hand, Ye et al. [30] propose the POI recommendation problem in LBSNs solved by user-based CF method, and further improve the system by linearly combining the geographical influence, social influence, and preference similarity. In order to enhance the performance, more advanced techniques are then applied, e.g., incorporating temporal influence [32], and utilizing a personalized geographical model via kernel density estimator [33, 34]. On the other hand, model-based CF is proposed to tackle the POI recommendation problem that benefits from its scalability. Cheng et al. [1] propose a multi-center Gaussian model to capture user geographical influence and combine it with social matrix factorization (MF) model [19] to recommend POIs. Gao et al. [4] propose an MF-based model, Location Recommendation framework with Temporal effects (LRT), utilizing similarity between time-adjacent check-ins to improve performance. Lian et al. [14] and Liu et al. [18] enhance the POI recommendation by incorporating geographical information in a weighted regularized matrix factorization model [8]. In addition, some researchers subsume users’ comments to improve the recommendation performance [5, 13, 31]. Other researchers model the consecutive check-ins’ correlations to enhance the system [2, 16, 37, 38].

Temporal Effect Modeling In 2011, Cho et al. [3] propose the periodicity of check-in data in LBSNs. People always visit restaurants at noon, so we suffice to recommend users restaurants he/she did not visit at noon. Pearson correlation helps us to recommend similar POIs at the same time slot. However, experiments in [3] depend on dense check-in data not fitting most of users. In 2013, Yuan et al. [32] combine the temporal similarity and non-temporal similarity to measure the similarity better. At the same year, Gao et al. [4] observe the non-uniformness property (a user’s check-in preference changes at different hours of a day), and consecutiveness (a user’s preference at time t is similar with time \(t-1\)). Further, Gao et al. propose LRT model based on MF technique to model the non-uniformness and consecutiveness. Meantime, Cheng et al. [2] propose the Factorized Personalized Markov Chain model [25] with Local Region constraint (FPMC-LR) to capture the consecutiveness, supposing strong correlation between two consecutive checked-in POIs. However, previous work does not model the three features together nor modeling the temporal influence at different scales.

Embedding Neural Network The embedding neural network, e.g., word2vec framework [20], has turned out to be a successful semi-supervised learning method. It is generally used in natural language processing [15, 17]. Since the efficacy of the framework in capturing the correlations of items, the embedding neural network is employed to the network embedding [7, 23, 28], and as well as in recommendation systems [27, 29]. Moreover, recent studies [11, 12] show that the neural word embeddings can be treated as a kind of matrix factorization method [9]. This equivalence between neural embeddings and the latent factor models inspires us to understand our ATTF model from the embedding neural network perspective. Our interpretation for ATTF model from neural network perspective verifies that the embedding neural network can be treated as a general framework for latent factor models, including rating estimation models [9] and ranking models (e.g., our proposed ATTF model).

3 Preliminaries

In this section, we first analyse the temporal features of user check-in data. Then, we introduce the time labeling scheme that is the prerequisite of our ATTF model. We analyse user check-in data in Foursquare and Gowalla, which demonstrates the similar check-in pattern. In the following, we show the empirical data analysis result based on a randomly selected user in Foursquare.

3.1 Empirical Data Analysis

We leverage empirical data analysis to explore the three temporal properties of check-in data. Our analysis verifies previous discoveries, for instance, the non-uniformness—user check-in preference changes at different time of a day [4]. Moreover, we observe some new properties not covered in prior work, e.g., the non-uniformness exists at different time scales.

Data sparsity is a big concern in previous temporal models. Figure 2a demonstrates a user’s check-in pattern in a day. We observe that the user always has many check-ins in the morning and evening, which verifies the periodicity. The check-in activity repeats in the morning and evening. Figure 2b shows the consecutive hour pair similarity,Footnote 1 i.e., the check-in similarity between time t and \(t-1\) (t means the hour \(1, 2,\ldots , 24\)). We observe that the user check-in preference has high similarity at some consecutive hours, e.g., between 5 o’clock and 4 o’clock, 8 o’clock and 7 o’clock. However, we also find that at some time (e.g., 9:00), the user has few check-ins and the similarity is zero. Therefore, the sparse data make it too hard to model the periodicity and consecutiveness via a counting method (e.g., Pearson correlation or cosine similarity). As shown in Fig. 2b, most of the similarities are zero. So the consecutiveness cannot be modeled at most of the time. The dilemma of counting methods in face of sparse data motivates us to exploit a latent factor learning model. In our model, we use a time latent factor to represent the temporal effect of a time slot, not modeling the temporal effect from the user or POI perspective. Further, the temporal factor is learned from all users’ check-ins at the time slot. Therefore, it overcomes the sparsity problem in counting methods.

Fig. 2
figure 2

Sparsity demonstration. a A user’s check-in pattern in a day, b consecutive hour pair similarity

Fig. 3
figure 3

Demonstration of non-uniformness at different time scales. a Non-uniformness in hour of a day, b non-uniformness in day of week, c non-uniformness in month

We observe that the non-uniformness (e.g., the check-in change characteristics) exists at different time scales in users’ check-in data. Following [4], we demonstrate an example of a random user’s aggregated check-in activities on his/her top 5 most visited POIs in Fig. 3. Figure 3a verifies the non-uniformness: a user’s check-in preference changes at different hours of a day [4]. As shown in Fig. 3a, the most visited POI changes at different hours. For example, the most visited POI is POI 1 at 1:00 while the most visited POI is POI 4 at 5:00. Besides, we discover there are other change characteristics. As shown in Fig. 3b, c, a user’s check-in preference changes at different months of a year, and among different days of a week as well. The change of check-ins at different time scales depicts the user preference from different perspectives: (1) A user may check-in at POIs around his/her home in the morning, visit places around the office in the day time, and have fun in bars at night. (2) A user may visit more locations around his/her home or office on weekdays. On weekends, he/she may check-in more at some shopping malls or vocation places. (3) At different months, a user may have different customs. For instance, he/she would visit ice cream shops in the months of summer and hot pot restaurants in the months of winter. Hence, only modeling the non-uniformness at a single scale, we cannot capture all temporal features, which need to be formulated at different scales.

3.2 Time Labeling Scheme

Time labeling is a prerequisite of our ATTF model. We use a time latent factor to represent the temporal effect at a specific time, and then learn from a latent factor model. Time labeling scheme determines how to assign a latent factor to specific time. Before diving to the model, we describe the time labeling scheme first.

Figure 4 demonstrates the time labeling scheme. In order to capture temporal features at different time scales, we represent a time spot with several parts and then aggregate their contributions together. According to the empirical data analysis, we consider temporal features in three time scales: month of a year, day of a week, and hour of a day. Now the temporal effect is formulated by three latent time factors. As shown in Fig. 4, we leverage three slices to denote a time spot: month of year, day of week, and hour of day. Further they are depicted by three kinds of different temporal latent vectors respectively. So a time spot t is labeled by a tuple \((t_1, t_2, t_3),\) which satisfies that \(t_1 \in \{0, 1, \cdots , 12\}\), \(t_2 \in \{0, 1, \cdots , 6\}\), \(t_3 \in \{0, 1, \cdots , 23\}\) (we have 12  months in a year, 7 days in a week, and 24 h in a day). Furthermore, we define \(T_1 \subset R^{12 \times d}\), \(T_2 \subset R^{7 \times d}\), and \(T_3 \subset R^{24 \times d}\) to denote the corresponding temporal latent factor matrices, where d is the latent vector dimension.

Fig. 4
figure 4

Time labeling scheme demonstration

To aggregate several temporal factors, we define an operator \(A(\cdot ): R^d \times R^d \times R^d \rightarrow R^d\) to combine different temporal features. Take “2011-04-05 18:10:23” as an example, its label ids at month, day of week, and hour are 3 (April), 2 (Tuesday), 18 (after 18:00). Hence its temporal latent factor is formulated as \(A(T_{1, 3}, T_{2, 2}, T_{3,18})\). It is important to note that our scheme is flexible: we are able to ignore one feature by taking away a slice, or introduce a new feature by adding a new slice.

Memory Reducing Trick We reduce the input data size through a binary coding trick. We employ one label id instead of three to represent the three slices. In detail, we use 4 bits to represent the month, 3 bits to represent the day of week, and 5 bits to represent the hour slot. So the time label id can be represented by an integer of 16 bits. For instance, “2011-04-05 18:10:23” could be coded as “0011 010 10010”, and its label id is 850. When we learn the model, we transform the label id into binary representation and find corresponding label to each slice. After labeling the time, we are able to model the temporal effect from a user-time-POI latent factorization model; see details in Sect. 4.

4 Methods

In this section, we first demonstrate the ATTF model. Then we give the detailed model inference and learning procedure. Finally, we summarize the model discussion.

4.1 Aggregated Temporal Tensor Factorization Model

Denote that \(\mathcal {U}\) is the set of users and \(\mathcal {L}\) is the set of POIs. In addition, \(\mathcal {T}_1\), \(\mathcal {T}_2\), and \({\mathcal {T}_3}\) are the set of months, days of week, and hours respectively. Further we define \(\mathcal {T}\) as the set of time label tuples, consisting of elements \(t:=(t_1, t_2, t_3)\), namely the temporal representation at different scales. The ATTF model estimates the preference of a user u at a POI l given a specific time label t through a score function f(utl),  where \(u \in {\mathcal {U}}\) is user id, \(t \in {\mathcal {T}} \) is a time label tuple, and \(l \in \mathcal {L}\) is POI id.

We are typically given N training examples \({(u_i, t_i, l_i)} \in \{1, \ldots , |\mathcal {U}|\} \times \{1, \ldots , |\mathcal {T}|\} \times \{1, \ldots , |\mathcal {L}|\}, {i=1, 2, \ldots , N}\), and correspondingly outputs \(y_i \in R, {i=1, 2, \ldots , N}\). Here, \({(u_i, t_i, l_i)}\) is the index of a particular element of a user-time-POI tensor, and \(y_i\) is the preference score of the user at the POI given the time label. One could simply collate the training data to build a suitable tensor, so the training task turns to fill in the blank entries of the tensor.

We exploit the Pairwise Interaction Tensor Factorization (PITF) [26] model to decompose the user-time-POI tensor. PITF model that turns out to be successful in the ECML/PKDD Discovery Challenge, runs much faster than other tensor factorization methods and has better performance in a large scale prediction task [26]. Thus the score of a POI l given user u and time t is factorized into three interactions: user-time, user-POI, and time-POI, where each interaction is modeled through the latent vector product. Further, we inference the model via Bayesian Personalized Ranking (BPR) criteria [24] that is a general framework to train a recommendation system from implicit feedback. Because prior work [14, 37, 38] indicates that treating the check-ins as implicit feedback is better than explicit ways for POI recommendation. Since we recommend POIs for users at specific time, any candidate POI has the same user-time interaction. As a result, the preference score is independent of the user-time interaction. Then the score function for a given time label t, user u, and a target POI l could be formulated as :

$$\begin{aligned} f( u, t, l) = \langle U^{(L)}_u, {L}^{(U)}_{l} \rangle + \langle A({T}^{(L)}_{1, t_1}, {T}^{(L)}_{2, t_2},{T}^{(L)}_{3, t_3}), {L}^{(T)}_{l}\rangle , \end{aligned}$$
(1)

where \(\langle \cdot \rangle \) denotes the vector inner product, \(A(\cdot )\) is the aggregate operator. Suppose that d is the latent vector dimension, \(U^{(L)}_u \in R^d\) is user u’s latent vector for POI interaction, \({L}^{(U)}_{l}, {L}^{(T)}_{l} \in R^d\) are POI l’s latent vectors for user interaction and time interaction, \({T}^{(L)}_{1, t_1}\), \({T}^{(L)}_{2, t_2}\), \({T}^{(L)}_{3, t_3} \in R^d\) are time t’s latent vector representations in three aspects: month, day of week, and hour.

Aggregate operator combines the several temporal features together. In this paper, we propose a linear convex combination operator. It is formulated as follows,

$$\begin{aligned} A(\cdot ) = \alpha _1 \cdot {{T}^{(L)}_{1, t_1}+\alpha _2 \cdot {T}^{(L)}_{2, t_2}+\alpha _3 \cdot {T}^{(L)}_{3, t_3}}, \end{aligned}$$
(2)

where \(\alpha _1\), \(\alpha _2\), and \(\alpha _3\) denote the weights of each temporal factor, which satisfy \(\alpha _1 + \alpha _2 + \alpha _3 = 1\), and \(\alpha _1,\alpha _2,\alpha _3 >= 0\).

4.2 Learning

We infer the model via BPR criteria [24], which treats the check-in activity as a kind of implicit feedback. Namely, we assume the user prefers the visited POIs than the unvisited. We treat the visited POIs as positive and the unvisited as negative. Then, we suppose that the score of f(utl) at positive observations is higher than the negative POIs, given u and t. Further, we formulate the relation that user u prefers a positive POI \(l_i\) than a negative one \(l_j\) at time t as follows

$$\begin{aligned} {l_i >_ {u, t} l_j}. \end{aligned}$$
(3)

Based on the pairwise preference defined above, we suffice to extract the set of preference constraints from the training examples

$$\begin{aligned} D_S := \{(u, t, l_i, l_j)| {l_i >_{u,t} l_j}, u \in \mathcal {U}, t \in \mathcal {T}, l_i, l_j \in \mathcal {L} \}. \end{aligned}$$
(4)

For simplicity, we denote \(y_{u,t,l} = f(u,t,l).\) Then for any quadruple in \(D_S\), it satisfies \(y_{u,t,l_i} > y_{u,t,l_j}\). Using a logistic function to model this relation, we get

$$\begin{aligned} p({l_i >_{ u, t} l_j}) := \sigma (y_{u, t, l_i} - y_{u, t, l_j}), \end{aligned}$$
(5)

which measures the probability of \(l_i\) is a positive observation and \(l_j\) is a negative observation for user u at time t. In Eq. (5), \(\sigma \) is the logistic function \(\sigma (x)= \frac{1}{1 + e^{-x}}\).

Suppose the quadruples in \(D_S\) are independent of each other, then learning the ATTF model is to maximize the likelihood of all the pair orders

$$\begin{aligned} \underset{{\mathbf {\varTheta }}}{{{\mathrm{arg\,max}}}} \underset{( u,t, l_i, l_j) \in D_S}{\prod } p({l_i >_{ u,t} l_j}), \end{aligned}$$
(6)

where \({\mathbf {\varTheta }}\) is the parameters to learn, namely \(U^{(L)}\), \({L}^{(U)}\), \({L}^{(T)}\), \({T}^{(L)}_{1}\), \({T}^{(L)}_{2}\), and \({T}^{(L)}_{3}\). The objective function is equivalent to minimizing the negative log likelihood. To avoid the risk of overfitting, we add a Frobenius norm term to regularize the parameters. Then the objective function is

$$\begin{aligned} \begin{aligned} \underset{{\mathbf {\varTheta }}}{{{\mathrm{arg\,min}}}} \underset{( u,t, l_i, l_j) \in D_S}{\sum } -ln (\sigma (y_{ u, t, l_i} - y_{u, t, l_j})) + \lambda _{{\mathbf {\varTheta }}}||{\mathbf {\varTheta }}||^2_F, \end{aligned} \end{aligned}$$
(7)

where \(\lambda _{{\mathbf {\varTheta }}}\) is the regularization parameter.

We leverage the Stochastic Gradient Decent (SGD) algorithm to learn the objective function for efficacy. First, we define \(y_{u, t, l_p, l_n} = y_{u, t, l_p} - y_{u,t, l_n},\) which models the pairwise relation in \(D_S\). Further we denote a common part in gradient decent values for all parameters as \(\delta = 1-\sigma (y_{u, t, l_p, l_n})\). As \({T}^{(L)}_{1, t_1}\), \({T}^{(L)}_{2, t_2}\), and \({T}^{(L)}_{3, t_3}\) are symmetric, they have the same gradient form. For simplicity, we use \({T}^{(L)}_{t} \in \{{T}^{(L)}_{1, t_1}, {T}^{(L)}_{2, t_2},{T}^{(L)}_{3, t_3}\}\) to represent any of them, \(\alpha \in \{\alpha _1, \alpha _2,\alpha _3\}\) to denote corresponding weight, and \(A(\cdot )\) to denote \(A({T}^{(L)}_{1, t_1}, {T}^{(L)}_{2, t_2},{T}^{(L)}_{3, t_3})\). Then the updating rule for the parameters is as follows,

$$\begin{aligned} \begin{aligned}&U^{(L)}_u \leftarrow U^{(L)}_u + \gamma \cdot \left( \delta \cdot \left( L^{(U)}_{l_p}-L^{(U)}_{l_n}\right) - \lambda \cdot U^{(L)}_u \right) ,\\&L^{(U)}_{l_p} \leftarrow L^{(U)}_{l_p} + \gamma \cdot \left( \delta \cdot U^{(L)}_{u} - \lambda \cdot L^{(U)}_p \right) ,\\&L^{(T)}_{l_p} \leftarrow L^{(T)}_{l_p} + \gamma \cdot \left( \delta \cdot A(\cdot ) - \lambda \cdot L^{(T)}_{l_p} \right) , \\&L^{(U)}_{l_n} \leftarrow L^{(U)}_{l_n} - \gamma \cdot \left( \delta \cdot U^{(L)}_{u} + \lambda \cdot L^{(U)}_n \right) ,\\&L^{(T)}_{l_n} \leftarrow L^{(T)}_{l_n} - \gamma \cdot \left( \delta \cdot A(\cdot ) + \lambda \cdot L^{(T)}_{l_n} \right) , \\&T^{(L)}_t \leftarrow T^{(L)}_t + \gamma \cdot \left( \delta \cdot \alpha \cdot \left( L^{(T)}_{l_p}-L^{(T)}_{l_n}\right) - \lambda \cdot T^{(L)}_t \right) , \end{aligned} \end{aligned}$$
(8)

where \(\gamma \) is the learning rate, \(\lambda \) is the regularization parameter. To train the model, we use the bootstrap skill to draw the quadruple from \(D_S\), following [24]. Algorithm 1 gives the detailed procedure to learn the ATTF model. We aim to learn the latent representations of user, temporal features, and POIs, namely \(U^{(L)}, {T}^{(L)}_{1}, {T}^{(L)}_{2},{T}^{(L)}_{3}, L^{(U)}, L^{(T)}\). Let \(|\mathcal {U}|\) denote the number of users, then we generate about \( \lfloor 100*\sqrt{|\mathcal {U}|} \rfloor \) tuples from \(D_S\) to generate a set \(D_e\) for the loss estimation, namely the negative log likelihood value. We follow [24] to set the number of samples for loss estimation as \(\lfloor 100*\sqrt{|\mathcal {U}|} \rfloor \). In each iteration, we sample S check-ins and then generate negative samples to learn the model. After that, we calculate the loss value over \(D_e\): \(\sum \nolimits _{( u,t, l_i, l_j) \in D_e} -ln (\sigma (y_{ u, t, l_i} - y_{u, t, l_j})) + \lambda _{{\mathbf {\varTheta }}}||{\mathbf {\varTheta }}||^2_F\). The convergent condition is satisfied when the loss value for the fixed sampled tuples does not decrease.

figure a

Complexity The runtime for predicting a triple (utl) is in O(d), where d is the number of latent vector dimension. The updating procedure is also in O(d). Hence training an quadruple is in O(d), then training an example (utl) is in \(O(k \cdot d),\) where k is the number of sampled negative POIs. For each iteration, we sample S training examples. The calculation cost for loss estimation is less than the parameter updating procedure. Therefore, training the model costs \(O(I \cdot S \cdot k \cdot d)\), where I is the number of iterations. In practical, I is always small for different datasets, in the range of [5, 30]. We usually treat the number of iterations as a constant, so the complexity of the model training is in \(O(S \cdot k \cdot d)\).

4.3 Model Discussion

The ATTF model can be treated as a linear combination of two matrix factorization models which learn user preference and temporal effect respectively, as shown in Eq. (1). The first term depicts the user-POI interaction, which is similar as the low rank matrix factorization for the user-POI matrix through collaborative filtering technique. The second term depicts the time-POI interaction, which acts like leveraging a latent factor model to describe the relations between time labels and POIs. Further the aggregate operator \(A(\cdot )\) combines several temporal factors together.

Two points are important to note for our model: (1) The ATTF model and the time labeling scheme are a general framework to subsume several temporal characteristics together. We take three common ones in this work, but it is easy to add others, e.g., different days in a month, workdays and vocations in a year. (2) Even though the model equation for ATTF in POI recommendation suffices to be expressed by a combination of two MF models, it is different from a simple ensemble of two MF model recommendation results because in our case the model parameters are learned jointly. Thus the learned parameters jointly represent the user preference and temporal effect. It better reflects the fact that user check-in behavior is a complex decision under many conditions.

The ATTF model can also be interpreted from the embedding neural network perspective. The embedding network, e.g., word2vec framework [20], has turned out to be a successful semi-supervised learning method in natural language processing [10, 22], network embedding [7, 23, 28], and recommendation systems [27, 29]. Moreover, recent studies [11, 12] show that the neural word embeddings can be treated as a kind of matrix factorization method [9]. This equivalence between neural embeddings and the latent factor models inspires us to understand our ATTF model from the embedding neural network perspective. Figure 5 demonstrates the equivalent embedding neural network for the ATTF model. The input layer is the one-hot representation for user, POI, and temporal information. The second layer is the embedding layer, which projects the one-hot vector as a continuous latent vector in the Euclidean subspace. Next, we exploit the product and sum operation to represent the check-in preference as \(\langle U^{(L)}_u, {L}^{(U)}_{l} \rangle + \langle A({T}^{(L)}_{1, t_1}, {T}^{(L)}_{2, t_2},{T}^{(L)}_{3, t_3}), {L}^{(T)}_{l}\rangle \), equivalent to Eq. (1). Finally, we construct a BPR loss layer to learn the embedding representations.

Fig. 5
figure 5

Embedding neural Network for ATTF model

5 Experiments

We conduct systematical experiments to seek the answers of the following questions: (1) how the proposed ATTF model performs comparing with state-of-the-art models? (2) whether the ATTF model is better than single temporal factor models? (3) how the parameters affect the model performance?

5.1 Data Description and Experiment Setting

Two real-world data sets are used in the experiment: one is Foursquare data from January 1, 2011 to July 31, 2011 provided in [6] and the other is Gowalla data from January 1, 2011 to September 31, 2011 in [35]. We filter the POIs checked-in by less than 5 users and then choose users who check-in more than 10 times as our samples. After the preprocessing, the data sets contain the statistical properties as shown in Table 1. We randomly choose 80% of each user’s check-ins as training data, and the remaining 20% for test data. Moreover, we use each check-in (utl) in training data to learn the latent features of user, time, and POI. Then given the (ut), we estimate the score value of different candidate POIs, select the top N candidates, and compare them with check-in tuples in test data.

Table 1 Statistics of data sets

5.2 Performance Metrics

In this work, we leverage three metrics to evaluate the model performance–precision, recall, and F-score. The precision and recall in the top-K recommendation system are denoted as P@K and R@K respectively. P@K measures the ratio of recovered POIs to the K recommended POIs, and R@K means the ratio of recovered POIs to the set of POIs in the testing data. For each user \(u \in \mathcal {U}\), \(\mathcal {L}^T(u)\) denotes the set of correspondingly visited POIs in the test data, and \(\mathcal {L}^R(u)\) denotes the set of recommended POIs. Then the definitions of P@K and R@K are formulated as follows

$$\begin{aligned} P@K = \frac{1}{|\mathcal {U}|} \sum _{u \in {\mathcal {U}}} \frac{|\mathcal {L}^R(u) \cap \mathcal {L}^T(u)|}{ K}, \end{aligned}$$
(9)
$$\begin{aligned} R@K = \frac{1}{|\mathcal {U}|} \sum _{u \in {\mathcal {U}}} \frac{|\mathcal {L}^R(u) \cap \mathcal {L}^T(u)|}{|\mathcal {L}^T(u)|}. \end{aligned}$$
(10)

Further, F-score is the harmonic mean of precision and recall. So the F-score is defined as

$$\begin{aligned} \textit{F-score}@K = \frac{2*P@K * R@K}{P@K + R@K}. \end{aligned}$$
(11)

5.3 Baselines

We compare our ATTF model with state-of-the-art collaborative filtering (CF) methods and POI recommendation methods incorporating temporal effect. Prior work [37, 38] indicates that treating the check-ins as implicit feedback is better to recommend POIs. Hence we exploit Weighted Regularized Matrix Factorization (WRMF) [8, 21] and Bayesian Personalized Ranking Matrix Factorization (BPR-MF) [24] as comparative CF methods. To illustrate the efficacy of our ATTF model, we compare it with LRT [4] and FPMC-LR [2] which are state-of-the-art POI recommendation methods incorporating temporal effect.

  • WRMF The WRMF model is designed for processing large scale implicit feedback data sets. We define the weight mapping of user \(u_i\) at POI \(l_j\) as \(w_{i,j} = (1+ 10 \cdot C_{i,j})^{0.5}\), where \(C_{i,j}\) is the check-in counts, following the setting in [18].

  • BPR-MF The BPR-MF model is a popular MF-based recommendation method to learn the pairwise relation, in which users prefer the observed items than the unobserved.

  • LRT The LRT model is designed to modeling the “non-uniformnes” and “consecutiveness” in a matrix factorization model for POI recommendation.

  • FPMC-LR The FPMC-LR model adds the Local Region constraint (i.e., geographical information) in the Factorized Personalized Markov Chain (FPMC) model [25]. FPMC-LR incorporates the geographical information and temporal consecutiveness through a local region constraint and the FPMC model respectively.

Moreover, to demonstrate the advantage of ATTF in aggregating several temporal latent factors, we also compare with three single temporal latent factor models: TTFM, TTFW, and TTFH. They are typically PITF model, that correspondingly considering the month, day of week, and hour as a temporal latent factor. Because these three models are the subset of our ATTF model, we attain their results by setting the corresponding weight as 1, and others as 0 in ATTF.

5.4 Experimental Results

5.4.1 Performance Comparison

In the following, we demonstrate the performance comparison on precision, recall and F-score. We set the latent factor dimension as 60 for all compared models. We leverage grid search method to find the best weights in ATTF model. \(\alpha _1\), \(\alpha _2\), and \(\alpha _3\) are constrained in the range of [0, 1]. In the grid search method, we first change \(\alpha _1\) from zero to one with step size 0.1. Then, for each \(\alpha _1\) value, for instance \(\alpha _1=0.1\), we change \(\alpha _2\) from zero to \(1-\alpha _1\) with step size 0.1. \(\alpha _3\) can be calculated by \(1-\alpha _1-\alpha _2\). The grid search method tries all value combinations with step size 0.1 satisfying the constraints \(\alpha _1 + \alpha _2 + \alpha _3 = 1\), and \(\alpha _1,\alpha _2,\alpha _3 >= 0\). As a result, the ATTF model on Foursquare data achieves the best result when \(\alpha _1 = 0.7\), \(\alpha _2 = 0.1\), and \(\alpha _3 = 0.2\), while the ATTF model on Gowalla data achieves the best when \(\alpha _1 = 0.2\), \(\alpha _2 = 0.1\), and \(\alpha _3 = 0.7\).

Fig. 6
figure 6

Precision on Foursquare and Gowalla. a Foursquare, b Gowalla

Fig. 7
figure 7

Recall on Foursquare and Gowalla. a Foursquare, b Gowalla

Fig. 8
figure 8

F-score Foursquare and Gowalla. a Foursquare, b Gowalla

Figures 6, 7, and 8 show the experimental results for Foursquare and Gowalla data on measurement precision, recall, and F-score respectively. We see that (1) Our proposed ATTF model outperforms state-of-the-art CF methods and POI recommendation models. Compared with the best state-of-the-art competitor in POI recommendation area (e.g., FPMC-LR), the ATTF model gains more than 20% enhancement on Foursquare data, and more than 36% enhancement on Gowalla data for all three measures, Precision@5, Recall@5, and F-score@5. We observe that models perform better on Foursquare data set than Gowalla data set, even though it is sparser. The reason lies in that Gowalla data contain much more POIs and a large candidate POI set makes the recommendation harder. (2) The ATTF model outperforms single temporal factor models. Compared with best single temporal factor model, the ATTF model gains about 3% enhancement on Foursquare data, and about 10% improvement on Gowalla data in a top-5 POI recommender system. So when data are denser, the ATTF model gets advantages. Because the ATTF model uses a tuple to represent a time spot, which gives more precise information. Dense data strengthen this precise labeling scheme. In addition, different weight assignments on both data give us two interesting insights: (i) When data are sparse, the temporal feature on month dominates the POI recommendation performance. Because check-ins on hour or day of week are sparse as shown in Fig. 3, then the corresponding characteristics are not easily caught. The Foursquare data set has high weight on month temporal factor. However, when data are denser, check-ins on hour are not so sparse. So the temporal characteristic on hour of day becomes prominent. (ii) We usually pay much attention on temporal characteristics on hour of day and day of week. Our experimental results indicate that the temporal characteristic on month is important, especially for sparse data. (3) Our proposed ATTF model, single temporal factorization models (e.g., TTFM, TTFW, and TTFH), and FPMC-LR perform much better than other competitors, especially at recall measure. They try to recommend POIs at more specific situations, which is the key point to improve performance. Our models recommend a user POIs at some specific time, and FPMC-LR recommends POIs given a user’s recent checked-in POIs; while, the other three models give general recommendations.

5.4.2 Parameter Effect

The regularization parameter and latent vector dimension are two important factors affecting the model performance. We explore how they affect the proposed model in the condition of other parameters fixed.

Figure 9 demonstrates the effect of regularization parameter on model performance. For simplicity, we set the same parameter \(\lambda \) for all latent vectors. The regularization part does not significantly affect the model. The model achieves best performance at 0.001. With the increasing of \(\lambda \), the performance decreases.

Fig. 9
figure 9

The effect of regularization parameter \(\lambda \). a Foursquare, b Gowalla

Figure 10 demonstrates how the latent vector dimension affects the model. The performance of ATTF steadily rises with the increase of latent vector dimension. For the trade-off of high performance and low computation cost, we suggest to set dimension \(d=60\).

Fig. 10
figure 10

The effect of latent factor dimension. a Foursquare, b Gowalla

6 Conclusion and Future Work

In this paper, we propose the ATTF model for POI recommendation. The proposed model introduces time factor to model the temporal effect in POI recommendation, subsuming all the three temporal properties: periodicity, consecutiveness, and non-uniformness. Moreover, the ATTF model captures the temporal influence at different time scales through aggregating several time factors’ contributions. Experimental results on two real-world data sets show that the ATTF model outperforms state-of-the-art models. Our model is a general framework to aggregate several temporal characteristics at different scales. In this work, we only consider three temporal characteristics: hour, week, day, and month. In the future, we may add more in this model, e.g., splitting 1 day into several slots rather than in hour. In addition, we may try nonlinear aggregate operators, e.g., \(\max \), in our future work.