1 Introduction

With the rapid development of social network services, users can capture and share their activities and locations through various location-based social networks, such as Foursquare, Face-book Places, Douban and so on. Recent years, POI recommendations has become a hot issue, by mining users’ preferences from a large number of check-in behaviors to help users find the location of interest. The spatial and temporal environment describes the basic elements of an event, that is, when and where. These factors are the basis for modeling behavior in practical applications. Users have different behaviors in different locations [26, 27] and the user’s location can be predicted [28]. Using complex spatial and temporal information to predict a person’s position at a particular point in time is challenging and critical.

Compared to other domains of recommendation technology, such as movie or music recommendation. The technique of POI recommendations faces a more serious data sparsity problem [34]. [10] observed that the sparsity of its user-POI matrix, which was extracted from Foursquare, reached as high as 99.87%, where each user visited only 55.94 POIs on average out of the total 46194 POIs. This phenomenon leads to insufficient user-POI interaction information, which is important to learn the user property in POI recommendation. To alleviate such an issue, one of representative approaches is to take the content information of POIs into consideration. Content information can be helpful in a variety of ways [16]. For example, a user may search a POI’s title or reviews before deciding whether he is interested in visiting the place. Hence, in reality, POIs reviews or title can actually be part of the factors that affect a user’s check-in decision. In addition, context information can help identify semantically similar POIs, e.g., Rock and Roll often appears in the reviews and descriptions of pop music. However, most of these works are based on traditional models that still can not solve the problem of data sparsity and handle new users and new POIs. One effective solution is to transfer knowledge from related domains and the cross-domain recommendation techniques [1, 7, 15] address such problems.

In real life, a user typically participates several systems to acquire different information services. For example, a user listens to a song on the way to the event. It provides us with an opportunity to improve recommendation performance in target services (or all services) through cross-domain learning. Following the above example, we can represent the POI feedback using a binary matrix where the entries indicate whether a user has visited a POI. Similarly, we use another binary matrix to indicate whether a user has listened a music. Usually these two matrices are highly sparse and it is beneficial to learn them at the same time. This idea is strengthened into the collective matrix factorization (CMF) [25] approach which jointly factorizes these two matrices by sharing the user latent factors. However, CMF is a shallow model and has the difficulty in learning the complex user-item interaction function [5, 6]. Moreover, its knowledge sharing is only limited in the lower level of user latent factors. A recent work CoNet [7] tried to extend CMF to high level of user latent factors. Although the aforementioned methods can alleviate the data sparsity of POI recommendations to some extent, one common limitation of those existing solutions is ignoring the influence of the specific features in each POI. POI recommendation needs to consider not only personal preferences, but also the spatiotemporal context, as a user tends to have different choices and needs at different time and places.

Motivated by benefitting from both knowledge transfer learning and learning interaction function, in this paper we propose a POI recommendation via Transfer Learning, named as (PIR-TL) to learn effective representations of POIs and users. In particular, in the proposed model we split the properties of the POI into two parts. The spatiotemporal part includes time and location which does not participate in transfer and the another part includes POI’s id, category and name. we assume that general preference hidden layers in two base networks are connected by dual mappings, which do not require them to be identical. The composition of multiple layers can first map the data (POI) into a highly non-linear latent space, and then the user representations can be learned through user general preference transfer across domains. Moreover, to incorporate the spatial temporal factors into our model, we introduce the location and temporal influence to learn the spatiotemporal preference. The model achieved in multi-layer feedforward networks by using dual shortcut connections and joint loss functions, which can be trained efficiently by back-propagation. To summarize, the main contributions of this paper are as follows.

  • We propose a novel transfer-learning-embedded model for the POI recommendations. To the best of our knowledge, this is the first work to fuse properties of other domains to achieve POI recommendation.

  • The PIR-TL model is proposed to incorporate carefully designed spatial preference modelling and preference transfer to capture the spatio-temporal and general information in POIs. As a result, our proposed model well captures user’s spatio-temporal and general interests simultaneously.

  • Experiments on real-world datasets are conducted to evaluate the performance of our proposed model. Our experimental results show that our method outperforms state-of-the-art methods.

2 Problem formulation

Next we will introduce the notations used in this paper and describe the problem definition.

We are given two domains, a source domain S and a target domain T. The set of users \(U=\{u_1,...,u_m\}\) (of size \(m=|U|\)) in both domains are shared. We denote the set of items in the source domain S by \(I_{S}=\{j_1,...,j_{n_s}\}\) (of size \(n_{S} =|I_{S}|\)) and in the same way \(I_{T}=\{i_1,...,i_{n_T}\}\) (of size \(n_{T} =|I_{T}|\)) represents the set of items in the target domain T. It’s worth noting that items in the source domain S and items in the target domain T are different in form. We denote each item i in the target domain T as a POI, which can be represented as a tuple \((Id_i, Category_i, Name_i, Time_i, Location_i)\) and each term in the tuple represents POI’s id, category, name, time and location separately. However, each item j in the source domain S is represented as \(Id_j\) only.

We implement the recommendation task by implicit feedback in this paper. So, we use a binary matrix \({\mathbf{R }_{S}} \in {{\mathbb{R}}}^{m \times n_{S}}\) to describe user-item interactions in the source domain S, where an entry \(r_{uj} \in \{0, 1\}\) is 1 if a user u has an interaction with item j and 0 otherwise. In the same way, let \({\mathbf{R }_{T}} \in {{\mathbb {R}}}^{m \times n_{T}}\) describe user-item interactions in the target domain T.

Problem definition: Given two observed domains including the user-item interaction matrices \({\mathbf{R }_{S}}\) and \({\mathbf{R }_{T}}\) in both domains, our goal for the cross-domain recommendation task is to recommend the items unobserved by user u in the target domain T to user u.

3 Methods

In this section, we present the architecture of our model. As illustrated in Fig. 1, we first introduce source domain item, user and POI embedding (Sect. 3.1), then introduce the spatial preference modelling and Transfer Learning based Non-spatial Preference Modelling which are two key ingredients of our proposed model (Sects. 3.2 and 3.3), then POI Recommendation is presented in Sect. 3.4, and finally present the model training in detail (Sect. 3.5).

Fig. 1
figure 1

System architecture

3.1 Embedding

This study proposes user embedding, source domain item embedding and POI Embedding that consider both general information embedding and unique spatiotemporal information embedding.

3.1.1 User and source domain item embedding

For user embedding and source domain item embedding, we adopt the one-hot encoding. It takes user u and item j , and maps them into one-hot encodings \({\mathbf{e }_{u}}\in \{0, 1\}^{m}\) and \({\mathbf{e }_{j}}\in \{0, 1\}^{n_{S}}\) where only the element corresponding to that index is 1 and all others are 0. And then embeds one-hot encodings into continuous representations via two embedding matrices(\({\mathbf{P}}\in {\mathbf{R}}^{m\times 3d}, {\mathbf{Q}}_1\in {\mathbf{R}}^{n_{S}\times 3d}\)), \({\mathbf{x}}_{u}={\mathbf{P}}^{T}{\mathbf{e}}_{u}, {\mathbf{x}}_{j}={\mathbf{Q }_1}^{T}{\mathbf{e}}_{j}\), then merges them as \({\mathbf{x}}_{uj}=[{\mathbf{x}}_{u}, {\mathbf{x}}_{j}]\) to be the input of successive hidden layers.

3.1.2 POI general information embedding

Then we propose a POI intrinsic embedding model which generates a low dimensional vector to represent POI features. A user may search POI’s name, category before deciding whether he is interested in visiting the place. we propose a CNN-based POI intrinsic embedding model that could consider name of POI. The benefit of POI intrinsic embedding is that it could learn the semantic relationship between POIs, helping the model to find users preference. Figure 2 shows the illustration of CNN for POI intrinsic embedding. Given a name n of POI i , each word \(w_{k}\) in name n is represented by a d-dimensional vector. Assuming that there were N words in a POI’s name, the embedding matrix of n with size \(N \times d\) is represented as:

$$\begin{aligned} \varPi (n) =\,\phi (w_{1})\,\oplus \phi (w_{2})\,\oplus ...\oplus \phi (w_{N-1}) \,\oplus \phi (w_{N}) \end{aligned}$$
(1)

where \(\varPi (n)\) denotes the embedding matrix of name n, \(\phi (w_{k})\) is a word embedding function mapping word \(w_{k}\) into a d-dimensional vector, and \(\oplus\) is the concatenation operator. Using the word embedding as the input of CNN, convolution layers applies a convolution operation to the input, each convolution filter employs a filter \(K_p\) to a window of h words to generate a new feature \(z^k_p\) as follows:

$$\begin{aligned} z^k_p = ReLU(\varPi (n)*K_p + b) \end{aligned}$$
(2)

where \(*\) denotes the convolution operation, b is the bias term. Then we use the pooling layer to extract the largest value in each feature map as fllows:

$$\begin{aligned} L_h= & {} \{l_1, l_2, ..., l_nf\} \end{aligned}$$
(3)
$$\begin{aligned} l_p= & {} max\{z^1_p, z^2_p, ..., z^{N-h+1}_p\} \end{aligned}$$
(4)

where nf denotes the number of filters, \(l_p\) denotes the feature coresponding to filter \(K_p\). We concatenate the max-pooling layers of different window size \(h \in \{1,2,3\}\):

$$\begin{aligned} L = L_1\,\oplus \,L_2\,\oplus \,L_3 \end{aligned}$$
(5)

Then put them into the fully connected layer to synthesize a high-level feature as fllows:

$$\begin{aligned} {\mathbf{x}}_{n} = W_{name}L + g_{name} \end{aligned}$$
(6)

where \(W_{name}\) is the weight matrix, \(g_{name}\) is the bias term and \({\mathbf{x}}_{n}\) is the output of CNN model.

Fig. 2
figure 2

The structure of POI’s name representation learning component

For each POI’s id, we map them into one-hot encodings \({\mathbf{e}}_{i}\in \{0, 1\}^{n_{T}}\), and then use a embedding matrix(\({\mathbf{Q}}_2\in {\mathbf{R}}^{n_{T}\times d}\)) to embed them as a d-dimensional vector \({{\mathbf{x}}_{i}={\mathbf{Q}}_2}^{T}{\mathbf{e}}_{i}\), then we embed the POI’s category into a d-dimensional vector as \({\mathbf{x}}_{c}\), finally we merge POI’s id category name embedding with user embedding as \({\mathbf{x}}_{ui}=[{\mathbf{x}}_{u}, {\mathbf{x}}_{i}, {\mathbf{x}}_{c}, {\mathbf{x}}_{n}]\) to be the input of successive hidden layers.

3.1.3 POI spatiotemporal information embedding

Different kinds of POIs have different activity timing. For example, people will prefer hiking with sunshine than with moonlight. Consequently, We embed all the location and time units into a latent space. Each POI v is presented as a tuple (tl), where t is the time unit, l is its location unit. For the time unit, many researchers analyze days of week check-in pattern about POI at different hours, inferring that weekdays (the days from Monday to Friday) have the similar pattern, meanwhile, weekends(Saturday and Sunday) have another similar pattern [31]. Therefore, we divide one day into 24 h, and one week into weekday and weekend, totally 48 time units to model temporal features. For the location unit, we divided each POI by region. With these two types of units, our model would learn a d-dimensional representation for all the hours and regions. Then merges them as \([{\mathbf{x}}_{t},{\mathbf{x}}_{l}]\) to be the embedding of POI’s spatial information.

3.2 Spatial preference modelling

NCF is a multi-layer neural network framework for item recommendation [6]. Its idea is to feed user embedding and item embedding into a dedicated neural network (which needs to be customized) to learn the interaction function from data, the NCF framework is more generalizable than the traditional MF model, which simply applies a data-independent inner product function as the interaction function. As such, we opt for the NCF framework to perform an end-to-end learning on both embeddings (that represent users and POIs) and interaction functions (that predict user-POI interactions).

As MLP concatenates the original user embedding \({\mathbf{x}}_{u}\) and POI’s spatial information embedding \([{\mathbf{x}}_{t},{\mathbf{x}}_{l}]\), it inspires us to merge them as

$$\begin{aligned} {\mathbf{e}}_{0}={\mathbf{x}}_{ut}=[{\mathbf{x}}_{u},{\mathbf{x}}_{t},{\mathbf{x}}_{l}] \end{aligned}$$
(7)

to be the input of later hidden layers.

$$\begin{aligned} \left\{ \begin{aligned}&{\mathbf{e}}_{1}=ReLU({\mathbf{W}}_{1}{\mathbf{e}}_{0}+{\mathbf{b}}_{1})\\&{\mathbf{e}}_{2}=ReLU({\mathbf{W}}_{2}{\mathbf{e}}_{1}+{\mathbf{b}}_{2})\\&...\\&{\mathbf{e}}_{h}=ReLU({\mathbf{W}}_{h}{\mathbf{e}}_{h-1}+{\mathbf{b}}_{h})\\ \end{aligned} \right. \end{aligned}$$
(8)

where \({\mathbf{W}}_{h},{\mathbf{b}}_{h}\), and \({\mathbf{e}}_{h}\) denote the weight matrix, bias vector, and output neurons of the h-th hidden layer, respectively. We use the ReLU function as the non-linear activation function, which has empirically shown to work well. Finally, the output of the last hidden layer \({\mathbf{Z}}_{ut}={\mathbf{e}}_{h}\) is transformed to a prediction score via:

$$\begin{aligned} {\hat{r}}_{ut}=\phi _1({\mathbf{Z}}_{ut})=\frac{1}{1 + exp(-h^{T}{\mathbf{Z}}_{ut})} \end{aligned}$$
(9)

where h is the parameter.

3.3 Transfer learning based non-spatial preference modelling

In this section, we use collaborative cross networks to general preference transfer, it is motivated by the cross-stitch networks [13].

The main idea is simple, using a matrix rather than a scalar to transfer. Similarly to the cross-stitch network, the target network receives information from the source network and vice versa. In detail, let \({\mathbf{e}}_{ui}\) be the representations of the l-th hidden layer and \({\mathbf{e}}^{\sim }_{ui}\) be the input to the \(l+1\)-th network in the target domain, respectively. Similarly, they are \({\mathbf{e}}_{uj}\) and \({\mathbf{e}}^{\sim }_{uj}\) in the source domain. The cross unit implements as follows:

$$\begin{aligned} \begin{aligned} {\mathbf{e}}^\sim _{ui}=W_i{\mathbf{e}}_{ui}+H{\mathbf{e}}_{uj}\\ {\mathbf{e}}^{\sim }_{uj}=W_j{\mathbf{e}}_{uj}+H{\mathbf{e}}_{ui} \end{aligned} \end{aligned}$$
(10)

where \(W_i\) and \(W_j\) are weight matrices, and the matrix H controls the information from source network to target network and vice versa. When target domain data is sparse, the target network can still learn a good representation from that of the source network through the cross units. The role of matrix H is to enable knowledge transfer between domains.

We use \({\mathbf{x}}_{uj}\)(Sect. 3.1.1) and \({\mathbf{x}}_{ui}\) (Sect. 3.1.2) as the inputs of two multi-layer perceptrons to get recommendation predictions on training samples, namely \({\hat{r}}_{uj}, {\hat{r}}_{ui}\) between users and items in both domains.

$$\begin{aligned} \begin{aligned}&{\mathbf{e}}^0_{ui}={\mathbf{x}}_{ui},{\mathbf{e}}^0_{uj}={\mathbf{x}}_{uj} \\&{\mathbf{e}}^1_{ui}=ReLU(W^1_i{\mathbf{e}}^0_{ui}+b^1_i+H^1{\mathbf{e}}^0_{uj}),\\&{\mathbf{e}}^1_{uj}=ReLU(W^1_j{\mathbf{e}}^0_{uj}+b^1_j+H^1{\mathbf{e}}^0_{ui}) \\&...\\&{\mathbf{e}}^L_{ui}=ReLU(W^L_i{\mathbf{e}}^{L-1}_{ui}+b^L_i+H^L{\mathbf{e}}^{L-1}_{uj}),\\&{\mathbf{e}}^L_{uj}=ReLU(W^L_j{\mathbf{e}}^{L-1}_{uj}+b^L_j+H^L{\mathbf{e}}^{L-1}_{ui}) \\&{\mathbf{Z}}_{ui}={\mathbf{e}}^L_{ui},{\mathbf{Z}}_{uj}={\mathbf{e}}^L_{uj}\\&{\hat{r}}_{ui}=\phi ^i({\mathbf{Z}}_{ui}),{\hat{r}}_{uj}=\phi ^j({\mathbf{Z}}_{uj}) \end{aligned} \end{aligned}$$
(11)

where \(W_i,W_j\) are weight matrices, \(b_i,b_j\) are biases, and the matrix H controls the sharing information from both domain networks. L is the total number of layers. \(\phi ^i\), \(\phi ^j\) are two one-layer perceptrons to map \({\mathbf{Z}}_{ui},{\mathbf{Z}}_{uj}\) to two scalars \({\hat{r}}_{ui},{\hat{r}}_{uj}\) .

3.4 POI recommendation

In this section, we will describe how we develop the POI recommendation system. Different from other general POI recommendation systems, our main contribution is to recommend a suitable POI with considering the user-item interactions in other domain and also the POI characteristics. Hence, we use preference transfer(Sect. 3.3) to model user’s general preference and use spatial preference modelling (Sect. 3.2) to model user’s spatial Preference, then combined them as follows:

$$\begin{aligned} {\hat{r}}_{u_{POI}} = \lambda {\hat{r}}_{ui} + (1-\lambda ){\hat{r}}_{ut} \end{aligned}$$
(12)

where \({\hat{r}}_{u_{POI}}\) is the final POI recommendation pridict scores, and \(\lambda\) is a trade-off parameter that balances between the importance of general preference transfer and spatial preference modelling in the proposed method.

3.5 Model training

Due to the nature of the implicit feedback, the squared loss \(({\hat{r}}_{uj}-r_{uj})^2\) is not suitable . Instead, we adopt the cross-entropy loss.

The goal of PIR-TL is to improve the prediction performance on both domains via jointly learning. Naturally, the loss function of L is designed as a joint cross-entropy loss from recommendation prediction of both domains, namely \(L_{uj}\) and \(L_{u_{POI}}\) , and with a regularization term \(L_{reg}\) :

$$\begin{aligned} \begin{aligned}&L=L_{uj}+L_{u_{POI}}+L_{reg}\\&\quad =-\mathop {\sum }_{(j,u,POI)\in T}r_{uj}log{\hat{r}}_{uj}+(1-r_{uj})log(1-{\hat{r}}_{uj})\\&\qquad +r_{u_{POI}}log{\hat{r}}_{u_{POI}}+(1-r_{u_{POI}})log(1-{\hat{r}}_{u_{POI}})+\mu \sum |\theta | \end{aligned} \end{aligned}$$
(13)

where T denotes the training dataset including positive and negative samples, \(r_{uj}\) and \(r_{u_{POI}}\) denote the corresponding labels, \(\mu\) denotes the regularization coefficient, and \(\theta\) denotes all the trainable parameters. We use Adam as the optimizer to update the parameters.

4 Experiment

In this section, we first present the experimental settings, and then compare our model PIR-TL with other state-of-the-art models.

4.1 Experimental settings

Dataset

We evaluate on three real-world cross-domain datasets. The first dataset, Music-Event dataset. The information contains logs of user listening musics and his/her event attendance list. The dataset we used contains 8352 user-event interactions and 169,337 user-musics ratings. There are 728 shared users, 7891 events, and 80,619 musics. We aim to improve the POI recommendation by transferring knowledge from relevant musics listening domain. The data sparsity is over 99.8%.

The second dataset, Movie-Event dataset. The information contains logs of user watching movies and his/her event attendance list. The dataset we used contains 11,044 user-event interactions and 872,781 user-movies ratings. There are 1023 shared users, 10,005 events, and 67,631 movies. We aim to improve the POI recommendation by transferring knowledge from relevant movies watching domain. The data sparsity is over 99.8%.

The third dataset, Book-Event dataset. The information contains logs of user reading books and his/her event attendance list. The dataset we used contains 10,802 user-event interactions and 123,315 user-books ratings. There are 956 shared users, 9804 events, and 56,945 books. We aim to improve the POI recommendation by transferring knowledge from relevant books reading domain. The data sparsity is over 99.8%.The statistics are summarized in Table 1. As we can see, three datasets are very sparse and hence we hope improve performance by transferring knowledge from auxiliary domains.

Table 1 Basic statics of the three datasets

Baselines

We categorize the baseline methods into two groups: (1)Cross-domain based methods .e.g., CMF [25], CDCF [12], CoNet,SCoNet [7]. which do not consider spatial information. (2)POI method. NCF [6],which only consider spatio-temporal properties without preference transfer. We compare these methods with our model to show the importance of spatial-temporal information and the advantage of preference transfer. We introduce these methods as followings:

  • CMF [25] : Collective matrix factorization is a multi-relation learning approach which jointly factorizes matrices of individual domains.

  • CDCF [12] : Cross Domain Factorization Machine proposes an extension of FMs that incorporates domain information in this pattern, which assumes that user interaction patterns differ sufficiently to make it advantageous to model domains separately.

  • CoNet [7] : Collaborative Cross Networks (CoNet) enables knowledge transfer across domains by cross connections between base networks.

  • SCoNet [7] : A modified version of CoNet with sparsity-induced regularization.

  • NCF [6] : Neural Collaborative Filtering (NCF) is a neural network architecture to model latent features of users and items using collaborative filtering method. The NCF models are trained separately for each domain without transferring any information.

Evaluation protocol

We adopt the widely used leave-one-out method to perform the evaluation. That is, we reserve one interaction as the test item for each user. We follow the common strategy which randomly samples 99 (negative) items that are not interacted by the user and then evaluate how well the recommender can rank the test item against these negative ones.

Since we aim at top-N POI recommendation, the typical evaluation metrics are hit ratio (HR), normalized discounted cumulative gain (NDCG), and mean reciprocal rank (MRR), where the ranked list is cut off at topN = 10. HR intuitively measures whether the reserved test item is present on the top-N list, NDCG and MRR also account for the rank of the hit position.

Parameter setup

We implement PIR-TL model based on Tensorflow in python. We use the same hyper-parameters for all datasets. Parameters are randomly initialized from Gaussian \({{\mathcal{N}}}(0,0.01^2)\) .We use Adam as the optimizer to optimize these parameters, where the learning rate is set to 0.001, the size of mini batch is 128, the ratio of negative sampling is 1, and the \(\lambda\) is 0.3. Specifically, the configuration of hidden layers in the base network is [48 24 12 6].

4.2 Comparing different approaches

The performance of our proposed model and the five baseline on three datasets evaluated by HR, NDCG and MRR is shown in Table 2.

Clearly, our proposed neural model PIR-TL significantly outperforms all the baselines on three datasets in terms of HR, NDCG and MRR, and we can see the following observations: (1) CoNet performs better than the shallow cross-domain models (CMF and CDCF), showing the effectiveness of deep neural approaches. (2) SCoNet beats CoNet, showing the sparse structure on the task relationship matrices are useful. (3) Compared to non-transfer NCF, PIR-TL achieves much higher recommendation accuracy, showing the benefits of preference transfer. (4) SCoNet drops behind PIR-TL, this is because SCoNet has limited consider of POI’s spatio preference.

Table 2 Comparison results of different methods on three datasets

4.3 Model analysis and discussion

In this section, we take an in-depth model analysis to further understand our model PIR-TL.

Impact of embedding size

Figure 3a presents the performance of our model PIR-TL with different embedding sizes. From the result, we observe that the performance of our model increases with the increasing number of embedding size d, then it slightly reduces when d is larger than 16. This may be because a relatively small embedding value limits model to capture user preferences for POIs, while model may suffer from overfitting when d exceeds a threshold. At this point, it is less helpful to improve the model performance by increasing the embedding size, but will increase the time cost of model training. Thus, to achieve the best trade-off between effectiveness and efficiency of model training, we set the embedding size d = 16 on three datasets.

Fig. 3
figure 3

The impacts of embedding size and loss and performance

Impact of different factors

We compare our model with three variants to explore the benefits of POI general attributes (e.g. POI category, name), POI time and POI location, respectively. PIR-TL-S1 is the first simplified version, where we remove the POI’s category embedding and name embedding to eliminate the impact of POI general attributes. PIR-TL-S2 is the second variant, where we remove the POI’s time embedding to neglect the impact of POI’s time. PIR-TL-S3 ignores the location of POIs by remove the POI’s location embedding.

The comparison results of PIR-TL and its variants are shown in Table 3. From the results, we first observe that PIR-TL consistently outperforms PIR-TL-S1, PIR-TL-S2 and PIR-TL-S3, indicating that PIR-TL benefits from simultaneously considering the three factors. Second, we find that the contribution of each factor to improving the NDCG test performance (HR and MRR have similar trends) can be ranked as follows: POI time > POI general attributes > POI location.

Table 3 Performance comparison of PIR-TL and its variants in terms of NDCG

Impact of loss and performance

We analyze the optimization performance of Model varying with training epochs. Results are shown on the Douban-Movie-Event dataset only due to the trend on the other two datasets are similar. Figure 3b shows the training loss (averaged/normalized over all training examples) and NDCG test performance on the test set (HR and MRR have similar trends) varying with each optimization iteration. We can see that with more iterations, the training losses gradually decrease and the recommendation performance is improved accordingly. The most effective updates are occurred in the first 12 iterations, and performance gradually improves until 22 iterations. With more iterations, Model is relatively stable.

5 Related works

In this section, we discuss related work from two aspects: (1) POI recommendation; and (2)Transfer learning.

POI recommendation

Different from traditional recommendations (e.g. music recommendation, movie recommendation), POI recommendation is characterized by geographic information and no explicit rating information [29].Moreover, additional information, such as temporal information [33], social influence [8, 9] and review information [4] has been leveraged for POI recommendation. [29] integrated the social influence with a user-based collaborative filtering model and modeled the geographical influence by a Bayesian model. [33] utilized the temporal preference to enhance the efficiency and effectiveness of the solution.

Recently, there are also some work turned their eyes on integrating analysis of joint effect of the above factors to alleviate the issue of data sparsity, cold start and spatiotemporal context-aware recommendation [30, 32]. For example, [32] proposed a probabilistic generative model for jointly modeling of geographical influence, temporal cyclic effect and semantic effect.

In addition, there are some existing studies on relevant directions, such as integrate spatial trajectory data [3, 17, 19,20,21, 23] with text data, use POI data and geo-tagged social media data to discover significant locations/regions [2, 18, 24], and route recommendation/planning [18, 22].

Transfer learning

Transfer learning aims at improving the performance of the target domain by exploiting knowledge from source domains [11, 14]. To solve the cold-start problem in item recommendation, cross-domain recommendation is proposed by either learning shallow embedding with factorization machine [12, 25] or learning deep embedding with neural networks [7].When learning shallow embedding, CMF [25] jointly factorizes the user-item interaction matrices from different domains. In order to model the domain information explicitly, CDCF [12] is designed where the former factorizes the user-item-domain triadic relation and the later models the source domain information as the context information of users. When learning the deep embedding of users and items, CoNet [7] use cross connections across different networks where shared mapping matrices is introduced to transfer the knowledge.

6 Conclusions

In this paper, we propose a POI model, which benefits from the strengths of spatial related and cross-domain knowledge transfer based preference modelling. In particular, a transfer learning based model is designed to capture useful preference information in other domains in a proper way, such that the issue of data sparsity can be alleviated. Besides, we use multi-layer neural network to capture the user’s spatiotemporal preference. Moreover, we design a method to effectively balance users’ general preferences in different domains and spatiotemporal preferences, which can achieve accurate POI recommendation. We conduct extensive experiments on three real-life datasets and the experimental results show the superiority of our proposed model in POI recommendations.