Keywords

1 Introduction

The demand for Point of Interest (POI) recommendation services is proliferating. Location-based Social Networks (LBSN) providers, such as Yelp and Google Local have effectively increased their market shares. According to Yelp’s Q4’19 report, there are 36 million unique mobile app users bringing in revenues over one billion dollars in 2019. Moreover, the total number of user reviews it collected since 2004 has surpassed 205 million [19]. From the Newzoo’s Global Mobile Market Report 2020: (1), there are 3.5 billion smartphone users worldwide by the end of 2020; (2), in 2019, about 56% of the global website traffic was generated by mobile devices [13]. Figure 1 shows the primary structure and components of a typical LBSN, in which the records of check-in activities are usually used to generate recommendations.

Conventionally, the data collected is saved and stored in a central server. Such centralized recommender systems are vulnerable when facing data breach issues, and the cost or penalty is substantial. The Capital One data breach, which caused approximately 500 million dollars financial damage on top of other indirect costs [9]. The case study on this incident shows that, nowadays, companies worldwide are not yet adequately adapted to securing their cloud computing environments [14].

Fig. 1.
figure 1

Location-based social networks (LBSN).

In addition to improving compliance controls, another solution is to push the data processing task to the user end, securing their privacy by eliminating the need for a central server [3, 12, 17, 20]. However, data itself is an essential resource, and the path of giving up storing user information to avoid legal fallouts could be a blind alley. Moreover, these frameworks still require private data, such as social network data or real-time GPS locations. Fetching such data is risky, even under the presence of a privacy disclaimer. In 2019, the Federal Trade Commission issued a 5 billion dollar fine on Facebook due to its violation on consumers’ privacy [5].

In this paper, We propose a vendor-based recommendation network scheme. Instead of maintaining a central server or carrying out all computing activities on user sides, we localize the recommendation tasks on vendors for each small area. The vendor can be any Point-of-Interest, such as restaurants, gas stations, and grocery stores. Challenges arise when small business owners decide to build their recommender systems, such as cold start problem and sparsity problem. Due to the lack of correlation in the high dimensional space, predictions directly made on a sparse feedback matrix often suffer from low accuracy. We alleviated this problem by introducing ‘virtual users’, a concept elaborated in later sections that can implicitly reflect real users’ preferences.

Both recommendation providers and consumers benefit from this scheme. For service providers, their users are more likely to use the services because local businesses are more trusted than large corporations [16]. For users, their data can be analyzed more efficiently and the risk of having all data hacked at once is remarkably reduced. The customers’ chosen vendors also reflect users’ active visiting areas making the system naturally location-aware.

Our main contributions are summarized as follows:

  • We propose a localized POI recommender system framework, where the computing tasks and data storage of a central server are distributed to smaller areas.

  • We introduce the idea of “virtual users”, a concept that enables localized recommender systems to collaborate with each other resolving the sparsity problem.

  • We conduct experiments on real-world datasets, demonstrating the importance of geographical restriction and the effectiveness of our framework.

The rest of this paper is organized as follows: the background and related works are introduced in Sect. 2. The problem description and our proposed solution are discussed in Sect. 3. Next, in Sect. 4, the experiments are carried out, and the results are analyzed. Section 5 gives the conclusion and future work.

2 Background

2.1 Centralized Recommendations

The major difference between a centralized RS and a distributed or decentralized RS is how data is attained and processed. The data is stored on a single server for a centralized RS where new recommendations are generated immediately after data pre-processing. There are many ways to implement centralized POI recommendation models. To investigate the tradeoff between privacy preservation and recommendation accuracy, we selected several straightforward models to demonstrate the proposed framework. With that being said, the Recommender System Network (RSN) we propose is compatible with various methods.

For example, for a classic Matrix Factorization (MF) model, a regression technique is realized to collaboratively learn the latent factors of users and items (i.e., POIs) [15]. While users’ feedback is reflected by their ratings, the latent factors indicate each user or item’s hidden characteristics. A general MF based recommendation model can be represented by the following optimization problem as shown in Eqs. (1) and (2), where \(\boldsymbol{r}\)\(_{ui}\) denotes the known rating given by user \(\boldsymbol{u}\) to item \(\boldsymbol{i}\), and \(\boldsymbol{\hat{r}}\)\(_{ui}\) denotes the predicted rating. Vectors \(\boldsymbol{p}\)\(_{u}\) and \(\boldsymbol{q}\)\(_{i}\) represent the user and item latent factors, respectively.

$$\begin{aligned} \min \limits _{p_{u},q_{i}}\sum _{r_{ui}{\in }R_{train}}(r_{ui}-\hat{r}_{ui})^2+\lambda (\left\Vert q_{i} \right\Vert ^2+\left\Vert p_{u} \right\Vert ^2) \end{aligned}$$
(1)
$$\begin{aligned} \hat{r}_{ui}=q^T_{i} p_{u} \end{aligned}$$
(2)

In the well-known biased MF model [11, 15], the predicted rating, however, is formalized as follows:

$$\begin{aligned} \hat{r}_{ui}=\mu +b_{u}+b_{i}+q^T_{i} p_{u} \end{aligned}$$
(3)

where \(\boldsymbol{\mu }\), \(\boldsymbol{b}\)\(_{u}\), and \(\boldsymbol{b}\)\(_{i}\) represent the global mean, the user bias, and the item bias, respectively. \(\boldsymbol{R}\)\(_{train}\) is the set of observed ratings. Accordingly, the objective function is then updated as Eq. (4). The notations and the parameters will be discussed in detail in later sections.

$$\begin{aligned} \min \limits _{p_{u},q_{i}}\sum _{r_{ui}{\in }R_{train}}(r_{ui}-\hat{r}_{ui})^2+\lambda (b^2_{u}+b^2_{i}+\left\Vert q_{i} \right\Vert ^2+\left\Vert p_{u} \right\Vert ^2) \end{aligned}$$
(4)

We involve Biased MF heavily in our experiment due to its excellent combination of simplicity and reliability.

2.2 Decentralized Recommendations

To convert a centralized RS into a decentralized RS, service providers need to push the data storage and processing tasks to the users’ end. Either the user data can be distributed efficiently, or a secure protocol such as a safe peer-to-peer structure is provided to allow information exchange. In a decentralized RS, every user keeps a fraction of the training data and is responsible for generating their own recommendations locally. Some researchers managed to shift the learning process to the users’ end to resolve privacy concerns [3, 17, 18].

However, there are inevitable vulnerabilities in these models. For example, when users exchange ratings directly, a malicious user is able to gather other users’ ratings by giving positive feedback to all locations. Alternatively, when only latent factors are exchanged, a malicious user can tell that another user visited a specific place if they share a similar latent factor associated with the same location. Each of the researchers made their breakthroughs and have solved different problems, but many of them remain.

3 Model and Methodology

3.1 Preliminaries

In a centralized or traditional recommender system, suppose we use \(\boldsymbol{u}\) to denote a user (customer) and \(\boldsymbol{i}\) an item (POI), then \(\boldsymbol{U}\) and \(\boldsymbol{I}\) are the user and item sets, where we have \(\boldsymbol{u}\) \(\in \) \(\boldsymbol{U}\) and \(\boldsymbol{i}\) \(\in \) \(\boldsymbol{I}\). \(\boldsymbol{m}\) and \(\boldsymbol{n}\) represent the sizes of \(\boldsymbol{U}\) and \(\boldsymbol{I}\), respectively. A rating \(\boldsymbol{r}\)\(_{ui}\) indicates the preference of user \(\boldsymbol{u}\) over item/POI \(\boldsymbol{i}\). In our dataset, each rating \(\boldsymbol{r}\)\(_{ui}\) \(\in \) [1, 5], where 1 indicates least favored and 5 most favored. As it was introduced in the previous section, we use \(\boldsymbol{\hat{r}}\)\(_{ui}\) for predicted ratings and \(\boldsymbol{r}\)\(_{ui}\) for their observed counterparts. Aside from the objective function in Eq. (4), if we denote the rating matrix by \(\boldsymbol{R}\), then we have the following formula:

$$\begin{aligned} R_{m\times n}\approx P_{m\times k} \cdot Q^T_{n\times k} \end{aligned}$$
(5)

where \(\boldsymbol{k}\) is the number of latent factors that are retained, \(\boldsymbol{p}\)\(_{u}\) and \(\boldsymbol{q}\)\(_{i}\) are column vectors of the two matrices, respectively. For the MF models, unless specified, we use \(\boldsymbol{P}\) \(\in \) \(\mathbb {R}^{m\times k}\) to denote user latent factor matrix, and \(\boldsymbol{Q}\) \(\in \) \(\mathbb {R}^{n\times k}\) to denote the item latent factor matrix. Furthermore, we define \(\boldsymbol{T}\)\(_{r}\) as the training set and \(\boldsymbol{T}\)\(_{e}\) as the test set. Typically, all MF methods require learning user and item’s latent factors by regressing over the known user-item ratings from the pre-processed training dataset [2]. Because of this, both Eqs. (4) and (5) aim to find the optimal \(\boldsymbol{P}\) and \(\boldsymbol{Q}\) that minimizes \(\left\Vert R-P \times Q^T \right\Vert \). Finally, we denote the constant in the regularizing terms in Equations such as (1) and (4) by \(\boldsymbol{\lambda }\). Both \(\boldsymbol{k}\) and \(\boldsymbol{\lambda }\) are adjusted and tuned using cross-validation. We use Stochastic Gradient Descent (SGD) to solve the least squares optimization.

In our proposed RSN framework, we break down the centralized RS into multiple local entities. Each area of the city has an independent RS which maintains its own users, and is considered as a local group, denoted by \(\boldsymbol{g}\)\(_{i}\) (\(\boldsymbol{g}\)\(_{1}\) \(\cup \) \(\boldsymbol{g}\)\(_{2}\) \(\cdots \) \(\cup \) \(\boldsymbol{g}\)\(_{n}\) \(=\) \(\boldsymbol{U}\)). The set of all groups is represented by \(\boldsymbol{G}\). In addition to the physical user (real customers) set \(\boldsymbol{U}\), we introduce a virtual user (generated fake customers) set \(\boldsymbol{V}\). If we define the matrix that stores the virtual users’ ratings as \(\boldsymbol{R}\)\(_{v}\) and the real users \(\boldsymbol{R}\)\(_{r}\), then we have:

$$\begin{aligned} R_{train} = \genfrac[]{0.0pt}0{R_{r}}{R_{v}} \end{aligned}$$
(6)

In practice, users are encouraged to choose a nearby vendor they trust to receive recommendations. In order to simulate the real scenario in the experiment, users in the dataset are clustered beforehand. The clustering is based on the Pearson Correlation Coefficient (PCC) of users’ ratings. Specifically, for any pair of users \(\boldsymbol{a}\) and \(\boldsymbol{b}\), the similarity between the two is defined as:

$$\begin{aligned} S_{ab\_PCC} = \frac{\sum _{i{\in }I_{ab}}(r_{ai}-\mu _{a})\cdot (r_{bi}-\mu _{b})}{\sqrt{\sum _{i{\in }I_{ab}}(r_{ai}-\mu _{a})^2}\cdot \sqrt{\sum _{i{\in }I_{ab}}(r_{bi}-\mu _{b})^2}} \end{aligned}$$
(7)

where \(\boldsymbol{\mu }\)\(_{a}\) and \(\boldsymbol{\mu }\)\(_{b}\) are the average ratings of users \(\boldsymbol{a}\) and \(\boldsymbol{b}\), and \(\boldsymbol{I}\)\(_{ab}\) is the item set that \(\boldsymbol{a}\) and \(\boldsymbol{b}\) both rated. After the affinity matrix is constructed, we then perform the kernel k-means clustering to minimize their in-cluster variance. The PCC is chosen since it has the best performance with respect to mean absolute error (MAE) in neighborhood based RS models [4]. Once all the users are clustered, the cluster centroids are treated as virtual users and sent to all the other RSs.

3.2 Problem Description

We chose two separate cities and their nearby districts to evaluate our model. Most users are only active in a particular town and remain in specific places. This phenomenon is usually referred to as “location aggregation” [3]. For example, Fig. 2 shows the points are aggregated where each point is a visiting record. The x-axis and y-axis represent the user and POI IDs, respectively.

Fig. 2.
figure 2

Yelp dataset user visiting behavior analysis. Each point in the figure represents a check-in record.

Users select local businesses they trust to share their data before getting recommendation services. Accordingly, we split all the users into different groups to simulate real user activities. This step in simulation is not required in practice since users choose their trusted vendors spontaneously. The paper estimates a user’s active location (i.e., the latitude and longitude) by their previously visited POIs (Eq. 8 and 9):

$$\begin{aligned} Lat_{u}= \frac{1}{\left| I_{u}\right| }\sum _{i\in I_{u}}{Lat_{i}} \end{aligned}$$
(8)
$$\begin{aligned} Lon_{u}= \frac{1}{\left| I_{u}\right| }\sum _{i\in I_{u}}{Lon_{i}} \end{aligned}$$
(9)

where \(\boldsymbol{I}\)\(_{u}\) denotes all the POIs a user visited.

However, the side effect of such action is what it makes the already sparse POI rating matrix even sparser. To increase the number of ratings that can be used to train each model, we provide each local RS with virtual users’ ratings. A local RS generates a certain number of virtual users by clustering the existing users. We denote the cluster as \(\boldsymbol{c}\) and the user set of that cluster as \(\boldsymbol{U}\)\(_{c}\).

$$\begin{aligned} r_{vi}= \frac{\sum _{i \in I_{uv},u \in U_{c}} r_{ui}}{\left| U_{c}\right| } \end{aligned}$$
(10)

Equation 10 shows how a virtual user rating is estimated. For a virtual user \(\boldsymbol{v}\), its rating toward location \(\boldsymbol{i}\) is approximated by computing the mean value of the ratings left by users in the same cluster (\(\boldsymbol{u}\) \(\in \) \(\boldsymbol{U}\)\(_{c}\)) who visited the same location (\(\boldsymbol{i}\) \(\in \) \(\boldsymbol{I}\)\(_{uv}\)). The virtual users, whose ratings are shared by all RSs in a RSN, summarize the preferences of physical users.

3.3 Generating Recommendations

When acquiring recommendations, a user downloads the two factorized matrices \(\boldsymbol{P}\) and \(\boldsymbol{Q}\) as defined in Eq. (5), from the local recommender system. This way, the dimensions of the original rating matrix are indirectly reduced, decreasing the download time. The original ratings are also slightly perturbed, making it harder to backtrace the user’s checking-in records. The user then reconstructs the rating matrix \(\boldsymbol{\hat{R}}\) on his or her personal device. By searching and finding the most similar user, according to (7), the user can acquire all the predicted ratings for any unvisited POIs. Users can choose whether to share the personal information they hold.

The information downloaded is not only anonymous but also a combination of real users and virtual users. The reconstructed matrix \(\boldsymbol{\hat{R}}\) is different from the rating matrix stored on the local server since the number of virtual users is dynamic, and all ratings are perturbed.

3.4 The Algorithm

Since the simulation of users choosing trustworthy vendors and generating virtual users play an essential role in our experiment, we organize the work and show it in Algorithm 1.

figure a

Each vendor-based RS possesses a training set \(\boldsymbol{Tr}\)\(_{g}\) in practice, learns the factorized matrices according to (4) and (5), and then make the matrices \(\boldsymbol{P}\) and \(\boldsymbol{Q}\) ready for download for its users. The final recommendations are generated on every user’s personal device which further decreases the workload for each vendor-based RS.

3.5 System Update and Maintenance

While the data is static in our simulation, the real-world users continuously move and change their active visiting areas. Furthermore, for privacy concerns, there should not exist a link between cluster centroids and physical users, which prevents the updating process from using the same users. Therefore, each time a clustering is completed, its components, centroids, and the number of clusters will be different from the previous one. To make the cluster centroids better represent real users’ personal preferences, two mechanisms are implemented:

  • The clustering needs to be regularly performed to generate new virtual users.

  • The old virtual users need to be turned inactive after the clustering becomes obsolete.

In real-world scenarios, each virtual user is attached with a timestamp. Once it reduces to zero, the virtual user expires and is then removed. When a virtual user is created, the local RS will broadcast it to all the RSs in the same network. The recipients will decide if the information is useful, depending on the overlap between virtual users’ visited locations and the item set of the current RS.

4 Experiments

4.1 Datasets

Fig. 3.
figure 3

The user visiting location plots (Las Vegas). Each point represents a user’s visiting at real-world geographical position.

We use two subsets of the Yelp business review dataset [1]. The first set was collected in the Urbana-Champaign area, and the second was from the city of Las Vegas and its surrounding areas. The ratings’ type is explicit rating (from 1 star to 5 stars) collected by Yelp between January 2007 and December 2017. We removed the users with too few ratings and repeated ratings.

In our test, all local RSs are in the same city or metropolitan area. However, in real-world scenario, if the RSs share any identical items or have overlapped item sets, communication can be established, and RSs in the same RNS can then enhance each other.

During pre-processing, we adopted multiple ways to test the appropriate number of RSs in a network. As discussed in previous sections, we need to group the users based on their visiting locations to simulate the real-world scenario. Figure 3 shows the users’ visited POIs, which almost reflects the streets’ shape in Las Vegas and nearby areas. However, when we attempt to guess users’ real locations by plotting their Euclidean centers of all the visited places, the results did not illustrate apparent segregation. Clustering methods, including k-means, spectral, and density-based spatial, were all tested, and eventually, we chose k-means for its simplicity and straightforwardness. When comparing different clustering methods, we evaluate both the results of accuracy and the balance of user numbers among each area.

Table 1. Datasets statistics

After pre-processing, the details of the datasets are listed in Table 1. We sort all the ratings in chronological order and split them by the ratio of 0.2 with the first 80% of ratings for training and 20% for testing.

4.2 Evaluation Metrics

We adopted two metrics to evaluate the model performance, the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Although accuracy is not always the best metric to evaluate recommender systems [10], minor accuracy improvements, measured by RMSE or MAE, can still pose significant impacts on the quality of top-k recommendations [8, 15]. Equations (10) and (11) show the details of the definition:

$$\begin{aligned} RMSE= \sqrt{\frac{\sum _{u,i\in Te}(r_{ui}-\hat{r}_{ui})^2}{m}} \end{aligned}$$
(11)
$$\begin{aligned} MAE=\frac{\sum _{u,i\in Te}\left| r_{ui}-\hat{r}_{ui}\right| }{m} \end{aligned}$$
(12)

4.3 Results and Discussion

In our experiment, we compare our results with three existing models:

  • The MF model, promoted by one of the Netflix winners Simon Funk [6]. We chose the one that has integrated the baseline model proposed in [15]. The two latent matrices in (5) are factorized and learned using the objective function in (4). We used a well-built version to represent a typical centralized RS [7].

  • The DMF model, a decentralized scheme that only allows users to exchange gradient loss among neighbors during training [3]. Like most decentralized RSs, there is no data stored on the server. All personal information is kept on users’ devices.

  • The baseline model, of which the prediction function is defined by Eq. 3 but without the last term. A predicted rating is merely calculated by adding the global mean, column bias, and row bias, and there are no iterative updates involved.

We opt for straightforward recommendation methods over complex models. In our experiment, the core model can be replaced or combined with other schemes. Each local RS can use different algorithms to generate recommendations.

Fig. 4.
figure 4

Local RSs accuracy results (Urbana-Champaign). Four models’ RMSE results for each local RS in Urbana-Champaign area.

Fig. 5.
figure 5

Local RSs accuracy results (Las Vegas). Four models’ RMSE results for each local RS in Las Vegas area.

In the Urbana-Champaign dataset, users were divided into four sections based on their frequently visited locations. In contrast, the Las Vegas metropolitan area has been categorized into ten smaller regions based on the same criterion. The results of every local RS are shown in Fig. 4 and Fig. 5. The different number of groups is due to the different number of users and the cities’ scale. On one hand, if we keep the user size too small for an area, the number of users that can be clustered would be too small, leading to insufficient clusters. On the other hand, if this size is too large, each cluster will have too many users, causing the centroids to be too general to reflect physical users’ preferences and interests.

In Fig. 4 and Fig. 5, for local RSs in RSN, their performance oscillates up and down on the curves formed by centralized RSs. In most cases, their accuracy is only slightly better than each local RS in an RSN. Occasionally, for some specific areas, such as areas 2 and 3 in Fig. 4 or areas 4 and 5 in Fig. 5, RSs in RSN produced higher accuracy than the centralized RSs. In practice, each local RS in an RSN can virtually work with any model and does not have to uniformly use the same method, so theoretically our proposed model has the potential to outperform a centralized RS. The reason is similar to why a hybrid model performs consistently better than a pure model.

One thing to point out is the way we calculate the average RMSE and MAE. We assume that each region has a local RS to generate its recommendations using real ratings and virtual ratings. It is necessary to estimate the performance of the RSN using all local RSs’ average MSE and RMSE. For MAE, the average is the exact mean value of all the MAEs from every local RS. For the RMSE, however, the average RMSE is estimated by calculating the MSE first, and then compute the average RMSE by taking the square root of the mean value of MSE.

As far as hyper-parameters, in Fig. 5, where every RS in the RSN uses biased MF as the default model, the number of latent factors \(\boldsymbol{k}\)(40) and learning rate \(\boldsymbol{\lambda }\)(0.1) were the same as the centralized MF model. In the DMF model, we used the same value for \(\boldsymbol{k}\) and set the regularizer to 0.01 and the learning rates to 0.05. We probed each model with \(\boldsymbol{k}\) \(\in \) \(\{5, 40\}\), the learning rate \(\boldsymbol{\lambda }\) \(\in \) \(\{0.01, 0.5\}\), and the regularizer between \(\{0.001, 10\}\).

Table 2. Datasets statistics
Fig. 6.
figure 6

Rating distributions. (Red: Urbana-Champaign, Blue: Las Vegas). The ratings are ordered from old to new. (Color figure online)

Table 2 shows the results for different models. It is apparent that all MF models performed better on the Urbana-Champaign dataset. There could be two reasons. First, the Urbana-Champaign dataset is more compact, meaning a small area with relatively sufficient users and POIs to analyze their preferences. Although the Las Vegas dataset is from a densely populated city, it is still too sparse geographically. In fact, this dataset includes visiting records from the city of Las Vegas, North Las Vegas, Spring Valley, Paradise, and all small towns nearby. Second, as shown in Fig. 6, the percentage of new businesses in the Las Vegas area is higher than that in Urbana-Champaign. As mentioned previously, we ordered the ratings chronologically, enabling the models to use old data to predict new ratings for simulating real-world scenarios.

In our first dataset, compared to the centralized Biased MF model, the accuracy tradeoff for our RSN model is very small if not trivial (as low as 0.0175 in RMSE and 0.0261 in MAE). The tradeoff is more significant when the recommendation method is changed from Biased MF to the Baseline, but it is still smaller than 0.1. This test result is under the circumstances that all local RSs in our RSN uniformly use the same method and much fewer ratings (1/4 of the total ratings in Urbana-Champaign dataset, 1/10 of the total ratings in Las Vegas dataset). With different recommendation methods implemented, the RSN can achieve better performance. Since each local RS maintains a much smaller set of users to generate recommendations, it reduces the training time. Also, with the help of “virtual users,” it scored similar accuracy as a centralized RS. In contrast to a completely decentralized RS such as DMF, the RSN sacrifices much less accuracy for privacy preservation.

In this experiment, we estimated users’ preferred vendors who hold their personal history using the locations of their most frequently visited stores. This although is not the most accurate way, is the best option in our assessment due to the limited information. We are positive that in practice, the overall performance of the proposed RSN framework will be better. As a summary, the RSN can lower the privacy risk and boost user confidence with minimal loss of prediction accuracy. Each local RS has a light workload and fast speed of convergence. Moreover, should there be any data breach, it is easier to investigate and control the damage.

5 Conclusion and Future Work

Conventional centralized RSs face high risks in protecting users’ privacy because they carry all personal information in the same system. On the other hand, even though decentralized models can eliminate the risks of data breaches, they almost give up all further opportunities for data analysis or mining.

In this paper, we proposed a recommender system network scheme that takes advantage of centralized RSs and decentralized RSs. By introducing virtual users, we can lower the data leakage risk while still generate accurate recommendations. The scheme focuses on both the collaboration among users and among the RSs hosted by small local businesses that people trust.

Future work includes integrating a distributed neural network into our RSN, RS-to-RS communication, and exploring other ways to create virtual or synthetic users, i.e., users who are not physical but can reflect real users’ interests and preferences