Keywords

1 Introduction

Collaborative filtering (CF) leverages the historical interactions between users and items to predict their preferences on a new set of items [12]. It provides recommendations by modeling users’ preferences on items based on their historical user-item interactions (e.g., explicit five-star rating and implicit 0–1 feedback on items) [5]. However, the direct collecting and modeling on the original data might reveal personally sensitive data and brings privacy concerns, since a neural model generally overfits on specific training examples in the sense that some of these examples are implicitly memorized.

Recently, many researches focus on DP-based recommendation approaches. Differential privacy (DP) has emerged as a strong privacy notation with a provable privacy guarantee. McSherry et al. [8] applied differential privacy theory to recommender systems for the first time, and Nguyên et al. [9] applied local differential privacy (LDP) to help users to hide their own information even from first-party services during the data collection process. However, existing DP-based models generally focus on the training data privacy but overlook the data privacy in practice while applying trained models (namely during testing). A DP-based CF model needs to be retrained when applying it on new users that are not in the training data, which is computationally expensive. Intuitively, an LDP-based method can protect user privacy during testing by applying random response from the client side. However, LDP can only guarantee the accuracy of the statistical result while avoiding individual record disclosure, leading to the fact that a single input from the client is often flooded with too much random noises, which reduces the recommendation utility.

In this work, we propose PrivRec, a user-centric differentially private collaborative filtering approach, that preserves recommendation utility. PrivRec leverages a user-centric privacy enhancing algorithm to privately model users from the client sides, which protects the data privacy of both training data and the data during testing. Specifically, an LSH-based user data modeling process is applied to generate user representations with intuitive privacy, and the Laplace mechanism is leveraged to provide theoretical privacy. Moreover, a knowledge distillation architecture is applied for further privacy guarantee, as the released model does not have the access of the original sensitive user information in the training data. Our contributions can be summarized as follows:

  • We propose PrivRec, a user-centric differentially private collaborative filtering approach that protects user privacy, while retaining recommendation utility.

  • We address the challenge of the privacy-enhanced client-side utility-preserving user modeling with a locality sensitive hashing-based user representation algorithm that applies Laplace mechanism.

  • We prevent the privacy disclosure from the potential overfitting of the models by introducing the knowledge distillation architecture. The released student model is trained without any access to the original sensitive data.

  • Experimental results on two real-world datasets demonstrate that PrivRec outperforms other neural collaborative filtering-based methods on retaining recommendation utility with privacy guarantees.

2 Preliminaries

2.1 Differential Privacy

Differential privacy (DP) [3] has become the de facto standard for privacy preserving problems. Local differential privacy (LDP) is a special case of DP where the random perturbation is performed by the users on the client side.

Definition 1

A randomized mechanism \(\mathcal {M}\) satisfies \(\epsilon \)-differential privacy (\(\epsilon \)-DP) if for any adjacent sets d, \(d'\) differing by only one record for any subset S of outputs \(S\subseteq R\),

$$\begin{aligned} \Pr [\mathcal {M}(d)\in S] \le e^\epsilon \cdot \Pr [\mathcal {M}(d')\in S], \end{aligned}$$
(1)

where \(\Pr \) denotes the probability and \(\epsilon \) is positive. Lower values of \(\epsilon \) indicates higher degree of privacy.

Laplace Mechanism is the most commonly used tool in differential privacy and has been applied in a number of works on differential privacy analysis [8, 15].

Definition 2

For a real-valued query function \(q: \mathcal {D}^n \rightarrow R\) with sensitivity \(\varDelta \), the output of Laplacian mechanism will be,

$$\begin{aligned} \mathcal {M}(d) = q(d)+Lap(\frac{\varDelta }{\epsilon }), \end{aligned}$$
(2)

where \(Lap(\varDelta )\) is a random variable drawn from the probability density function

$$\begin{aligned} Lap(x) = \frac{1}{2\varDelta }e^{-\frac{|x|}{\varDelta }}, \forall x\in R. \end{aligned}$$
(3)

2.2 Locality Sensitive Hashing

Locality Sensitive Hashing (LSH) [4] is an effective approach for approximate nearest neighbor search and has been applied in many privacy-preserving tasks [14]. The main idea of LSH is to find a hashing function or a family of hashing functions such that a hash collision occurs on similar data with higher probability than others, i.e., it is likely that, (i) two neighboring points are still neighbors after hashing, and (ii) two non-neighboring point are still not neighbors after hashing. For data in domain S with distance measure D, an LSH family is defined as:

Definition 3

A family of hashing functions \(\mathcal {H}=\{h:S\rightarrow U\}\) is called \((d_1,d_2,p_1,p_2)\)-sensitive, if for any \(x,y\in S\),

$$\begin{aligned} \text {If } D(x,y)\le d_1, \text {then } \Pr [h(x)=h(y)]\ge p_1, \end{aligned}$$
(4)
$$\begin{aligned} \text {If } D(x,y)\ge d_2, \text {then } \Pr [h(x)=h(y)]\le p_2. \end{aligned}$$
(5)

MinHash. Minwise hashing [2] is the LSH for resemblance similarity, which is usually leveraged to measure text similarity. MinHash applies a random permutation (i.e., hashing function) \(\pi :\varOmega \rightarrow \varOmega \) on the given set S, and stores only the minimum value of the results of the permutation mapping. Specifically, a permutation mapping randomly shuffles all possible items of the input, and returns the corresponding indices of input items. Formally, the result of MinHash (namely signature) is defined as,

$$\begin{aligned} h^{min}_\pi (S)=\min (\pi (S)). \end{aligned}$$
(6)

Given sets \(S_1\) and \(S_2\), the probability of that the two sets have the same signature is shown as,

$$\begin{aligned} \Pr (h^{min}_\pi (S_1)=h^{min}_\pi (S_2))=\frac{|S_1\bigcap S_2|}{|S_1\bigcup S_2|}, \end{aligned}$$
(7)

By applying multiple independent MinHash permutations, the probability of the two sets having the same MinHash signature is an unbiased estimate of their Jaccard similarity [13].

2.3 Knowledge Distillation

Knowledge Distillation (KD) is introduced by Hinton et al. [6], to transfer “knowledge” from one machine learning model (the teacher) to another (the student). The main idea of KD is that the student model can learn the distilled information directly from the teacher model, in order to reduce the amount of parameters while retain the performance. Specifically, the teacher model is trained on the original training data with ground-truth labels, and then the student model is fed with the same input but set the outputs of hidden layers in the teacher model as targets.

3 Threat Model

We are addressing two major privacy threats, training data privacy and privacy in practice. The training data privacy is the common problem since the identity-revealing user representations are stored in the first-party service providers or memorized by the potential overfitting of machine learning models. The potential data breach brings privacy threats on the training data. When applying a trained machine learning model, the privacy in practice is often overlooked. Consider a client-server scenario, a client-side user is faced with one of the following problems: (i) if the user is an existing user whose historical data are in the training set, it is expected to send its identifier (e.g., user ID, cookies) so that the server can retrieve the corresponding representations, which reveals its identity; (ii) if the user is a new user, it cannot get personalized recommendations since they are not modeled, or it has to send its historical data for the server to infer its preferences, which brings threats of privacy disclosure.

4 Design of PrivRec

4.1 Framework

Our approach follows the teacher-student knowledge distillation (KD) structure. We illustrate the visualized framework of PrivRec in Fig. 1. It consists of two parts: the teacher model and the student model. We leverage a multi-layer perceptron (MLP) to perform neural collaborative filtering (NCF) by predicting the interaction between a user and an item. It is the prototype of many recent researches on CF-based neural recommender systems [1, 5, 7, 11]. Next, we will present the specifics of the PrivRec.

Fig. 1.
figure 1

The general framework of our approach. The teacher-only modules are replaced with the student-only modules in the student model.

4.2 Components

Teacher. The teacher model is a naive NCF model trained on the training data without any privacy constraints. Consider a user u and an item v, the network structure is described as follows. The model takes the identifiers u and v as input and embeds them to the \(d_u\) and \(d_v\)-dimensional latent spaces with embedding layers \(emb_u:u\rightarrow p_u\) and \(emb_v:v\rightarrow p_v\). Then, \(p_u\) and \(p_v\) are projected to matrix factorization spaces with the same dimension \(d_m\), denoted as \(m_u: p_u\rightarrow f_u\) and \(m_v:p_v\rightarrow f_v\). The calculation layer concatenates the summed dot-product of the factors \(dot(f_u,f_v)\) and the results of two bias layers \(bias:f\rightarrow b\) for the two factors. Finally, the result of the calculation layer is passed to a regression layer composed by a fully-connected layer with the sigmoid activation for the final prediction \(\sigma :l_{u,v}\rightarrow \hat{r}_{u,v}\). Overall, the teacher learns the latent representation of users and items for the prediction. The teacher model is not published for privacy of the training data. Formally, we have:

$$\begin{aligned} f_u=m_u(emb_u(u)),\quad f_v=m_v(emb_v(v)), \end{aligned}$$
(8)
$$\begin{aligned} \hat{r}_{u,v}=\sigma (calc(f_u,f_v))=\sigma (cat(dot(f_u,f_v), bias(f_u), bias(f_v))). \end{aligned}$$
(9)

Student. The structure of the student model is similar with the teacher’s, but takes different inputs and targets. It is this model that be published to the deployment environment in practice. We reuse the item embeddings \(p_v\) in the teacher model, since they do not reveal user-specific information. The user embedding layer is replaced by a differentially private locality sensitive hashing(LSH)-based representation procedure described in Algorithm 1. The MinHash functions are publicly available and identical for all user, so that the permutations stay unchanged both in server-side training process and in client-side practice. The hashing signatures are arranged into a vector, where each entry represents a perspective of the user data. Accordingly, the user representation can be computed completely on the client side with the publicly available LSH permutations and the user’s private positive item set.

figure a

The remaining difference between the teacher model and the student model is that the student is not trained on the ground-truth labels. Instead, it learns from the output of the teacher’s user factorization layer \(f_u\) and the last hidden output \(l_u,v\). The LSH-based user representation \(sig_u\) obtained from the above algorithm is projected to a fully-connected layer as the student’s user factor \(fc:sig_u\rightarrow f'_u\), and the output of the calculation layer is \(l'_{u,v}\).

We summarize the working process of the student model as follows. (i) It projects user u’s identifier into a latent space with the LSH-based algorithm for the MinHash signatures \(sig_u\) as Algorithm 1; (ii) The signatures are fed into a fully-connected layer to get the student’s user factor \(fc:sig_u\rightarrow f'_u\); (iii) the item v’s factor \(f_v\) is obtained from the teacher model; (iv) the factors \(f'_u\) and \(f_v\) are passed to the calculation layer; (v) finally, the regression layer is applied to produce the final prediction.

Detailed privacy analysis of PrivRec is omitted due to page limit. Please refer to the extended version on arXiv.

5 Experiments

5.1 Experimental Settings

Datasets. We conduct our experiments on the commonly used MovieLens 1M dataset of movie ratings and the Jester dataset of joke ratings. We filter out the invalid data, including the users and items with no ratings in the dataset, and the users that rate equally on all items.

Metric. To measure the effectiveness of knowledge distillation . The recommendation utility is measured as the accuracy of rating prediction. We adopt mean absolute error \(\text {mae}(y_{predict}, y_{truth})=E|y_{predict}-y_{truth}|\) as the metric of the recommendation accuracy. A lower mae indicates a more precise rating prediction on collaborative filtering.

Baselines. We evaluate our model by comparing with several widely used baseline methods. Specifically, neural collaborative filtering models with following pretrained user representations are considered:

  • NCF: The original Neural Collaborative Filtering method that models users and items into representation vectors. It is the prototype of our teacher model without privacy constraints.

  • SVD: Singular Value Decomposition, a matrix factorization method, is one of the most popular collaborative filtering methods. It takes user-item interaction matrix as input, and returns the low-dimensional representations of users.

  • LDA: Latent Dirichlet Allocation is an unsupervised learning algorithm for natural language processing initially, and discover topics based on contents. We employ LDA to learn a user’s latent preference as its representation.

  • LSH: Locality Sensitive Hashing is the prototype of our LSH-based user representation. It applies multiple MinHash signatures on a user’s historical data as the user’s representation vector.

  • DP-SVD, DP-LDA and DP-LSH: We apply the differentially private Laplace mechanism on the baselines above to explore the utility preservation of our LSH-based differentially private user representation.

5.2 Results

We compare our PrivRec with the mentioned baselines for the neural collaborative filtering recommender systems. As shown in Fig. 2(a), by comparing the first six methods in the legend ([DP-]SVD, [DP-]LDA, [DP-]LSH), we observe that Laplace mechanism significantly degrades the recommendation service utility of SVD and LDA, while LSH-based method is having less trade-off in applying differential privacy. This demonstrates that the LSH-based user representation implies more user preferences than traditional methods after introducing the same amount of noises. According to the last three methods in the legend (DP-LSH, NCF, PrivRec), the knowledge distillation (KD) architecture of our PrivRec substantially improves the recommendation utility of the DP-LSH method. PrivRec shows comparable averaged recommendation utility with the baselines, with the same privacy budget of differential privacy and intuitively stronger privacy within its knowledge distillation training process.

Fig. 2.
figure 2

Experimental results.

Further experimental results on the influence of the noise scale and the amount of MinHash signatures are shown in Figs. 2(b) and (c). The influence on \(\epsilon \) is measured with the averaged result on different K, and vice versa. A greater eps indicates smaller \(\epsilon \), which introduces more noises into MinHash signatures and provides stronger guarantee on user privacy, while slightly downgrading prediction accuracy. A larger K implies more facets of hashing and more information of user data, and further reduces rating prediction error.

6 Conclusion

In this work, we focus on both training data privacy and the privacy in practice during testing in neural collaborative filtering. The proposed PrivRec is an early effort to protect user privacy in practice from the client side. It manages to model a user locally and privately with a LSH-based algorithm and the DP principle. PrivRec shows promising results with privacy guarantees on NCF. In the future, we will extend it to more variety of recommender systems.