Keywords

1 Introduction

Today, modern mobile phones have the capability to detect proximity of other users and offer means to communicate and share data in ad-hoc way, with the people in the proximity, which naturally integrates those two trends: wireless opportunistic networking and decentralized online social networks, and leads to the great development and deployment of D2D based MSNP (Mobile social network in proximity), which is explicitly defined as: A wireless peer-to-peer (P2P) networking of spontaneously and opportunistically connected users (e.g., through the Bluetooth/WiFi interfaces on their mobile devices), exploits both geo-proximity and social interests as the primary filters in determining who is discoverable on the social network [1]. In contrast to traditional web-based online social networking, D2D based MSNP can enable more tangible face-to-face social interactions in public places such as parks, stadiums, and train stations, etc.

In MSNPs, individuals can maintain and store their sensitive data by themselves, which can alleviate the problem of big brother (privacy concern) in traditional MSN. This implies that the omniscient OSN provider that has become “a big brother”, collects and stores all user’s data (messages, profiles, location, relations, etc.), which may cause serious privacy concern, e.g., selling users’ personal information, and targeted advertising, However, MSNP users still face growing privacy concerns.

Basically, the first step toward effective D2D based MSNP is for mobile users to choose whom to interact with. As an example, Alice wants to conduct a proximal talk with nearby passengers at the airport. Since she can simultaneously interact with only one or a few persons, it is crucial for her to select those who can lead to the most meaningful social interactions: The natural way is to select those whose social profiles most match hers. Widely known as profile matching, this method is rooted in the social fact that people normally prefer to socialize with others having similar interests or background over complete strangers.

A major challenge for profile matching is to ensure the privacy of personal profiles which often contain highly sensitive information related to gender, interests, political tendency, health conditions, and so on. This challenge necessitates private matching, in which two users compare personal profiles without disclosing them to each other. Generally, there are two mainstreams of approaches to solving the privacy-preserving profile-based friend matching problem. The first category is converted into Private Set Intersection or Private Set Intersection Cardinality, whereby two mutually mistrusting parties, each holding a private data set, jointly compute the intersection, or the intersection cardinality of the two sets without leaking any additional information to either party. These schemes could enable only coarse-grained private matching and are unable to further differentiate users with the same attribute(s). To solve this problem and thus further enhance the usability of MSNP, the second category includes fine-grained private matching mechanisms, which consider a user’s profile as a vector with fine-grained attribute values, and measures the social similarity by private vector dot product [2].

Although, both kinds of approaches could effectively enforce privacy-preserving profile-matching among nearby users without the support of the trusted third party, they have the following disadvantage: Always rely on public-key cryptosystem and homomorphic encryption [36]. Usually, multiple rounds of interactions are required to perform the public key exchange and private matching between each pair of parties, which incurs high communication and computation costs to resource-limited mobile terminals in MSNP.

Based on non-homomorphic encryption-based privacy-preserving scalar product computation [7], an EWPM (Efficient Weight-based Private Matching) protocol was proposed to employ Confusion Matrix Transformation algorithm instead of computation-consuming homomorphic cryptographic system, to achieve the privacy preserving goal with a higher efficiency [8]. The main weakpoint in EWPM is that the inferred matching value doesn’t have strict semantic meaning, and can only roughly represents the profile similarity among users. For example, in the following Subsect. 3.3, we give a special case, in which the obtained matching values by EWPM for two pairs of users are identical, but according to strict similarity metric (e.g., cosine similarity), those two matching values are not same.

Based on the above observation, this paper designs a Lightweighted fIne-grained Privacy-Preserving Profile matching mechanism for D2D based MSNP, LIP3, which, in comparison with the existing CMT schemes (e.g., EWPM), can provide strict and accurate profile matching value-cosine similarity result among individuals. The numerical results show that LIP3 can provide more accurate similarity measurement than EWPM, and bring no more computation and communication overheads.

The rest of this paper is organized as follows. Section 2 gives the system model of LIP3, and the adversary models dealt with in this paper. In Sect. 3, we describe the details of the proposed system, LIP3, and give an example to illustrate the advantage of LIP3 over EWPM. In Sect. 4, the security and complexity analysis are schematically provided. Finally, we briefly conclude this paper.

2 LIP3 System Model

2.1 System Architecture of LIP3

When people join MSNPs, they usually begin by creating a profile, and then interact with other users. The content of profile could be very broad, such as personal background, hobbies, contacts, places they have been to, etc. Privacy-preserving profile matching is a common and helpful way to make new friends with common interest or experience, find lost connections, or search for expert, without revealing participants’ personal and private profiles.

Specifically, each user’s interest profile is defined from a public attribute set consisting of \( n \) attributes. The number of \( n \) may range from several tens to several hundreds. Each attribute is associated with a user-specific integer value \( i \in [1,l] \) (called as the weight of an attribute) indicating the corresponding user’s association with this attribute. The higher the value of this attribute is, the more interest the user has in the attribute. Usually, letting \( l \) equal 10 may be sufficient to differentiate user’s interest level. Suppose two users Alice and Bob’s interest sets are characterized as the following profile vectors \( \vec{\varvec{u}}_{A} = \left( {u_{{A_{1} }} ,u_{{A_{2} }} , \ldots ,u_{{A_{n} }} } \right) \) and \( \vec{\varvec{u}}_{B} = \left( {u_{{B_{1} }} ,\,u_{{B_{2} }} , \cdots ,u_{{B_{n} }} } \right) \), respectively. Each individual can modify her/his profile later on when needed. The most widely applied similarity metric to infer the matching value between individuals, say Alice and Bob, is cosine similarity:

$$ similarity\,\left( {A,B} \right) = \frac{{\vec{u}_{A} \cdot \vec{u}_{B} }}{{\left\| {\vec{u}_{A} } \right\| \cdot \left\| {\vec{u}_{B} } \right\|}} = \frac{{\mathop \sum \nolimits_{i = 1}^{l} u_{{A_{i} }} \cdot u_{{B_{i} }} }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{l} \left( {u_{{A_{i} }} } \right)^{2} } \cdot \sqrt {\mathop \sum \nolimits_{i = 1}^{l} \left( {u_{{B_{i} }} } \right)^{2} } }} $$
(1)

Assume that Alice wants to find someone to chat, e.g., when waiting for the flight to depart. As the first step (Peer discovery), she broadcasts a chatting request via the MSNP application on her smartphone to discover proximate users of the same MSNP application. Suppose that she receives multiple responses including one from Bob who may also simultaneously respond to other persons. Due to time constraints or other reasons, both Alice and Bob can only interact with one stranger whose profile best matches hers or his. The next step (Profile Matching) is thus for Alice (or Bob) to compare her (or his) profile with those of others who responded to her (or whom he responded to). LIP3 will enable two users to measure the accurate similarity value between the above fine-grained privacy-preserving personal profiles using cosine similarity metrics.

Figure 1 illustrates the system architecture of the proposed privacy-preserving profile matching scheme LIP3, which is composed of two mobile users with specific interest profiles, and several component which facilitate the similarity calculation in LIP3 scheme.

Fig. 1.
figure 1

System architecture of LIP3 scheme

The plain profile vectors in the initiator (say Alice) and responder (say Bob) are firstly transformed into corresponding attribute matrices through CMT, which can completely describe users’ profiles. Then the initiator encrypts her attribute matrix and sends it to responder. The responder Bob will calculate the multiplication between the received encrypted matrix and the attribute matrix of herself. The obtained matrix (in Level I privacy) or the scalar value (in Level II privacy) will be sent to initiator who, then decrypts and obtains the cosine similarity between initiator and responder. Note that, in our proposal, the module of responder’s profile vector should be explicitly sent to initiators.

2.2 Adversary Models

There exists attacks from outside adversaries, such as eavesdropping the wireless communication channel or modifying, replying and injecting the captured messages. We assume the users in our protocol are honest-but-curious (HBC), which means they will comply with the algorithmic procedure, but they are curious about other users and try to learn more information than allowed. Furthermore, some users may be inside attackers who monitor the matching process and obtain the intermediate results without complying the agreements. They try to infer users’ profiles through these observations. Based on the adversary models, similar as [8], the following two privacy levels are defined.

  • Level-I privacy: when LIP3 ends, both the initiator and responder learn nothing about each other’s attribute, when they are HBC.

  • Level-II privacy: when the LIP3 ends, both the initiator and responder learn nothing about each other’s attribute, even when they are inside attackers.

3 The Detailed Procedure of LIP3

In our scheme, each individual, say Alice’s profile vector is explicitly encoded into a profile matrix \( \varvec{A}_{l \times n} \) whose elements depend on the individual’s personal attributes and weights. This matrix can completely describe an user’s profile, in which the row vectors indicate the weight of interest and column vectors mean the public attribute. Specifically, if the value of the jth attribute of Alice is set as \( i \) (\( i \in [1,l] \)), then she sets \( a_{ij} = 1 \) and \( a_{mj} = 0 \) where \( a_{mj} \in \varvec{A}_{l \times n} \), \( m \ne i \).

A MSNP session involves two users and usually consists of three phases. First, two users need discover each other in the neighbor-discovery phase. Second, they need compare their personal profiles in the matching phase. Last, two matching users enter the interaction phase for real information exchange.

3.1 Preliminary LIP3 that Satisfy the Level-I Privacy

The main contribution of our paper is that LIP3 explicitly defines a weight matrix \( \varvec{W}_{l \times l} = (w_{ij} )_{l \times l} \) through which the accurate cosine similarity can be inferred, without revealing individuals’ private profiles. Specifically, the element \( w_{ij} \) is given as the Eq. (2).

$$ \left( {w_{ij} } \right)_{l \times l} = i \cdot j $$
(2)

In LIP3, the initiator (Alice) and responder (Bob) respectively hold the attribute matrices \( \varvec{A}_{l \times n} \) and \( \varvec{B}_{l \times n} \), which are transformed from the both users’ plain profile vectors. \( p \) and \( q \) are two large primes. \( \varvec{C}_{l \times n} \) and \( \varvec{R}_{l \times n} \) are two matrixes used for hiding personal information. The vector \( \overrightarrow {\varvec{k}} \) is the secret key kept by initiator to decrypt the original results. The detailed procedure of LIP3 is given as follows.

  • The initiator initializes her personal profile according to Algorithm 1, which can be run offline, and broadcasts her friend discovery request to others. When Algorithm 1 ends, the initiator keeps \( \overrightarrow {\varvec{k}} = [k_{1} ,k_{2} , \cdots ,k_{l} ] \), and \( q \) secretly and sends \( \varvec{A}_{{\varvec{l} \times \varvec{n}}}^{\varvec{*}} \) to the responder;

  • After receiving \( \varvec{A}_{{\varvec{l} \times \varvec{n}}}^{\varvec{*}} \), the responder computes \( \varvec{D}_{{\varvec{l} \times \varvec{l}}} = (d_{ij} ) _{l \times l} \) according to Algorithm 2 and sends \( \varvec{D}_{{\varvec{l} \times \varvec{l}}} \) to the initiator;

  • The initiator operates the following steps: \( \varvec{T}_{{\varvec{l} \times \varvec{l}}} = \left( {t_{ij} } \right)_{l \times l} = \left( {d_{ij} + k_{i} } \right) mod\;q \). It is shown that the above constructed equation \( \varvec{T}_{{\varvec{l} \times \varvec{l}}} = \varvec{A}_{{\varvec{l} \times \varvec{n}}} \times \varvec{B}_{{\varvec{l} \times \varvec{n}}}^{\varvec{T}} \). Moreover, let \( \varvec{T}_{{\varvec{l} \times \varvec{l}}}^{\varvec{*}} = \left( {t_{ij}^{*} } \right)_{l \times l} \), and \( t_{ij}^{*} = \frac{{t_{ij} - (t_{ij} mod\;p^{2} )}}{{p^{2} }} \);

  • The initiator considers the corresponding weights and computes:

    $$ H_{l \times l} = W_{l \times l} \cdot *T_{l \times l}^{ *} = \left( {\begin{array}{*{20}c} {w_{11} \cdot t_{11}^{ *} } & \cdots & {w_{1l} \cdot t_{1l}^{ *} } \\ \vdots & \cdots & \vdots \\ {w_{l1} \cdot t_{l1}^{ *} } & \cdots & {w_{ll} \cdot t_{ll}^{ *} } \\ \end{array} } \right) $$
    (3)
  • in which the operator \( \cdot * \) denote multiplying the corresponding elements of two matrices \( \varvec{W}_{{\varvec{l} \times \varvec{l}}} \) and \( \varvec{T}_{{\varvec{l} \times \varvec{l}}}^{\varvec{*}} \) to obtain the matrix \( \varvec{H}_{{\varvec{l} \times \varvec{l}}} \).

  • The initiator calculates the matching value \( \tau = \mathop \sum \limits_{i = 1}^{l} \mathop \sum \limits_{j = 1}^{l} h_{ij} \), which equals the value \( \overrightarrow {\varvec{u}}_{A} \cdot \overrightarrow {\varvec{u}}_{B} \), then the cosine similarity between two interacting individuals can be obtained.

The Algorithms 1 and 2 used in the LIP3 operation procedures are given as follows.

3.2 Enhanced LIP3 Satisfying Level-II Privacy

Note that the above procedures can only satisfies the privacy level I. In order to resist the malicious users to achieve the Level-II privacy, instead of directly sending the matrix \( \varvec{D}_{l \times l} \) to initiator, the responder can send the scalar value \( \sigma = \mathop \sum \limits_{i = 1}^{l} \mathop \sum \limits_{j = 1}^{l} t_{ij} \) to initiator, in which \( \left( {t_{ij} } \right)_{l \times l} = D_{l \times l} \cdot *W_{l \times l} \) is calculated based on Eq. (2). And then, on receiving the message \( \sigma \), the initiator decrypts the matching value \( \tau \) via the following operators:

$$ \tau_{1} = \left( {\sigma + l\left( {\sum\nolimits_{i = 1}^{l} {k_{i} } } \right)} \right) mod\;q;\;\tau = \overrightarrow {\varvec{u}}_{A} \cdot \overrightarrow {\varvec{u}}_{B} = \frac{{\tau_{1} - (\tau_{1} mod\;p^{2} )}}{{p^{2} }} $$

Then the cosine similarity between Alice and Bob is following:

$$ similarity\left( {A, B} \right) = \frac{\tau }{{\left\| {\overrightarrow {\varvec{u}}_{A} } \right\| \cdot \left\| {\overrightarrow {\varvec{u}}_{B} } \right\|}} $$

3.3 The Advantage of LIP3 over EWPM

We use a simple example to verify the correctness of our scheme. We assume three users Alice, Bob and Charles are within the communication range. The number of attributes n, is 3, and the maximal attribute value l, is 2. Suppose Alice is the initiator, with profile \( \overrightarrow {\varvec{u}}_{A} = (1,1,2) \), translate to matrix is \( \varvec{A}_{2 \times 3} = \left( {\begin{array}{*{20}c} 1 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} } \right) \), Bob and Charles are the responders and the profiles of Bob and Charles are \( \overrightarrow {\varvec{u}}_{B} = (1,1,1) \), matrix \( \varvec{B}_{2 \times 3} = \left( {\begin{array}{*{20}c} 1 & 1 & 1 \\ 0 & 0 & 0 \\ \end{array} } \right) \), \( \overrightarrow {\varvec{u}}_{c} = (1,2,1) \) matrix \( \varvec{C}_{2 \times 3} = \left( {\begin{array}{*{20}c} 1 & 0 & 1 \\ 0 & 1 & 0 \\ \end{array} } \right) \), respectively. Since the calculation process between Alice and Bob is similar to that of Alice and Charles, we just describe the process between Alice and Bob in detail, and give the matching value between Alice and Charles directly. Similarly as [8], we can get: \( \varvec{T}_{2 \times 2}^{ *} = \left( {\begin{array}{*{20}c} 2 & 0 \\ 1 & 0 \\ \end{array} } \right) \) which numerically equals the result as \( \varvec{A}_{2 \times 3} \times \varvec{B}_{2 \times 3}^{T} \).

Then, according to Eq. (2), we obtain \( \varvec{W}_{2 \times 2} = \left( {\begin{array}{*{20}c} 1 & 2 \\ 2 & 4 \\ \end{array} } \right) \), then \( \varvec{H}_{l \times l} = \varvec{W}_{2 \times 2} \cdot *\varvec{T}_{2 \times 2}^{ *} = \left( {\begin{array}{*{20}c} 2 & 0 \\ 2 & 0 \\ \end{array} } \right) \) ; \( \tau_{AB} = \mathop \sum \limits_{i = 1}^{l} \mathop \sum \limits_{j = 1}^{l} h_{ij} = 4 \).

Note that, interestingly, the term \( \tau \) equals the value of \( \overrightarrow {\varvec{u}}_{A} \cdot \overrightarrow {\varvec{u}}_{B} \). Therefore, the similarity value between Alice and Bob is: \( similarity (A, B) = \frac{{\tau_{AB} }}{{\left\| {\overrightarrow {\varvec{u}}_{A} } \right\| \cdot \left\| {\overrightarrow {\varvec{u}}_{B} } \right\|}} = \frac{4}{\sqrt 3 \times \sqrt 6 } = 0.943 \).

Similarly, we can get the value \( \tau_{AC} = 5 \), and the similarity value between Alice and Charles is: \( similarity (A, C) = \frac{{\tau_{AC} }}{{\left\| {\overrightarrow {\varvec{u}}_{A} } \right\| \cdot \left\| {\overrightarrow {\varvec{u}}_{C} } \right\|}} = = \frac{5}{\sqrt 6 \times \sqrt 6 } = 0.833 \).

Obviously, For initiator Alice, Bob is the better matching person than Charles.

However, using the protocol EWPM proposed in [8], We can only obtain \( S_{AB} = 3 \) (the matching value between Alice and Bob), and \( S_{AC} = 3 \) (the matching value between Alice and Charles). Those values neither have strict semantic meaning, nor distinguish whether Alice is more matching with Bob or Charles. Thus, LIP3 is obviously advantage over EWPM in terms of matching accuracy (measured with profile similarity).

4 Preliminary Performance Analysis

4.1 Security Analysis

① Schematic Proof of Privacy Level I.

Depending on secure property of the confused matrix transformation, the correctness of the LIP3 is straightforward. However, in level I privacy, through \( \varvec{D}_{{\varvec{l} \times \varvec{l}}} \), the initiator can obtain the \( \varvec{T}_{{\varvec{l} \times \varvec{l}}} \) that numerically equals \( \varvec{A}_{{\varvec{l} \times \varvec{n}}} \times \varvec{B}_{{\varvec{l} \times \varvec{n}}}^{\varvec{T}} \), and then it is possible for initiator to infer the responder’s profile matrix \( \varvec{B} \). However, as they are both HBC users, the initiator will not monitor the matching process and decrypt the intermediate results get the original results of \( \varvec{A}_{{\varvec{l} \times \varvec{n}}} \times \varvec{B}_{{\varvec{l} \times \varvec{n}}}^{\varvec{T}} \), so she learns nothing about the responder other than the matching value. The privacy of the responder can be protected too.

② Schematic Proof of Privacy Level II.

The key point of proving the privacy level II of LIP3 lies in that: In level II, the responder only sends \( \sigma \) instead of \( \varvec{D}_{{\varvec{l} \times \varvec{l}}} \) to the initiator. Even the initiator Alice has \( \overrightarrow {\varvec{k}} \) to get the original data, she has no way to learn the computation process. While the responder Bob knows the process, but he cannot obtain the \( \overrightarrow {\varvec{k}} \). In this way, the users’ privacy is protected from the internal attackers.

4.2 Complexity Analysis

Similar as EWPM [8], we can also use the offline, online computation cost as well as the communication overhead to measure the complexity of the proposed scheme LIP3. The computation cost is evaluated using the number of the multiplication and exponentiation operations, since these operations are always resource-consuming in mobile devices. The communication overhead is evaluated by counting the transmitting and receiving bits.

In our paper, h represents the hash function SHA-256, \( exp_{1} \) means 1024-bit exponentiation operation, \( exp_{2} \) means 2048-bit exponentiation operation, \( add \) indicates modular addition, and \( mul_{1} \) and \( mul_{2} \) mean 1024-bit and 2048-bit multiplication operation, respectively.

Assume that each user’s interest profile has \( n \) attributes, and the highest attribute value is \( l \). Table 1 gives the corresponding complexities in the existing Fine-grained [4] scheme, EWPM [8], and our proposal LIP3. Note that LIP3 uses similar matching method as EWPM, so we compare the complexities of both Level-I and Level-II in those two schemes.

Table 1. Complexity comparison among LIP3, EWPM and fine-grained privacy-preserving profile matching schemes

From Table 1, we can observe that, similar as EWPM, compared with Fine-grained scheme, LIP3 reduces computation and communication costs significantly. Specifically, in comparison EWPM, our scheme LIP3 only brings additional computation of the modules of the initiator’s and responder’s profile vectors, and additional transmission of a scalar value, which are all constant operations, independent of the parameters used in LIP3, e.g., the number of attributes \( n \), and the maximal attribute value \( l \). Those trivial additional overhead can be totally negligible.

5 Conclusion

In this paper, we propose an effective and secure CMT based privacy-preserving profile matching scheme for D2D based MSNP, LIP3, which can infer the accurate cosine similarity between two users by considering both the number of the common interests and the corresponding weights. In comparison with the existing CMT schemes (e.g., EWPM), LIP 3 can provide strict and accurate profile matching value, i.e., cosine similarity result, among individuals, without incurring extra computation and communication overhead. Therefore, LIP3 is suitable to be implemented by resource-constrained mobile devices, especially for various MSNP applications.