Keywords

1 Introduction

To capture the interactions between the user, OSN, and TPA, we have chosen the Facebook platform [1] for observation. When the users access a TPA for the first time, the users are presented with a list of attributes to be shared with the TPA. The users are allowed to access the TPA only if they agree to share the attributes. Otherwise, the service is restricted. Though, such data could be used constructively for sharing users experience and use that knowledge to improve the experience of other users availing the same service; the users are being exposed to more and more privacy threats in such situations. The very act of sharing data is not considered as a breach of privacy. The privacy breach occurs whenever the contextual integrity is wrecked, in other words whenever the information not intended to be disclosed in that particular context gets exposed.

Risk Analysis. The risk involved in sharing attributes may lead to attacks [3, 5, 10, 13] like privacy violations, de-anonymization, fake profile creation, information leakage attacks, cyber bullying, identity theft, etc., Women and children are even more vulnerable to these kinds of attacks. [2, 15, 17]. In an online survey conducted by BullyingUK, it is recorded that 87% of teens of age between 11 and 16, who reported cyber abuse said they were targeted on Facebook, and 20% blamed Twitter [14]. There were reported incidents where social media applications violated the accepted terms of service [9, 11]. Consider a scenario where the user shares their attributes (sensitive or insensitive) to many applications from the same application developer. The application developer has access to all the information about a user from all the applications developed by them. There is a possibility to correlate that information and other information available through online resources, to infer the information that was not intended to be shared, thus violating the contextual integrity.

Motivation. Applying access control does not solve the current issue of sharing user attributes. Restricting sharing of a required attribute to a TPA, restricts the user accessing the service. Hence a solution is needed to enable the user to share all the required variables in a privacy-preserving way. PPI applies data perturbation techniques like standard differential privacy and randomization to transform the data. As the Privacy Preserving Interceptor resides in between OSN and TPA, users will benefit by using more number of TPA without much privacy concern and on the other hand, OSN and TPA will have an increased number of active users participating. Considering all these benefits, there should be some mutual consensus set up between OSN, PPI, and TPA for such interactions to be feasible. The rest of the paper is organized as follows. Section 2 presents the design of privacy preserving interceptor. Section 3 presents a survey of the existing work in the literature and Sect. 4 presents conclusion and future extensions of the paper.

2 Privacy Preserving Interceptor

2.1 Overview

The goal of the proposed mechanism is to preserve the privacy of the user and enable users to access multiple TPA without much compromise to utility. PPI applies Standard Differential Privacy concepts, Where the disclosure risk of the user remains the same as before sharing. Utility level of the data is managed as we share all the required attributes as perturbed variables. The perturbed variable resembles the original variable statistically to provide the necessary level of utility. The degree of perturbation depends on the user’s privacy requirement. Hence, our design provides a customized solution to the user that strikes a balance between privacy and utility matching the user’s need.

The overall interaction between the components OSN, TPA and PPI is shown in the Fig. 1. The users request to OSN is always intercepted by PPI, it identifies the attributes and applies perturbation matching the user’s level of privacy. The user’s level of privacy is captured based on the user’s sharing behavior captured via PPI. Initially, for a new user the privacy level is measured based on a short survey conducted when the user starts to use the application. The PPI eventually learns about the user’s actual sharing level based on the attributes shared by users.

Fig. 1.
figure 1

OSN, PPI, TPA interaction pattern

2.2 Design of PPI

Differential Privacy. A randomized function f gives \( \in \) differential privacy if for all data sets \(D_1\) and \(D_2\) differing on a single user and all S \(\subseteq \) Range(f).

$$\begin{aligned} Pr(f(D_1) \in S)\le e ^{\in } \quad Pr(f(D_2) \in S) \end{aligned}$$
(1)

where \(f(D_1)\) is function applied to data set including the user attributes and \(f(D_2)\) is function applied to data set excluding the user attributes and \( \in \) gives the required privacy level for all user attributes in the set S. The value of \({\in }\) is chosen based on the factors deciding the privacy requirement like user’s privacy choice and attributes sensitivity.

Laplacian noise as in Eq. 2 is generated with chosen \(\in \) and added to original user attributes to generate the perturbed user attributes.

$$\begin{aligned} \delta _{noise}=\frac{1}{2b} \quad e ^{\frac{-(x-\mu )}{b}} \end{aligned}$$
(2)

where \(\mu \) is the mean of the noise signal and b represents the spread of the noise. The spread parameter is based on the global sensitivity parameter \(\varDelta F(x) \) and the privacy parameter \(\in \) as shown in Eq. 3.

$$\begin{aligned} b= \frac{\varDelta F(x)}{\in } \end{aligned}$$
(3)

The global sensitivity is the difference between the maximum value and the minimum value that could be assigned to an attribute as in Eq. 4.

$$\begin{aligned} \varDelta F(x)= max -min \end{aligned}$$
(4)

The attribute requested by TPA is replaced with modified value \(Attr_{perturbed}\) as in Eq. 5 arrived by adding the laplace noise \( \delta _{noise} \) to original attribute (Fig. 2).

$$\begin{aligned} Attr_{perturbed}=Attr_{original}+\delta _{noise} \end{aligned}$$
(5)
Fig. 2.
figure 2

PPI block diagram

Choosing the Privacy Parameter. Individual’s choice of privacy varies based on multiple constraints like their geographic location, community, age, gender, etc. Therefore, there cannot be a single fixed level of privacy for all users. To address this issue, the module User’s privacy level estimator updates the sharing level of the user based on the user’s sharing behavior. Initially, a survey was conducted from users to capture their desired sharing level. The desired sharing level varies with the actual sharing level based on the context in which it is shared. So, the share score estimator calculates the actual sharing level of the user from the attributes shared in OSN. The privacy level of the user is measured on a scale of 3 based on the survey questions and the attributes shared by the user in OSN. The attribute table shown in Fig. 3 is used to calculate the share score as in Eq. 6.

$$\begin{aligned} Share score(actual)=\frac{{w}_{1}\varSigma at{t}_{1}+{w}_{2}\varSigma at{t}_{2}+...+ {w}_{n}\varSigma at{t}_{n}}{n*m} \end{aligned}$$
(6)

where \(w_{1},w_{2}...w_{n}\) are the weights assigned to the attributes, \(\varSigma att_{1},\varSigma att_{2}...\varSigma att_{n}\) are attributes, n is the number of attributes and m is the number of applications. There may be variations between actual and desired share score. So, an average of actual and desired recorded from survey questions is used to measure the Share score. The attribute sensitivity estimator finds the sensitivity of data. The sensitivity of the data attributes is measured by finding how frequently the attribute has been shared (the most sensitive attribute is the least shared attribute) and the default classification given by Facebook. The attribute accessed by users are stored in structure as shown in Fig. 3.

Fig. 3.
figure 3

Attribute table

Value ‘1’ stored in the table denotes the attribute being shared. Facebook’s classification is shown in Fig. 4, the attribute is assigned weight factor 1 for basic profile elements, weight factor 2 for extended profile elements and a weight factor of 3 for extended permissions. Let N be the number of applications accessed by the user. The sensitivity value calculation for attributes is done as in Eq. 7.

Fig. 4.
figure 4

Attribute classification

$$\begin{aligned} sensitivity score_{att}=\frac{\sum _{i=1}^{N} app_{i}* wt_{classification}}{\sum _{i=1}^{N} app_{i}} \end{aligned}$$
(7)

finally privacy level \(\in \) is calculated as in Eq. 8.

$$\begin{aligned} \in =Sensitivityscore_{att}+Sharescore \end{aligned}$$
(8)

Replacement with Random Values. Certain attributes like name, email id when shared with TPA, the chance is high that it could be correlated with auxiliary information revealing much more information, than currently available. Instead of adding noise to it, PPI uses the approach of distorting the original values by replacing with random attributes. Based on the user’s sharing choice the attributes are either shared or replaced with random values picked from the database.

2.3 Discussion

The optional attributes can be opted not to share and the required attributes are perturbed thereby balancing the privacy and utility requirement of the user. Let us consider a case for App - Livestream, the attributes requested are public profile elements (name, age, gender, picture), email. Sensitivity value calculated for public profile elements is 1, email is 2. privacy level \(\in \) is calculated as 1.22 and laplacian noise (0, 8.19) i.e., 0 mean and spread of 8.19 is added to original value. Assuming data (‘AAA’,23,‘f’,pic1.jpg,xyz@gmail.com) is transformed as data’ (‘XYZ’,28,‘f’,pic1.jpg,xyz@gmail.com). The user can also opt to change the profile photo and email to be replaced with some random elements. PPI provides an attempt to perturb the required attributes. The mechanism could be applied to birthdate, location parameter, timezone, age etc., and random replacement has been applied for all other attributes. In our future extention we wish to apply other techniques to identify the similarity and replace the attributes like likes, action.music, action.books etc., with similar items to preserve the utility. The trust worthiness of the TPA could also be included as a parameter in deciding how much to disclose. Privacy Level Vs Perturbation % is plotted in Fig. 5. value of \(\in \,=1\) provides the highest perturbation level with 23% noise and value of \(\in \,=6\) provides the lowest perturbation with 4.5% noise.

Fig. 5.
figure 5

Privacy level vs perturbation %

3 Related Work-Comparison

There has been considerable work done to resolve the privacy issues in OSN-TPA scenario. To understand the seriousness of the privacy risk involved in sharing user attributes with TPA Chaabane et al. [6] conducted an experiment to study the interaction between OSN applications and other external parties. The research revealed shocking results that the Facebook and RenRen applications interacted with hundreds of different fourth-party tracking entities. A similar study was conducted by Aldhafferi et al. [3] showed that the Personal data collected through TPA could be used for data matching to reveal sensitive information posing serious risks to privacy. [12] Kong et al. proposed a framework that utilizes the structure feature learning model to capture the relations among the permission requests and its textual descriptions and functionalities. The work provides insights for users to be aware of potential risks of permission requests.

Defining the privacy setting in the right manner is the most important and cumbersome task for a naive user. Hence Anthonysamy et al. [4] have proposed CPM framework that helps the users to gain control over their data shared with TPAs by utilizing the social construct of friends to identify the best configurations and make use of the same.

Cheng et al. [7] have proposed a framework based on access control mechanism, wherein the applications are split into an internal and external component. Allowing the internal components to access private information but restricting the access to external parties. Implementing such a solution may help to limit the optional attributes, whereas limiting the required attributes leads to the user being denied to access the service. Another framework based on PBAC (Permission-Based Access Control) proposed by [16] Tomy et al., has been designed to give users complete control over their data and to decide on the information disclosure with third party applications. The work aims to provide the necessary awareness for the user in understanding the privacy risk involved in sharing sensitive data.

An approach similar to ours proposed by Egele et al. [8] have designed a browser plugin to intercept the data flow between OSN and TPA and through control mechanisms users can protect their profile data from malicious applications. The control mechanism of restricting the data once again stops the user from accessing the service. Whereas our approach does not limit the attribute, rather it grants access to the required attributes by generalizing them.

4 Conclusion

Preserving privacy in OSN is a challenging task. The very business model of OSN is based on analyzing the data shared by the user and using it as a monetizing fuel to run the business. The data is not only used within OSN but also the complexity increases by sharing it with third party application services to provide an extended service set to the users. Once the data is passed to TPA, the user’s control over the data is lost, and they could even give the data to advertising agencies or data aggregating companies. The user’s privacy threat spectrum widens with the inclusion of TPA. In this paper, we have collected information about the required attributes requested by the application. We have computed the privacy level of individual users and sensitivity of attributes to define the privacy parameter \(\in \). Our proposed system Privacy Preserving Interceptor (PPI), intercepts the user’s request and forwards the perturbed data to TPA. PPI perturbs the sensitive attribute by adding Laplacian noise or by replacing with random values. In our future extension, we wish to model the user’s privacy level to include other features like the age of the user, gender, geographic location, cultural background. The attribute sharing is contextual. So, the TPA’s trustworthiness could also be included as a parameter to decide the degree of disclosure. Improvements to perturbation techniques are being explored for certain attributes in which rather than replacing with random values, we are considering replacement with similar values to provide better utility.