Keywords

1 Introduction

Privacy expectations may be influenced by the users’ sharing activity with the Online Social Network (OSN) audiences; by each OSN user’s privacy preferences; and by the terms of and agreements with the OSN provider. The challenge of sustaining high privacy levels in OSNs is of great importance in an era when data oversharing has exploded. Privacy risk scores daily increase due to the fact that OSN users are publishing willingly their Personally Identifiable Information (PII) treasure, although in most cases they fail to use the privacy features in a manner consistent with their intentions. This is not only attributed to the providers’ neglect of designing usable privacy setting interfaces, but also to back-end privacy breaches and vague privacy policy guidelines [1].

The aim of this paper is to propose a simple and easy-to-use method for identifying and analyzing privacy risks in OSNs and then to suggest corrective action plans for them. Privacy risk is defined as the “potential loss of control over personal information” [2]. In this study, privacy risk is defined as the potential for PII exposure when the privacy levels do not meet or exceed the agreed audience visibility that could result in embarrassment of the data subject. We focus on the privacy risks incurred when a user leaves the default privacy settings unchanged.

The method we used in this study was based on the recommendations of the ISO 31000:2009 standard for Risk Management [3]. The proposed Privacy Risk Assessment Method involves the identification of the assets, and the assessment of likelihood and impact of a privacy violation incident. To complete the picture, the risk mitigation strategy that should be followed for treating the identified risks is also provided [4]. The method is described herein as it applies to the case of the most popular OSN, namely Facebook; however, its application to other OSNs is more or less straightforward.

To this end, the sharing data are categorized and classified in a common base data model, according to their potential for privacy invasion when sharing them in OSNs. Furthermore, with a view towards creating a common risk register for OSN privacy, we considered the ten most popular Facebook actions [5] and we identified the accordant privacy risks in case of an incident. Then, a visualized risk scoring matrix was designed, aiming to provide awareness to the data subjects that willingly share their PII treasure via their daily social networking interactions.

The remainder of this paper is structured as follows: the related work is presented in Sect. 2. In Sect. 3 we introduce a data classification model to be used for identifying and classifying critical PII. Section 4 describes the proposed method for assessing privacy risks and its application to the case of Facebook. Section 5 summarizes our conclusions and outlines directions for future work.

2 Related Work

The types of risks vary depending on the nature of affected assets; as different types of information are published in online communities, it would be very useful to classify all this data. In [6] a general taxonomy for social networking data is presented. Then, based on the nature of the data and the reason why this is shared, a different approach is investigated in [7]; herein, a classification of different types of OSNs and different data contained in OSNs is provided. A Privacy Framework that classifies user’s data, associates privacy concerns with data, classifies viewers of data, determines privacy levels and defines tracking levels is presented in [8]; the cases of Myspace, Facebook and LinkedIn are examined. Beyond the new taxonomy of data types that is proposed in [9], a metric to assess their privacy relevance is developed for the topmost leading social networks namely Facebook, Google+, Twitter, LinkedIn and Instagram. Serious concerns about which types of PII is processed highlight the need of creating customized groups for this content; in this study, the data taxonomy we recommend is an extension of the classification that was presented in [10].

Privacy grading systems aim to provide detailed information for enhancing users’ awareness [11]. A framework to compute a privacy score of an OSN user is proposed in [12]. Several studies over the relationship between the social network graph topology and the achievable privacy in OSNs were presented in [13] but much works has still to be done. In [14] a new model and a privacy scoring formula are proposed that aim to calculate the amount of PII that may be exposed to Facebook Apps. Moreover, a useful tool in order to detect and report unintended information loss in OSNs that also quantifies the privacy risk attributed to friend relationships in Facebook was presented in [15]. In [16], an innovative suggestion inserted the Privacy Index and the Privacy Quotient to measure a user’s privacy exposure in an OSN and to measure the privacy of the user’s profile.

A quantitative analysis approach is necessary in order to assess privacy impact for OSNs and that is what was provided in [17]. However, the risk identification seems not to be enough to avoid possible emerging threats that may be concealed in OSN interactions. Privacy policy visualization tools have been proposed presenting the users’ privacy issue in a manner more comprehensible that this has used before [6,7,8,9] but they have been proved insufficient as they do not cover every aspect of privacy in online social networks. Based on the predicates of a privacy policy model, a privacy policy visualization model for Facebook was presented in [18]; this aimed to help both the data providers and the data collectors to better understand the content of designed policies. Three different approaches that highlight the need for usable privacy and security field are presented in [19]. The effects of visualization on security and privacy were investigated in [20] and showed that visualization may influence positively the users who can better comprehend the applied safeguarding measures for their PII and make them trust the providers. However, due to the fact that the mobile applications are becoming more and more popular, the need for enhancing users’ awareness over possible privacy risks that may arise when installing applications in their mobile devices is apparent. A privacy meter that visualizes such risks through mobile applications on Android devices was presented in [21] and seems to make the privacy management easier for the users. Beyond other OSNs, Facebook aims just to make the users keep in mind to choose an audience before they share their content with it. Facebook checkup is a tool that was added in Facebook in 2014 and consists of an audience selector that enables users to review their privacy practices and settings [29]. Although privacy strength estimators seem to provide acceptable privacy guidelines by helping the users to keep their accounts protected [30], research on the OSN privacy more often than not remains at the level of identifying and analyzing privacy problems, rather than proposing solutions.

In a privacy evaluation framework, the most difficult procedure is to select the proper metrics in order to meet the objectives of the evaluation. A wide variety of privacy metrics has been proposed in the literature to rank the level of protection offered by OSNs. Although OSN users seem to personalize their privacy of the PII they are sharing via their OSN accounts, privacy assurance cannot be guaranteed in various cases [22], as the privacy control mechanisms seem not to reflect their real privacy intentions. Not only the lack of privacy risk awareness but also the misconception that the privacy controls they set are sufficient to prevent unauthorized access and misuse of their shared PII may cause serious privacy leakage.

3 Data Classification

Our proposed classification builds upon [10] and extends the taxonomy proposed by Årnes et al. for the personal data processed by Facebook; it also aims to amend deficiencies highlighted in existing taxonomies [6, 7, 9] that have been proposed for OSN data. First, an overall data analysis and review of the common data categories as they are defined in each OSN was performed. Then, a user-oriented approach was followed, by analyzing data that is used in the 10 most popular OSN activities.

Based on the deficiencies we identified in the aforementioned taxonomies, this classification was built around three core pillars:

  1. a.

    User-friendly terminology. Vague terms and inaccurate naming of classes may create wrong perception to the users regarding the content of each class. The ultimate goal of the proposed classification was the users to be familiar with class names used in this classification. For instance, OSN users can easily understand a term such as “meta tags” comparing to “metadata” used by Årnes, as the tag feature is one of the most popular functions of Facebook and prepares the users for the data types expected to be included in such a category.

  2. b.

    High granularity. Existing taxonomies present lack of granularity in data categories definition and this creates difficulties in verifying what data may be included in these categories.

  3. c.

    Completeness. Previous taxonomies do not cover all available OSN data types, such as missing data related to “Third Party Data” and “Communication Data”.

The data categories we defined are the following:

  • Registration Data is the data that a new user should give to an OSN provider in order to become a member of an OSN and use it.

  • Enriched Profile Data includes data types that are not mandatory to be complete for retaining an OSN user account but it is recommended the related data sets to be filled in for enhancing users’ OSN activities. This category contains a. text-based data, b. multimedia and c. data that is shared in OSN pages, e.g. contact information, familial information, education information, employment information, visual information etc.

  • Social Graph Data is the data that describes users’ social interaction or declares her ratings/interests such as connection relationship, public endorsement of pages, group membership etc.

  • Publish Sharing category includes all the data that an OSN user shares on her own pages or what other OSN users share on her pages; OSN user may have control over the content once she shares it or not;

  • Meta-Tags is the data that is added by others by using labels over sharing content and discloses which users are interlinked with this content. More specifically, this category includes status tagging, photo tagging, geo tagging, hashtagging [22].

  • Third Party Data includes the data types that are used when an OSN user enjoys the Facebook integrated third party services like applications, games etc.

  • Financial Data is any type of purchase data such as credit card information.

  • Connection Data is the data that shows the activities that are associated with users’ Facebook accounts.

  • Communication Data includes the data that is used to provide communication between two OSN users.

Table 1 depicts our proposed data classification for the case of Facebook. The second column aggregates all the data derived from Facebook data analysis; and the last column presents the data categories we classified, identifying them with a Data ID, from D1 to D9. Subcategories are also defined for the cases that sensitiveFootnote 1 information is included in the main categories; a prefix “S” derived from the term sensitive precedes the Data ID was used for these subcategories in order to be distinguished. Due to the fact that the D2 category also includes political and religious PII, we classified these in subcategory SD2.

Table 1. Proposed data classification for Facebook

It is worthwhile to mention that although the Name belongs to the contact information category, in this study we will not include it in this area. The reason why this content item is excluded from the scoring matrix is due to the searchability feature that is incorporated in each OSN that makes the name public by default. The provision of their name is the critical element when an individual decides to sign up in an OSN.

4 Assessment of Privacy Risks

As mentioned in Sect. 1, the method proposed in this study is based on the ISO 31000:2009 standard. The scope of this risk-based privacy review involves evaluating the privacy levels that are offered by default in OSNs in conjunction with the different types of data that are shared to theses online communities by examining popular OSN actions, starting from the creation of an OSN account until its deletion.

The approach follows four steps:

  • risk identification based on possible privacy issues.

  • assessment of impact.

  • assessment of likelihood

  • assessment of privacy-related risks

In this paper, we first identify the privacy related risks that are met often in OSNs; these are presented and analyzed in Sect. 4.1. Section 4.2 describes the impact and the likelihood assessment for this study. Impact evaluation was based on two factors, first, the nature of the data that is used in OSNs (see Table 4) and second, the type of incident that may occur (see Table 3) and we considered three levels, namely low, moderate and major; likelihood evaluation was based on the default visibility levels that are offered in Facebook (see Table 5) and we considered three levels, namely low, moderate and high. Last but not least, by applying the results of Sect. 4.2, we evaluated the privacy risks (see Table 6) that are presented in Sect. 4.3.

4.1 Risk Identification

As seen in Table 2, we identified the asset risks for the most common social networking activities and we considered the corresponding privacy issues that may arise during them. We assumed that when an OSN user performs a social networking action, at least one data category as defined in Table 1 is affected. For the D2 and D7 data categories that include sensitive data, we considered OSN activities including users’ sensitive information as affected assets. The privacy issues identified in association with each OSN action were taken as indicative examples. It should also be noted that this study is not an exhaustive analysis that aggregates the total number of risks that may be concealed in OSNs; it rather presents part of them and highlights the likely repercussions that privacy violations may have to users’ PII.

Table 2. Asset risk matrix

Despite the fact that OSN users may feel protected when they apply strict privacy settings, it is highly possible that privacy breaches may occur. As seen in Table 2, risks were identified in the most frequent OSN actions, from the stage of account creation as described in AR1 when examining the case that transparent policy terms and conditions are not provided to the user, until its deletion, as described in AR10 when examining the case of fake reports. In the latter case, in the case of Facebook, after clicking “Report”, two options are offered to the user, namely “Report content shared” or “Report this account.” Then the user is requested to explain the reasons why she reports an account and suggestions for resolving this problem appear in a new window. Based on the complaint the user described the options that are recommended include “Unfollow”, “Unfriend”, “Block”, and “Submit to Facebook for Review.” Until Facebook evaluates the related reports, the user’s PII remains visible in her profile.

Table 3. Impact levels

Missing legal grounds for the processing of users’ sensitive data create serious risks. According to AR2, the update of users’ political and religious opinions would be preferable to be avoided.

The public endorsement in OSN pages seems to help advertisers to satisfy their commercial goals. It is not a coincidence that targeted suggestions for pages with content similar to that an OSN user is reading on the internet are created, due to the keyword scanning mechanisms that are used by OSNs. AR3 focuses on the excessive and unauthorized access to the users’ PII.

In the case of Facebook, “Nearby friends” is a new feature that was added recently on the top of users’ friend list that provides information about users’ location and is available through the Facebook mobile application. However, the most popular feature for users’ location declaration remains the “check-in”. When we tried to combine different content types in a sharing post, we ended up that according to AR4, the most important risk is highlighted because the privacy limitations of the data are not satisfied for the whole dataset.

Even when one is not willing to share one’s PII to one’s OSN profile, data disclosure may be achieved from one’s friends OSN activities, and as a result private things about one may become visible to unknown audiences. In most cases, a user’s PII is discovered by content that is added by others, such as tagged data [22]. AR5 describes a similar case with unintentional PII disclosure due to a possible conflict during the implementation of different privacy options between the user who tags the content and the tagged user.

In order to play a game in Facebook, the users are requested to install the suggested app. Users’ PII is given to third parties through these applications. According to [23], when a user accepts the installation of such an app, she gives her permission for accessing her public information, her profile information; she accepts receiving related emails to the registered email address; she is granting access to her posts in News Feed, to her family and relationship information, to her photos and videos, to her friends’ information. AR6 focuses on the excessive collection of users’ PII and on the data economy that is not respected during the processing of PII.

When a user buys a page advertisement, the service providers collect and process her PII for the financial transaction. User’s payment information includes her credit/debit card number, expiration date, security code, authentication data, billing, shipping and contact information [24]. AR7, the risk that arises in this case, is a possible breach of privacy during processing of sensitive data, assuming that the relevant operator is responsible for the data leakage.

AR8 describes a feature that provides users with direct and quick access to their accounts by giving them the choice of remembering the device they last logged in. However, the collection of identifiers creates risks, as anonymity is not ensured. As seen in AR8, users’ location data as well as their physical presence are disclosed.

Facebook messenger is now also available as an independent application for access via mobile devices. However, the content of private messages known as inboxes seems not to be invisible for all Facebook stakeholders. Data mining when links are contained to these messages is a usual phenomenon. As described in AR9, the unauthorized use and access to sensitive discussions creates threats for the PII protection when communications are monitored.

Facebook reporting helps the users to protect the social network from any digital threats that can harm the users’ privacy. The OSN Service Provider (OSNSP) team is available to face incidents and problems such as pornography, hate speech, threats, graphic violence, bullying and spam. After a report has been submitted to the OSNSP team, it is reviewed, and its conformance to the Facebook policies and statements is examined; then, it is decided whether the reported content will be removed or not. Furthermore, notices are received by the users who have shared these content types; actions after reporting include revocation of users’ ability; disablement of certain profile features for the user; cancelation of a Facebook account; reporting to law enforcement. Sometimes users’ sharing content is deleted due to a number of fake reports. AR10 describes such a case, and as a result PII loss is possible [25].

4.2 Impact and Likelihood Assessment

According to [26], the Privacy Risk equals the likelihood of an incident occurring times its impact.

The level of impact from a privacy-related incident e.g. privacy breaches due to data leakage is the magnitude of harm that results from the consequences of unauthorized disclosure modification, destruction, or loss of PII. The likelihood of a privacy-related incident’s occurrence is a risk factor that is estimated by analyzing the probability that a given threat is capable of exploiting a given vulnerability.

The nature of the data was used for assessing the impact and the current default privacy controls were used for assessing the likelihood; the stricter the default privacy levels are, the lower the likelihood of PII disclosure is and the more accessible the PII is to bigger audiences, the more possible the PII disclosure is.

For the purpose of our approach, we consider four levels of impact, namely Low, Moderate, Major and Critical, as shown in Table 3 below. As stated above, the level of impact is correlated to the repercussions that a possible data breach may have to the PII of the data subjects involved. For instance, loss of tangible PII assets, be they sensitive PII or not is considered to have major impact for the privacy risks we have defined in our study, as there is no possibility of individual access and participation of data subjects to recover their PII. However, in case of unauthorized access and disclosure of PII, data subjects may apply stronger privacy restrictions, in order to mitigate such a type of risk; as a result the impact will be assessed as low.

As seen in Table 4, a classification of the data categories we defined in Table 1 according to their nature is presented. Whether this PII is sensitive or not was based on the recommendation presented in [27].

Table 4. PII asset categorization

The assessment of likelihood is based on the default visibility levels namely “Public”, “Friends of friends”, “Friends”, “Only me” that was considered per data category. Three levels are defined, namely Low, Moderate and High. High likelihood means that many people are allowed to see the OSN sharing content; this corresponds to the “public” option (Table 5).

Table 5. Likelihood levels

Having assessed the impact in case of a data breach and the likelihood of this happening, the evaluation of risk is possible; the assumption of the worst-case scenario with highest possible impact on the data subjects was made.

4.3 Risk Assessment

By applying the results of Sect. 4.2, the following Table 6 results:

Table 6. Privacy risks in Facebook

For the Asset Value, we used “P” for Personal information and “S” for Sensitive information. For the visibility level, we considered the four choices that are offered by Facebook. “Only me” (M) is the strictest privacy level that declares that no one can see the sharing PII apart from the content owner. The “Public” (P) choice corresponds to the minimum privacy level and declares that the sharing content is visible to everyone. The intermediate levels are the “Friends” (F) and “Friends of friends” (FoF) privacy choices.

As seen in Table 6, two levels were considered for the default visibility level of D1 because a combination of PII is used for the users’ registration. More specifically, while the majority of the information that should be provided for the registration process is visible to users’ friends, the display name and the registration email address are visible to everyone on Facebook. Furthermore, on May 2014, Facebook changed the default post privacy setting from “Public” to “Friends”.

The last column in Table 6 shows the risk level of the privacy-related incident scenarios. The risk is measured on a three-value qualitative scale and is calculated according to the rules shown in Fig. 1. The meaning of each value of risk is explained in Table 7.

Fig. 1.
figure 1

Privacy risk matrix

Table 7. Risk levels

4.4 Risk Management

The majority of the risk assessment results show that the privacy levels need improvements; thus, risk management activities should be undertaken in order to address the identified privacy risks. Risk treatment options include modification, retention, avoidance and sharing.

As seen in Table 8, we recommend a risk treatment action for asset risks we identified in Table 2. For the asset risks’ treatment, the active participation of the user is necessary.

Table 8. Risk treatment

Risk modification is the risk mitigation that can be achieved by reducing either the incident’s likelihood or its impact. In this study, due to the fact that risk impact was defined based on the PII sensitivity, it is not possible to reduce impact; thus, we selected to reduce the risk likelihood by amending the visibility level.

The majority of the default settings in Facebook tends to make the users’ content visible to the public audience [28]. Thus, as seen in Table 9, when a user customizes her privacy settings by selecting limited audiences, she can decrease the privacy score without letting her PII exposed to everyone. This grading component is a dynamic field as it can easily change by the user’s initiative. Figure 2 depicts the new risk scores as decreased after customizing the visibility level.

Table 9. Mitigated risks in Facebook
Fig. 2.
figure 2

Asset risk visualization

Benchmarking of best practices also recommend monitoring of changes that would impact individuals’ privacy as well as monitoring of the effectiveness of implemented privacy controls.

Risk retention is the handling of risks that cannot be avoided. Additional controls should be implemented in case that risk levels are higher than those assessed, based on the risk acceptance criteria. The risk acceptance criteria include: a cost-benefit analysis that compares the estimated benefit with the estimated risk; different risk management techniques for the different risk levels; provision for future additional treatment, when it is necessary. When the level of risk does not meet the risk acceptance criteria, the establishment of a retention program that could cover possible privacy gaps through corrective action plans is required. The main goal of this treatment option is the maintenance of consistency with retention practices. As seen in Table 8, when we cannot avoid a risk, guidelines that make the stakeholders aware of the best practice techniques for the retained risks are recommended.

Risk avoidance means elimination of risk. This can be achieved in two different ways; either the likelihood or the impact of the risk to be set to zero. As seen in Table 8, we recommend no sharing of sensitive PII in Facebook or the implementation of “Only me” visibility option for such content types.

When sharing the risk with another party or parties, insurance arrangements are performed; insurance partners are used in order to spread responsibility and liability. A risk can be shared either in whole or partially.

5 Conclusions

The main purpose of this study was to show that when OSN accounts are managed fairly, efficiently and effectively, the risk of privacy breaches may be managed. The research objectives were to assess whether the current framework of privacy controls offered for OSN PII management is appropriate and consistent with users’ needs. The results of this examination provide detailed and simple information that allows a non-technical or non-privacy aware person to understand how their PII and privacy might be invaded.

The need for enhancements in the current default privacy controls provided by Facebook became evident, as high and critical privacy risks were identified. In view of the reluctancy of the service provider to increase the default privacy levels, users should take the initiative to increase the privacy levels on their sharing PII by following the provided recommendations.

The method proposed in this paper can be applied to other OSNs as well. It is our intention to pursue this direction of research in the future, so as to develop a comprehensive understanding of the privacy risks of default privacy settings in all popular OSNs, to be subsequently used for recommending appropriate mitigation action to their users.