Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

USEMP is a multi-disciplinary research project, integrating the perspectives of lawyers, engineers, computer scientists, marketing experts and social scientists, aiming at developing a framework that will empower Online Social Network (OSN) users by enhancing their control over the data they distribute or interact with, with an eye on what can be inferred from the personal data shared in OSNs. At the core of this objective lies the idea of reducing the existing asymmetries of processing and control between OSNs organizations and citizens.

For that purpose, we will briefly indicate how transparency tools operate in the legal framework of Data Protection by Design (DPbD), notably for profile transparency. Additionally, we will present how an ecosystem - as envisaged in USEMP - can evolve in the future, identifying business opportunities and challenges that could arise when the user has the means to assert more meaningful control in a sustainable manner. In addition to the proposed considerations for a personal data value model in OSNs, the USEMP research team has also developed a scoring framework that can be used to collect and compute indicators for the information disclosure and value of shared personal data, which is also described in the last part of the paper.

2 The Legal Angle: Data Licensing and Profile Transparency

At the global level privacy and data protection is in turmoil, creating unprecedented uncertainty over the business ethics that drive the new economy. We believe that it is pivotal that the upcoming EU GDPR stabilizes the expectations within the internal market of the EU, making sure that citizens have a legitimate expectation of the consequences of sharing data. USEMP has developed two types of legal tools to support the data-driven ecosystem of OSNs, which underpins the previous– technical and economic– account.

2.1 Data Licensing Agreement (DLA)

EU data protection law requires that any processing of personal data is based on one of six legal grounds. Current business models are often based on consent and on the legitimate interest of the data controller (i.e. the party that determines the purpose of processing). These are not very reliable grounds: Consent can be withdrawn at any time and the legitimate interest of the data controller needs to be balanced against the fundamental rights and interests of the data subject. In both cases the legal relationship between user and service provider remains opaque and distant, usually determined by privacy policies and terms of service that are oriented to assumptions of US law. In the context of USEMP we have opted for a Data Licensing Agreement that requires the active participation of the user, by means of mutually obligatory agreement (quid quo pro) that determines what each party commits to deliver in the context of the economic value chain. The DLA thus licenses the use– and if applicable– the re-use of personal data for explicitly defined and specified purposes by specific data controllers under the conditions set forth in the DLA. The core obligation on the side of the user is to download and install the USEMP tools on her device and to license her data for the purpose of computing a set of scores related to disclosure dimensions that can help the user understand what can be inferred based on shared data. The core obligation on the side of the provider is to provide transparency tools that deliver user-friendly visualizations of the (perceived) sensitive information that can be inferred from the user’s data points. The DLA contains explicit consent for the processing of (legally) sensitive data (for the specified purpose only) and explicit consent for downloading the USEMP tools (or any other tracking mechanism).

The next version of the DLA will be modular, enabling a more granular type of licensing, comparable with the various types of Creative Commons Licenses for copyright protection. This should enable users to specify e.g. for which purposes their data may and may not be used, and/or re-used, by which parties and for what time.

2.2 Profile Transparency Tools

The current and the upcoming EU legislative framework of data protection requires that any automated decisions that have a significant impact on the user must comply with a number of conditions. Crucially, under the upcoming framework, three transparency requirements must be met for decisions based predominantly on automated profiling: the existence of such profiling, the logic of processing and the envisaged consequences (Hildebrandt 2012). The upcoming framework will also require the implementation of Data Protection by Design mechanisms that ensure that data controllers by default comply with their legal obligations (see [5]). This means that once state of the art mechanisms have been developed, their employment will become mandatory. USEMP is developing tools to provide profile transparency. By inferring disclosure scores based on the disclosure dimensions, USEMP is capable of indicating what kind of profiles people match, thus also indicating how they may be targeted by third parties (advertising, insurance companies, employers etc.). We hope that once USEMP-type platforms emerge as viable playgrounds for the testing of inferred user profiles, OSN providers will be forced to collaborate with them to increase the trust of their users and to improve their reputation. This may also contribute to compliance with their legal obligation to provide profile transparency. In the next section of the paper we discuss the results of research with respect to the effect of transparency in the current advertising value chain and we briefly discuss end-consumers perception of privacy.

3 Increasing Transparency of Use for Personal Data in OSNs

The research regarding transparency undertaken in the USEMP project has been based on conducted expert interviews with nine (9) diverse actors from the advertising ecosystem to get first hand insights; interviews covered roles from ad.networks, marketing companies, organizations related to the practices of advertising and social networks, all of which have a business interest in the utilization of users’ data (the interviews method and details are described in detail in [6], we have also added a summary in Table 1).

Table 1. Overview of USEMP conducted interviews with advertising/marketing industry stakeholders

We start with a description of the advertising and marketing ecosystem in terms of its use of personal data as it is today. The problems that users face in the value network underpinning personal data are central to the USEMP project: non-transparent re-use of personal data, often through unaccountable and untrustworthy third parties. This causes information asymmetries and power imbalances, disfavouring the user. Our research has shown that many actors on the industry side also see shortcomings in the status-quo, and we note that their interests are not per se contrary to those of users.

A concrete and objective value of personal data cannot be easily defined. All actors involved in the OSNs value model depend to some degree on some kind of personal data, but each of them on different types and for different ends and purposes. Thus the value of personal data from OSN users differs for each actor. This also relates to the fact that it makes no sense to reduce ‘value’ to either monetary value or ethical value, as both are simultaneously at stake [11]. We propose a mechanism that can increase transparency of personal data usage in the ecosystem [1]. This would not only contribute to a business ecosystem that is more respectful to the content creators such as social network users, while enabling innovation that complies with EU Data Protection law. It would also increase the value of (personal) data for each actor and thus for the ecosystem on the whole.

The low level of trust in the industry is a main barrier in reaching such a goal (see also [2]). Trust is not only from the OSN end-user towards the industry (rightfully so in many regards), but also between commercial actors in the value network. The reason for this is among others the non-transparency of data-related operations. Linked to non-transparency, the low quality of utilized data is a major issue, impeding the industry’s functioning, reducing efficiency and thus also profits. For the user, non-transparency creates not only information asymmetries, but also diminishes the user experience (for example, irrelevant advertising and longer page loading times from the effect of third party tracking).

Additionally, the USEMP team has performed a number of focus group interviews and related analysis to understand what the social requirements are for privacy enhancing tools. This social requirement analysis has highlighted the need for a personal data management platform that currently the USEMP team is building (see [16] for details of requirements and method of analysis).

3.1 Personal Data Business Model Evolution and Related Legal Aspects

Firstly, we will assess how the ecosystem will evolve, if the market is left to operate alone, without appropriate regulation or mechanisms as the ones suggested by USEMP.

The low quality and reliability of data is a real problem for the industry. In order to avoid being responsible and accountable for personal data issues, actors often outsource the collection and pseudo-anonymizationFootnote 1 of data to data brokers or similar companies. Subsequently OSN providers purchase end-user data they need for their operations from such third parties (advertising networks, data brokers or similar companies), in some cases pseudo-anonymized. This can be seen in a similar light as rights clearance in the field of intellectual property. After buying data in this fashion, the actor may reason that it is no longer personal data, or if it is, at least lay the blame on the third party. This is– obviously– incorrect, and partly caused by the fact that some of the big players believe they can afford to base their operations on US law, even when processing personal data of EU citizens.

As the interviews with industry experts have illustrated, the functioning of the market is strongly affected by major OSNs, mostly Google, Twitter, Facebook and their related competition (for example see [8]). These actors might be able to offer a remedy regarding data quality. They have full access to personal data collected through user profiles in their respective OSNs. In addition, they have access to additional behavioural data spanning over additional data points (like web sites that include OSN tracking code).

End-User Demand for Data Protection and the Effect to Smaller Companies.

In such an environment end-users might increasingly demand stronger data protection and prefer those actors and services that they trust and offer tools that respect their data protection rights. If the market is left alone to deal with that, the already strong position of the prominent actors today will allow them to fortify their position. In contrast to smaller companies, these actors have the necessary means at their disposal to invest and develop new tools and mechanisms. Additionally data protection and privacy tools would be controlled by those companies, which already misuse their access to user data today. This could be regarded in the light of feigned vendor relationship management, pretending that users have control (for examples see [3]). The advantage for OSN organizations increases even further because they can leverage user lock-in to ask for more personal data or further reaching users without losing end-users to competitors.

Our research and interviews with ecosystem actors indicates that the currently weak enforcement of data protection and privacy leads to adverse effects; it harms the European industry, weakening innovation, hurting SMEs and start-ups, while favouring big companies, especially from non-European countries that feel they can ignore the EU Data Protection legislationFootnote 2. Moreover evidence of this harm is now documented in industry reports that show the increased use of ad-blockers plugins from end-users and the corresponding revenue losses (see for example [15]).

Need for Personal Data Management Tools that Provide Awareness.

The concept of privacy by design, incorporating privacy from the first step of the design project, aims for a sustainable data life cycle management. Prior consent in the form of opt-in can be seen as a prime example of the failure of these types of solutions. Though consent supposedly enables the user to take all decisions over his/her data, the user basically has very little or no clue as to the consequences. One of the problems is that privacy is reduced to the ability to hide one’s data, whereas the real issues reside in the inferences that may be drawn from the data. Data Protection, moreover, requires that users become aware of the purpose of processing, while forcing providers to process only those data necessary (data minimisation). We therefore believe that consent is not the best way to engage the users. A more sustainable and meaningful way to ensure user participation is the use of a Data Licensing Agreement (DLA) that involves clearly demarcated mutual obligations based on a fair exchange.

Business Challenges for Personal Data Management Tools.

The implementation of current personal data management tools entails several challenges. Most significantly, the costs can be challenging for small companies. Efforts to ensure data protection, as demanded by legislation, are coupled with investments in respective technologies to support such tools. Cookie regulation sets an example in this regard, but also the right to be forgotten: the request to delete all data related to one user might be difficult to comply with for practical reasons rather than not being willing to do soFootnote 3.

Another decisive challenge regarding opt-in is that it cannot work as OSN providers, whose operations are mostly data-driven, are located between advertisers and publishers. The latter companies do not offer any kind of value proposition that is relevant directly to the user. As a result, they have no leverage that interests users to care sufficiently about their personal data. The most successful data-driven companies of today, on the other hand, have several advantages. As social networks, they are able to provide opt-ins simple through existing user lock-ins. The value proposition is, at least partly, that the only alternative for the user to being tracked is to stop using the serviceFootnote 4. Their scale furthermore allows them to create independent data management tools, owned and controlled by them.

Value Proposition for Personal Data Management Tools.

Thus, the value proposition of personal data management tools, i.e. the incentive to use them, is a central issue in this regard. First, it depends on the interests of the users not only in knowing what is happening with their data, but also in investing effort and time in actually controlling it. Second, even those groups in society that are concerned about their personal data do often lack the knowledge to assess their own “value”. Expectation management would thus be integral to a tool in question, as an individual’s data is worth much less to the industry than many would think (the value is in the connections and inferences, not in the individual data points). In addition, it needs to be clarified how much value a user already derives from using free services, as these are sponsored through the personal data market. If these two aspects are taken into account, a value proposition for a personal data management tool may become feasible.

Last but not least, such a tool could make clear that, if a user voluntarily provides certain data, for instance through tracking, a company would make money of it, while the user could benefit from free services and more personalized marketing, provided there is transparency with respect to which personal data are used and for what purpose. This could provide re-usable data for the benefit of several actors. However, we believe that such an exchange is only sustainable if users are informed about what they inadvertently disclose when interacting with their OSN.

3.2 USEMP: A Centralised Tool for Transparency

In the following paragraph we present how the USEMP tool for personal data management can become effective and interesting enough for users and businesses to utilize. The research conducted in USEMP illustrates (see [6]) that direct monetization of personal data by the data subject/OSN user is unrealistic, due to the low monetary value of an individual’s information. Moreover, the value that users derive from free services, such as Google, Facebook, Twitter, news etc., is already immense, yet, indirect. Instead of monetization, a tool that focuses on transparency in the first place (with the potential to be extended) seems to promise most potential, both for the user and the business. USEMP’s interdisciplinary use cases and scenarios highlight the importance of transparency and awareness (see [16]). They demonstrate the need to hand over a certain amount of control to users, notably over the definition of ‘privacy’ and ‘sensitiveness of data’ (even if this will not overrule mandatory law regarding the treatment of sensitive data as defined by law). Foremost, a tool should create knowledge and awareness about the personal data market, which in itself would already be a major step forward. In privacy literature this has been coined as relating to ‘institutional privacy’, as opposed to ‘social privacy’.

The latter concerns disclosure of personal information to one’s peers, the former concerns the capturing of (inferred) personal information by public and private service providers. Whereas users have developed intricate privacy strategies with regard to social privacy, they are hardly aware of their lack of institutional privacy (see for example [9]). Not only would the actual value of personal data become more clear, not only would it clarify the actual value of personal data more clear, it would also illustrate the implications of free online services. The development of comprehensive visualisations is certainly of main importance in this regard, as it improves the user experience and comprehensibility of these complex dynamics.

From the value network perspective, which we have adopted for this work, it is a central question where such a tool:

  • needs to be “inserted” in the value network, in order to be most effective,

  • which actor is most directly affected by the tool, and

  • how can this actor pass obligations and benefits to its partners in the value network.

Due to the strong connectedness of the ecosystem in question, the right location in the value network can affect the whole structure. Depending on where the tool is located, different advantages may also arise for different actors, and for users.

Options for Implementing a Transparency Tool in the Ecosystem.

The broad distinction we can make regarding the value network is between ‘in the middle’, ‘at the sides’, and at the front-end (user side).

Ad.networks and Technology Providers.

So far, much attention has gone to the actors in the middle part of the ecosystem depicted in Fig. 1. These, the publishers and app developers, are data-driven companies, and all user-related data might flow through their systems. However, they are only intermediaries, facilitators of the actions of others. Also these actors are obliged to implement consent mechanisms in their operations, due to the so-called cookie legislation (which has been twisted by the industry to force users’ hand, so in point of fact such consent has little meaning [5]. This is probably the least effective spot to realize effective data protection. Although the core of their business is data, and often pseudonymous data, they have no direct contact with the user. They depend also in this regard on the sides of the value network. Focusing on the actors in the middle is unfortunate, as they create value indirectly for the user, backing the free model by increasing efficiency and relevance of advertising and other content. Without the data, they cannot work, as targeting, delivering and evaluating all depend on it.

Fig. 1.
figure 1

The Value Network of Online Advertising and options to introduce transparency tools

Publishers and Advertisers.

The actors on the sides, i.e. publishers and advertisers (A and B respectively in Fig. 1), are arguably a better location to implement a personal data management tool. Indeed, they are legally obliged to implement data protection, including transparency tools. USEMP tools will thus mitigate their liability for violations by integrating empowering tools for DPbD. This is far more effective because they have direct contact with the user or customer. They also need a good reputation to stay attractive for the user. Furthermore, almost all other actors depend on the sides. Thus, they do not only have leverage over the users, but also over the actors in the value network. Due to the strong connections and dependencies, business-to-business pressure down the value chain should be utilized as a powerful accomplice in strengthening personal data protection rights. The user can only build a trust relationship with those actors he/she has direct contact with, if these actors provide reliable insights about their operations and how these may affect the user.

Benefits of a Centralized Tool.

The most significant outcome of all possible scenarios thus seems to be that a centralized mechanism on the side of the OSN providers promises the most desirable outcomes; this would provide a tool with overarching effects on the whole ecosystem. This is not least the case because:

  • the economic value of personal data is indirect, and through centralising it in a transparent way, it becomes clearer;

  • it can have benefits for both the user and the industry by increasing transparency that is missing most in the ecosystem;

  • most importantly, a centralised transparency tool could also promote smaller and European companies, which might otherwise not be able to invest in appropriate tools themselves;

  • centralisation that clearly complies with the legal framework of data protection, notably by providing DPbD and therefore backed by competent Data Protection Authorities, would be a good starting point for creating trust on the side of users;

  • finally, a tool where all data management related operations are centralised could create legal certainty and ensure accountability through transparency.

The Case for an Independent Platform.

We note, however, that it is not obvious that OSN providers are willing to provide the kind of transparency that is required. Below, we will explain that the emphasis is on profile transparency, which is part of mandatory data protection law. OSN providers may wish to keep their inferences behind the walls of trade-secret and IP rights, finding them to be a central part of their competitive advantage. At the same time, users may not be willing to trust OSN providers’ information about what they infer. They may prefer an independent platform to secure more impartial inferring tools. The USEMP tools aim to provide precisely such an independent platform.

Challenges for the Adoption of Centralized Platforms and USEMP Specifications.

The challenge for a centralized platform such as the one proposed by USEMP, is how this can be applied in free market conditions. There are two types of driving force that can help the adoption of such a platform: a) regulatory policies that require such tools to be implemented, and b) self-regulatory policies that can be enforced by the industry itself (see for example [17]). In addition to any regulation, the proposed solution acts also as a set of specifications that can be implemented by more than one platform by consortia of business actors that can see the benefits of introducing such solutions (for example to increase trustworthiness for the end-users).

The USEMP consortium plans to experiment in such a centralised solution where there will be contractual relationship with the user and the platform that will allow the sharing of user data on the basis of a granular license to use and/or reuse the data. This will, for instance, enable a one-time licence (or prohibition) of the processing of specific types of data, specified reuse, or particular third parties. Such granular licensing can be implemented clearly all over the value network. To render the consent that is involved in concluding a data licensing agreement (DLA) meaningful, it is important to develop indicators of personal data privacy and value to end users. In the following sections of the paper we describe the proposed USEMP framework for the collection of the necessary information and computation of such indicators.

4 USEMP Disclosure and Personal Value Indicators Framework

In order to develop tools that will increase the transparency of usage of personal data by advertisers and other third parties, it is important to develop personal data value and disclosure indicators that can be used to provide end-users meaningful insights. In the following paragraphs, we present a framework that has been developed as part of the USEMP project. Since these indicators are part of the personal data for each end-user of the USEMP tools (referred to also collectively as Databait), their use is also governed by the proposed DLA approach in Sect. 2.

In the early stages of the development of the USEMP disclosure scoring framework we identified a list of personal data attributes that have been qualified as sensitive or valuable by the users.Footnote 5 For users, to better perceive the different aspects of their privacy, it is useful to organize the attributes in a semantic manner. To this end we organize the identified attributes in a number of high-level categories that we refer to as disclosure dimensionsFootnote 6. This organization allows for clear and intuitive presentation and handling of the different aspects of a user’s personal information.

For instance, one of the disclosure dimensions to be considered is demographics, which includes user attributes such as age, sex, etc., and another is health factors, which includes attributes such as smoking and drinking, etc. Such a grouping has multiple benefits for the end user. First, it enables him/her to form a succinct, easy to grasp mental model of the disclosed personal information and to prioritize its different parts. Second, it enables the use of different compact visualization methods that will further augment the user’s awareness with respect to his/her personal information.

From a legal perspective we note that much of this data falls within the ambit of sensitive data, which has stringent legal implications. Providing these disclosure dimensions will enable users, OSN providers and other stakeholders to get a better understanding of the sensitive data that are inferred from user data and will thereby enable a clear attribution of liability for the processing of such inferred data.

On top of this disclosure dimensions framework, we develop the USEMP disclosure scoring model, by enriching it with disclosure and data value scores. Having organized the user attributes in the disclosure dimensions structure, we proceed by enriching it with disclosure and data value scores. Disclosure scores are about quantifying the potential negative impact entailed by the disclosure of different parts of the personal information of a user. The economic (though not monetary) value of a user’s shared personal data in OSNs (e.g. posts) is inferred by measuring its impact on the user’s social graph, i.e. audience (in terms of reactions, e.g. likes, shares, comments).

4.1 Disclosure Dimensions

We define eight (8) key categories of personal attributes, which we name disclosure dimensions. These are: (A) Demographics, (B) Psychological Traits, (C) Sexual Profile, (D) Political Attitudes, (E) Religious Beliefs, (F) Health Factors and Condition, (G) Location, and (H) Consumer Profile.

These dimensions cover a wide variety of personal information, which OSN users in many cases consider of private nature (perceived privacy), and also encompass information that is considered sensitive from a legal perspective (legally sensitive data). In addition, based on current business practices (mainly stemming from the marketing industry), the identified dimensions are associated with certain value levels, i.e. they carry a certain level of utility for (marketing) companies that are interested in targeting consumers. Table 2 summarizes the eight identified disclosure dimensions, along with the value levels associated with them.

Table 2. Overview of USEMP disclosure dimensions

This set of eight disclosure dimensions constitutes the current top-level schema of the USEMP privacy model, and although we do not foresee considerable changes at this level, the implementation of the overall framework is generic enough and can accommodate such changes if needed (e.g., addition of a new dimension, splitting of an existing dimension into more).

The disclosure dimensions framework effectively defines a hierarchy where the top level represents the OSN personal data profile, the next level contains the eight (8) disclosure dimensions, the level below contains the set of attributes of each dimension and the lower level contains a predefined set of possible values for each attribute (please note that the word value here refers to the possible values an attribute may take and should not be confused with the concept of personal data value).

We link OSN data to this framework by considering a variety of detection and analysis mechanisms, e.g. multimedia information extraction techniques and inference techniques, and OSN presence data (typically in the form of observed user activities, e.g. likes [18], posts, user interactions, or volunteered profile information). In short, we use a user’s OSN presence data (e.g. posted content, likes, set of friends), in order to predict the values of the user’s attributes. This involves both utilizing explicitly provided information and also producing a number of inferences.

4.2 Disclosure Scoring

Initially, the USEMP research team identified the need for a scoring framework that would help the end-user understand better which OSN or web behaviour actions may disclose personal information related to the disclosure dimensions presented above. Such a mechanism is based on perceived privacy and highlights what can be disclosed about users based on inferences on their data. We make clear to the users that this disclosure is based on algorithmic decision making and need not at all be correct, but emphasize that this is how the current personal data ecosystem operates: the value chain is based on such probabilistic disclosures. Additionally, overall disclosure scores are computed at each level of the proposed hierarchy. In the next subsection we provide more details about the scoring framework.

Structure:

A schematic illustration of the disclosure scoring framework is shown in Fig. 2. USEMP disclosure scoring framework builds on top of the disclosure dimensions and assigns a number of scores to the elements of each level. These scores express different aspects of disclosure, e.g. the perceived sensitivity of different types of information, the confidence that some value holds for some user, an overall disclosure score and other aspects that will be described shortly Clearly, the two important characteristics of this framework are the following: (a) it is tailored to the hierarchical structure of the disclosure dimensions, (b) there are multiple scores associated with the elements of each level of the hierarchy. Hence, the framework enables the following two kinds of user awareness: (a) navigation through the levels of the hierarchy and understanding of how the scores for some particular value affect or are affected by the levels above and below it, and (b) focus on specific aspects of the factors that are related to perceived privacy; e.g., it will be possible to focus on visibility, the overall disclosure score, etc.

Fig. 2.
figure 2

Overview of USEMP disclosure scoring framework

Here, we consider an additional level at the root of the framework, which contains any type of data that is generated as a result of a user’s behaviour and interaction with the services of an OSN operator. This includes posted content (text, images), explicitly declared profile information, user network data, sets of likes, etc. We call this the OSN presence data layer and consider it as the primary source for populating the disclosure scores for the given user. Naturally, between the perceived privacy values level and the online presence data, there is a layer of modules that perform various mining and inference procedures. The overall framework is visualized in Fig. 2.

Computation is carried out in a bottom-up manner. That is, information that comes from the OSN and the inference mechanisms is used to fill the scores at the values level and then the computed scores at the values level are used to fill the scores at the upper levels, one level at a time. Starting from the level of values, the scores that characterize each value are the following:

  1. (a)

    Confidence. This is a continuous value in the range from 0 to 1. It represents how confident we are that the corresponding value is true and is typically computed by the inference algorithm along with the produced inference. It needs to be noted that the confidence values under the same attribute should sum to 1 (except for the case that an attribute can take multiple values simultaneously).

  2. (b)

    Sensitivity. This score represents how sensitive the user feels that this particular piece of information is. It also ranges from 0 to 1.

  3. (c)

    Visibility. This score attempts to quantify the set of people to which the relevant information is accessible and consists of three sub-scores. The first is the overall visibility score and is also a continuous value in the range from 0 to 1 (lower values denote that the corresponding piece of information is accessible by fewer people, whereas a value of 1 denotes public information). The second visibility sub-score is a qualitative label that is related to the overall disclosure score and expresses the widest possible audience to which this information is accessible. For instance, a value whose overall visibility score is 0 has a visibility label of “Private”, a value whose overall visibility score is 1 has a visibility label of “Public” and an intermediate value denotes the widest group of people that have access to the value, e.g. “Friends” or “Friends of friends”, etc. This sub-score is called “visibility label”. The third visibility sub-score expresses an estimate of the actual audience that sees this value and we refer to it as “actual visibility”. It is an integer number representing the actual number of users that are aware of that value and depends on the estimates of the actual audience of the content that has been used to infer that value. It should be noted that the current implementation of the scoring framework does not compute the actual audience, however, different approaches are considered for estimating it.

  4. (d)

    Declared/Inferred. This is a binary value that defines whether our knowledge about the particular value comes from explicitly provided information that the user has provided or has been inferred (derived). It is not an actual score but reflects information that is important for maintaining a complete view of disclosure with respect to some particular value. Additionally, in some cases a value may be both declared and inferred. In such cases, the value will be considered as declared (i.e. declared will override inferred).

  5. (e)

    Support. This field is not really a score, but rather provides a link to the OSN presence data that support the particular value. In the case that the support for the value is associated with an inference mechanism, this field points both to the inference mechanism and the data that the inference mechanism used. This field is particularly important because it allows the user to understand the types of content that are important for his privacy.

  6. (f)

    Level of control. This score represents the ability of a user to control the disclosure of data about him/her. It ranges from 0 to 1; low values will denote a limited ability to control the disclosure of this particular data about the user. The ability of a user to control the disclosure of data about him/her may be limited by the fact that the support of some value may involve also data posted by other users that the user him/herself cannot control. This score is set by evaluating each piece of shared information with respect to (a) ownership of the data from the end-user (or someone else), (b) the permissions framework of the social network that may allow the user to stop this information from being shared or not.

  7. (g)

    Disclosure score. Eventually, the framework includes an overall perceived score that provides a succinct idea about the overall privacy status of an OSN user (see for example [13]). It is a function of other scores: confidence, sensitivity and visibility. Higher values of the score denote a higher exposure of personal information that is perceived to be of private nature. Note that although the disclosure score essentially summarizes the other scores, the model maintains a separate list of the individual scores (confidence, sensitivity, and visibility) in order to support richer visualization and analysis capabilities (e.g. separate visualization of visibility and sensitivity).

The three upper levels of the proposed disclosure scoring structure, namely the user, the dimensions and the attributes, are all associated with the following set of scores: (a) Visibility, (b) Disclosure score and (c) Level of control. These have the same meaning as the corresponding scores at the values level. In addition, the top level (user) is also associated with an overall personal data value score (please see the next subsection). For a full description of proposed estimators see [7, 19].

4.3 Personal Data Value Indicators Framework

In addition to disclosure scores, a set of personal data value indicators are required so that the end-users can be informed about the value of the data they are sharing. The proposed personal value indicators are based on the activities of the end user in the OSN environment and his/her OSN social graph. Two basic indicators are initially proposed:

  1. (a)

    a measure of influence for a specific person, referred to as Influence score and denoted with I, that is based on the history of the objects that the specific person has created in the OSN;

  2. (b)

    a measure of the importance of an object (picture/video/post), denoted with M, that is posted to the OSN. M is calculated taking into consideration the type of action on the specific object of the first- and second-hop friends of the object creator.

The Influence score of a specific person should be estimated based on the history of the objects that the specific person has created, while taking into consideration the

  • number of connections comparing to the total number of users of the network;

  • types of actions (share, like, comment) of the first and the second hop friends on the objects that the corresponding person has uploaded/created to the OSN.

For the calculation of user influence I, the following parameters are proposed to be collected and used:

  • number of objects (i.e., picture/video/post) that a user has created

  • number of first- and second-hop friends

  • total number of first- and second-hop friends that had an action on each object (i.e., picture/video/post)

  • type of action (i.e., share, like, comment) of user j on the object i

The parameters listed above can be collected and combined to different formulas to compute values for variables I and M.

The proposed personal data value combines these two factors (I, M) and is calculated as follows:

$$ V = I \cdot M $$

This initial set of defined value indicators (data value V, user influence score I, object importance M) is defined so that it can be computed from actual OSN data (like Facebook). As part of future work these parameters will be collected and computed with actual data from pilots on top of Facebook and with simulated data from theoretical models and various formulas for I, M will be tested (for a full description of proposed estimators see [7]).

4.4 Experimental Tools and Visualization

In order to evaluate the ideas developed in the USEMP project, a set of pilots have been scheduled and a set of tools are developed to provide a testbed for collecting data and end-user feedback, these are referred by USEMP partners as the Databait tool. These include:

  • Databait web browser plugin: a browser plugin that is used to collect users’ browsing data during the pilot and that allows end-users to block tracking behaviour or offer users recommendations with respect to sharing data.

  • Databait webapp: a web application that allows end-users to view indicators of their online social network sharing behaviour with respect to transparency and data value.

  • Databait backend: a framework and set of services that collect data about user behaviour and compute a number of indicators with respect to disclosure and data value that are shared with end-users via the Databait webapp visualizations.

In the following Figs. 3, 4 and 5, we present some examples of the UI/UX of the web browser plugin and webapp visualizations for the Databait tools (this is work in progress to be validated after the completion of the USEMP pilots):

Fig. 3.
figure 3

Databait web plugin – allows end users to view third-party tracking services and block them (or unblock them)

Fig. 4.
figure 4

Databait web application – access to visualizations for transparency and data value

Fig. 5.
figure 5

Databait web application – access to visualizations for transparency and data value: (a) left screen shows the concepts detected in a user’s shared images, (b) right screen show the locations detected in a user’s shared posts

  • Figure 3: presents the look and feel of the Databait web plugin that is responsible for selecting trackers and blocking (or unblocking them).

  • Figure 4: presents the look and feel of the Databait web application that provides the end-user with access to a number of visualizations/tools and information from OSN shared data.

  • Figure 5: presents the look and feel of two types of visualization that allow the user to understand if the shared data disclose any location (from Facebook posts analysis) or interests (derived from image analysis) and present them in an intuitive way.

4.5 Evaluation of USEMP Tools

We have performed a series of user studies within the USEMP project in order to build the Databait tools on solid grounds of user acceptance in the form three focus group sessions with end users. Hence the study acted as a formative evaluation approach by involving end users to the design and evaluation of tools while the project moves forward. Three focus group interviews were conducted in English within March 2015 together with 15 participants from Botnia Living LabFootnote 7. The design of the focus groups was based on gathering end user insights on main theme of transparency of personal information through USEMP tools. For this purpose we presented the participants with the mock-ups of the USEMP tools/Databait (as presented in Figs. 3, 4 and 5). Each function demonstration was then followed by questions targeted on insights for values, motivations and barriers to use. We briefly discuss the participant feedbacks here.

We first asked users about the normal social media and internet usages in order to indirectly and directly capture their awareness of the ways they disclose information. During the course of a normal social media usage there are different communicative actions which are established. Users have personal motivations and external influencing factors that force them to use social media. Therefore their usage is not totally optional and for this reason their personal information is inevitably disclosed. Among personal motivational factors that can be enumerated are the willingness to reach a wider audience in order to promote themselves for example with different political activities, keeping in touch with the families, friends, acquaintances, keeping track of their events like their friend’s birthdays, to keep themselves updated about what is happening in surroundings and etc. Disclosing various types of information is evident in these types of social media usages. We asked them about the kinds of information they think they are disclosing in their everyday usage of social media to capture the level of their awareness towards privacy issues. Our analysis showed that most users think of privacy as only the basic personal information they disclose voluntarily like name, age, relationship status. The awareness towards the observed and inferred data sources is extremely low among the users.

Next, the Databait tools were presented using the mocks of the future tool and the expected results (e.g. inferences). The participants were allowed to freely discuss about the features and ask questions about the functions. Therefore we created a milieu for the lively discussion and to capture their concerns and how the tool could serve them in different scenarios. For example we could observe that most of the participants were curious about the features and at some points were shocked by the level of the tool’s sophistication. One of their pivotal points was related to the unconscious disclosure of personal information that might have an impact on one’s public image and how data processors could gain value of their information. The users saw the benefits in this awareness awakening through manipulation of informed disclosure. The benefits were also associated to the disclosed information at various levels determined by each and every user’s beliefs, cultures, economical values gained, political outreach and etc. Databait’s personality trait function showed to be beneficial in this sense since users can be sensitive to different subjects.

Photo and location leaks functionalities could draw user’s attention on various levels of disclosure both those revealed intentionality and those that are unintentional. From intentional point of view users find this helpful with respect to the values of the contents to the Social Media owners. So what made them more aware of their shared content was the ability to see the profits of their contents from the social media owner’s perspective; to see what could be gained from the contents and how those could be inferred. Even though they are aware of their shared contents, their perception of the contents’ secondary usage was limited so that social media owner’s bad intentions could hide in the user’s low institutional privacy awareness. Unintentionally revealed sensitive information interpreted by Databait could help the participants learn more about the adverse effects of their actions and seek to possible solutions e.g. deleting photo/location leak or limiting audience.

To summarize, the result from evaluation of designed concepts, showed that users are curious about the revelation of values that could be drawn from personal information and generated content. We found that users are willing to be more educated through the tool about adverse effects of their sharing habits triggered by a sense of dread that could raise their awareness. Here the idea is that the users are more intrigued when they see dangers more explicitly. This has then led the users to perceive such privacy tools to be more effective. Simplicity showed to have an impact on how the users are willing to adopt a tool as well. Most of the users agreed that the tool needs to have a ‘simple to use’ settings with self-explanatory features. Our aim in USEMP project is to take this end user’s perspective into account for the next versions of the tools.

5 Conclusions and Future Work

In this paper we have presented the results of the multi-disciplinary project USEMP in developing a value model for the use of personal data in advertising and OSNs that empowers the end user and offers more transparency to the use of personal data. The presented research describes:

  • what are the legal aspects of users’ privacy in OSNs that can be addressed by transparency tools that are based in the principle of Data Protection by Design (DPbD);

  • how a DLA model is more appropriate than that of simple prior-consent to improve trust and user control on sharing personal data in OSNs;

  • how a centralized tool developed as an independent platform is more appropriate for the business ecosystem to improve trust of the end-consumers to the advertising and marketing industry;

  • how a disclosure scoring framework can be developed to support such a transparency tool;

  • an overview of visualization methods that can be used as part of such a tool.

The ideas examined in this paper are currently implemented in the form of Databait tools and they are under evaluation from the pilot experiments organized from the USEMP project. Their impact on the end-user perception of privacy and the creation of new innovative business models that can support DPbD in OSNs and online advertising/marketing will be presented in future publications.