Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The awareness for privacy concerns is growing in the public. With this awareness comes a call for more transparency on what, why and how software-systems collect, use, and process personal information. Hansen [1] identifies transparency as one of three privacy protection goals ensuring “that all privacy-relevant data processing including the legal, technical and organizational setting can be understood and reconstructed” [2]. Hence, it is not sufficient to increase user’s privacy awareness, it is also necessary to provide the information needed to users in order to understand how they personal data is processed. Transparency, as all software qualities, is a complex property. It leads to requirements for the representation of static information about the software’s intended purpose, but also to requirements on informing users about run-time events, e.g., malfunctions. In addition to the requirements about informing what happens, there are also requirements on how the information is shown to users to ensure that mechanisms to improve the software’s transparency have an impact on the user’s privacy-awareness. Especially concerning legal compliance, requirements engineers have to provide an as complete set of requirements as possible to ensure that the software that is built based on these requirements is compliant. I.e., the software requirements have to bridge the gap between the legal requirements and the technical mechanisms to realize them. To empower requirements engineers to identify all transparency requirements relevant for the software to be built, we have to refine the high-level privacy goal transparency into more concrete transparency requirements that assist requirements engineers in the elicitation process.

To obtain an as complete taxonomy of transparency requirements as possible, we consider different sources that requirements engineers also should consider. To be compliant with legislation requirements engineers have to consider privacy and data protection laws relevant to them, depending on the application domain of the software to be developed also standards have to be considered, to increase user acceptance, the user’s needs have to be considered. We used as sources for the creation of our taxonomy the ISO/IEC 29100:2011 standard [3] and the draft of the EU Data Protection Regulation [4]. We then considered relevant research in the field of privacy, transparency, and awareness including empirical research on user’s privacy concerns to validate the completeness of the proposed taxonomy.

The rest of the paper is structured as follows. Our privacy requirements taxonomy is derived and presented in Sect. 2 and validated using related work identified using a systematic literature review in Sect. 3. Section 4 concludes the paper.

2 Deriving and Structuring Requirements on Transparency

In Sect. 2.1, we systematically analyze the privacy principles described by ISO/IEC 29100:2011 [3] and the draft of the EU data protection regulation [4] to derive the transparency requirements they contain. To derive the requirements, we analyzed the description of the privacy principles and the formulations of the regulation. We looked for verbs like inform, notify, document, present, provide, explain, communicate and related nouns. We keep the formulation of the identified transparency requirements close to the original documents from which we identified them. In Sect. 2.1, we enumerate these derived requirements using the notation Tn. As the ISO principles and EU articles partly overlap, we identified several refinements of identified requirements. We relate those requirements using a refines relation. If a transparency requirements T\(n_1\) refines a part of another requirement T\(n_2\), this means that T\(n_1\) adds further details on how or what information has to be made transparent. The refines relation is visualized in form of an initial ontology of transparency requirements in Fig. 1. In Sect. 2.2, we structure the transparency requirements identified in Sect. 2.1 into a taxonomy of transparency requirements. This taxonomy is presented as an extensible metamodel.

ISO/IEC 29100:2011 and the draft of the EU data protection regulation do not use the same terminology. To avoid ambiguities, we will use the following term definitions from the draft of the EU data protection regulation in this paper.

  • Data subject “means an identified natural person or a natural person who can be identified, directly or indirectly, by means reasonably likely to be used by the controller or by any other natural or legal person, in particular by reference to an identification number, location data, online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that person.” This term is called PII principal in ISO/IEC 29100:2011.

  • Personal data “means any information relating to a data subject.” This term is called personally identifiable information (PII) in ISO/IEC 29100:2011.

  • Processing “means any operation or set of operations which is performed upon personal data or sets of personal data, whether or not by automated means, such as collection, recording, organization, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, erasure or destruction.”

  • Controller “means the natural or legal person, public authority, agency or any other body which alone or jointly with others determines the purposes, conditions and means of the processing of personal data; where the purposes, conditions and means of processing are determined by Union law or Member State law, the controller or the specific criteria for his nomination may be designated by Union law or by Member State law.” This term is called PII controller in ISO/IEC 29100:2011.

Fig. 1.
figure 1

Initial ontology of transparency requirements

2.1 Requirements Identification from Privacy Principles and Legislation

ISO/IEC 29100 Privacy Principles. To derive our taxonomy of transparency requirements, we first consider the international standard ISO/IEC 29100:2011 [3], which defines 11 privacy principles which are a superset of the OECD principles [5] and the US fair information practices (FIPs) [6].

We start our analysis of the privacy principles with the openness, transparency and notice principle, which is obviously concerned with transparency. From this principle, we obtain the following transparency requirements.

  1. T1

    Inform data subjects about the controller’s policies, procedures and practices with respect to the processing of personal data.

  2. T2

    The information about the management of personal data has to be clear and easily accessible for data subjects (and the public).

  3. T3

    Explain the purpose of data processing to data subjects.

  4. T4

    Specify the persons to whom the personal data might be disclosed.

  5. T5

    Provide the identity of the controller including contact information to data subjects.

  6. T6

    Provide information about the choices to limit the processing of personal data to data subjects.

  7. T7

    Provide information about the means to access, correct and remove personal data to data subjects.

  8. T8

    Provide information in the case that a decision that a data subject can make has an impact on the data subject.

  9. T9

    Document and communicate all contractual obligations that impact personal data processing externally to the extent those obligations are not confidential.

  10. T10

    Provide information about the personal data required for the specified purpose to data subjects.

  11. T11

    Provide information about how and what personal data is collected to data subjects.

  12. T12

    Provide information about how, what and to whom personal data is communicated to data subjects.

  13. T13

    Provide information about how and what personal data is stored to data subjects.

  14. T14

    Provide information about authorized natural persons who will access personal data to data subjects.

  15. T15

    Provide information about data retention and disposal requirements.

T1 and T2 are the most general requirements in our initial ontology. Hence, they form the root elements (cf. Fig. 1). T1 is considered with what information has to be presented and is refined by T3-T15 that are all also concerned with about what data subjects have to be informed. In contrast, T2 is concerned with how that information has to be presented to data subjects.

The consent and choice principle strengthens that data subjects have to give their consent on a “knowledgeable basis” and hence, they have to be informed before obtaining consent. This information has also to contain information about “the implications of granting or withholding consent”. We identify the following requirement.

  1. T16

    Before data subjects are asked to give consent to use their data, provide all information necessary to make this decision to them, including the implications of granting or withholding consent.

This requirement refines T2 in the sense that the point in time when the information has to be provided is specified. Additionally, T16 refines T8 by describing which data has to be provided to data subjects when they make the decision to give consent.

The principle purpose legitimacy and specification stresses that data subjects have to be informed about the purpose of data collection and use before it is used for the first time or for a new purpose. This information has to be presented using language “which is both clear and appropriately adapted to the circumstances.” In the case that sensitive data is processed, sufficient explanations have to be provided to the data subject. Hence, we obtain following requirements.

  1. T17

    Inform data subjects about the purpose of data collection and use before it is collected or used for the first time for this purpose.

  2. T18

    The language used for providing information to data subjects has to be clear and appropriately adapted to the circumstances.

  3. T19

    Provide sufficient explanations whenever sensitive data is used to data subjects.

Requirement T17 complements T3 with the information when data subjects have to be informed. T18 is a refinement of T2 by adding the notice that the presentation has to be adapted to the circumstances in which this information is shown. T19 places emphasis on providing explanations whenever sensitive data is used and hence refines the top-level requirement T1.

The principle collection limitation is concerned with limiting the collected personal data to the minimum needed. We obtain the following additional requirement.

  1. T20

    Provide information to data subjects about if it is optional to provide personal data.

This requirement complements T11 and T16, because it is important to inform data subjects before data collection and giving consent whether it is optional to provide the questioned personal data.

The principle accountability contains the following transparency requirements that are concerned with the occurrence of privacy breaches, which is not yet covered by other transparency requirements, because the other requirements are concerned with the normal behavior of the system under consideration.

  1. T21

    Inform data subjects and other relevant stakeholder (as required in some jurisdictions) about privacy breaches that can lead to substantial damage to data subjects as well as the measures taken for resolution.

The principle information security implies the following transparency requirement that refines the transparency requirement T1.

  1. T22

    Inform data subjects about the (security) mechanisms to protect their personal data.

Draft of the EU Data Protection Regulation. To identify further transparency requirements and to refine the already identified requirements, we analyze the draft of the EU Data Protection Regulation [4] that is currently under review and will be when accepted by all member states be mandatory to be implemented by all EU member states. In contrast to the situation in the US where no privacy regulations covering all industrial branches exist [7], the EU Data Protection will cover all industrial branches.

Article 5 (b) adds the need that the purpose has to be legitimate to requirement T3. Hence, we obtain the following refined requirement.

  1. T23

    Explain data subjects why the purpose of data collection is legitimate.

Article 12 prescribes the implementation of procedures and mechanisms for exercising the rights of data subjects and says that “If the controller refuses to take action on the request of the data subject, the controller shall inform the data subject of the reasons for the refusal and on the possibilities of lodging a complaint to the supervisory authority and seeking a judicial remedy”. Hence, we identify a transparency requirement that, similar to T21, is not concerned with the normal system behavior.

  1. T24

    If requests of data subjects for exercising their rights are rejected, then the reasons for the refusal has to be provided.

From Article 14, we can derive following transparency requirements that refine previously identified requirements.

  1. T25

    Provide the period for which the personal data will be stored to data subjects.

  2. T26

    Provide information about “the existence of the right to request from the controller access to and rectification or erasure of the personal data concerning the data subject or to object to the processing of such personal data”

  3. T27

    Provide information about data transfer “to a third country or international organisation and the level of protection afforded by that third country or international organization”.

  4. T28

    Inform the data subject about the source the personal data used originates from.

  5. T29

    Provide information to data subjects “at the time when the personal data are obtained from the data subject; or where the personal data are not collected from the data subject, at the time of the recording or within a reasonable period after the collection, having regard to the specific circumstances in which the data are collected or otherwise processed, or, if a disclosure to another recipient is envisaged, and at the latest when the data are first disclosed.”

T25 refines T13 by adding the need for specifying the duration of data storage. T26 adds a legal need to T7. T27 refines T12 by requiring special treatment when data is transferred to third countries or international organizations. T28 refines T11 by adding the need to provide information of the source of the personal data used. T29 refines T2 with information about when to provide information to data subjects.

Article 31 is concerned with the notification of personal data breaches and refines T21 by adding a duration after which the supervisory authorities have to be informed.

  1. T30

    Notify supervisory authorities (and data subjects) about the occurrence of a personal data breach not later than 24 hours after having become aware of it.

2.2 Setting up a Transparency Requirements Taxonomy

In this section, we structure the identified preliminary transparency requirements into a transparency requirements taxonomy. Figure 2 shows our taxonomy in the form of a metamodel using a UML class diagram. We structured the transparency requirements into a hierarchy, which is derived from the initial ontology shown in Fig. 1. We describe our taxonomy in the following from the top to the bottom. An overview of the mapping between the transparency requirements taxonomy to the initial transparency requirements is given in Table 1.

Fig. 2.
figure 2

Our proposed taxonomy of transparency requirements.

Table 1. Mapping of transparency requirements to preliminary requirements

Transparency Requirement. The top-level element of our hierarchy is the general TransparencyRequirement which corresponds to the initial requirement T1. In our metamodel we declared this requirement as abstract, i.e., it is not possible to instantiate it, only its specializations can be instantiated. It has six attributes. First, the dataSubject who has to be informed. Second, a set of counterstakeholders who are involved in the processing of the data subject’s data and the data subject has to be informed about them. For example, T4 and T14 prescribe to specify the (authorized) persons to whom personal data might be disclosed. This is the case for many requirements in our taxonomy and hence, we put this attribute to the top-level requirement. If there is no need to specify persons who are somehow involved in the data processing, the attribute counterstakeholder is left empty. Our taxonomy suggests to consider data subjects and counterstakeholders as persons. The data subject should be a natural person, whereas the counterstakeholders can be natural, legal, or artificial persons, e.g., organizations or authorities. Third, the set of personal data of the data subject for which the transparency requirement is relevant. Almost all transparency requirements that we identified previously refer to the data subject and his/her personal data. Hence, all transparency requirements in our taxonomy have the data subject and his/her personal data as attribute. Fourth, we document whether the specified personal data represents senstiveData, because of T19 sensitive data needs special consideration. Fifth, the attribute linkability documents whether the personal data is linkable to a single data subject, a group of possible data subjects, or is anonymous. This attribute is not explicitly motivated from the requirements, but T16 mentions that in the case of giving consent all information necessary to make this decision has to be provided to data subject and we think that the linkability of the personal data to the data subject is such an information. Sixth, in accordance with T5 the data subject has to be informed about who the controller is.

Presentation Requirement. The initial transparency requirements T2, T16, T18, T29, and T30 are in contrast to the other requirements not mainly concerned with what information shall be provided to the data subject, but with how this information has to be presented. To decouple the how from the what in our taxonomy, we introduce PresentationRequirements. Every TransparencyRequirement has exactly one PresentationRequirement assigned, which describes how the information has to be provided to the data subject. On the other side, the same PresentationRequirement can be related to multiple TransparencyRequirements. The attribute time reflects T16, T29, and T30 that prescribe the time when information has to be provided. The possible values for this attribute are summarized in the enumeration PresentationTime (cf. Fig. 2). We derived these values from T16, T29 and T30. Nevertheless, we do not consider this enumeration, such as all other enumerations presented in our taxonomy, as complete and whenever necessary they can be extended. The attribute languages is not explicitly mentioned in a transparency requirement, but to provide information clearly and adapted to the circumstances to data subjects (in accordance with T18) one should present this information using at best the first language of each possible data subject. The attribute accessibility serves to document the requirements on how data subject shall be able to access the information, indicated by T2. An information may has to be publiclyAvailable, onRequest of the data subject, or the information is forwarded to the user when needed.

ExceptionalInformationRequirement. Most transparency requirements are concerned with providing information about the normal behavior of the considered system. This information can be considered as rather static. In contrast, T21, T24, and T30 require to inform data subjects in cases where unexpected events occur. For this purpose, we refine the general TransparencyRequirement into the requirement ExceptionalInformationRequirement. The attribute case stores the kind of unintended event the data subject has to be informed about. This can be a dataBreach as mentioned in T21 and T30, a systemChange that e.g., changes the purpose of data processing (cf. T17), or a rejectedRequest of a data subject as described in T24. In addition to the data subject that has to be informed, T21 also states that authorities may have to be informed. The attribute authorities is used to document the natural, legal, or artificial persons that have to be informed if the respective exceptional case occurs.

ProcessingInformationRequirement. The requirement ProcessingInformationRequirement refines TransparencyRequirements and contains the properties that all static transparency requirements, which refine the initial requirement T1 (cf. Fig. 1), have in common. The attribute controlOptions summarizes (using the data type ControlOption) the options the data subject has to limit the processing of personal data (T6), means to access, correct and remove personal data (T7 and T26), and the consequences implied by these options (T8). T3, T17, and T23 require that the purpose for data processing is explained to data subjects. The attribute purpose is used to provide a set of Statements that could consist of functional requirements and knowledge about the software environment for which’s fulfillment the personal data of the data subject is needed. Furthermore, the attribute reason is used to provides information about why the personal data is needed for the purpose and why it is legitimate to use it. Due to T10 and T20, data subjects have to be clearly informed whether the provision of personal data is optional and whether the information is needed for the specified purpose. The attribute mandatory is used to capture this information. The attribute security is used to represent how the personal data is protected as required by T22. Possible protection mechanisms are e.g., encryption and accessControl.

CollectionInformationRequirement. Requirement T11 prescribes that data subjects have to be informed about how and what data is collected from them. For this purpose, we refined the ProcessingInformationRequirement into the CollectionInformationRequirement. In addition to the information that is already inherited from TransparencyRequirement and ProcessingInformationRequirement, we derived from T28, which is a refinement of T11 (cf. Fig. 1), the attribute method that reflects whether the data collection is direct, indirect, or whether existing data of the subjects is reused.

FlowInformationRequirement. Requirement T12 implies a further refinement of ProcessingInformationRequirement that we call FlowInformationRequirement. This requirement prescribes to inform data subjects about the flow of their data. From T9 and T27, we derived that for each information flow, it is important to inform the data subject about the contractual obligations and policies the data receiver is bound to. This information is represented in the attribute contract. Furthermore, T27 puts an emphasis on taking care of data transfer to third countries and international organizations. Hence, we added the attribute countries to capture the geographical destination of the data flow.

StorageInformationRequirement. From T13, we derive the requirement StorageInformationRequirement that is also a refinement of ProcessingInformationRequirement. This requirement is used to represent the information that is needed to inform the data subject about the storage of his/her personal data. In addition to the attributes inherited from TransparencyRequirement and ProcessingInformationRequirement, T15 and T25 require that the data subject is informed about the duration of storage and the data retention and disposal requirements. To reflect this information, we use the attribute retention. The possible values of this attribute can indicate that personal data is stored for an unlimited time, as long as it is needed for the purpose it was collected for (forAction), or until it is deleted (untilDeleted) after there is no reason to keep the data anymore, but not directly.

The complete taxonomy is shown in Fig. 2. Note that the taxonomy is easily extensible by further refinements of requirements, adding further attributes and relations, and adapting the suggested enumerations to the needs implied by the application domain and relevant legislation of the software to be developed. Table 1 provides an overview of how the initial requirements Tn that we derived from ISO 29100 and the draft of the EU Data Protection Regulation are reflected by the proposed taxonomy.

3 Validation of the Taxonomy Using Related Literature

In this section, we give an overview of existing research that also contains considerations about the privacy goal of transparency. To validate our proposed taxonomy, we map the notions and concepts used in the related literature to our taxonomy to check whether it is suitable to reflect the shapes of transparency used in the literature.

Table 2. Mapping of transparency notions from the literature to our proposed taxonomy

To identify the relevant related work, we performed a systematic literature review using backward snowballing [46]. To obtain the starting set of papers for our review, we manually searched the proceedings and issues of the last 10 years of computer science conferences and journals that are mainly concerned with at least one of the topics privacy, requirements, and software engineering and ranked at least as B-level in the CORE2014Footnote 1 ranking. First, we checked whether title or abstract of a paper indicated that the paper is concerned with privacy (requirements), transparency, or awareness. If this was the case, we analyzed the full text of the paper. Due to the manual search process, we have to deal with the threat of validity that our starting set of papers does not contain all relevant literature, because it was published in a source that we do not consider or was published earlier than in the last 10 years, To mitigate this threat, we applied backward snowballing. I.e., we also considered the papers referenced in the papers that we identified as relevant until no new candidates were found. In total, we identified 403 papers that seemed to be relevant after reading title and abstract. After the analysis of the full text, we finally identified 39 papers as related work.

Due to space limitations, we cannot present all details of the literature review in this paper. The details can be found in a technical reportFootnote 2. The list of considered conferences and journals can also be found in this technical report. We were able to map each explicitly mentioned transparency related concept in the literature to an element of our taxonomy. This mapping is provided in Table 2. We categorized the identified literature into the four categories Privacy (Requirements) Engineering, Empirical Research on Privacy Awareness, Privacy from the Legal Perspective, and Privacy Policies and Obligations.

From Table 2, we can see that almost all papers in the category Privacy (Requirements) Engineering have considered what information has to be provided to data subjects, but only the halve of these papers mentioned that it is important how this information is provided. Only three contained aspects related to notification of data subjects in exceptional cases, e.g., data breaches. Note that none of the papers in this category covered all elements of our taxonomy. The papers in the category Empirical Research on Privacy Awareness mainly investigate the users’ awareness of data processing. The papers did not give recommendations on how data subjects shall be informed about exceptional cases. In the category Privacy from the Legal Perspective, we have papers that consider single laws or aspects that can be reflected by single elements of our taxonomy, and papers that consider a larger legal framework or privacy impact assessments and hence, cover (almost) all elements of our taxonomy. The papers in the category Privacy Policies and Obligations provide the most structured, detailed, and complete concepts related to transparency requirements. Nevertheless, we did not find any literature that provides an as structured, detailed, and complete overview of transparency requirements as our proposed taxonomy shown in Fig. 2.

4 Conclusions

In this paper, (1) we systematically derived requirements for the privacy goal transparency from the ISO/IEC 29100:2011 standard [3] and the draft of the EU Data Protection Regulation [4]. These two documents belong to the most relevant sources for privacy requirements that have to be considered by software developers. (2) We then structured these requirements in a metamodel for transparency requirements. This metamodel provides an overview of the identified kinds of transparency requirements and shall help requirements engineers to identify and document the transparency requirements relevant for them and the information needed to address the transparency requirements. (3) We performed a systematic literature review and provide an overview of the relevant research related to transparency requirements. (4) We validated that our taxonomy contains all necessary aspects mentioned in the identified literature. The literature review showed that all aspects of the privacy goal transparency mentioned in the literature are reflected in the proposed taxonomy. Furthermore, we did not find any literature that presents transparency requirements in an as structured, detailed, and complete manner. Our proposed metamodel of the taxonomy can easily be adopted and extended.

As future work, we plan to develop a systematic process that assists requirements engineers to identify the relevant transparency requirements based on a given set of functional requirements. Furthermore, we will develop a tool to generate human-readable representations of the instantiated transparency requirements of our proposed metamodel based on text templates.