Keywords

1 Introduction

A Data Protection Impact Assessment (DPIA) method aims to identify the main risks of a project with respect to the rights of data subjects concerning their personal data. It is a systematic process to elicit threats to the privacy of individuals, identify the procedures and practices in place to mitigate these threats, and document how the risks were addressed in order to minimise harm to data subjects [12, 22]. DPIAs have been recognised as a key topic for data protection governance in Europe, as they will become mandatory according to the ongoing data protection legal framework reform, in the form of the proposed General Data Protection Regulation (GDPR) [13]. The version of the European Parliament’s first reading also incorporates the concept of risk into the DPIA process (cf. Article 32a), in the scope of the DPIA mechanism by mandating data controllers to carry out a DPIA in those cases likely to present specific risks to the rights and freedoms of data subjects. Hence, the concept of risk is embedded in the DPIA process as a pre-assessment stage and a risk analysis would be able to function as an awareness methodology in order for a DPIA to be carried out. Note that in the context of the present analysis the terms DPIA and Privacy Impact Assessment (PIA) are being used interchangeably [15].

A DPIA seems to perform a dual function. On the one hand, it can serve as an accountability mechanism, especially where data breaches or losses occur – in the sense that it allows organisations acting as data controllers or data processors to demonstrate their awareness about the risks concerning privacy and data protection and their commitment in ensuring an effective level of protection of personal data [45]. On the other hand, it can foster the safeguard of privacy and data protection rights [35] in the case of potentially privacy intrusive projects and services, because it requires the controller to systematically consider the intended data processing, the associated privacy risks and the measures to be taken to mitigate these risks from the very outset of its activities [45]. Accountable organisations should embrace DPIAs as part of their overall risk management practices. Unfortunately, today there is a lack of tool support for organisations to perform DPIAs of cloud services.

In this paper, we present the design of a Data Protection Privacy Impact Assessment Tool (DPIAT) developed as part of the EU funded Cloud Accountability (A4Cloud) projectFootnote 1. The tool considers a number of information sources from which cloud specific risks and existing countermeasures can be collected and evaluated, in the process of supporting impact assessments for projects considering processing personal data in the cloud. We also propose updated DPIA questionnaires with respect to existing standards and recommendations, building on the expertise of experts from different disciplines from legal research to information security and risk management and to user experience design.

The remainder of the paper is organised as follows: we discuss related work in Sect. 2. We describe the rationale and approach to construct the proposed DPIA based on legal and socio-economic considerations in Sect. 3. Our approach consists of three steps: (1) conduct a pre-assessment to determine the need for a fully-fledged DPIA (see Sect. 3.2); (2) conduct the full DPIA if warranted by the previous step (see Sects. 3.3 and 3.4); and (3) perform a risk-based comparison of potential cloud service providers (CSPs) (see Sect. 4). The DPIA takes a form of a dynamic questionnaire, which aims to collect information from the user about the project under evaluation and its organisational practices. The risk evaluation of potential cloud solutions takes into account some information collected in DPIA and the implementation status of security controls by the CSP. Section 5 presents the DPIA tool design and its dynamic questionnaire to collect information about the project under evaluation and organisational practices, and its automation of steps 1–3 above. The tool produces a report containing several privacy indicators and risks based on the filled questionnaires and the selected CSP.

2 Related Work

Privacy impact assessments are already being rolled out as part of a process to encourage privacy by design [22]: in November 2007 the UK Data Protection Authority, the Information Commissioner’s Office (ICO) launched a PIA process (incorporating privacy by design) to help organisations assess the impact of their operations on personal privacy. This process assesses the privacy requirements of new and existing systems; it is primarily intended for use in public sector risk management, but is increasingly seen to be of value to private sector businesses that process personal data. Similar methodologies exist and can have legal status in Australia, Canada and the US [39]. The methodology aims to combat the slow take-up to design in privacy protections from first principles at the enterprise level. Usage is increasingly being encouraged and even mandated in certain circumstances by regulators, as considered further in the following section.

The role of a risk-based approach in data protection has been considered by a number of parties, including: as an assessment of the relative values of such an approach [4]; modifying the original OECD data protection principles to take this into account [29]; analysing the relationship with accountability [18] and recent regulatory analysis [1, 7].

In terms of automation within the privacy impact assessment process, there are a few systems that have attempted this in various contexts, which we shall consider further below.

In Canada, the Treasury Board Secretariat provided in 2003 an e-learning tool for government employees interested in learning more about privacy and PIAs and how to complete them [30]. Furthermore, a new self-assessment tool, aimed at Small and Medium Enterprises (SMEs), was launched in Canada in May 2011. It was developed jointly by the Federal, Alberta and British Columbia privacy commissioners’ and is a detailed online questionnaire that helps organisations gauge how well they are protecting personal information and meeting compliance standards under Canada’s private-sector privacy law on both federal and provincial levels.

The US Department of Homeland Security (DHS) employs a PIA tool called the Privacy Threshold Analysis that helps users determine whether a PIA is required under the E-Government Act of 2002 and the Homeland Security Act 2002 [42]. In the UK, the PIA Guidelines provide a number of screening questions to help users decide whether a Full-Scale PIA or a Small-Scale PIA is warranted. The Guidelines also include a number of questions for a privacy law compliance check, and a Data Protection Act (1998) compliance check. Templates are also included within the Guidelines for Data Protection compliance and the Privacy and Electronic Communications Regulations (PECR) [22].

Most of these PIA tools are based upon a simple “decision-tree” approach and are mainly procedure-based with coarse-grained granularity, offered as Web applications that do not take into account the cloud or any of its characteristics. The following are PIA automated systems that are worthy of particular mention:

  • A prototype decision support tool developed by the PRAIS project [20]. This tool enables personnel working with personal information to assess the privacy implications of information sharing actions dynamically and to share information and manage users’ consent and other participant needs.

  • HP Privacy Advisor (HP PA). It assesses risk and degree of compliance for projects that handle personal data and guides employees in their decisions on how to handle different types of data. HP PA uses a rule-based system to capture global privacy knowledge that is too complex to be easily captured via decision trees and to dynamically only present the relevant question to elicit privacy-relevant information about a project to the user [3133].

  • A privacy impact assessment tool prototype based upon ICO guidelines related to UK Data protection Act, allowing appropriate stakeholder views and input and using confidences within the knowledge representation to allow assessment of the value of the input as well as customisation of risk indicator values [38].

  • Avepoint Privacy Impact Assessment System [3] and TRUSTe Assessment Manager [41] help to automate the impact assessment workflow and to track the tasks involved in the question answering process by the multiple organisational roles. However they do not focus on cloud services, which intrinsically involve third parties and data transfers.

Decision support systems for PIAs in cloud computing are a new field and there are few systems available, although there is some work targeted at the areas of clinical decision applications, and life science enterprise solutions [5]. Prior work includes tools for cloud assessment: the Microsoft “Security Assessment Tool” designed to help find weaknesses in an IT security environment, privacy impact assessment of cloud environments [40] and decision support tools for cloud service provisioning [34]. In addition, several standards propose cloud security guidance: European Network and Information Security Agency (ENISA) [16], National Institute of Standards and Technology (NIST) [27], ICO [24] and Commission Nationale de L’informatique et des Libertés (CNIL) [11], CSA Governance Risk and Compliance (GRC) stack [10].

In the next sections we explain how our DPIAT builds on the body of knowledge and recommended practices mentioned above, adjusting the DPIA process and questionnaire to make it informative, user-centric and synthetic. It differs from previous work by focusing on a profile of SMEs wishing to move to the cloud. Additionally, our approach for assessing cloud risks is founded on information disclosed voluntarily by CSPs in the CSA Security, Trust & Assurance Registry (STAR).Footnote 2

3 Multidisciplinary Approach to DPIAs

The proposed GPDR provides for a series of accountability measures that aim to strengthen protection of personal data. DPIAs fall under the scope of those measures, aiming at mitigating risks resulting from certain processing operations. In practice, a DPIA screening consists of a set of questions allowing for multiple choice or free text answers, which help to assess the risks for personal data involved in the intended processing. Taking this into account, as well as the various examples of existing PIAs, this section proposes a DPIA questionnaire that is tailored to particular data protection risks associated with cloud computing services.

The DPIA tool incorporates two questionnaires. The first questionnaire (See Table 1) is a pre-screening (risk) assessment, which must be carried out to assess whether a full-scale DPIA is mandatory. The main questionnaire (See Table 2) is an extensive set of questions that comprises the full-scale DPIA [19].Footnote 3 The user of the DPIA tool will probably not be an expert in privacy and data protection. Therefore, the questions are formulated in a form and language understandable for lay users, in order to facilitate them in providing the right information [36]. We have targeted the DPIA tool to SMEs that typically lack in-house data protection experts and the resources to hire experts.Footnote 4 The tool thus should guide the user through the process as much as possible and provide meaningful feedback that helps the user to improve the privacy characteristics of their project and facilitate legal compliance with the data protection regulation.

Table 1. Data Protection Impact Assessment Pre-Screening Questions
Table 2. DPIA Screening Questionnaire

3.1 Methodology

Cloud computing has several characteristics [25] that may adversely impact the privacy of personal data, including distributed nature, multitenancy, third-party hosting, potentially long supply chains. A cloud can be spread across multiple jurisdictions with different degrees of data protection and no transparency about this [16]. The multitenancy leads to risks of isolation failure and insecure data deletion which can compromise personal data. Third-party hosting can cause the cloud consumer to lose control over personal data, especially when the CSPs are not transparent about the data processing performed, the data protection measures used and the data security breaches that occurred [16]. This becomes even more apparent in the case of complex supply chains formed from different CSPs. When developing the DPIA questionnaire (see Sect. 3.3) and the cloud adoption risk assessment model (see Sect. 5) we considered these cloud characteristics and their impact on data protection.

Given that the current data protection framework within the EU is under review and that the proposed GPDR still is under extensive negotiation at the time of writingFootnote 5, we had to decide whether the questionnaires would take into account the new DPIA framework proposed within the GDPR. Following discussions within the A4Cloud consortium, all partners agreed that the DPIA tool should be as future proofFootnote 6 as possible, and therefore we took into account both the Data Protection Directive (DPD) [14], as it is still the main legal instrument within the EU, and the drafts of the upcoming GDPRFootnote 7, rather than focusing exclusively on the legislation currently in force. The aim we set was to develop a tool that could be used effectively under both regimes.

The DPD provided us with the basic concepts and principles defining the current general data protection framework, while the GDPR provided additional concepts and concrete procedural guidelines for a practical DPIA questionnaire. In particular, the principles relating to processing of personal data, such as purpose limitation and data minimisation, derived from the DPD. Articles 6 and 7 of the current DPD, which deal with the legitimacy of data processing, gave grounds for an extensive set of questions aimed at mapping the user’s intention to the legal terms incorporated in the DPDFootnote 8. Furthermore, ICO’s “Code of Practice: [23], in conjunction with the PIA Guide of the Office of the Australian Information Commissioner (OAIC) [2] also proved to be useful tools in phrasing particular questionsFootnote 9. The ICO’s PIA Handbook [22] constituted the key inspirational instrument in drafting the questions related to the grounds of processing.

The GDPR (in the form of the European Parliament’s first reading), was used as the starting point for both questionnaires. Articles 32a and 33 provide the conditions under which a DPIA would be mandatory.

The analysis of the DPD, GDPR, and various DPIA and PIA [8, 4345] models are reflected in the construction of the questionnaire’s frameworkFootnote 10: the legal norms and the PIA/DPIA models utilisedFootnote 11 allowed us to develop the “Question” field (for the related “Explanation” one see Tables 1 and 2), while the sources for risks in cloud environments [9, 10, 16] were used to give a logical structure to the questionnaire and to weigh the answers provided by users. The “Answer” fields were developed to steer the user throughout the questionnaire according to a logic order that was formulated mainly through the examination of the DPD and the GDPR, while assessing the impact and the likelihood of an unwarranted event happening.

Many PIAs work on the assumption that the user is aware of certain basic data protection notions, such as ‘personal data’ and directly ask the user whether they process personal data and for which purposes and on what ground and so forth. Our DPIA starts from the premise that the user does not know these concepts and it therefore tries to, within limits, do a legal qualification of the user’s responses to simple terms. Based on the kind of information the user intends to process, the tool will ‘decide’ that it constitutes personal data, rather than having the user specify so in advance. The tool does provide feedback incorporating proper legal terminology where applicable.

The risk assessment, which provides the basis for probing the user about mitigation measures, is based on a series of documents (see Sect. 3.5 below) regarding the most commonly occurring incidents in cloud ecosystems; from a data protection viewpoint, these incidents provided valuable insights on the cloud’s potential threats to informational self-determination, on their likelihood and on their foreseeable impact. We conceived risk as the by-product of the interplay between the likelihood of an event and of the impact that event would have. We based the construction of the questionnaire on that conception, which is to say we used literature and reports to investigate, on the one hand, the most harmful privacy-related incidents, and on the other the most likely ones, all in order to develop a better understanding of what to ask when assessing the impact of an undertaking’s activities on data subjects’ privacy and data protection rights. Since the questionnaire aims to assess, grosso modo, how and how much a cloud user’s undertaking deviates or could deviate from the physiology dictated by data protection norms (as embodied currently in the DPD and for the future in the GDPR), and the impact of its activities on data subjects, it seemed proper to consider, amongst other prominent factors, the most likely and/or the most harmful incidents in cloud environments. Based on these considerations, we formulated questions embracing the notions of risk and likelihood in an intelligible manner for the tool user; for instance, the incorporation of the question: “How severe do you deem the consequences, in case you process outdated information for the individuals it refers to?” forms a clear example on tool’s underlying perception on the notion of impact, while a question such as “For how long do you store the information you are dealing with?” captures the related perception on the notion of likelihoodFootnote 12. The situations that are most likely to threaten individuals in the cloud or that, if they occur, would harm individuals the most, provided a useful list of the risks to be incorporated in the tool. Determining their impact and likelihood turned out not to be straightforward, though. Due to the lack of available and sufficiently targeted metrics, the likelihood parameter was inferred through the review of several documents issued by public bodies tasked with the safeguard of the rights to privacy and data protection or dealing with information security, for instance [11, 12, 17, 26] among others. The impact parameter, on the other hand, is historically hard to define when correlated to the notions of privacy and, albeit to a minor extent, to the one of data protection: as it has been noted by prominent doctrinal sources, they appear “to be about everything, and therefore […] to be nothing” [37]. Moreover, harms deriving from privacy and data protection violations are hardly quantifiable in that they are inherently linked to other rights, whose infringement causes the starkest impact on data subjects [37] – “a cluster of related activities that impinge upon people in related waysFootnote 13. Hence, an ontological definition of the impact deriving from a data privacy violation appears to be hardly feasible in the tool’s contextFootnote 14, aside, of course, from what can be directly inferred from the relevant regulations. We have therefore made reasoned assumptions about potential impacts.

It is important to stress here that this process could not capture the whole of the relevant law, which is far too complex, lengthy and granular to be represented in the tool. Qualitative decisions had to be made about which legal norms should be included, and at what level of detail. In addition, framing the questions and devising explanations of their meaning lost even more detail and richness of meaning. The version of the legal norms embodied in the tool is thus only a partial summary of the law’s requirements in this area, shaped to the needs of the tool. This means that the tool cannot be relied on to identify all potentially applicable legal obligations, and that its risk assessment outputs are by definition not fully comprehensive.Footnote 15

Despite the existence of several PIA/DPIA models which deal with traditional cases of processing, there is hardly a sufficient number of cloud-tailored DPIA models, especially when considering the growing importance and pervasiveness of the cloud computing model in the market and the differences that run between traditional IT environments and the cloud. ENISA’s recommendations [16] constituted, though, a helpful methodological tool in identifying and evaluating risks on the data protection rights. Also, ENISA’s framework for Cloud Security Incident Reporting [17] formed the key element for the development of the evaluation scheme we propose. Several other scholarly publications [26] have been consulted for targeted guidance on particular topics in order to articulate cloud-relevant questionsFootnote 16.

3.2 The Pre-assessment Stage

The pre-assessment stage includes a set of seven questions, fully presented in Table 1. It aims to identify whether the processing operations to be undertaken can be perceived as potentially risky to the protection of personal data of the individuals and as such trigger the full-scale DPIA when this is the case. It initially assesses whether the information s/he deals with constitutes personal data or not, and then evaluates the kind of information processed, its sensitivity, the purposes of the processing, the actors involved and the extent to which the information is likely to be diffused. Our purpose was mainly to provide the user with a very short and incisive quick-scan to assess the presence or the absence of some general factors that indicate the use of personal information, e.g. the very qualification as personal data of the information dealt with by the tool’s user, or the presence of sensitive data amongst it.

3.3 The Assessment Stage

The (conditionally) following full-scale DPIA includes 50 questions (see Table 2 for an excerpt and [19] for the full version including explanation of implication of each answer option). The questions are grouped into to five (5) topical areas (the key inspirational document which enabled the taxonomy of these topical areas was [28]), which refer to: (1) the type of project, (2) the collection and use of data, (3) the project’s storage and security policies, (4) data transfers, and (5) cloud specific issues. The aim of this set of questions is to assess how the interactions between the subjects that perform the DPIA and CSPs affects data subjects’ rights to privacy and data protection.

Each question has several possible suggested answers (single selection or multi-choice), avoiding open questions, which are hard to process automatically. While answering some questions the user can get guidance from the DPIAT (see Sect. 5) on how to address the privacy issues related to the specific answers. In particular, questions 35 and 39 cover respectively a set of privacy and security controls supporting data protection; this helps the user document existing controls and to understand which others could be implemented.

3.4 Evaluation of the Results

Each question has a formula for computing the privacy impact score based on its answer and a weight prioritising the importance relative to other questions. For example, the Question 4 in Table 2Are you relying exclusively on consent in order to process information of individuals?” has the following possible answers:

  1. (a)

    Consent is given directly by the individual by a statement (e.g. by a consent form)

  2. (b)

    Consent is given directly by the individual by an affirmative action (e.g. by ticking a box)

  3. (c)

    Consent has been obtained implicitly by the individual (e.g. by merely use of the service or inactivity)

We assign the value for the privacy impact score for the answer to this question using the following formula: If option ‘a’ then the score is 0, Else if option ‘b’ then the score is 1/4, Else if option ‘c’ then the score is ¾.

Intuitively, the option ‘c’ would have a bigger impact on privacy than option ‘b’ and ‘a’ so the score is chosen to be proportional to the perceived impact. We compute the final privacy impact score (FI) taking into account the answers to all the questions:

$$ FI = \frac{{\mathop \sum \nolimits_{i}^{N} s_{i} \alpha_{i} }}{{\mathop \sum \nolimits_{i}^{N} \alpha_{i} }} $$
(1)

Here N is the number of questions in the DPIA questionnaire, \( s_{i} \) is the score for the answer to the question i; and \( \alpha_{i} = 1 \) if the question i is answered and \( \alpha_{i} = 0 \) otherwise.

In addition, we associate the questions with several privacy indicators, capturing different privacy aspects: data sensitivity, compliance, trans-border data flow, transparency, data control, security, and data sharing. For example, the answer to the question above influences the data control and transparency indicators. Some of the indicators can enhance privacy (compliance, transparency, data c aurity), while the others diminish it (data sensitivity, trans-border data flow and data sharing). Therefore, the privacy indicator scores will be either proportional to the privacy impact scores of individual answers or inverse. So in the example above a higher score for the answer (option ‘c’) implies less data control and transparency.

We compute the final privacy indicator score for the indicator j (\( FI_{j} \)):

$$ FI_{j} = \frac{{\mathop \sum \nolimits_{i} s '_{ij} \alpha_{i} \beta_{ij} }}{{\mathop \sum \nolimits_{i} \alpha_{i} \beta_{ij} }} $$
(2)

Here \( s '_{ij} = s_{i} \) if the indicator j negatively affects privacy and \( s '_{ij} = 1 - s_{i} \) otherwise; \( \beta_{ij} = 1 \) if the answer to question i impacts indicator j and \( \beta_{ij} \) = 0 otherwise. The ratio \( \sum\nolimits_{i}^{N} {\alpha_{i} /N} \) represents the coverage of the questionnaire and indicates the reliability of the indicators.

Finally, we define the overall privacy impact level and privacy indicator levels for the assessment by translating correspondingly FI and \( FI_{j} \) to a uniform qualitative scale: Low < Medium < High and use color-coding to facilitate the presentation: Low → Green, Medium → Yellow and High → Red.

In order to provide users with actionable guidelines, the DPIAT final report contains an additional section that delivers textual guidance generated according to the user’s answers. Far from being considerable as legal advice – as the tool specifically disclaims – the section is still able to make the tool’s user focus on specific privacy and data protection-related issues s/he might have overlooked. For instance, when a user indicates that data protection is not considered from the outset of the assessed project’s development, the section highlights the importance of the concepts of Data Protection by Design and Data Protection by Default.

3.5 Discussion

Under the GDPR, as amended by the outcome of the European Parliament’s first reading, there is a trend to make DPIAs compulsory when the processing operations of controllers are likely to present specific risks for rights and freedoms of data subjects (Article 32a of the Parliament’s text Respect to Risk). This approach seems to confirm the importance of DPIAs to protect data subjects’ rights and freedoms: this meant for us embedding in the DPIA process the concept of risk analysis introduced in the earlier stated Article 32a of the European Parliament’s amended text.

As to the first area of questions relating to the type of project undertaken by the tool’s user, our aim was to frame both the kind of activity performed by the CSP’s client and the aim of that activity. We considered the fact that a controller could handle personal data (for instance, the controller may obtain information such as the name and e-mail address of users through online subscription forms) for a number of different reasons and aims (e.g. commercial purposes) Therefore, we decided to include two separate inquiries: one regarding the activities through which data is processed, and another regarding the purpose of the processing.

The second area of questions regards the collection of the information, the usage that processors make of that information and the means with which personal data is handled. This section draws heavily from the basic principles of both the DPD and the GDPR. For instance, it attempts to discover whether there appear to be solid, legitimate grounds for processing, identify the main risks of non-compliance with the data protection principles and assess the tool user’s plans for compliance with the rights of the data subject sanctioned by law.

Storage and Security (deletion includedFootnote 17), moreover, is considered a third area, which deserves specific consideration, especially in relation to the traits of Cloud Computing.

The investigation we propose was developed according to an “individual-centric approach”, which tried to deepen the level of protection accorded to data subjects, irrespective of who (either CSPs or their customers) exerts concrete control over the particular aspect considered: that is to say, we considered it more useful to ask SME users (and individuals using the tool) questions pertaining to the CSPs’ areas of controlFootnote 18, accepting the chance they might not know the answer to our inquiry, in which case the user simply refrains from answering. Leaving questions open provides a less ‘accurate’ assessment, but still provides guidance. Users can also return to the questionnaire after obtaining answers to questions they cannot answer from others to provide a more complete picture. The tool thus is not a one-way street, but can be used iteratively.

A major concern we had related to the “updatedness” of the information dealt with by the tool user. The questionnaire includes two questions regarding the foreseen negative consequences of the outdated information processed by the tool user’s undertaking; specifically the questionnaire addresses the consequences of outdated information about individualsFootnote 19 and how such outdated information can lead to regulatory liabilityFootnote 20. Whether or not outdated information may result in civil or criminal liability, however, is outside the scope of the DPIA. An individual-centric approach has also been adopted for the fourth set of questions, which relates to the transfer of information. This is because transferring information is controlled by the law to attempt to limit the risks that the data subjects are subject to by prescribing conditions for data transfer. Furthermore, due to the target of the DPIA tool, this class of inquiries caters for the possibility that the tool’s user does not possess an adequate level of knowledge to answer all questions. Much like with the third set of questions, we considered the possibility of a lack of answer appropriate.

The final set of questions refers exclusively to cloud computing services. Given the complexities of cloud computing technology, it was a challenge to formulate those questions in an understandable language for an ordinary user. Each deployment model has various ramifications which are not necessarily known in the first place to the user of the DPIA tool who is to decide whether to opt for a particular cloud computing service or not.

It is important for the users of a cloud service to know how to secure the information they process within the cloud environment. Taking that into account, the cloud relevant questions aim at ascertaining the level of exposure to risk that the user may have by virtue of using a specific type of cloud service. Two major aspects are important to establish in this regard. Firstly, it is important to know whether the cloud service used by the user of the DPIA tool is public, and thus shared with third parties, or private, and thus solely used by the user. Secondly, it is important to establish what the user utilises the cloud service for.Footnote 21

The inclusion of a specific part of the questionnaire targeted only to the cloud environment serves as an enabler for the applicability of the DPIA tool to a non-cloud setting as well, in an attempt to ensure that the DPIA Questionnaire remains future proof so far as technological change is concerned. This technology neutral approach enables the application of the tool to future Internet services. If the cloud-relevant questions are removed, the questionnaire can potentially be used to assist in achieving compliance with the legal framework irrespective of whether the assessed undertaking operates in the cloud or not.

Future proofing the tool in terms of its legal content is more problematic. Even once the GDPR has been agreed and becomes law, the content of the law will not be static because laws are regularly amended. More challenging is that the meaning of legal provisions develops and changes over time, in response to court decisions about specific sets of facts and policy decisions and guidance issued by regulators. For this reason a mechanism will need to be developed to review and update the legal content of the tool at appropriate intervals to ensure that it does not become dangerously inaccurate.

4 Cloud Adoption Risk Assessment Model

We employ the Cloud Adoption Risk Assessment Model (CARAM) to evaluate the risks resulting from adoption of cloud services (see [6] for full details). CARAM is designed to assist (potential) cloud customers assess all kinds of risks—not only privacy-related—that they face by selecting a specific CSP. The results of CARAM risk assessment constitute a part of the DPIA report (see Sect. 5).

CARAM is a qualitative deductive risk assessment model based on ENISA’s cloud risk assessment model [16] and the Cloud Security Alliance’s (CSA) Cloud Assessment Initiative Questionnaire (CAIQ).Footnote 22 Like in [16] we conceived risk as the by-product of the interplay between the likelihood of an event and of the impact that event would have. CARAM complements ENISA’s approach to take into account cloud customers’ assets (modelled based on the list of assets from the ENISA report) and the implementation status of security controls in CSA STAR public registry to perform a relative risk assessment of (potential) cloud solutions. This can help cloud consumers to determine which CSPs have acceptable risk profiles for security, privacy, and quality of service.

Most of the entries in STAR use a template that provides 148 questions grouped into several control areas covering the state of implementation of various security controls. We have categorised the answers of more than 50% of the CSPs from the STAR—including several big players—into the following categories:

  • Implemented: the control is in place

  • Conditionally Implemented: the control can be implemented under some conditions

  • Not Implemented: the control is not in place

  • Not Applicable: the control is not applicable to the provided service

Since the answers were given in a verbose free text form instead of simple Yes/No and the number of answers was big (circa 9000) we used supervised machine learning algorithms provided by the WEKA tool [21] to automate this classification.

We used these answers together with other information from the ENISA report to calculate the vulnerability index for different risk scenarios (see Table 3 for the list of risk scenarios). The vulnerability index is defined to be proportional to the number of implemented security controls that mitigate vulnerabilities involved in the risk scenario. It is later used to adjust the probability of the risk scenario using the values provided by the experts from the ENISA report as a baseline. Eventually, the risks are grouped into three categories: risks for security, privacy and service: to provide a high level risk profile which is easier to interpret. Based on these results the customers can compare different cloud solutions and select those satisfying their risk tolerance.

Table 3. ENISA’s list of risk scenarios and their categories

Figure 1 displays the level of exposure (vulnerability index) for privacy risks among the analysed CSPs (similarly, the vulnerability index can be computed for security and service risks). According to these results, the lowest vulnerability index for a cloud solution is 0.011 while the vulnerability index for the highest risk cloud solution is 0.491. Although the later index is more than 44 times higher than the former, it is still less than 0.5. This means that the likelihood value for even the highest risk cloud solution in STAR will be reduced significantly, and become “LOW” according to the risk matrix from [6]. This is expected since all analysed CSPs report that they have implemented at least 70% of the controls from CAIQ.

Fig. 1.
figure 1

Privacy vulnerability index for 44 CSPs in STAR (The actual CSP names were omitted for confidentiality reasons)

In this approach, we rely on the self-assessment provided by the CSPs since it is not possible to verify independently the status of each control: only three of the analysed CSPs had a third party certification from CSA when we performed the data collection. Certification report details are not available to the public.

5 DPIA Tool and Report

DPIAT’s web interface enables an easy and user friendly experience of a questionnaire about a perceived complex issue. Screenshots are shown in Figs. 2 and 3. The landing page asks the user whether they would like to start with pre-screening questions to determine if they need to answer the full-scale questionnaire (screening questions). The full-scale assessment questionnaire (see Sect. 3.3) contains a set of a bit more than 50 questions displayed in five stages categorising them. The stages are Type of Project, Collection and use of information, Storage and security, Transfer of information, and Cloud specific questions. During the completion of the questionnaire, the user is provided feedback on the answers and choices they make. This includes, for instance, pointing out that the chosen option increases the privacy risk, thus subtly suggesting the user to reconsider their choice. The tool does not judge, but is rather aimed at stimulating the user to think about their project from the perspective of privacy and data protection.

Fig. 2.
figure 2

DPIAT initial screen

Fig. 3.
figure 3

DPIAT tooltip displaying information about the selected options

The output is a report including the data protection risk profile, assistance in deciding whether to proceed or not, and suggested mitigations. The report contains three sections. The first, “Risk Related to Your Proposed Application”, is based on the answers to the questionnaire and contains the overall data protection impact score and several privacy indicator scores (see Sect. 3.4) namely, risks related to Sensitivity, Compliance, Trans-border Data Flow, Transparency, Data Control, Security, and Data Sharing (see Fig. 4). The second part, “Risk Related to the selected Cloud Provider”, displays the risks based on the security controls used by the CSP (see Sect. 4). It contains the 35 ENISA [16] risk scenarios with their associated scores. The last section contains additional information related to the GDPR article 33. It also explains to the user that DPIA is meant to be an ongoing process and guides the user on the general phases of the assessment. The final decision of whether to proceed with the desired transaction (which triggered the DPIA in the first place) is up to the user or his manager (i.e. an approver in case the result of the DPIA is high risks).

Fig. 4.
figure 4

DPIAT output report - details of first section

The implementation of the server-side application and web-service (Questionnaire Provider) is written in Java. This application provides access to the Questionnaire data and also provides a rules-engine that helps determine the flow of the questionnaire for the client as well as providing further details and information based on the user’s responses to the questions offered. The rules engine is based on the DroolsFootnote 23 library. The client-side application is implemented using HTML5 and JavaScript and utilises a number of open-source libraries to simplify the underlying business logic layer. We use RESTfulFootnote 24 API as a transport layer and JSONFootnote 25 as the data-interchange format.

During the development of the tool, testing on how the user experience should look like was conducted. The tool was presented to several users including partners in HP Privacy Office and the feedback received was incorporated in the final implementation of the tool. Positive feedback was given on the amount of guidance provided for the user in terms of information text for both the questions and the answers. Also, dividing the 50 questions into five stages was considered a good impact on the tool’s usability. Additional testing was carried out with privacy researchers from a variety of interdisciplinary backgrounds, and further changes are planned to the tool in respect of this feedback. In particular, there was a strong perceived need for more explanation about both how the tool derives its recommendations and about how these recommendations should be interpreted and acted upon.

6 Conclusions

We have presented a contemporary Data Protection Impact Assessment methodology focusing on the use of cloud services, supported by a tool that aims at helping users to understand privacy risks of their intended project and help them consider means to mitigate these concerns. The DPIAT is based on existing PIAs, legal sources and specific cloud risk scenarios. It is aimed specifically at SME users that typically have limited knowledge about privacy and data protection and have restricted resources to consult experts in the field, yet will have a legal obligation (once the GDPR comes into effect) to conduct a DPIA. Although the tool does not incorporate advanced intelligence to help the user, we believe that the way we have structured the issues, framed the questions and provide situation specific feedback and a crude likelihood/impact score, actually will help the target audience understand the importance of privacy and data protection in their context and help improve legal compliance.