Keywords

1 Introduction

The General Data Protection Regulation (GDPR) is pushing data controllers and processors to review and rethink their procedures. According to the Regulation, data controllers and processors need to ensure data subjects (i.e., whom the personal data are about) that the processing is lawful, fair and transparentFootnote 1.

This paper is about the last of those principles, transparency. Differently from lawfulness and fairness, which express legalistic concepts, transparency is a socio-technical concept: intended socially, it means to empower data subjects to have the means to know whether their personal data are lawfully and fairly processed, and how; intended technically, means that ways to achieve transparency should be enforced in existing systems whenever appropriate [1].

The interest in a technical implementation of transparency was not born with the GDPR. For example, it was already discussed in cloud computing to enforce accountability [3], and in this it shares a similar goal to the GDPR. Giving a full overview of the principle’s history is beyond the scope of this article, but one important observation is that, simultaneously with the entering into force of the GDPR, there exist already tools for enhancing transparency. They are called Transparency Enhancing Tools (TETs), system-independent apps dedicated to inform her/him about how personal data are handled by an online service she/he is accessing.

Can they help improve a system’s transparency according to the GDPR? The answer is unclear; as unclear is whether they can give, to who implements them, a presumption of compliance with the GDPR’s legal transparency principle. At least in part, this uncertainty is due to the nature of the GDPR. Its legal provisions are expressed in a way that admits several interpretations. As other regulations, the GDPR has been thought for a broad audience and to be technology independent.

Thus, discussing whether a certain technology, like TETs, helps systems in the task of providing transparency requires a methodology. In this paper we apply one: leveraging on a previous study of ours about transparency for medical data systems [26], we elicit a list of requirements from GDPR Articles and provisions that talk about transparency. Then, we select a few TETs among those recently presented in the literature and we discuss whether they implement the requirements we extracted from the GDPR. In so doing, we systematically analyse transparency in support to identify the GDPR concepts still in need of more development.

This work extends our conference paper [25]: we give more explanation to our methodology, and revisit our results by exploring other technical and legal aspects of transparency. In this extended version, we give focus to the process of eliciting requirements from the GDPR and automatically comparing them with technical requirements. We also give more context to our work by appending the full categorisation of the 27 studied TETs, and the complete list of technical requirements.

2 Transparency and the GDPR

Transparency is a transverse principle in the GDPR, that is, it is referred directly or indirectly in several Recitals and Articles, but there is not a clear characterisation of it in the law. For that, we have to review the Articles of the Regulation, and we did by following a four round approach (see also Fig. 1): 1. Selection; 2. Filtering; 3. Revision; and 4. Validation. These rounds were conducted as follows:

Fig. 1.
figure 1

Methodology for selecting transparency related Articles from the GDPR.

1. Selection. Two of this paper’s authors working independently made a list of Articles that, according to their understanding, were about transparency. Both authors had previous experience with transparency and TETs, so the expectation was that the combined knowledge covers the general perception of transparency in different technical domains.

2. Filtering. The two lists selected were compared and combined. One author at least reviewed all the Articles. Both authors defended their interpretation of transparency, agreed on a common understanding, and extracting categories of Articles covering that understanding, including those about properties and artefacts that support the implementation of the concept. The categories eventually selected by the authors are the following:

  1. 1.

    Concerning data subjects – Articles describing the knowledge that should be made available to the data subjects;

  2. 2.

    Concerning authorities – Articles describing the knowledge that should be made available to authorities (e.g., Data Protection Officers, or auditors);

  3. 3.

    EmpowermentFootnote 2 – Articles mandating the provision of means for the data subjects to react (e.g., rectification, and erasure);

  4. 4.

    Quality of transparency – Articles which qualify transparency and describe how information should be presented to data subjects (e.g., concise, easy to understand);

  5. 5.

    Certification – Articles which foresee certification as a means to demonstrate the service’s practices;

  6. 6.

    Consent – Articles commenting on the need for the data subjects to consent with usage and processing of data.

3. Revision. To check whether our selection is in line with the state of the art, we selected one work which is considered authoritative in the matter, the guidelines by the Article 29 Working Party [1], and looked into what Articles therein are referred as being about transparency. We did so in the following way. Two authors (but not the same pair that executed the Filtering to reduce the risk of selection bias) independently selected the Articles that, according to their interpretation, are in the guidelines mentioned to be related to transparency. Both reviewers produced a very similar list. We believe this happened because the guidelines are more explicit about their interpretation of transparency.

4. Validation. The lists from Selection and from Revision were compared. The comparison intended to highlight the relevancy of our selection of Articles by calculating how many Articles mentioned in the guidelines were covered by us (in first and second rounds). We also compared our list with the one presented by the German Standard Data Protection Model (SDM)Footnote 3 regarding the protection goals of transparency and intervenability.

2.1 Transparency in GDPR’s Articles

As a result, we compiled a list of selected transparency-related GDPR Articles (paragraphs and sub-paragraphs) that comprises 79 items. It can be found in Table 2. Our selection covers approximately 93% of the Articles in the guidelines. We consider our list sufficiently relevant. We comment here only on the Articles mentioned in the guidelines that we opted not to include in our study. Article 12.5 describes when the charge of a fee may (or may not) be applied when information is provided to data subjects regarding personal data. Even though this Article relates to transparency, it does not describe a technical feature of a TET or system. Article 20 describes the right to portability, which contains provisions on the characteristic of the information provided by transparency, and should be verified for compliance in every tool. Articles 25.1 and 25.2 are both regarding the implementation of data protection by design and by default. This concept is instead related to the security property of privacy. Hence those Articles were not selected in our list. However, we include Article 25.3, which foresees the use of certification mechanisms to demonstrate compliance with Articles 25.1 and 25.2. We understand that Article defends the right of data subjects to be aware of how their data are processed (in line with data protection principles), and as such, is in line with our interpretation of transparency.

Our selection does not contradict the list presented by the SDM, it is simply more detailed. The majority of Articles listed by the SDM are also considered in our selection. With the exception of Articles 5.1.(d), 5.1.(f), and 20—regarding accuracy of data, security of personal data, and portability of data. These Articles also contain provisions on the quality of the data provided by transparency, and should be verified for compliance in every tool. Article 40, referring to the design of codes of conduct for controllers and processors, which could hardly be accomplished through the use of TETs. And Article 42, on certification mechanisms, which are considered in Sect. 3.

2.2 Technical Requirements for Transparency

We match the selected GDPR’s Articles with a list of technical requirements for transparency presented in previous work from the authors [26]. Due to space limitations, in Appendix A we present a complete list of requirements to help the reader picture how they look like, but we do not give details on their specification and characteristics, we remand to the original work for full details.

To match the Articles from the GDPR and the technical requirements for transparency in medical systems, we developed a simplified parser based on natural language processing techniques.

Our process consists in (1) the analysis of the text corpora (2) extraction of corpus-based glossaries and parsing of the corpora, and (3) final adjustments.

We did not conduct any statistical analysis, nor part-of-speech tagging (techniques applied in more sophisticated natural language processing algorithms). Instead, we iterated a few times realising small adjustments in our glossaries, re-evaluating the results of the parsing and, whenever needed, manually adding or removing a match.

Our approach is indeed only possible as our glossaries are context-based, limited to the terminology found in the GDPR and our requirements. We are aware of existing efforts in interpreting and translating laws, regulations, and other legal documents (e.g., [2, 16, 30]). We do not mean to compete with them, but rather state that our parser, in the specific problem herein addressed, has given sufficiently accurate results.

Text Corpora Analysis. The first step was carried out manually. We first analysed the two text corpora: the Articles and provisions in the GDPR, and a set of technical requirements for transparency in the medical domain (see Appendix A). A text corpus is described as a “large body of linguistic evidence typically composed of attested language use”, but has been used nowadays for a wide variety of text collections [13]. Our set of requirements is not a text corpus in its typical meaning, as they are not composed by standardised terms. In this sense, our requirements constitute a text corpus in its modern interpretation: a text collection tailored to one specific domain. The GDPR, on the other hand, represents better a classic text corpus, as it is stable, well-established and composed by standard legal terminology.

We analysed the text corpora and familiarised with the differences between the terminologies, as one corpus comprises technical terms and the other legalistic jargon. The terms found in one corpus were interpreted and linked to terms in the other. As a result of this task, we highlighted potential connections between requirements and GDPR Articles and established a preliminary list of matches.

Extraction of Corpus-Based Glossaries and Parsing. To ensure the consistency of our matching procedure, we automated the comparisons by extracting possibly-equivalent terms and structuring them in glossaries. Terms found in the GDPR were matched to their equivalent technical terms, found in the list of requirements. The knowledge base needed for realising this step came from revisiting the preliminary list of matches, from where we extracted the key-terms that seem to have triggered each match. We identify matches according to a few textual elements present in the GDPR Articles: the information to be provided to the data subject; the rights the data subject must have; the techniques described in the Articles; and few selected keywords. We organised each of these in hash tables that represent, in a way, simplified corpus-based glossaries (see Table 1).

Table 1. Glossary of equivalent terms (GDPR terms on the left, and Technical terms on the right). Information between brackets are contextual and do not constitute the key-term.

Some key-terms were intentionally marked as not applicable as they brought almost no contribution to the final list of matches. For example, the term “transparency” found in Article 5.1(a) “Personal data shall be processed lawfully, fairly and in a transparent manner in relation to the data subject (‘lawfulness, fairness and transparency’)”. This Article is comprehensive and should relate to every single requirement from our list, as it mandates data to be processed transparently. To ensure our list had only the most meaningful matches, we decided to explicitly mark this term as not applicable (N/A). The same applies to the term “shall not apply”, which is present in Articles (or paragraphs and sub-paragraphs) describing an exception to another Article. In other words, it presents the circumstances in which our requirements do not need to be implemented. Hence, any match with an Article of this sort is likely to be a false-positive. To avoid this, we marked the term as not applicable. It is important to note that terms marked like this are not the same as terms absent from our glossaries. While the first will force a mismatch between a GDPR Article with that term and any possible requirement in our list, the second will just be disregarded when computing the matches.

The matches are based on an automatic parser. Initially, it parses each GDPR Article to identify all the key-terms they contain. Then the requirements are parsed, searching for the ones which present at least one equivalent term for each key-term found. Our criteria for a match between an Article and a requirement is that all key-terms from the first are represented in the second. The matching procedure is abstracted in Algorithm 1.

figure a
figure b

The computation of matches is realised in steps (as shown in Algorithm 2): we run the same parsing algorithm for each glossary, and later we merge the results of each comparison in one final list. By doing so, we maintained the matching criterion decoupled, which simplified the process of re-evaluation of the terms and their possibly-equivalents. It also helped in balancing the asymmetry between GDPR Articles and our technical requirements, as the Articles are generally more verbose and encompass too many key-terms. Separating the terms into four glossaries ensured our criterion is not too restrictive, and that Articles can be matched by one or several categories of textual elements.

Final Adjustments. After computing the matches based on the glossaries of terms, we reviewed the resulting list and compared with our preliminary list. Each match was analysed, but we focused on the discrepancies between the lists. For those, we semantically interpreted the Article and requirement matched to understand the context in which the key-terms appeared, and whether or not they had the similar meaning. We conducted this procedure in a peer review manner. The matches were adjusted accordingly. We highlight here a few of the manually adjusted matches.

According to our initial list, requirement 111.2 on information about how data are stored and who has access to them, should match with Article 15.1(c), which describes the rights of the data subject in obtaining from the controller the recipients of personal data. The requirement and the Article have a clear relation. However, it was being disregarded by our parser as the Article contains the key-term “third countries” which does not appear in the requirement. As this key-term is responsible for several other well-fitted matches, we opted for adjusting this exception manually. Similarly, the matches involving requirement 111.18, on describing the ownership of the data, had to be adjusted. We understand that describing the ownership means to clarify what means to be the owner of a piece of data. In other words, to inform and describe the rights the data subjects have regarding the control of their data. In this sense, requirement 111.18 also relates to Articles 13.2.(c), 14.2.(c) and 21.4. Our parser captured a few relevant matches for this requirement, but not all of them. We manually added those remaining.

Some other matches were also considered for adjustments, as they were not present in our preliminary list, but were left untouched after a closer semantic analysis. For example, requirement 111.7, about describing procedures and mechanisms planned in cases of security breaches, matched to Articles 33.3 and 33.5, and requirement 111.15 about informing on who has the authority to investigate any policy compliance, which is also matched with 33.3. These Articles describe the information to be provided to data subjects in case of a data breach. Initially, the match was not considered as the requirements are ex ante (information to help the users understand what will happen to their data beforehand), and the Articles are, in a sense, ex post, as the data breach already happened. However, if the information described in the requirements is made available beforehand, in the event of a data breach, it will facilitate compliance with Article 33 from the GDPR. For this reason, we keep these matches.

Similarly, requirements 221.2,5,8 are matched with Article 5.2 of the GDPR (controller shall be accountable and responsible for demonstrating compliance with the lawfulness, fairness and transparency principles). The requirements, at first glance, seem unrelated to the Article, and to each other. However, the three requirements demand the users to be presented with evidence of security breaches, of recovery from them, and of permission history. As evidence, by definition, is a piece of information or data that is used to prove or disprove something, we understand they contribute to demonstrate compliance. Even though these matches were not identified in our initial list, we decided to keep them. Our final list of matches is shown in Table 2.

Table 2. Final list of matches between GDPR Articles and technical requirements. 72% of the requirements are matched (26 out of 36). (Table originally presented in [25])

3 Transparency and Technology (TETs)

At least at an intuitive level, the most natural technology for transparency is represented by TETs. According to [18], TETs are tools to“make the underlying processes [of personal data or a subject] more transparent, and to enable data subjects to better understand the implications that arise due to their decision to disclose personal data, or that have arisen due to choices ‘made in the past”’. This cited work already provide an extensive list of tools. We also reviewed other survey works about TETs and compiled a drafted list of such tools [4, 7, 17, 22, 31].

Besides, we browsed the literature for “transparency enhancing tools”, looking for works that may have referred to the tools indirectly or within text. The search included works published since 2014, the year the GDPR started to be strongly supported by the European ParliamentFootnote 4. We selected 27 tools which can be potentially linked to the transparency principle. We categorised them using TETCat [31], a methodology to classify TETs according to their properties and functionalities, for instance, such as among others, assurance level (not trusted, semi trusted, or trusted), the application time of the tool (ex ante, ex post or real time) and interactivity level (read-only or interactive).

Our categorisation is summarised in Appendix B and described in the next paragraphs. Its full version is made available in [24].

Assertion Tools. Tools are classified as the assertion type whenever the correctness and completeness of the information they provide cannot be verified (not trusted), and they can only provide information on the controller’s alleged processing practices. The TETCat does not further distinguish assertion tools, so tools of this type have diverse goals.

Examples of assertion tools are third-party tracking blockers, e.g., Mozilla LightbeamFootnote 5 (ML), Disconnect meFootnote 6 (DM), and Privacy BadgerFootnote 7 (PB); and tools that educate users on matters related to privacy protection, e.g., Privacy Risk Analysis (PRA) [5], Me and My ShadowFootnote 8 (MMS), Privacy ScoreFootnote 9 (PS) and Access My InfoFootnote 10 (AMI).

Awareness Tools. This is the first type of tools providing information verifiable for completeness and correctness, for two assurance levels (i.e., trusted and semi trusted). Awareness tools provide ex ante transparency, and interactivity level of read only. Tools in this category help the user becoming aware of the privacy policy of the service provider but do not provide the users with controls over the processing of data. Examples of such tools are machine readable or interpreted policy languages, e.g., Platform for Privacy Preferences ProjectFootnote 11 (P3P). Another example of an awareness tool is the Usable Privacy ProjectFootnote 12 [20], which automatically annotates privacy policies. Finally, tools providing certification seals and marks such as the European Privacy Seal (EuroPriSe) [6] or the TrustArc (TArc) [27] are also examples of tools in this category.

Declaration Tools. Only one tool falls under this category: PrimeLife Policy Language (PPL) [10], which is similar to awareness tools, comparable to the P3P tool, but offers some level of interactivity.

Audit Tools. Audit TETs present users with ex post or real time transparency. Tools in this category include those that allow for access and verifiability of data, but do not provide means for the users to interact and intervene with the data processing (i.e., read only tools), such as the Data TrackFootnote 13 (DT) [9] and Personal Data Table (PDT) [22]. Another tool under this category is The Blue ButtonFootnote 14. Which is an initiative to standardise the right to access personal medical data in the USA, and display a logo stating that users are allowed to visualise and download their data.

Finally, the Private Verification of Access (PVA) [11] proposes a scheme for a posteriori access control compliance checks that operates under a data minimisation principle and provides a private independent audit. This tool also falls under the audit tools category.

Intervention Tools. These tools allow users to verify properties about the processing of their data as well as to interact and control the terms of data collection and usage. Examples are: the Privacy Through Transparency (PTT) [21]—supporting Break-the-Glass (BTG) policies; and Privacy eSuiteFootnote 15.

Remediation Tools. According to the TETCat these tools comprise functionality to exercise control over data collection and usage, and also to modify and delete personal data stored by a data controller. Tools belonging to this category are, for instance, PrivacyInsight (PI) [4] and GDPR Privacy DashboardFootnote 16 (GPD) [19]—both privacy dashboards; and openPDS (oPDS) [14], and MeecoFootnote 17 (Mee) which are examples of data vault/marketplace applications.

4 TETs for the GDPR

Our goal is to select from our list of TETs, those which can presumably help achieve compliance with the provisions of the GDPR. We do this indirectly, by selecting those TETs which satisfy the requirements for transparency that we elicited from the analysis of Articles and Recitals of the GDPR

Methodology. The selected TETs have been compared against the technical requirements for transparency, in search for matches. A match is when a tool satisfies one or more requirements. Here, we first pre-select tools and requirements by their application time, distinguishing between ex ante and ex post/real time. Then we compared TETs and requirements one by one. We did this work manually, but having categorised TETs helped us to implement this task more systematically.

4.1 Comparing TETs and Requirements: Results and Discussion

Table 3 summarises the findings (we have put in bold the requirements ex ante (1**), and in slanted those ex post (2**)). A full report of them may be found [24], where we expand the GDPR Articles into the paragraphs and sub-paragraphs relevant to this work.

Looking at the Table, two particular exceptions in this matching—exceptions with respect to what one would expect from the methodology we followed—that stand out and need a comment.

The first is concerning requirement 112.1 on the provision of mechanisms for accessing personal data. In the context of medical systems, data about the patients are typically generated by other users in the system. As a consequence, allowing these patients to access their data can be interpreted as pre-condition for them to anticipate what will happen to their data, hence ex ante. However, in the context of TETs, tools which allow for the access of personal data are considered ex post. We interpret requirement 112.1 and those tools as closely related, even if their application times do not match. The second is regarding certification seals, which we consider ex ante. Certification seals are tools which testifies that a system complies with a given criterion. If the criteria regards the processing of data, these seals can help a data subject to anticipate how their data will be processed. However, from the perspective of the system, when evaluated for the certification, the processing of data is already happening. For this reason, we accept the match between such tools and a few relevant ex post requirements.

In what follows, we comment on our findings.

Table 3. From [25]. Transparency Enhancing Tools (TETs), technical requirements, and the GDPR Articles they help realising (* added manually).

Requirements vs TETs: What Matches and What Does Not. Three requirements regarding terms and conditions seem not to be addressed by any TET: 111.1 on information regarding the physical location where data is stored; 111.4 on the existence of third-party services and sub-providers; 111.14 on clarifications of responsibility in case of the existence of third-party services.

We believe this information could be provided together with the terms and conditions of the service. Even though the tool provided by Usable Privacy Project (UP) aims at facilitating the reading of these, we did not identify tags for the requirements above. For this reason, we do not consider these requirements as addressed. There are other relevant developments on this subject, such as the CLAUDETTE projectFootnote 18, which makes use of artificial intelligence to automatically evaluate the clauses of a policy for clarity and completeness in the light of the GDPR provisions. Another relevant tool in this regard is the Me and My Shadow (MMS), which provides a functionality called Lost in Small PrintFootnote 19. It reveals and highlights the most relevant information of a given policy. We do not include those tools in our study as the first only evaluates the quality of the policy, without necessarily easing the understanding of its contents, and the second for only providing few selected examples of policies of popular services. Nevertheless, they indicate that this matter is already subject of attention. We expect to see a different scenario concerning tools for terms and conditions in the future.

We also observed a lack of tools covering technical aspects of data processing. For example, requirement 111.5 about informing how the system ensures data is not accessed without authorisation, and requirement 111.20 on evidence of separating personal data from metadata, are not addressed by any of the tools we studied. The reason for this is not clear, as other requirements about the use of specific security mechanisms (111.12), and how to protect data (111.13) also cover technical aspects and seem to be the subject of attention of TETs. We speculate this lack of attention may be due to the target audience, which in general has no technical education and would not value such information. Another possible explanation is that this sort of information is provided together with others, and we missed to identify them in our selected tools.

Finally, requirements regarding security breaches and attacks also seem to have gained less attention. They constitute the majority of requirements not addressed by any TET: 111.7, 211.1, 211.4, 221.2, and 221.8. As security breaches are unforeseen events, it does not come as a surprise that there are no tools for aiding the understanding of issues related to them. Nonetheless, it is important to notice that the GDPR reserves two Articles to provisions on personal data breaches (Art. 33 and 34), one of which is dedicated to describing how to communicate such matters to the affected data subjects. Being the health-care industry among the ones with most reported breaches, and being medical data in the top three most compromised variety of data (for more details, see results of the data breach investigation [28]), we consider this to be an area in need of further development.

TETs vs Articles: Which Suggests Compliance and Which Does Not. Only a few Articles from the GDPR are not related to any of the selected transparency tools: meaning that none of its paragraphs or sub-paragraphs is matched to a TET. These concern the Articles about data protection mechanisms and certification. Article 25 regards data protection by design and by default, and Article 32 has provisions on security of processing; both mention that compliance with such Articles may be demonstrated through the use of approved certification mechanisms referred to in Article 42.

Despite having included two certification seals in our list of TETs (i.e., EuroPriSe, and TrustArc), EuroPriSe’s criteria catalogue has not been approved pursuant to Article 42(5) GDPR. The reason is that they have not been accredited as a certification body pursuant to Article 43 GDPR yetFootnote 20. While for TrustArc, we cannot confirm it is an approved certification mechanisms, we did not find enough information about this matter.

A few transparency quality and empowerment related Articles are also not addressed by our selected tools. Article 12, for example, qualifies the communications with the data subject and states that it should be concise, easily accessible, using clear and plain language, and by electronic means whenever appropriate. In our understanding, this Article does not match to any specific tool because it is transverse to all of them. This Article has provisions regarding the quality of communications; all tools communicating information to data subjects should be affected by it. In [23] we discuss metrics for transparency which, in line with this reasoning, consider the information provided to final users “being concise”, or “being easily accessible” as indicators that transparency is properly implemented.

With regard to empowerment related Articles, while a few Articles do relate to some tools (e.g., Art. 17, 19 and 21), they are either partially addressed by transparency tools, or not addressed at all. In fact, empowerment and transparency are different properties [12, 26], and this may explain why only a few of those Articles are addressed by TETs. But at least with regard to Articles describing the rights of the data subject towards the processing of personal data (e.g., Art. 22, and 26), we believe policy, and terms and conditions tools could also address them, but we found no tool addressing those subjects.

There are developments in this topic of empowerment though [12]. In this work empowerment (referred to by the authors as intervenability) is discussed as a privacy goal, and it is compared to transparency. In this context, Article 12 relates to their requirement T4 and T5, and Article 17 relates to requirement I10. However, the full implementation of empowerment, as it requires providing ways for users to exercise their rights regarding personal data, may not be suitable for a TET. The analysis of the requirements proposed in [12] and their relationship with TETs falls out of this work’s scope.

It is important to notice that a few Articles which appear not to be covered by any TET, are not considered in this analysis because they do not match by key-terms with any of our requirements. We investigate two of them manually: Articles 11, and 9. Article 11 has provisions on processing which does not require identification. We consider this Article in our study as its paragraph 2 states that the controller shall inform the data subjects when it is not in a position to identify them. It also further states that in such a case, Articles 15 to 20 (on the exercise of data subject’s rights) shall not apply. In this sense, Article 11 describes a case when empowerment tools (related to Articles 15 to 20) are not required. It does not make sense to discuss the relationship of this Article and TETs in our list.

Article 9, on the other hand, has provisions on data subject’s consent for data processing of special categories of personal data, including data concerning health. Privacy eSuite tool (PeS) is a web-service consent engine specifically tailored to collect and centralise consent for the processing of health data. Hence, it is connected with Article 9. In the interest of completeness, we manually added this match in Table 3. However, PeS is a proprietary tool designed in line with the Canadian regulations. We found no means to determine to which extent this tool can help achieving the provisions in the GDPR.

Being consent described in the GDPR as one of the basis for lawful processing of personal data, the number of tools addressing this subject seems suspiciously low. This fact does not imply that medical systems and other services are currently operating illegally. We are aware that collecting consent for processing data is a practice. However, we are interested in tools designed to facilitate the task of collecting consent and to help users to be truly informed of the consequences of giving consent.

We investigated this more closely, among our findings there are mostly tools and frameworks aiding the collection of informed consent for digital advertisingFootnote 21. We also found mentions to the EnCoRe (Ensuring Consent and Revocation) project, which presents insights on the role of informed consent in online interactions [29]. The project appears finalised, and we found no tool proposed to address the collection of informed consent.

One could claim that tools proposed for terms and conditions, or privacy policies (e.g., P3P, PPL, and UP), can also help collecting consent. While this is a possible solution, special attention is required that the request for consent is distinguishable from other matters (as per GDPR Article 7). It is also important to note that consent to the processing of personal data shall be freely given, specific, informed, and unambiguousFootnote 22. Implicitly collecting consent for data processing is arguably against the provisions in the GDPR [29]. In that work, the authors discuss to which extent terms and policies are even read and understood. In this sense, consent is unlikely to be truly informed and freely given.

5 Related Works

To the best of our knowledge, only a few works discuss matters of compliance with the GDPR principles (i.e., [4, 12, 19]). In [12], the authors derive technical requirements from the international standard ISO/IEC 291000 and the GDPR. Even though in this work technical (international standard) and legal (GDPR) documents are used, those are not compared. The requirements studied in this work are instead extracted from these documents.

In [4] the authors propose a Transparency Enhancing Tool (TET) in the form of a privacy dashboard. To define the relevant features to be implemented, they derived eight technical requirements from the right of access presented by the GDPR, the previous European Data Protection Directive, and the Federal Data Protection Act from Germany. Similarly, Raschke et al. propose a GDPR-compliant dashboard in [19]. In this work, however, only four high-level features are extracted from the GDPR: the right to access data, obtaining information about involved processors, rectification and erasure of data, and consent review and withdraw. Both works extract requirements from data protection laws, but do not compare them with any other sources.

Four works review TETs [7, 15, 17, 31]. The work by Murmann and Fischer-Hübner [15] surveys the literature searching for transparency tools, and explores aspects of usable transparency—derived from legals provisions in the GDPR, and well accepted usability principles. The authors identify meaningful categories of tools and propose a classification based on functionalities and implementation, for instance. Although this work is comprehensive in exploring the characteristics of usable TETs, it does not explicitly map technical aspects of the tools with the GDPR provisions they help accomplishing.

There are works, however, which compare and map legal and technical requirements, principles and designs. In particular, [8] reviews usability principles in a few selected TETs. To this aim, the authors gather requirements from workshops and by reviewing documents related to data protection, such as the proposal of the GDPR (document available at the time), and the opinions from the Article 29 Data Protection Working Party. These requirements are mapped to three Human-Computer Interaction (HCI) concepts, which in turn are discussed in the context of the TETs. Even though the mappings presented in this work are thoroughly discussed, the authors do not present a structured procedure followed when defining them. It is our interpretation that those mappings were identified manually.

The SDMFootnote 23 also classifies GDPR’s provision in terms of data protection goals (e.g., availability, transparency, intervenability), and comments on technical measures that help to guarantee transparency, such as, documentation of procedures, logging of access and modifications. These measures relate to our requirements, but are more high-level. We believe our requirements could be classified according to them, allowing us to select TETs that can accomplish transparency as described by the SDM. We leave this task to future works.

6 Discussion and Conclusion

Even since before the GDPR entered into force, several activities and initiatives bloomed with the aim to provide advice, guidance, instruments, or all of those services, to enterprises concerned about the high fines that were promised to follow a provable lack of compliance with the Regulation.

In this paper we focus on one particular aspect of the compliance, that about the Regulation’s principle of transparency. Despite the principle being only transversely referred in the GDPR—that is, it is not subject of one Article or one Recital in particular, but it is rather referred across many items—compliance with it is a serious matter. In January 2019, this statement could not become clearer, when The French data protection authority, the Commission National de l’Informatique et des Liberte (CNIL), condemned Google to pay an impressively high penalty, in the order of about 50 Million euros, because of lack of transparency. CNIL concluded in fact that users of services like Google Search, YouTube, Google Maps, Play Store etc., are not in the position to have a fair perception of the nature and volume of the collected dataFootnote 24. The CNIL also objected the transparency of the consent form that Google offers to its users, arguing that the consent form is not informative enough because it is stated in a way which is unclear and ambiguous, in addition to the fact that users have no choice but to accept it.

Discussing the full extent of this famous legal case is beyond our goal and it is not our business either to speculate on the reasons why Tech Giants like Google fail to be compliant with a Regulation, but at least, in part, one could question whether this might be due to the lack of instruments to inform users. In this paper, we looked into what could be the most natural choice, that is Transparency Enhancing Tools (TETs), while at the same time discussing the technical requirements that emerge from a technical reading of the GDPR’s provisions.

This comprehensive analysis of transparency helps identifying current and future developments to better comply with transparency and related GDPR requirements by using TETs. The tools were proposed to protect users’ privacy in general and thus not designed specifically for the GDPR; rather they have been tailored for one specific use case or goal, or thought to fulfil a specific legislation or regulation according to what were the priorities of who designed and developed them. Consequently, they cannot be immediately available to be included in most systems nor mindlessly considered ready to interpreting the GDPR’s provisions. But our analysis highlights which TETs match the GDPR’s requests on transparency, and according to which aspect they do that. However, adapting the tools to become instruments of compliance to the specifics of transparency in GDPR is something that needs to be developed or discussed in a near future. We are not there yet but this paper started to identify and clarify the way towards that goal, so that any future development will not be necessarily built from a blank board, but can be leveraged already by the 12 out of the 21 GDPR Articles that we studied and discussed here. At least partially, those Articles are addressed by the selected/presented TETs.