Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

“Today’s trends have opened an entirely new chapter, and there is a need to explore whether the principles are robust enough for the digital age. The notion of personal data itself is likely to change radically as technology increasingly allows individuals to be re-identified from supposedly anonymous data. In addition, machine learning and the merging of human and artificial intelligence will undermine concepts of the individual’s rights and responsibility” [1, p. 13].

1 Introduction

Today’s technologies enable unprecedented exploitation of information, be it small or big data, for any thinkable purpose, but mostly in business [2, 3] and surveillance [4] with the ensuing juridical and ethical anxieties.

Algorithms are regularly used for mining data, offering unexplored patterns and deep non-causal analyses to those businesses able to exploit these advances. They are advocated as advanced tools for regulation and legal problem-solving with innovative evidence gathering and analytical capacities.

Yet, these innovations need to be properly framed in the existing legal background, fit in to the existing set of constitutional guarantees of fundamental rights and freedoms, and coherently related to existing policies in order to enable our societies to reap the richness of big and open data while equally empowering all players.

Indeed, if the past is normally the prologue of the future, when our everyday life is possibly influenced by filtering bubbles and un-verifiable analytics (in a word: it is heteronomously “pre-set”), a clear ethical and legal response is desperately needed to govern the phenomenon. Without it, our societies will waver between two extremes—either ignoring the problem or fearing it unduly. Without a clear transnational ethical and legal framework, we risk either losing the immense possibilities entailed by big and small data analytics or surrendering the very gist of our praised rights and freedoms.

To secure the benefits of these innovations and avoid the natural perils every innovation brings along, this paper makes a call for regulating algorithms and expanding their use in legal problem-solving at various levels by first exploring existing legal rules. Accordingly, this paper, building on existing literature: (1) frames the main legal and ethical issues raised by the increasing use of algorithms in information society, in digital economy, and in heightened public and private surveillance; (2) sets the basic principles that must govern them globally in the mentioned domains; (3) calls for exploring additional uses of algorithms’ evidence-based legal problem-solving facility in order to expand their deployment on behalf of the public interest (e.g. by serving pharmacovigilance) and to empower individuals in governing their data production and their data flow.

It is important to stress that the tremendous benefits of big data—for instance, of the interplay among data mining and machine learning—are not questioned here. To the contrary, this paper asserts that in order to reap all the promised richness of these benefits it is important to correctly frame the reality of the interplay of these technologies. It is also important to emphasize that data protection, whatever form it takes in a given legal system, can be a key to development, rather than an obstacle.Footnote 1 There is no possible alternative to sustaining such a bold statement, since in the digital age static profiles are basically disappearing due to the expansion of machine learning and artificial intelligence. Among other things, such an evolution implies that the classification model used in any given moment no longer exists, as such, in a (relatively contemporaneous) subsequent one. Accountability, then, requires both the technical (already extant) and the legal ability to establish with certainty in each moment which profile has been used in the decision process.

Thus, the emerging regulatory approach must necessarily blend together various legal, technological, and economic strategies for which the time frame is of crucial importance.

The EDPS [1, p. 13] rightly stressed that “The EU in particular now has a ‘critical window’ before mass adoption of these technologies to build the values into digital structures which will define our society. This requires a new assessment of whether the potential benefits of the new technologies really depend on the collection and analysis of the personally-identifiable information of billions of individuals. Such an assessment could challenge developers to design products which depersonalise in real time huge volumes of unorganized information making it harder or impossible to single out an individual.” Yet, at this stage we should wonder whether it is more a matter of privacy protection or dignity threatened by the biased targeting of anonymities, not just individuals.

2 The Classifying Society

The Free Dictionary defines the verb “to classify” as: “to arrange or organize according to class or category.” Google, faithful to its nature, clarifies: “[to] arrange (a group of people or things) in classes or categories according to shared qualities or characteristics”.

According to the Cambridge Advanced Learner’s Dictionary, the verb “classify” originally referred, however, to “things”, not to humans.Footnote 2 The societies in which we live have altered the original meaning, extending and mostly referring it to humans. Such a phenomenon is apparent well beyond big dataFootnote 3 and the use of analytics. For instance, in medicine and research the term “classification” now applies to old and new (e.g. emerging from DNA sequencing) shared qualities and characteristics that are compiled with the aim of enabling increasingly personalized advertising, medicine, drugs, and treatments.Footnote 4

The rising possibilities offered by machine learning, data storage, and computing ability are entirely changing both our real and virtual landscapes. Living and being treated according to one or more class is the gist of our daily life as personalized advertisingFootnote 5 and the attempt by companies to anticipate our desires and needsFootnote 6 clearly illustrate.

Often every group of “shared qualities or characteristics” even takes priority over our actual personal identity. For instance, due to a group of collected characteristics, we can be classified and targeted accordingly as (potential) terrorists, poor lenders,Footnote 7 breast cancer patients, candidates for a specific drug/product, or potentially pregnant women.Footnote 8 These classifications can be produced and used without us having the faintest clue of their existence, even though we are not actual terrorists (perhaps, for instance, we like to travel to particular areas and happen to have an Arab-sounding name), we are affluent (but prefer to live in a much poorer neighbourhood), we do not have breast cancer, we have no need for or interest whatsoever in a drug/product, we are not even female, or we are not a sex addict (see infra).

The examples referred to above are well documented in literature and have been chosen to illustrate that classifications characterize every corner of our daily life.Footnote 9 They expose most of the legal problems reported by scholars and concerned institutions. Most of these problems revolve around notions of privacy, surveillance, and danger to freedom of speech,Footnote 10

Yet, literature to date has failed to discuss the very fact that the classifying society we live in threatens to make our actual identities irrelevant, fulfilling an old prophecy in a cartoon from a leading magazine. This cartoon displayed a dog facing a computer while declaring: “on internet nobody cares you are actually a dog”. With hindsight, we could now add “if you are classified as a dog it is irrelevant you are the human owner of a specific kind of dog”. Indeed, Mac users are advertised higher prices regardless of whether they are personally identified as affluent [18]. Although in some countries (such as the USA) price discrimination and customer-steering are not forbidden unless they involve prohibited forms of discrimination,Footnote 11 we should begin to question the ethics of such processes once they are fully automatic and unknown to the target.Footnote 12 In addition, they lock individuals into myriad anonymous models based upon traits that they might not be able to change, even if, theoretically, such traits are not as fixed as a fingerprint—as, for instance, where the use of an Apple product is needed for the specific kind of work performed, or happens to be required by ones’ employer.

Even in those instances in which the disparate classifying impact is technically legal it can be ethically questionable, to say the least. Such questionability does not depend so much on the alleged aura of infallibility attributed to automatic computerized decision-making [23, p. 675] as on its pervasiveness and unavoidability.

Apparently there is nothing new under the sun. Humans, for good or bad, have always been classified. Classes have been a mode of government as well. What is different about the classificatory price discrimination (and not only price!)Footnote 13 that can be systematically produced today—different from the old merchant assessing the worth of my dresses adjusted according to the price requested—is discrimination’s dimension, pace, pervasiveness and relative economy.Footnote 14

Literature has also failed to address another key issue: due to the role of big data and the use of algorithms, actual personal identification as a requirement for targeting individuals is becoming irrelevant in the classifying society. The emerging targets are clusters of data (the matching combination between a group of data and a given model), not physical individuals. The models are designed to identify clusters of characteristics, not the actual individuals who possess them—and even when the model incorrectly identifies a combination, it is refining its own code,Footnote 15 perfecting its ability to match clusters of shared characteristics in an infinite loop.

Classification based on “our” “shared characteristics” covers every corner of us (what we are and what we do, let alone how we do it) once we can be targeted at no cost as being left-handed, for instance. Yet, the power (resources, technology, and network economies) to do such classifying is quickly becoming concentrated in fewer hands than ever.Footnote 16 The pace of classification is so rapid as to have begun happening in real time and at virtually no cost for some players. Its pervasiveness is evidenced by the impact our web searches have on our personalized advertising on the next web site we visit.Footnote 17 The progressive switch to digital content and services makes this process even faster and easier: “Evolving integration technologies and processing power have provided organizations the ability to create more sophisticated and in-depth individual profiles based on one’s online and offline behaviours” [27, p. 2].Footnote 18 Moreover, we are getting increasingly used to the myth of receiving personalized services for freeFootnote 19 and to alternatives to the surrender of our data not being offered [29].

Classifications are not problematic by definition, but some of their modes of production or uses might be. In particular, this applies extensively to the digital domain, wherein transactions are a continuum linking the parties (businesses and “their” customers)—a reality that is clearly illustrated by the continuous unilateral changes made to Terms of Service (ToS) and Privacy Policy Terms and Conditions (PPTCs) that deeply alter the content of transactionsFootnote 20 after (apparent) consumption and even contemplate the withdrawal of the product/service without further notice.Footnote 21

On a different account, the expanding possibility of unlocking information by data analysis can have a chilling effect on daily choices when the virtual world meets the real one at a very local level. For instance, a clear representation of the outer extreme of the spectrum introduced at the beginning (fear of technology), could be the hesitation to join discussion groups (on drugs, alcoholism, mental illnesses, sex, and other topics) for fear that such an affiliation can be used in unforeseen ways and somehow disclosed locally (maybe just as a side effect of personalized marketing).Footnote 22

Even when cookies are disabled and the web navigation is “anonymous”, traces related to the fingerprints of the devices we use are left. Thus, the elaboration and enrichment of those traces (that are by definition “anonymous”) could be related to one or more identifiers of devices, rather than to humans. At this point there is no need to target the owner of the device. It is simpler and less cumbersome from a legal standpoint to target the various evolving clusters of data related to a device or a group of devices instead of the personally identified/able individual. Such a state of affairs calls for further research on the need to “protect” the anonymous (and, to the extent that we are unaware of its existence, imperceptible) identity generated by data analytics.

In this unsettling situation the legal and ethical issues to be tackled could be tentatively subsumed under the related forms of the term “classify”.

The adjective “classifiable” and its opposite “non-classifiable” indicate the legal and ethical limits of classification that have so far been made publicly manifest in well-known incidents like the NSA scandal,Footnote 23 Facebook tracking,Footnote 24 the Domain Awareness System (DAS)Footnote 25 and TrapWire software.Footnote 26 Yet, the virtual number of classifications and biases is actually infinite [38].

The verb “misclassify” and its related adjectives (“misclassified,” “misclassifying”) denote instances of error in classification. As paradoxical as it might sound, these algorithms’ “mistakes” (false positives and false negatives) reinforce the strength of the classifying society, while simultaneously opening a Pandora’s box of new and often unknown/undiscoverable patterns of discrimination.

The verbs “overclassify” and “pre-classify” entail, for instance, risks of excessive and anticipatory classification capable of limiting autonomy, and can certainly be attached to the use of predictive coding in any capacity, be it automatic or human controlled/readable.

Indeed, since literature has clearly illustrated that in the classifying society full personal identificationFootnote 27 is not necessary to manipulate the environment in which we are operating [29], it is paramount to focus on the tracked/traceableFootnote 28 algorithmic “anonymous” identities upon which businesses and governmentsFootnote 29 rely to deal with us—that is, to focus on the classes, those various sets of data that pigeonhole us (these sets of data, are, to use the more appealing technological term, the “models” upon which algorithms act). After all, if a model is already built on data available to our counterpart and only very few of “our” data, even a pseudo-anonymized or fully anonymized model [42] is sufficient to pigeonhole; the classifying society is, across the continents, altogether bypassing data protection as we know it because it is targeting subsets of data fitting specific models rather than individuals expressly. Moreover, as anticipated, these clusters are mostly related to things rather than to individuals. Hence, and for instance, no warning (notice) is needed if the model does not need (therefore it is not even interested) to know it is targeting a given individual in real life; it only needs to identify a cluster of data fitting the model, related to one or more things, regardless of their correspondence with an identified or identifiable individual. Yet, if behind the anonymous subset of data there is an undiscovered real person, that person is effectively being targeted—even if the law as currently formulated does not regard it as such.Footnote 30

These technologies are deployed in the name of a “better user experience,” and of fulfilling the future promise of augmented reality and enhanced human ability. Yet, living immersed in the classifying society, we wonder whether the reality that “better matches” our (even anonymous) profiles (shared characteristics in a model) is also a distorted one that reduces our choices.

Such a troubling doubt calls into question the role that law has to play in facing the power of algorithms. It requires exploring both the adapted applications of existing rules and the design of new ones. It also suggests that ethical and legal principles should be shared and embedded in the development of technology [44]. The power to classify in a world that is ruled by classification should entail duties along with rights.

Nevertheless, even the newest regulation on data protection, the EU GDPR,Footnote 31 does not address these concerns. To the contrary, it might amplify them, legitimizing the entire classifying society model. But we cannot deal with such issues here.Footnote 32

Finally, the emergence of the classifying society is sustained by an economic argument that is as widespread as it is false: massive data collection rewards companies that provide “free” services and access where an alternate business model (for payment) would fail. First of all, this argument itself makes clear that since the service is rewarded with data collection, it is not actually free at all.Footnote 33 Secondly, markets based entirely on a pay-for-service model do exist: the Chinese internet services system is a clear example.

Against this general framework we need to now re-frame a number of the legal issues—already stressed by the literature—generated by the widespread use of algorithms in the classifying society.

2.1 (Re)sorting Out the Legal Issues

A large portion of the legal and ethical issues arising from the use of algorithms to read big (or small) data have been already identified by both legal and information technology experts. Similarly, there are a variety of taxonomies for algorithms related to their ability to look for hypotheses and explanations, to infer novel knowledge (deductive algorithms) or transform data into models (inductive algorithms), and to use third-party data to colour information (so-called socialized searches such as the ones used by Amazon’s book recommendations).Footnote 34 All of these taxonomies can be related to the “classify” vocabulary mentioned above, but this is not the task we have here. The following brief overview will explicate the main issues, demonstrating how the two identified deficiencies in the literature call for a different approach and a multilayer regulatory strategy.

A growing literatureFootnote 35 has already illustrated the need for more transparency in the design and use of data mining, despite the fact that transparency as such can undermine the very goal of predictive practices in a manner that is disruptive to the public interest (for instance by making public the random inspection criteria used to select taxpayers).Footnote 36 Nevertheless, what we are questioning here is not the use of classification and algorithms for public goals and by public authorities. Instead, we focus on the pervasiveness of these processes in daily life and at the horizontal level among private entities with strong asymmetries of knowledge and power. Such asymmetries in the use of descriptive and predictive algorithms can generate micro-stigmas which have not been fully explored, let alone uncovered. These micro-stigmas or classifications are so dispersed and undisclosed that mobilization and reaction appear unrealistic. Indeed, and for instance, it is a new stereotype that Apple products users are more affluent than PC users; yet it is a stereotype that escapes existing legal and ethical rules and can lead to higher prices without even triggering data protection rules.

In their reply to Professor Richards, Professors Citron and Gray [54, p. 262] recall the various forms of surveillance leading to “total-information awareness”: “coveillance, sousveillance, bureaucratic surveillance, surveillance-industrial complex, panvasive searches, or business intelligence”. In stressing the role of fusion centersFootnote 37 as a key to this shift to total surveillance,Footnote 38 they emphasize the fall of the public/private surveillance divide.

Yet, this is not the Orwellian nightmare of 1984 but a distributed mode of living and “manipulating” in which we are normally targeted indirectly—that is, by matching a subset of our data (that do not need to be personal identifying information in the eyes of law) with the models generated by algorithms during the analysis of large quantities of data (data not necessarily related to us either), all without the need to identify us as individuals.

When we read the actual landscape of the information society in this way, continuous surveillance, as it is strikingly put by Julie E. Cohen [53], alters the way people experience public life.Footnote 39 However, the implications for human daily life and activities are deeper and more subversive—and yet mostly unnoticed—in the relationships we develop under private law [29].Footnote 40

Predictive algorithms are applied widely [58]Footnote 41 and extensively resort to data mining [62, pp. 60–61]. They have intensively and deeply changed the notion of surveillance and have introduced a novel world of covered interaction between individuals and those who sense them (private or public entities or their joint alliance). As Balkin [63, p. 12] puts it, “Government’s most important technique of control is no longer watching or threatening to watch. It is analyzing and drawing connections between data…. [D]ata mining technologies allow the state and business enterprises to record perfectly innocent behavior that no one is particularly ashamed of and draw surprisingly powerful inferences about people’s behavior, beliefs, and attitudes.”Footnote 42

A significant American literature has tackled the problem by examining potential threats to the Fourth Amendment to that country’s Constitution [64,65,66,67]. Yet, from a global perspective wherein the American constitution does not play a role, it is important to investigate legal and ethical rules that can be applied to private entities as well, beyond their potential involvement in governmental operations.Footnote 43

Despite the fact that scholars have extensively stressed the potential chilling effectFootnote 44 of the phenomenon, we claim that the “risk” accompanying the opportunities transcends a disruption of the public arena and casts on individual and collective lives the shadow of an unseen conformation pressure that relentlessly removes the right to be divergent from the available modes of acceptable behaviour, action and reaction, in our commercial and personal dealings. The individual and social impact goes beyond the risks posed by continuous surveillance technology and reduces “the acceptable spectrum of belief and behavior”. As it has been stressed, continuous surveillance may result in a “subtle yet fundamental shift in the content of our character”. It arrives “not only to chill the expression of eccentric individuality, but also, gradually, to dampen the force of our aspirations to it” [15].Footnote 45

Yet, what we are describing in the classifying society is more than, and different from, surveillance. While surveillance might impair individuals’ and groups’ ability to “come together to exchange information, share feelings, make plans and act in concert to attain their objectives” [74, p. 125], the classifying society can prevent us from thinking and behaving in our private relationships in a way that diverges from the various models we are pigeonholed into.

Platform neutrality is also in danger in the classifying society since large players and intermediaries have the ability to distort both commerce and the public sphere by leveraging their size or network power or big data.Footnote 46 The issue becomes more problematic once we look at the legal protections claimed by intermediaries such as Google.Footnote 47 Indeed, the claim of such intermediaries to being merely neutral collectors of preferencesFootnote 48 is often undermined by the parallel claim they make, as corporations, they enjoy a free speech defence [81] that would allow them to manipulate results in favour of (or contrary to), for example, a political campaign, or a competitor, or a cultural issue. Such a result is somehow acknowledged in the American legal system by Zhang v. Baidu.com Inc.,Footnote 49 which affirms that a for-profit platform’s selection and arrangement of information is not merely copyrightable, but also represents a form of free speech.Footnote 50

This latter example of corporate strategic behaviour, switching from one to another self-characterization, illustrates very clearly the need for a transnational legal intervention,Footnote 51 or, at least, the need to overcome the stand-alone nature of legal responses that allows a company to use one approach under competition law and another one under constitutional law, for instance.Footnote 52

Yet, this very opportunistic approach should make us wonder if it could be claimed as well by the individuals sensed by algorithms. Since sensed and inferred data are mostly protected as trade or business secrets, it would be worth exploring the possibility of protecting individual anonymities as individual trade secrets in shaping their own bargaining power.

2.2 On Discrimination, (Dis) Integration, Self-Chilling, and the Need to Protect Anonymities

Analytical results may have a discriminatory—yet sometimes positive [85]—impact based on unacceptable social factors (such as gender, race, or nationality), on partial information (which being by definition incomplete is typically thereby wrong), on immutable factors over which individuals have no control (genetics, psychological biases, etc.), or on information the impact of which is unknown to the individual (e.g. a specific purchasing pattern). With reference to the latter it can also be the case of a plain and tacit application of a generalization to a specific individual [86, p. 40],Footnote 53 or of plain errors in the data mining process [68] amplifying the risks of misclassification or over-classification we have anticipated above.

It is also rather obvious that data mining casts data processing outside of “contextual integrity” [88], magnifying the possibility of a classification based on clusters of data that are taken literally out of context and are therefore misleading. For instance, assume that, for personal reasons unrelated to sexual habits, someone goes through an area where prostitution is exercised. This individual does it at regular times, every week, and at night. The “traffic” in the area forces a slow stop-and-go movement, compatible with attempts to pick up a prostitute, and such a movement is tracked by the geo-localization of one or more of the devices the individual is carrying along. Accordingly, once connected to the web with one of those devices (recognized by one or more of their components’ unique identification numbers, such as their Wi-Fi or Bluetooth antenna) the device is targeted by advertising of pornographic materials, dating websites, sexual enhancement drugs, or remedies for sexually transmitted diseases because it fits the model of a sex addict prone to mercenary love.

Note that we intentionally switched from referring to the human individual to referring to his/her devices and to the fact data collected, which are directly and exclusively related to the device. There is no need to even relate the cluster of data to personal identifying information such as ownership of the car or contracts for the services provided to the device. Nor there is any need to investigate the gender of the individual since to apply the model it is sufficient that a component in the device is characterized by features “normally” referable to a specific gender (e.g. the pink colour of the smartphone back cover). Of course, the more our society moves towards the Internet of Things, the more the classifications will be related to individual things and groups of related things (related, for instance, because they are regularly physically close to one another—such as a smartphone, a wallet, a laptop, …). Targeting groups of related things can be even more accurate than targeting the human individual directly. Indeed, some common sharing of characteristics among things increases the salience of each bit of individual information; for instance their “place of residence” (home) is detected by the fact that they all regularly stay together at the same geo-localized point at night, but variations in their uses offer the possibility of fine tuning the targeting by, for instance, noting that while at “home” the group shows a different behavioural pattern (e.g. related to a different gender/age because a companion or family member is using the device). Accordingly, targeting the device at different hours of the day may expand the reach of the classifying society, again without the need to resort to personal identifying information and with a higher granularity. To the contrary, the classifying society avoids personal identifying information because such information reduces the value of the different clusters of data referable in the various models to the device or to the group of things.

This example clearly illustrates how and why a change of paradigm is required: to consider and regulate anonymities—not only identities—in data protection.

This rather simple example of analytical mismatch also triggers the issue of the harm caused to the undiscovered individual behind the devices. It might be problematic to claim his/her privacy has been intruded. Arguably, the new (erroneous) knowledge produced to sell her/his devices’ data for targeted advertising could even be protected as a trade secret. Apparently the model is acting on an error of “perception” not very different from the one a patrolling police officer might have in seeing an individual regularly in a given area, week after week, and deciding to stop him for a “routine control”. Yet, while the impact of a regular and lawful search by police is trivial and might end up correcting the misperception, the erroneous fitting to the model of mercenary love user would lead to a variety of problems—such as unease in displaying in public or with relatives, children, spouse, … ones’ own web searches due to the advertising they trigger and that cannot be corrected by traditional privacy preserving actions (cookies removal, cache cleaning, anonymous web searches,…) and reduced ability to exploit searches since “personalized” results pollute the results of such searches on the given device [39].

These are all forms of privacy harm that are difficult to uncover and even more problematic to prove and remedy. In the words of Ryan Calo [89]: “The objective category of privacy harm is the unanticipated or coerced use of information concerning a person against that person. These are negative, external actions justified by reference to personal information. Examples include the unanticipated sale of a user’s contact information that results in spam and the leaking of classified information that exposes an undercover intelligence agent.” In ordinary people’s lives this might involve, for instance, “the government [leverage of] data mining of sensitive personal information to block a citizen from air travel, or when one neighbor forms a negative judgment about another based on gossip” [89, p. 1143].Footnote 54

We claim here that the expanding use of algorithms amplifies the “loss of control over information about oneself or one’s own attributes” [89, p. 1134] to a level beyond personal identification. Indeed, in several legal systems data protection is only triggered when personal identifiable information is at stake.Footnote 55 In the classifying society vulnerability does not require personal identification.Footnote 56

Data mining is already contributing to a change in the meaning of privacy, expanding the traditional Warren and Brandeis legacy of a “right to be let alone” to privacy as “unwanted sensing” by a third party [90, pp. 225–226]. However, in the classifying society, the interplay among the various techniques for gathering and enriching data for analysis and during the mining processFootnote 57 (from self tracking [91] to direct interaction or company logs [11], or the intermediary logging of data such as google searches or cookies or the purchase of data by a broker to integrate and augment existing data bases)Footnote 58 has reached a point in which data protection as we know it is powerless and often effectively inapplicable to modern data mining technologies.

Data mining, the “nontrivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data” [93],Footnote 59 is producing a deep change in our thinking paradigm. We used to consider data in contexts, to track them to individuals. Algorithms, big data, and the large computing ability that connect them do not necessarily follow causal patterns and are even able to identify information unknown to the human individual they refer to.Footnote 60 Pattern (or “event-based”) and subject-based data mining [69, p. 438] search for patterns that describe events and relate them. They result in both descriptive and predictive results. In the latter case, data analysis generates new information based on previous data. This information should be able to predict outcomes (e.g. events or behaviours) by combining previous patterns and new information. The underlying assumption is that results found in older data apply to new ones as well, although no causal explanation is provided even for the first set of results.

Datasets are actively constructed and data do not necessarily come from the same sources, increasing the risks related to de-contextualization. Moreover, since an analyst must define the parameters of the search (and of the dataset to be searched), human biases can be “built in” to the analytical tools (even unwillingly),Footnote 61 with the additional effect of their progressive replication and expansion once machine learning is applied. Indeed, if the machine learning process is classified as “non-interpretable” (by humans, sic!), for instance because the machine learned scheme is assessing thousands of variables, there will not be human intervention or a meaningful explanation of why a specific outcome is reached (e.g. a person, object, event, … is singled out and a group of devices is targeted as a sex addict).

In the case of interpretable processes (translatable in human understandable language) a layer of human intervention is possible (although not necessary), as in, for example, the police search. Again this possibility is a double-edged sword since human intervention can either correct biases or insert new ones by interfering with the code, setting aside or inserting factors [95].

The lack of (legal and ethical) protocols to drive human action in designing and revising algorithms clearly calls for their creation, but requires a common setting of values and rules, since algorithms are basically enjoying a-territoriality in the sense that they are not necessarily used in one given physical jurisdiction. Moreover, the expansion of the autonomous generation of algorithms calls for building similar legal and ethical protocols to drive the machine generation of models. In both cases, there is also a need for technological verifiability of the effectiveness of the legal and ethical protocols making human readable at least the results of the application of the model when the model is not readable by humans.

Interpretable processes are certainly more accountableFootnote 62 and transparentFootnote 63 (although costlier),Footnote 64 but they still present risks. Moreover, by using (or translating into) interpretable processes, data mining will increase the possibility of getting back to searching for causality in the results instead of accepting a mere statistical associationFootnote 65: the group of devices passing through a city’s solicitation block could be cleared of the social stigma the model has attached to them with all of its ensuing consequences. This is an important note since data mining results are NOT based on causality in the traditional sense but instead on mere probability.Footnote 66 Human readability would also enable a confidence check on the level of false positives in the rule produced by the algorithm, and, ex post, on the number of false negatives that were missed by the predictive model.Footnote 67

What comes out of the preceding analysis is also a more nuanced notion of anonymity for the classifying society. After all, as clearly illustrated in the literature, anonymous profiles are “quantified identities imposed on individuals, often without their consultation, consent, or even awareness” [48, p. 1414].Footnote 68 The concept of anonymityFootnote 69 has influenced the very notion of personal data that is still defined in the EU GDPR as “any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier, or one or more factors specific to the physical, physiological, genetic, mental, economic, cultural, or social identity of that natural person”.Footnote 70 Meanwhile a “nothing-to-hide” cultural approach has been blooming with the expansion of the so-called web 2.0,Footnote 71 generating a less prudent approach to the data our devices generate and share. Most (not to say all) apps and software we use “require” access to (and also generate) an enormous amount of data that are unrelated to the purpose of the software or app. Those data are widely shared and subsequently fed in to a myriad of models that are then applied to the same and other sets of data-generating “things” (not necessarily to the individual owner of them).

Thus the technological promise of data protection through anonymization has been defeated by technology itself. Moreover, “…where anonymity for the sake of eliminating biases is desirable, one cannot assume that technical anonymity by itself guarantees the removal of bias. Rather, if technical anonymity allows for biased misattribution, then at least in some contexts, there may need to be additional steps introduced to mitigate this possibility” [105, pp. 178–179].

Finally, it is the very notion of pseudo-anonymous dataFootnote 72 that provides the key to bypass entirely personal data protection in the classifying society.

2.3 Data and Markets: A Mismatch Between Law, Technology, and Businesses Practices

A modern clash of rights seems to emerge in the classifying society—between, on the one hand, the right of individuals to control their own data, and, on the other, the interest of business in continuously harnessing that dataFootnote 73 as an asset. The latter are increasingly protected by IP or quasi IP as trade secrets.Footnote 74

This clash is echoed among economists. Some support business arguing that more (customers’) data disclosure reduces information asymmetries [108, 109] while others argue it is data protection that generates social welfare [110]. In this debate legal scholars mostly advocate more individuals’ control over their data at least in terms of an extended propertization [111, 112].Footnote 75 However, theories [113, 116, 117] about how a market for personal data, acknowledging data subjects’ property rights, do not seem to consider the actual technological state of art [21] and the corresponding legal landscape.

In this landscape the legal framework remains dominated by the Privacy Policy Terms and Conditions whose legal enforceability remains at least doubtful in many legal orders. Meanwhile, de facto PPTC are enforced by the lack of market alternatives, by the widespread lack of data subjects’ awareness both of the existence of tracking procedures and of the role of data aggregators and data brokers.Footnote 76 The hurdles and costs of a lawsuit close this vicious loop in which, although doubtful in legal terms, PPTC actually govern the data processing beyond statutory constraints in any given legal system.

In addition, the composite nature of most transactions makes opaque, to say the least, information collection and sharing processes. Very often, data processing is externalized [119] making it even more opaque and difficult to track data once they are shared.

Note also that some key players such as Google, Amazon, and e-Bay do not identify themselves as data brokers even though they regularly transfer dataFootnote 77 and generate models.

Further and subsequent processing, even if data are anonymized or pseudo-anonymized, generates models that are later applied to the same (or other) groups of shared characteristics, a process which impacts individuals whose data, although technically non-personal, matches the model. In other words, the production cycle we are describing (data collection of personal/things data, their enrichment with other data, and their pseudo-anonymization, the generation of the model and its continuous refinement) allows for application of the model to individuals without the need to actually involve further personal data (in legal technical terms), entirely bypassing data protection laws—even the most stringent ones.

Paradoxically, as it might seem that such a process could easily pay formal homage to a very strict reading of the necessity principle and the ensuing data minimization. Incidentally, the design of privacy policy terms that require (better yet: impose) consent in order to access the required services in a technological ecosystem that does not provide alternativesFootnote 78 trumps altogether those principles, while a business model based on the classifying society and on things’ data (that is pseudo-anonymized data) disregards those principles altogether. In such a scenario, the data minimization rule normally required under the EU data protection directiveFootnote 79 to distinguish between information needed to actually deliver the good or service and additional data becomes irrelevant—in any event, it has never acquired actual grip in business practice due to the low enforcement rate and despite its potential ability to increase data salience [119].

Technology experts have developed various concepts to enhance privacy protection—e.g. k-anonymity [123], l-diversity [124] and t-closeness [125]—but these concepts have revealed themselves as non-conclusive solutions. A similar fate is common to other privacy-enhancing technologies (PETs) [126, 127] that rely on the assumed preference for anonymity, clearly (yet problematically) challenged by the use of social networks,Footnote 80 while Do-Not-Track initiatives have not obtained enough success either [133]. And yet, the importance of all these attempts is becoming progressively less relevant in the classifying society. Accordingly, users must keep relying on regulators to protect them and on businesses to follow some form of ethical behaviourFootnote 81 calling for more appropriate answers from both stakeholders.

As mentioned, device fingerprints enable identification of individuals and devices themselves even when, for instance, cookies are disabled. If we consider that each device is composed of several other devices with their own digital fingerprints (e.g. a smart phone has a unique identification number while its Bluetooth and Wi-Fi antennas and other components also have their own), and that each of them continuously exchange information that can be compiled, the shift from personal data to things’ data and to clusters of things’ (non-personal) data as the subject of data processing begins to emerge in all its clarity and pervasiveness. When the GSM of a mobile phone allows geo-localization, we are used to thinking and behaving as if the system is locating the individual owner of the mobile phone. Yet technically it is not; it is locating the device and the data can be easily pseudo-anonymized. Once it becomes possible to target the device and the cluster(s) of data contingently related to it in a model, without the need to expressly target a specifically identified individual, the switch from personal data to things’ data is complete, requiring an alternative approach that has not yet fully emerged.

The situation described above leads to a deadly stalemate of expanding asymmetry among players in markets and in society overall.

The overarching concept behind the classifying society is that individuals are no longer targetable as individuals but are instead selectively addressed for the way in which some clusters of data that they (technically, one or more of their devices) share with a given model fit in to the model itself. For instance, if many of our friends on a social network have bought a given book, chances are that we will do it as well; the larger the number of people in the network the more likely it is we will conform. What results is that individuals (or, rather, the given cluster of information) might be associated with this classification (e.g. high buying interest) and be treated accordingly—for instance, by being advertised a higher price (as happened for OS users versus Windows users).

Note that we normally do not even know such a classification model exists, let alone that it is used. Moreover, if engineered well from a legal standpoint, a data processing might not even amount to a personal data processing, preventing any meaningful application of data protection rules. On the other hand, the construction of models on a (very) large array of data that might or might not include ours is beyond our control and is normally protected as a business asset (e.g. as a trade secret).

In addition, since the matching between individuals and the model just requires a reduced amount of data, the need for further enriching the cluster of data with personal identifying information is progressively reduced. Thus, it becomes increasingly easy for the model to be uninterested in “identifying” us—instead contenting itself to target a small cluster of data related to a given device/thing (or ensemble of devices)—because all of the relevant statistical information is already incorporated in the model. The model need not identify us nor search for truths or clear causality as long as a given data cluster statistically shares a given amount of data with the model. Moreover, even if the model gets it wrong, it can learn (automatically with the help of AI and machine learning techniques) from the mistake, refining itself by adding/substituting/removing a subset of data or giving it a different weight that adjusts the statistical matching results: (big) data generate models and the use of models generates new data that feed the models and change them in an infinite creative loop whose human (and especially data subject) control is lost [7].

Such a deep change is unsettling for data protection because it undermines any legal and technological attempt (e.g. of anonymization techniques) to intervene. Also, it can lead to unpleasant—even if unsought—results. The societal impact and risks for freedom of speech have been highlighted elsewhere. Yet, the selection of information and the tailoring of the virtual (and real) world in which we interact, even if not concerted among businesses,Footnote 82 reduce the chances of producing divergent behaviours with the result of actually reinforcing the model, on the one hand, and reducing the possibility of divergence even further on the other.Footnote 83

Accordingly, technical solutions against privacy loss prove unhelpful if actions are based upon the correspondence of a small number of shared characteristics to a model. Indeed, privacy-preserving techniques, privacy-by-design, and other forms of technical protection for privacy all aim at reducing the amount of information available to prevent identification or re-identification of data [135]Footnote 84 while personal identification as such is fading away in the encounter between technology and businesses models driven by algorithms.

This state of affairs calls for further investigation and a different research approach.

3 A Four-Layer Approach Revolving Around Privacy as a Framework

As anticipated, the data economy offers many advantages for enterprises, users, consumers, citizens, and public entities offering unparalleled scope for innovation, social connection, original problem-solving, and problem discovery using algorithms and machine learning (e.g. on emerging risks or their migration patterns such as in pandemic emergencies).Footnote 85 The actual unfolding of these possibilities requires the gathering and processing of data on both devices and individuals that raises divergent concerns [140] related to, on one hand, the will to share and participate (in social networks, for instance, or in public alerting systems in the case of natural disasters or pharmacovigilance), and, on the other hand, to the reliability of the organisations involved, regardless of their private or public nature.Footnote 86

The complete pervasion of interactive, context-sensitive mobile devices along with the exponential growth in (supposedly) user-aware interactions with service providers in the domains of transportation and health-care or personal fitness, for instance, dynamically generates new sources of information about individual behaviour or, in our framework, the behaviour of (their) devices—including personal preferences, location, historical data, and health-related behaviours.Footnote 87 It is already well known that in the context of mobile cells, Wi-Fi connections, GPS sensors, and Bluetooth, et cetera, several apps use location data to provide ‘spatial aware’ information. These apps allow users to check in at specific locations or venues, to track other users’ movements and outdoor activities, and to share this kind of information [151]. Health-state measurement and monitoring capabilities are being incorporated into modern devices themselves, or provided through external devices in the form of smart watches, wearable clips, and bands.

The information provided can be employed for health care, for personal fitness, or in general for obtaining measurable information leading to marvellous potentials both in public and private use. Nevertheless, once cast in light of the blossoming role of algorithms in the classifying society this phenomenon contributes to the above-mentioned concerns, and calls for a clear, technologically-neutral regulatory framework that moves from privacy and relates to all legal fields involved [152], empowering balanced dealing between individual data subjects/users and organised data controllers.

The legal landscape of algorithm regulation requires that traditional approaches to privacy, conceived as one component of a multifaceted and unrelated puzzle of regulations, are transcended, regardless of the diversity of approaches actually taken in any given legal order (opt-in versus opt-out; market versus privacy as a fundamental right, …). Only if data protection features at the centre of the regulatory process can it offer a comprehensive approach that does not overlook the link between social and economic needs, law and the technological implementation of legal rules, without underestimating the actual functioning of specific devices and algorithms-based business models.

The general trends of technology and business modeling described so far demonstrate the need to embed effective protection in existing legal data control tools (privacy as a fundamental right, for instance, or privacy by default/design) and the eventual need/opportunity to introduce sui generis forms of proprietary protection (personal data as commodities) to counterbalance the expanding protection of businesses in terms of both trade secrets and recognition of fundamental freedoms.Footnote 88

In this framework, predictive algorithms are the first to raise concerns—especially in terms of consent, awareness (well beyond the notice model), and salience. Our privacy as readers and our ability to actually select what we want to read is an important issue that promptly comes to the fore when predictive algorithms are used [154].Footnote 89

Nevertheless, using privacy as a framework reference for other areas of law in dealing with the classifying society would require an integrated strategy of legal innovation and technical solutions.

3.1 Revise Interpretation of Existing Rules

The first layer of this strategy would leverage the existing sets of rules.

In the actual context of algorithms-centrality in our societies at any relevant level, we claim that it is necessary to explore the potential interplay between the fundamental principles of privacy law and other legal rules such as unfair practices, competition, and consumer protection. For instance, although it may be based on individual consent, a practice that in reality makes it difficult for users to be aware of what data processing they are actually consenting to might be suspicious under any of the previously mentioned sets of legal rules. Business models and practices that segregate customers in a technically unnecessary way (e.g. by offering asymmetric internet connections to business and non-business clients), directing some categories to use the services in certain ways that make data collection easier and more comprehensive (e.g. through concentrated cloud-based services or asymmetric pricing) might appear suspicious under another set of regulations once privacy as a framework illustrates their factual and legal implications.

Furthermore, a business model that targets data clusters directly but refers to individuals only indirectly, using models generated by algorithms without clearly acknowledging this, can be questionable under several legal points of view.

Within this landscape dominated by opacity and warranting transparency and accountability, a ‘privacy as a framework’ approach is required to overcome a siloed legal description of the legal rules in order to cross-supplement the application of legal tools otherwise unrelated among each other. This layer of the approach envisions not unsettling regulatory changes by the rule makers but the use of traditional hermeneutic tools enlightened by the uptake of the algorithm-technological revolution.

In this vein some authors have begun to explore the application of the unconscionability doctrine to contracts that unreasonably favour data collecting companies [157].Footnote 90 To provide another example, once we tackle the topic of predictive algorithms, it is unavoidable to delve into the unfair commercial practices legal domain. All western societies devote a specific regulation to them.Footnote 91 In a March 2014 panel, the Federal Trade Commission [163] identified some topics relevant to our discussionFootnote 92:

  • “How are companies utilizing these predictive scores?

  • How accurate are these scores and the underlying data used to create them?

  • How can consumers benefit from the availability and use of these scores?

  • What are the privacy concerns surrounding the use of predictive scoring?

  • What consumer protections should be provided; for example, should consumers have access to these scores and the underlying data used to create them?”

Predictive algorithms decide on issues relevant to individuals “not because of what they’ve done, or what they will do in the future, but because inferences or correlations drawn by algorithms suggest they may behave in ways that make them poor credit or insurance risks, unsuitable candidates for employment or admission to schools or other institutions, or unlikely to carry out certain functions.” The panel concluded by insisting on “transparency, meaningful oversight and procedures to remediate decisions that adversely affect individuals who have been wrongly categorized by correlation.” The need to “ensure that by using big data algorithms they are not accidently classifying people based on categories that society has decided—by law or ethics—not to use, such as race, ethnic background, gender, and sexual orientation” was considered urgent [164, pp. 7–8].

Yet, if we are concerned about de-biasing the algorithms in order to avoid “already” non-permitted discriminationsFootnote 93 and to promote transparency (intended both as accountability and as disclosure to customers),Footnote 94 we should be even more concerned about the risks posed by “arbitrariness-by-algorithm”, which are more pressing for the million facets of our characteristics, either permanent or temporary—such as our momentary mood, or our movement through a specific neighbourhood—that the classifying society makes possible.

Once we begin to ask ourselves whether it is fair is to take advantage of the vulnerabilities [169], of health-relatedFootnote 95 or other highly sensitive information [171], the need for a legal and ethical paradigm emerges vigorously and calls for reinterpreting the existing rules to expand their scope.

Indeed, once it is acknowledged that ToS and PPTCs are normally not read before accepting them [172],Footnote 96 and that the actual flow of data is often not even perceived by users because it runs in the background, the only apparent safeguard would seem to be an application of the necessity principle that not only limits data gathering and processing but that is normatively prescriptive well beyond authorizing a data processing.

For example, everybody realises that pre-installed appsFootnote 97 on ICT devices do not even allow activation of the apps themselves before consenting to the ToS and PPTCs, let alone the kind and extent of personal data collection and processing that they involve.Footnote 98 A sound application of the necessity principle (and of the privacy by design and by default approach to regulation) would impose a deep change in the design of web sites, of distributed products, of the structure of licence agreements, and of patterns of iterative bargaining even before dealing with the content of clauses on data processing and further uses of collected or given data.

For this reason, both the re-interpretation of existing legal rules and the eventual enactment of new ones are both necessary components of the overall strategy.

Such a process, which reverses the ongoing deregulation output via private law embedded in the content of the terms and in the design of their deployment in the progress of the business relationship (the pattern of alluring customers to buy), is rather intrusive on the part of the State and might be better pursued by reinterpreting existing private law rules governing the relationship between data subjects and data processors and/or service/product providers. This means enabling an interpretation of existing remedies (for instance against unfair business practices or of tort law rules, consumer protection statutes, and mistake and errors doctrines, …) enlightened by the deep and subterranean change in the technological and legal landscape we have described so far.

For instance, if data protection rules are insufficient to prevent every installed app from claiming access to the full range of personal data present on a device in order to work, perhaps the (unknown or not read) terms requesting such an extension of access can be considered illicit—as unfair terms, unfair practices, or as anticompetitive, thereby triggering traditional contractual or liability remedies.

Reinterpretation, however, presents its own limits and certainly cannot cover every facet of the classifying society.

3.2 Promote Changes in Regulation and Empower Data Subjects Technologically

On a different level, the approach could lead to: (1) a reclassification of legal rules applicable when algorithms play a key role related to privacy along innovative lines; (2) the production of new rules in the same instances.

Such an approach would influence and inform the entire regulatory framework for both public and private activities, aiming at overcoming the pitfalls of an approach that maintains separate rules, for instance, on data protection, competition, unfair practices, civil liability, and consumer protection. However, to date, such an unsettling move is not realistically foreseeable.

Innovative regulation would be required, for example, in rearranging the protection of the algorithms as trade secrets in order to serve an emerging trend of considering personal data as (at least partially) property of the data subject, thereby enabling some forms of more explicit economic exchange. However, this innovation would also require the development and deployment of appropriate technology measures.

Data generation is not treated as an economic activity for the data subject while data gathering from data processors is characterized as such. This is mostly due to the fact that technologies to harness and exploit data from the business side are largely available while technology on the data subject side has to date failed to empower data subjects with workable tools for maintaining economic control over the information related to them.

The need to strengthen the legal and technological tools of data subjects due to the scale and pervasive use of algorithms in our societies is clearly manifest in the literature across any scientific field.

From our point of view, however, the background premise of data subject empowerment is similar to the one we can make on consumer protection, for instance. Both consumer protection and data protection laws endow consumers/data subjects with a large set of rights with the aim of reducing information asymmetries and equalizing bargaining power. The effective implementation of these consumer rights is mainly based on information that businesses are required to provide at an important cost for them. Indeed, both the relevant legal framework and the EUCJ consider consumers to be rational individuals capable of becoming “reasonably well-informed and reasonably observant and circumspect” (Case C-210/96, par. 31), for instance by routinely reading food labels and privacy policies (Case C–51/94),Footnote 99 and of making efficient decisions so long as the relevant (although copious) information is presented to them in “a comprehensible manner”.Footnote 100

However, proposing to provide information in order to remedy bargaining asymmetries and sustain a proper decision-making process does not seem to effectively fill the gaps between formal rules and their impact on the daily life of consumers/users.Footnote 101 Once information is technically providedFootnote 102 and contract/TOS/PPTCsFootnote 103 are formally agreed upon, customers remain trapped in them as if they had actually profited from mandated information transparency by having freely and rationally decided upon such terms. Despite all the costs for industries, the attempt to reduce asymmetries and to empower efficient consumer rights often remains a form of wishful thinkingFootnote 104 with a significant impact on consumer trust and market growth.

Indeed, very few people read contracts when buying products or services online [172]. Actually, the expectation that they would do so is possibly unrealistic, as a typical individual would need, on average, 244 h per year just to read every privacy policy they encounter [173], let alone the overall set of information that is actually part of the contract and the contract itself [172].

Moreover, it is likely that users are not encouraged to read either terms and conditions or information related to the product/service due to systematic patterns in language and typographic style that contribute to the perception of such terms as mere document formalities [187] and/or make them uneasily accessible (see, for example, actual difficulties in reading labels and information displayed on packages).

The difficulty of reading terms and technical information stems out not only from their surface structure but also from readers’ lack of familiarity with the underlying concepts [188] and the fact that they figure in a format totally unfamiliar and not plainly understandable to the majority of consumers. Moreover, the vagueness and complex language of clauses, along with their length, stimulate inappropriate and inconsistent decision patterns on behalf of users.

As a result, lack of effective information, unawareness, and perception of a loose control over the process of setting preferences have a decisive impact on consumer trust, leading to weakening consumption patterns and to expanding asymmetries of power between individual users/customers/consumers and businesses.

Reversing this state of affairs to actually empower users/data subjects to select and manage the information they want (and want to share), and to expand their bargaining powers as a consequence of their ability to govern the flow of information, requires a deep change in approach. A change as deep as it is simple. Since businesses are not allowed to change the notice and consent approachFootnote 105 (although they largely manipulate it), a careful selection of useful information can be construed only from the other end (data subject/users).

The concept is rather simple: law forces the revelation of a large amount of information useful to users/data subjects, but the latter need an application tool to efficiently manage such information in order to actually become empowered and free in their choices.Footnote 106 While businesses cannot lawfully manage to reduce the amount of information they owe to users as consumers or data subjects, users can reduce and select that information for their improved use and for a more meaningful comparison in a functioning market.

A key feature of this layer in our approach is to embrace, technologically and in regulatory terms, a methodology pursuing an increased technological control over the flow of information, also with the assistance of legal rules that sustain such a control.

Yet, we are aware of the difficulties this layer entails. Indeed, it is difficult to re-gain a degree of control over a large part of data used by algorithms or by the models they create. Indeed, as off-line customers, our data are collected and processed either by relating them to devices we wear (e.g. e-objects), carry (e.g. smart phones), or use (e.g. fidelity cards, credit cards, apps, …) when buying products or services (e.g. paying the highway or the parking with electronic means).

It is important to note that a personal data-safe technological approach (e.g. data are maintained in the control of the data subjects that licence access to them) would require that businesses actually consent to opt-in to a legal system that allows and enforces the potential for data subjects to avoid surrendering data beyond a clear and strict application of the necessity principle. Similarly, and where algorithms’ functioning remains unveiled by the shadow of trade secrets protection, only a technological and cooperative approach can be of help. For instance, collective sharing of experiences along with shared (high) computational abilities can “reverse engineer” algorithms’ outputs to detect old and new biased results.

In other words, without a strict enforcement (that requires both a technological and a legal grip) of the necessity principle in delivering goods and services, we will never see a different market for data emerge—and certainly not one in which data subjects get most of the economic benefits from data (even pseudo-anonymous) related to them.

3.3 Embed Ethical Principles in Designing Algorithms and Their Related Services

There are several reasons to attempt to embed ethical principles in the design of algorithms and their related services [192, 193].

First of all, to date any technological and legal strategy to rebalance the allocation of information, knowledge, and power between users/data subjects and businesses/data processors and brokers has proved unsuccessful.Footnote 107

The reinterpretation of the solutions in any given legal system in light of technological developments and a sound privacy-as-a-framework approach would require a strong commitment by scholars, courts, and administrative authorities. In any event, it would require a certain amount of time to deploy its effects.

To the contrary, ethical principles and social responsibility are already emerging as drivers of businesses’ decision-making. Ethical principles, once technologically incorporable, in designing and testing algorithms and their related services can use exactly the same technology used by algorithms to monitor algorithms’ and business models’ actual operation.

Also, the adoption of such principles by the community of code developers, for instance, does not in principle need to receive the approval of businesses since these would be ethical rules of the “profession”.

4 A Summary as a Conclusion

In the classifying society a new target for protection emerges: personal anonymities, the clusters of characteristics shared with an algorithm-generated model that are not necessarily related to personal identifiable information and thus are not personal data in the strict meaning even of EU law.

The centrality of personal anonymities is blossoming with the maturing of the Internet of Things that exponentially increases the potential of algorithms and of their use unrelated to personal identifying information.

Personal anonymities stem from innovative business models and applied new technologies. In turn, they require a combined regulatory approach that blends together (1) the reinterpretation of existing legal rules in light of the central role of privacy in the classifying society; (2) the promotion of disruptive technologies for disruptive new business models enabling more market control by data subjects over their own data; and, eventually, (3) new rules aiming, among other things, to provide to data generated by individuals some form of property protection similar to that enjoyed by the generation of data and models by businesses (e.g. trade secrets). The blend would be completed by (4) the timely insertion of ethical principles in the very generation of the algorithms sustaining the classifying society.

Different stakeholders are called to intervene in each of the above mentioned layers of innovation.

None of these layers seems to have a prevalent leading role, to date. However, perhaps the technical solutions enabling a market for personal data led by data subjects, if established, would catalyse the possibility of generating alternate business models more in line with the values reflected in personal data protection rules. Yet, the first step remains to put privacy at the centre of the wide web of legal rules and principles in the classifying society.