Part I: Foundations for Distributed Data Economics

Distributed ledgers offer new horizons of opportunity for the monetization of data, and new models whereby individual consumers gain more control over and benefit from their personal data, versus the predominant model of today that awards the greatest economic gains to the platform marketing companies such as Facebook and Google, which generally leverage the economics of data extraction. With the market cap of digitally traded tokens of distributed ledger companies exceeding US$240 billion, significant investment in the private sector is supporting the creation of the new distributed data ecology.Footnote 1

Before we can explore distributed data economics, we need to understand where the data is derived, how it is manufactured and how it has been monetized in the past.

Blockchain systems, or distributed ledgers, are fundamentally databases. While a great deal of attention has been paid to the design, architecture, support, distribution and fundraising surrounding these databases, insufficient attention has been paid to the nature and quality of the data going into these systems, and how that data is being monetized. To paraphrase the chief innovation officer of one of the top banks in Europe, given how problematic consumer data often is in terms of its quality, we run the risk of creating immutable problems.Footnote 2

The Potential of Distributed Data

Imagine a world where consumers dictate how their personal data is used, not a handful of corporate conglomerates. Imagine a world where there is vibrant competition, and choice among service providers, for everything from personal banking to healthcare to energy services. Imagine a world where companies pay consumers directly, instead of marketing platforms, in order to acquire their business. Imagine a world where artists get paid royalties for their work directly instead of most of the profits of small artists disappearing into the coffers of the corporations which run the recording labels and distribution systems. Imagine a world, even, where a public health crisis can be resolved through a nearly automatic collective action by a community to contain the spread of an infectious disease. Virus epidemiology information and gene sequencing could be automatically propagated through a distributed data network in seconds or minutes, instead of the current system which requires layers of human approvals and sometimes sees political intervention at the expense of public health, as with the COVID-19 coronavirus outbreak.Footnote 3

These are all possibilities in a distributed data economy, but there are many obstacles to overcome—not the least of which the historical legacy that surrounds personal data.

The Data Aggregators

Data aggregators grew out of an opportunity to monetize the voluminous data that began to emerge out of the connected world, such as data from payments systems and telecommunications systems that offer rich sources of information about human behaviour.

These data sources have been generating ever-more-greater volumes of data from billions of individuals at ever-faster rates. Consultancy IDC projects that there will be more than 44 zettabytes of data generated around the world, up from 4.4 zettabytes in 2013.Footnote 4 One zettabyte is 2 to the 70th power bytes. If this book were in printed form, and filled with a zettabyte of data, you would have 10 volumes each tall enough to reach the sun.Footnote 5

Who are the titans of this first generation of data aggregation? Acxiom (the relevant division now owned by marketing conglomerate Interpublic Group) was the undisputed Zeus on Mount Olympus of data aggregation. Credit bureaus such as Equifax, Experian and TransUnion join them on this lofty vantage. Not unlike the Gods of Olympus, these data aggregators are extraordinarily difficult to reach, for example if you have a dispute about bad data that entered your record through fraud or identity theft. Yes, there may be a web form that you can eventually puzzle through, but oversight is weak and recourse limited. In some cases, the credit bureaus have purchased collection agencies, which enforce action based on…credit bureau data. Consumers are caught in a self-contained universe if they attempt to dispute a claim. Companies like Plaid, CreditKarma, Mint and MyLife.com now assemble and derive insights around consumer data. Dozens of vendors sell aggregated “anonymized” mobility data, information about how blocks of consumers move around a city, neighborhood or specific location.

Insight into consumers enables one to quickly pierce the anonymity of the crowd. With a few demographic dimensions (age, approximate income, city), an individual can be traced to their home address. Other, more indirect privacy penetrations are possible. For example, researchers discovered that four points of shopping data (such as date and location of purchase) could uniquely re-identify an individual out of millions of records.Footnote 6

The Emergence of Fine-Grained Human Behavioural Data

Fine-grained human behavioural insights can be extracted by understanding the digital traces, or “breadcrumbs”, that people leave on ubiquitous electronic networks that pervade every aspect of modern society.Footnote 7

The first modern payment card was issued in 1950.Footnote 8 Adoption was slow initially, but began picking up steam as data communications services improved in the 1970s and 1980s. Credit cards are expected to carry more than 850 billion purchase transactions by 2028, up from a current level of 369 billion.Footnote 9 In Europe, 2018 alone saw more than US$ 3 trillion of purchase volume.Footnote 10 With this growth in payments systems have come insights into consumer purchasing behaviours, derived from the purchasing data, that has proven highly valuable to marketers.

Mobile phones, likewise, have emerged as rich source of human factors data within the past ten to fifteen years.Footnote 11 Other data sources began emerging—for example, Catalina Marketing harvested the “scan” data from checkout registers retail stores, generating a fine-grained map of consumer shopping behaviours (albeit one that struggled with the consumer migration to e-commerce).Footnote 12 Loyalty programs (earning “points” or “miles”) have further generated actionable data on consumers that merchants have used to fine tune marketing.Footnote 13

With the World Wide Web (popularly referred to as the “internet”) exploding into widespread adoption in the late 1990s and beyond, a new vehicle was created for the acquisition of personal consumer data.

This proliferation of consumer data created a virtual feast of digital information for the data aggregators to gorge themselves on. Initially, consumer brand companies such as Proctor & Gamble and Nestlé, hungry themselves for smarter and better ways to market their products, supported this nascent industry with billions in revenue. Over time, other consumer-facing sectors such as financial services and auto embraced this approach to identifying and targeting relevant audiences and individuals.

Oligopoly Platform Companies

Increasingly, oligopoly platform companies such as the BATs (Baidu, Ali Baba, Tencent) and the FANGs (Facebook, Amazon, Netflix, Google) are themselves aggregating and tying together data from disparate sources and offering marketing analytics services to their corporate customers. Continuous location streams from mobile operating systems such as Android and messing apps such as WeChat and WhatsApp enable a very fine-grained understanding of behaviour—and ability to identify not only an individual, but their preferences and even predictions on future behaviours.Footnote 14

Part II: Legacy Data Economics

Data Depletion

A decade ago, the World Economic Forum published a white paper “Personal data: The Emergence of a New Asset Class”,Footnote 15 coincident with the emergence of the expression “Data is the New Oil”. Like oil and gas, data systems represent a long-term asset class with the long-cycle investment required to harvest them and maintenance investment is also required on an ongoing basis. Databases also have an analogous concept to oil and gas reserves: depletion. In the data world, this is commonly referred to as “decay” or “data decay”, namely the rate at which information in a database becomes obsolete. As the world of personal data economics has become more complex and interconnected, the World Economic Forum and others are looking at new approaches for creating and apportioning value from using data in new ways. Approaches such as using federated data to uncover value in latent health information (in turn creating economic incentives to cure rare diseases, for example)Footnote 16 as we will describe later in this chapter.

For example, in parts of Europe, as many as 23% of the population has moved within the past 5 years—comparable with one of the most mobile societies, the USA, with a rate of 24%.Footnote 17 Factors ranging from employment-driven movement (e.g. Polish workers in Paris) to humanitarian crisis (e.g. Syria) have further accelerated these trends. This means that name-and-address information becomes obsolete.

This leads us to a world where consumer data sets can decay 30% per year or more. For business data in certain markets (e.g. tech job contact details in San Francisco), that decay rate can exceed 70%.Footnote 18 If you are seeking to understand society or understand customers, you therefore need to invest not in data sets but in data systems, that enable you to keep pacing with the rapidly degrading data. Data is a river, not a rock, and should be viewed as a rapidly moving resource rather than a fixed object in space and time. The systems supporting that data should incorporate a mechanism for improving recency.

The Value of Personal Data

Once a data system is architected, and means of acquiring and compiling information (let us say, about consumers), what is the value of that data?

Industry Valuation of Consumers

How much, fundamentally, is a person valued economically, from the perspective of data economics?

The answer is of course in the manner in which it is consumed, how it is monetized. As of this writing, an individual is worth on average, globally, $359 per year to Google but $1,793 to Amazon (in terms of revenue).Footnote 19 The average American Facebook user is worth about $220 per year, but EU users are worth only about ¼ as much, perhaps due to stricter advertising regulations—one would expect that it would be much closer to the US revenue, given that per capita income in EU member states like Germany and Norway are comparable to or even greater than the USA on a PPP basis.Footnote 20

Amazon is an interesting case study. While it delivers more revenue volume through its shopping services, nearly two-thirds of its operating profit for 2019 came from Amazon Web Services (AWS), which also grew 25% faster than Amazon’s core products business.Footnote 21 International business segments are still operating at a loss.Footnote 22 And AWS is very high margin revenue—23% operating margin versus 5% operating margin overall for Amazon.Footnote 23 What this means is that user data generates a large volume of low-margin revenue for Amazon, while corporate revenue tied to cloud services now comprises the majority of Amazon’s profits.

Facebook, on the other hand, runs at a 34% operating margin as of 2019, even after a rise in expenses over 2018.Footnote 24 They have been able to successfully monetize user data 790+ % better than Amazon. Facebook’s margins are more than double the typical media company.Footnote 25 There are some who believe they should be regulated like an oligopolistic media company and not simply a “technology provider” as they would like to be classified.Footnote 26

Indeed, oligopoly platforms like Facebook and LinkedIn demonstrate an interesting aspect of personal data monetization—the “network effect”. The more people use these platforms and the more connected they are to each other, the more valuable the platform experience is to users (making them “stickier” and spending more time interacting with the platform) and the more marketers are willing to pay to access these audiences. Two-sided networks such as Airbnb or Lyft or Ola have an indirect network effect but still see this power-law value creation curve.Footnote 27

For Facebook, at least, its consumption of user data profits may be reaching the dregs of the bottle. New data privacy laws, repeated cyberhacks, and growing awareness about the relatively weak responses Facebook has given with respect to the use of its platform to promote misinformation, are beginning to shift Facebook’s interaction with regulators and policymakers, and may put pressure on its ability to monetize user data.Footnote 28 The government backlash against Facebook-sponsored Libra Project,Footnote 29 an overt attempt to acquire even more consumer data off its network (this time in the payments arena), illustrates the dangers of a data monetization policy that fails to transparently and rigorously address data ethics. Indeed, the announcement of Libra stimulated a number of governments to accelerate their Central Bank Digital Currency (CBDC) projects with the express purpose of competing with or suppressing Libra.Footnote 30 The Reserve Bank of Canada went further and said they would only launch a CBDC if Libra were successful.Footnote 31 Companies like Apple, for example, have not stimulated government response to such a degree, perhaps through more astute government affairs efforts coupled with data privacy actions perceived as beneficial to consumers.Footnote 32

Consumer Self-Worth

The converse system is instructive to explore: how much value do consumers attribute to their various personal data elements? Someone who publishes articles on LinkedIn might not place tremendous value on their own name, since it can be found attached to the article they published. Other data elements about an individual are much more sensitive. According to research conducted by the University of Trento, “where I am right now” (a user’s location in time and space) is the most “valuable” personal data. Media consumption, at the other end of the spectrum (where you read news or information), is valued little or not at all, and which apps you use falls somewhere in between. This landmark “Money Walks” study also determined that people value their own data more on days, which are outliers, where unusual events or activities are occurring, versus ordinary days.Footnote 33

Generalizations are slippery in the world of personal data values. One has only to look at the disparity among personal data protection laws in Germany, the USA, and Nigeria, to pick three countries, to see consumer sensitivities or lack thereof. With that said, consumers have generally shown a willingness to share information around activities if it will help improve their experience with a product or service. Age and gender also play into how much or little individuals value their personal data.Footnote 34 Perhaps unsurprisingly, the Millennial generation of digital natives is more prone to data sharing without remuneration.Footnote 35 Some business models have been constructed around tangible benefits for tangible personal data sharing: Waze works so well because Waze users share traffic and other road condition data with each other; Netflix’s recommendation engine, a core component of its value proposition, requires that users allow for the cross-fertilization of viewing preferences (“Other viewers like you watched…”)—sometimes getting the service into trouble, even when publishing “anonymized” insights into this data.Footnote 36

The Societal Cost of Legacy Data Models

The Economics of Privacy

Economists have, for decades, been exploring the economic cost of privacy. Not unlike filtration systems for water, or public safety patrols by law enforcement, or security gates on buildings, data privacy has associated costs. For example, if a job applicant, for a position working at a pharmacy that dispenses prescription pharmaceuticals, conceals that he or she has been arrested for selling drugs illegally, his or her personal privacy is protected, but the business in question assumes much greater economic risk than it intended (business, regulatory, and reputational). If a private citizen is in a traffic collision, but his or her medical records are locked in a secure system that the first responders cannot access, there may be a direct and deleterious impact on health care if, for example, the individual is allergic to certain medications or has a Do Not Resuscitate (DNR) order on file. If an individual wishes to apply for a loan, certain efficiencies are introduced if a trusted third party such as a credit bureau can provide assurances to the lender (the bank, for example) that the individual has adequate creditworthiness.Footnote 37

For each of these circumstances, there are counter-arguments that suggest an equivalent economic burden. The job seeker may have been falsely accused, and lacked the funds to adequately defend against state prosecution. The person in the traffic collision might have their medical information improperly accessed another time if it is too readily available, and suffer other harm as a result. The loan applicant might be disputing information on the credit file, but be unsuccessful in having incorrect information removed. For example, at least one of the major credit bureaus has purchased a loan collection agency where they buy debts. Even if one wishes to dispute a bureau report, the counterparty who is making false claims…is the same bureau. Or the bureau’s algorithm might be discriminatory such as to disfavor the class of people to which the individual belongs.Footnote 38 Shoshanna Zuboff and others have railed against the rise of “surveillance capitalism” and the society decay accompanied by the growth of the oligopolistic data platform companies like Facebook and Google.Footnote 39

Generally speaking, the attitude about data privacy is weighed against public good. Someone’s right to medical privacy is typically not superseded by the public’s right to be aware of an infectious disease crisis; even if Patient Zero’s identity is protected, the public needs to be aware that the disease is spreading from X location, so that steps may be taken to contain the infection. In some domiciles, the identities including photos of sex offenders are published; in other domiciles, this is viewed as punitive rather than rehabilitative and other measures are taken. Yet, where is the line drawn with respect to digital data? If someone threatens on a Facebook posting to blow up a school, that is certainly a safety concern; should public officials take steps around the individual, the school, or both? What if an individual simply states a preference for a political candidate who holds extremist views? In one instance, most domiciles err on the side of caution for society, in the second instance, most domiciles would protect the identity of the individual. But is a posting on a Facebook page truly private? The affirmative statement that protects classes of personal data is one that only recently has come to be codified in statute and regulation, as we will discuss later in the chapter.

Cyber (In)security

Hackers have noticed the value of personal data and have engaged in large-scale data theft over the past few years. Unfortunately, as we point out in our book New Solutions for Cybersecurity, these massive data stores have been inadequately secured. The Aadhaar biometric and demographic database of 1.2 billion Indians was breached with an individual record for sale for a reported Rs 500 (about €6.50).Footnote 40 Equifax saw its entire US database stolen, about 148 million Americans.Footnote 41 As more and more personal data is acquired and analyzed and stored, it creates ever-more-tempting targets for cybercriminals, both those acting purely from a profit motive as well as a growing array of state-sponsored data thieves. Economic analysis can reveal the trade-offs in terms of the cost of implementing better data security versus the benefits of mitigating societal, business or individual harm.Footnote 42

Part III: New Generation Data Ecologies

We have now established the value of personal data, considered the costs of personal data privacy and impacts of poor cybersecurity, and hinted at some of the opportunity embedded in rich data streams. In this section, we are going to investigate powerful computational tools that assist with positive social change, and the necessary privacy and personal data governance regulations that accompany them—laying the foundation for the distributed data economy. Without appropriate protections, these richer data streams offer a significant challenge with respect to protecting consumers from exploitation.

Social Physics of Personal Data

More than a decade of research by Prof. Alex Pentland at the Massachusetts Institute of Technology and its collaborating institutions has derived a new computational social science of “social physics”.Footnote 43 Placing machine-learning rigor behind Adam Smith’s musings on communal good,Footnote 44 social physics has uncovered new transparency and insights into society, and has enabled interventions at scale such as mapping how vaccines could reduce the spread of malaria in sub-Saharan AfricaFootnote 45 and helping a region in central Europe reduce energy usage by 17% to “go green” (enabling them to only use renewables and not rely on fossil fuels to power their homes) at a fraction of the economic cost of conventional methods.Footnote 46 These insights have been derived from analyzing anonymized, aggregated datasets consisting of the tiny digital traces people leave throughout the day by using their credit cards and mobile phones.Footnote 47

This naturally has lead Prof. Pentland and his collaborators to the necessary twin of social physics insights, the domain of personal data privacy. Pentland chaired the World Economic Forum privacy working group that evolved a set of principles he termed the “New Deal on Data”.Footnote 48 This thinking offers direct lineage to the emergence of new privacyregulations in Europe and elsewhere.

Emergent Regulation: GDPR, PSD2, Open Banking, and the California Consumer Privacy Act

The European Union has been highly sensitive to, and progressive on, the topic of personal data and personal data monetization. Cognizant of the data privacy issues and the impacts apparent from the exploitation of consumer data by private sector interests, the EU has promulgated a body of law and regulation to change the frame.

The first major legislation, GDPR, helps establish basic rights for the individual against the corporate conglomerate, around ideas like ownership of personal data, governance (control) over personal data and “the right to be forgotten”. It also introduces penalties for data breaches, a significant step towards helping consumers understand the hidden economic cost of the shadowy realm of data brokers and data aggregators. An interesting artifact of GDPR is that it is enforceable for the rights of European citizens even if they are not physically present in Europe i.e. a vacationing French family in Florida using local internet or telecom services would enjoy the same protections, in theory, as if they were in Toulouse. Lesser known is the fact that GDPR creates a de facto open banking mandate around data portability.Footnote 49 In the USA, California passed a similar regulation (the California Consumer Privacy Act).

Its sister regulation, the Second Payment Services Directive (PSD2), enables personal portability of critical data, in this case, bank data. The UK passed a very similar regulation, Open Banking. In each case, the goal is to move away from a model where an oligopoly of large corporations create insurmountable switching costs for consumers, and towards a model where there is increased competition (accompanied by hopefully lower prices and/or better service) for consumers because their personal financial data becomes portable. An added benefit that delivers second-order economic cost improvement both for consumers and banks is better cybersecurity; API’s offering a more robust cyber protocol than the previous market of “screen scrapers” that would pretend to be users and log into different banking websites to collect personal financial data on behalf of consumers. Quite harmonious in principle with GDPR, the open banking mandates that requires personal consumer financial data to become transparent and portable subject to the individual consumer’s desires.Footnote 50

In practice, compliance with these two regulations is proving to be challenging for companies.Footnote 51 New solutions that incorporate distributed ledger and artificial intelligence may enable not only compliance with GDPR and PSD2, but also enable personal monetization of distributed data for the benefit of the individual, rather than the corporate actor such as one of the FAMGAs (Facebook-Amazon-Microsoft-Google-Ali baba). We will discuss this in the section on the distributed data economy.

Part IV: The Distributed Data Economy

In this final section, we will discuss the potential and the enablers surrounding distributed data. Notice the shift in language as we progress through this chapter, from economics to ecologies to economy. We are self-consciously envisioning an evolution to a higher order of societal organization.

Laying the Groundwork for Distributed Data

We are not quite at a place in the progression of both our technologies and of our legal and business frameworks to harness the full potential of distributed data. Too, we need to improve data literacy if we hope for individuals to be able to appreciate, fully understand and take advantage of the benefits that can arise from distributed data. Before these actions can take place, we have core infrastructure challenges to address to create the necessary conditions for a distributed data future:

  1. A.

    First, we need to source more and better data. Data quality remains one of the unspoken tragedies of the big data revolution. “Water, water, everywhere / Nor any drop to drink”.Footnote 52 All of those zettabytes of data being produced, much of it personal data, and yet very little attention is paid to the actual quality of the information. Vicki Raeburn, who was Chief Data Quality Officer, shared that she would have to go lie down if she spent too much time considering the current quality of the “big data” that is being proudly discussedFootnote 53… made immutable, thanks to blockchain. To bowdlerize Kirsty Rutter, former Chief Innovation Officer of Barclays UK, when she spoke on a panel about DLTs in the spring of 2019, “immutable [garbage]”. Another input into the data equation is the characteristic of data transience (as stated earlier, “data is a river, not a rock”). Particularly as we start to explore social physics, we find that dynamic data flows need new management and technology processes to ensure accuracy of predictions. We need to move data systems to the point of near-real-time analysis, which perhaps can be partially enabled by systems such as OPAL that we will discuss below. Measurement and performance management of data systems has an imperative of managing and mitigating data senescence.

  2. B.

    Second, we need better analytics conducted on that data. At this writing, we are still in the very infancy of big data analytics. Let us recall that advances such as Google-incubated TensorFlow are not even four years old.Footnote 54 Newer systems like Endor’s artificial intelligence prediction engine are just barely beginning to scale.Footnote 55 Hybridized systems that combine human intuition and synthesis with machine analysis are emerging, yet need substantially more research and development before they become industry standard. New maths are emerging and new capabilities will arise as breakthroughs like quantum computing (today very much a laboratory technology) become commercially available.Footnote 56

  3. C.

    Then, we can evolve into fully distributed data networks. While DLTs are computationally expensive, this is purchased with the coin of economic benefit that can be derived from better sharing of the right information at the right time under the right controls (data governance). IPFS and other hybridized schemes that have “on chain” and “off chain” data storage enable the transparency and resiliency of a distributed ledger with the scalability of massive data sets.

Distributed Data Ethics

Increasingly attention is being paid to the ethics of big data and artificial intelligence, and the complexity of addressing these issues will only increase in a distributed data world. Research ethicists, for example, are raising questions about how conventional university approaches to protecting individuals break down in the face of big dataFootnote 57; ethicists have not yet begun to explore seriously what this means in the distributed data world.

Data systems become geometrically more useful with the application of artificial intelligence, and advanced artificial intelligence (AI) systems such as machine learning and deep learning are powered by large volumes of data. Accordingly, the data discussion and the AI discussion quickly converge. Technology scholars such as Luciano Floridi of the Oxford Internet Institute have proposed a framework approach to AI ethics, based on reviewing numerous ethical frameworks and models and converging on 5 pillars of ethical AIFootnote 58 summarized below:

  1. (1)

    Beneficence: AI should be doing good for society.

  2. (2)

    Non-maleficence: AI needs to go further than just doing good, it also needs to make sure that it doesn’t create harm (a consumer might have a shorter commute thanks to a driving map application, but what if the data about their commute patterns is then exploited to malignant ends?). This is where issues like data privacy and data security come into play.

  3. (3)

    Autonomy: in creating systems of autonomous machines (which become even more difficult to manage in a distributed data economy), we need to ensure that human autonomy isn’t compromised. People still need to make decisions about things that affect them.

  4. (4)

    Justice: the AI systems should both be fair, and promote fairness. Algorithmic discrimination by AI has been written about extensively,Footnote 59 and societal values needed to be embedded into AI-driven systems to ensure that technology aligns with law and morality.

  5. (5)

    Explicability: the decisions made by the AI need to be understandable by humans, which helps ensure accountability for them.

A similar model might be conceived of for a distributed data economy. Indeed, as distributed systems become more widely adopted, an ethical framework for the distributed data economy becomes imperative, since distributed systems are intrinsically more difficult to control from a central source. How can you audit the code and activities of thousands of servers in a complex encrypted network, given the difficulties in doing so with a handful? How do you monitor the content of secure, encrypted communications streams that may be making decisions that are adverse to society?

Governance for the New Data Order

The OPAL Project (www.OpalProject.org) provides a federated approach for managing the mechanics of a highly distributed data environment that is nonetheless useful, and allows for better control of the ethical dimensions discussed in the previous section. The basic principles of OPAL focus on improving data security and data governance. They are highly congenial with a distributed data future.

The old style of handling data, which leads to a number of the data insecurity challenges seen with Equifax or Aadhaar, entails accumulating a large volume of data in a single repository, where analytics can then be conducted on it. The information theory of centralization posits that it is more efficient from a computer systems management perspective both to maintain the database and to perform analytics. Economic analysis has shown that when there is low uncertainty, it is more economically beneficial for a company to centralize, albeit with reduced flexibility.Footnote 60

By bringing the code to the data, instead of the data to the code, we have an opportunity to dramatically improve information security. Contemporary information management systems, notably blockchain or current-generation distributed ledger technologies, provide for a robust code architecture to manage these more complex information flows. Next generation systems that implement near-homomorphic encryption, such as Enigma, offer the potential for exponential improvements on information security and encryption while maintaining sufficient flexibility and accessibility for the data system to be useful in a number of applications (Fig. 5.1).Footnote 61

Fig. 5.1
An illustration depicts the old and the new methods of data security. The old method involves the movement of the data into code while the new method follows the reverse approach.

(Source The author)

The OPAL method to protect data

Distributed ledger technologies appear tailor-made for these types of distributed data systems, and standards like OPAL create a body of coherency around how algorithms and data governance are managed. Bringing these systems together pose a viable platform on which GDPR and PSD2 compliance can be maintained, while extending more control to the users and better security over the data.Footnote 62

Data for the People

A distributed data economy is only possible with sufficient data literacy of consumers to understand, and take action around, the monetization (and protection) of personal data. Market demand to support data aggregators who act on behalf of consumers, a new kind of “data co-op”, will only emerge at sufficient levels to support a distributed data economy if consumers attain greater levels of sophistication around how much various types of data are worth, and how those consumers can govern and manage their personal data economics.

The idea of consumer-powered personal data markets isn’t new; companies like Datacoup have been trying for years to get critical mass.Footnote 63 A common expression in data-aware circles is “If you’re not paying for the product – you are the product” (attributed to various individuals).Footnote 64 Too few consumers are conscious of this dynamic. The issues of consumer data illiteracy have been (1) the lack of a mandate that empowers consumers around their data (now being solved with regulations like GDPR) and (2) sophistication among users. The UN has highlighted this as a priority within its World Data Forum, with participants stating that “improving data literacy was needed”.Footnote 65

The models already exist for providing greater information literacy. Frameworks have been proposed for large-scale community engagement powered by data, posing questions about how data can be used for public good and asserting that control mechanisms built into distributed data, like OPAL, can manage the tension between personal data privacy and benefits to society.Footnote 66 Through a concerted set of actions, data literacy can be introduced to different segments of society, and create the necessary ingredients to promote the distributed data economy.

Yet data literacy by itself is insufficient. It’s not only data, but metadata (abstractions drawn from a collection of or interpretation of data) that powers many of the artificial intelligence models today, and will even more so in the future. Informed consent by users, and therefore user data literacy, needs to take into account not only the specific data elements an individual might be exposed to a company (for example, the person’s location to enable the use of a map application) but also the inferential insights derived from that (wealth, income, and credit score can be estimated based on pattern analysis of a collection of data points related to locationFootnote 67).

Distributed Data Policy

Policymakers globally are actively grappling with questions of how to engage with distributed data and how to regulate it, while also managing the potential for innovation and new enterprise formation it contains. Governments around the world, for example, are contemplating CBDC projects, bringing government coffers directly in line with distributed data opportunity. More than 120 national data privacy laws have been put in place,Footnote 68 risking a Tower of Babel in the absence of harmonization as data moves cross-border but is regulated locally.

Areas that the European Union, for example, could pursue with respect to data policy include:

Robust Governance. Encouragement by regulators of the private sector use of distributed ledger-based framework approaches, such as OPAL, would help to harmonize activities, streamline oversight and deliver the benefits of standards in terms of market formation and market growth. These efforts in Europe could be tied to the large-scale funding already allocated for blockchain investment.

Adaptation and Innovation Support. GDPR and PSD2 merit active review and augmentation, as would privacy and data portability regulations more broadly (particularly in light of how GDPR and PSD2 have been used as models by other jurisdictions). Now that Europe has had time to see how corporations are attempting to comply in practice, adjustments can be made to the frameworks and the interpretation guidelines. For example, an unintended consequence of GDPR has been to make it more difficult for smaller companies to comply, and introducing new costs and business and financial risks to start-up ventures, although potentially also have created areas of opportunity.Footnote 69 Steps that regulators and policymakers can take to modulate the effects of these regulations on innovation include:

  1. (1)

    Progressive regulatory models, similar to how some jurisdictions have addressed financial services licensing (e.g. the Bank of England’s e-money, “halfway” and “full” banking licenses in an effort to support challenger entry into the banking sector).

  2. (2)

    More “sandboxing” opportunities for start-ups to engage with regulators in a contained environment, where issues can be candidly raised and addressed, and greater compliance capacity within start-ups developed. “Tech sprints” and hackathons around distributed data, data privacy, and data portability, with regulator and policymaker involvement, become another mechanism to simultaneously build government capacity around new technologies and align private sector activity with areas of opportunity.

  3. (3)

    Safe harbor exemptions for defined activities for small and medium sized enterprises (SMEs), so that the cost of compliance does not drive them out of the market

  4. (4)

    Encouragement of and potentially funding for private sector compliance-as-a-service providers, which could also help reduce the costs for individual SMEs while maintaining quality and rigor.

Coordination. Further coordination among the European Union, the OECD, the UN data agencies and the G20, along with other bodies such as ASEAN, African Union, OAS and Caricom, would help mitigate the risks of distributed data policy disharmonization on the one hand, and “jurisdiction shopping” by large corporate interests on the other (which run the risk of undermining data policy). The EU’s Gaia-X project, on the one hand, provides some independence from the purely corporate models of virtualized data and data governance such as those offered by Amazon or Microsoft.Footnote 70 On the other hand, it’s an EU-only initiative at present. Could it be offered to other nations as well? In a distributed data world, data is global, and government response needs to be global, not just regional.

By aligning these areas of Robust Governance, Adaptation and Innovation Support, and Coordination, the disruption that distributed data represents can be better managed, and the economic benefits realized while mitigating potential harms to society.