1 Introduction

In the digital ecosystem, data is considered to be the key ingredient for many of today’s revenue models, crucially determining whether a service is successful. At the same time, the protection of (personal) data, users and competition becomes increasingly important for policy makers and competition authorities. For example, the European antitrust investigations against Google attribute either to the observation that consumers might be disadvantaged or that competition and innovation is hampered (c.f., Drozdiak and Schechner 2016, for an overview of European antitrust probes against Google). In fact, personal data entered or revealed at a specific online service may lead to a lock-in effect for users as switching to competing services induces costs to re-enter the data required by the new online service (c.f., Klemperer 1987a, for related research). Hereby, (dominant) online services may benefit, but innovation and service variety might be reduced as market entry is deterred. Illustrative examples of data-induced switching costs are provided by online banking accounts (where switching leads to the necessity to re-enter recurring transferals), online mail or storage services (where switching leads to the necessity to re-enter general user information, and to re-upload files, photos, contacts or categories), or cloud computing environments (where preferences and adaptations have to be re-injected). These services suggest that a lock-in does not necessarily stem from network effects alone, i.e., the number of participating users or complementary provided services. Instead, as Chen and Hitt (2002) analyze empirically, there is a variety of factors (additionally) influencing a user’s loyalty. We build on these observations and argue that the (amount of) already revealed (personal) data is a crucial factor for (1) online services active in data-driven markets because it determines the service’s competitive strength and thus, profitability, and also for (2) users because they might be locked-in to a certain service.

It is well known that established systems designed to lock-in users may hamper the success of new services and lead to excessive rents of incumbent firms (c.f., Katz and Shapiro 1994; Farrell and Klemperer 2007) and – eventually – to market failures. In this spirit, the European Commission has recently formulated a general “right to data portability” for personal data. Consequently, a standardized way of how information that has been actively provided can be ported from one online service to another is required (c.f., European Commission 2016b, p.45, Article 20); an issue most voluntarily provided functionalities for users to export previously revealed data do not explicitly account for (c.f., Facebook 2018; Google 2018), and an issue also highlighted by the Deputy Chief Technology Officer of the United States (c.f., Macgillivray and Shambaugh 2016). Ultimately, and especially in combination with the “right to erasure” (c.f., European Commission 2016b, p.43, Article 17), the European Commission’s initiative aims to promote users’ negotiation power vis-à-vis (dominant) online services by reducing lock-in effects, i.e., protecting the “fundamental rights and freedoms of natural persons” (c.f., European Commission 2016b, p.32, Article 1). However, the economic effects of such an intervention on consumer’s surplus, on the amount of data online services collect from their customers, on online service’s profits, and on service variety are unclear to date. Albeit the regulation is binding for all European member states since May 2018, academic analyses have so far been limited to the legal and technical dimensions of data portability. An analysis of strategic incentives, business strategies and economic outcomes is lacking, as Nobel prize laureate Jean Tirole outlined in his speech on competition and regulation of online platforms (c.f., Valero 2016.)

This paper addresses this research gap and analyses the competitive effects of a user’s ability to port data from an incumbent online service or content provider (CP) to a market entrant. Hereby, we analyze the CPs’ incentives (not) to promote data portability and their business strategies in data-driven markets. Additionally, we shed light on the ensuing effects on consumers as this is pivotal to the argumentation of the European Commission and the U.S. Deputy Chief Technology Officer, alike. In doing so, we develop a game-theoretic model that considers the economic effects arising from a right to data portability by considering two CPs generating revenues primarily through data revealed by users active at their platform. Thus, we abstract from any explicit revenue model (e.g., based on advertisements, or based on selling aggregated user-data to third parties), and from additional revenue streams (e.g., services based on a subscription model) by simply assuming that data revealed by users can be transformed into revenue. Hence, additional data has a positive effect on a CP’s profits. On the other hand, revealing data bears costs (i.e., a disutility) for users: either they have some effort revealing data as such (say, the time needed to enter the data), or – more general – users give away data, to which they attribute some value to (say, privacy costs in a broader sense). Consequently, whereas collecting more data is beneficial for CPs, users experiencing a higher disutility might switch to competing CPs or even leave the market. However, users’ ability to do so is impeded by established switching-costs and lock-ins. The ability to port data by means of data portability arguably lifts the established restrictions on users, but may also impact the CPs’ data consumption. These effects have to be taken into account when analyzing the competitive effects.

Our obtained results show that data portability is not necessarily beneficial for users because CPs entering the market have an incentive to increase the amount of data users have to reveal. Thus, the ultimate goal to protect users is not necessarily achieved. Conversely, the CPs’ incentives (not) to promote data portability are unambiguous if the costs for implementing a right to data portability are zero or comparably low: Whereas dominant CPs (incumbents) always suffer from data portability, emerging CPs (entrants) challenging incumbents are better off. However, as total surplus increases under a data portability regime, predominantly due to the arising benefits for the entrant who is able to generate higher revenues, the decision to enforce a right to data portability is far more complex than currently realized.

2 Literature Review

We refer to data portability as consumer’s ability to transfer (personal) data revealed at one CP to another CP. To the best of our knowledge, the IS literature has so far not considered this concept explicitly in terms of strategic incentives, business strategies, or economic outcomes. Albeit, the technical literature demonstrated the feasibility of that concept by proposing models to conveniently port data, e.g., between cloud computing vendors. In this vein, Ranabahu and Sheth (2010) propose semantic web techniques to achieve portability and Petcu and Vasilakos (2014) inter alia highlight open standards and open application programming interfaces as technical solutions. Thus, most technical studies provide a proof of concept that data portability is technically feasible but do not explicitly discuss the possible trade-offs for the involved parties.

In light of the General Data Protection Regulation, which has become effective in May 2018, several legal investigations have been carried out. Graef (2015) conducts a legal analysis of data portability in social networks with respect to the (European) competition law and summarizes relevant cases. Vanberg and Ünver (2017) inter alia highlight arising security issues as well as “disproportionate costs for small and medium sized companies” (Vanberg and Ünver 2017, p.14) induced by introducing a right to data portability. Swire and Lagos (2013) explicitly refer to consumer welfare and “express serious concerns about the RDP [right to data portability]” (Swire and Lagos 2013, p.338) because, (1) the problems addressed by the regulation (e.g., monopoly power through lock-ins) were legally already covered by competition law, (2) personal data could easily be exported, i.e., security problems arise, and (3) it was unclear how a common standard could be achieved if a variety of different service providers were involved. The authors conclude that “the proposed RDP appears to reduce consumer welfare” (Swire and Lagos 2013, p.379), but do not offer or discuss economic incentives or outcomes, which additionally highlights the necessity of economic backing in this context.

Moreover, this study is related to two strands of the economic literature, which will be highlighted in the following. First, as we assume users to be locked-in when using a data-intensive online service due to costs to port these data, we draw on the literature investigating the role of switching costs. The results derived from this literature show that an incumbent firm has an incentive to lower its price anticipating that an entrant enters the market (Klemperer 1989). In essence, firms thus fiercely compete in early periods to gain market shares which can then be harvested in later periods (Klemperer 1987a, b). Hence, switching costs induce softened competition in later periods which allows the remaining firms to set higher prices. Indeed, as Gehrig and Stenbacka (2004) show analytically, competing firms have an incentive to establish high switching costs. The authors show that these can be achieved by (maximum) horizontal differentiation (additionally, see Hotelling 1929; d’Aspremont et al. 1979). Within the taxonomy introduced by Ray et al. (2012), our study deals with “user-related” switching costs as they include the effort a user needs to invest to “ensure a satisfactory switch of service and to recreate or transfer features” (Ray et al. 2012, p. 199). More precisely, one may argue that within the framework provided by Ray et al. (2012), transfer costs are of particular importance to users. To demarcate our approach from previous literature related to the existence of switching costs and lock-ins, the fact that we assume data to be the considered good, which inherently determines the degree of switching costs as well as firm’s profits (c.f., Sect. 3 for details) is crucial and should be highlighted. Hence, the strategy derived from the traditional switching cost literature would induce to set lower prices in early periods (i.e., collect less data) to deter entry and gain market shares which can thereupon be harvested. This, in turn, is not necessarily the equilibrium strategy of an incumbent in a data-driven market environment, as (1) switching costs would be lower in succeeding periods and (2) profits in later periods from data already gained in early periods would be reduced. These specific aspects of the competitive environment further delineate our approach from, e.g., Caminal and Matutes (1990) who consider endogenous switching costs.

Second, our study on data portability is related to the strand of the (economic) effects stemming from interoperability. Within this strand, the literature on compatibility and standardization between different services, especially the ensuing effects of the availability of converters as considered by Farrell and Saloner (1992), should be highlighted. In their theoretical model, Farrell and Saloner show that the availability of (imperfect) converters allows users to benefit from other users using a competing technology, i.e., a converter induces benefits through compatibility. Thus, direct network effects resulting from interoperability are a central aspect of the depicted model. Another important view on interoperability is highlighted within the study conducted by Pollock (2009). Pollock evaluates the effects of controlling the possibility to convert “’software’ or services’ associated with one platform to run on another” assuming a two-sided market (Pollock 2009, p.155). Thus, Pollock considers interoperability being determined by indirect network effects. Additionally, the impact of the ability to control the mode of interoperability itself is investigated. Thus, the author allows the platform to directly control the costs of flow of information, i.e., the costs for interoperability. However, although interoperability plays a pivotal role in online markets, the mentioned studies do not depict the concept of data portability for several reasons. In general, interoperability should not be confounded with the portability of data (c.f., Graef 2015). Additionally, next to several technical dimensions, the central economic distinction can be seen in (1) the role of network externalities, which are not necessarily relevant in the context of data portability as a user’s lock-in in data-driven markets is crucially influenced by the (amount of) data revealed at a certain online service and not solely by network externalities (c.f., examples provided in Sect. 1), and (2) the scope of the platform’s ability to control the flow of data: since the mentioned European regulation is binding for all services alike, most existing online services are left with no possibility to strategically set the amount of data that can be ported, i.e., online services are unable to control the costs for portability.

Our proposed game-theoretic model, which will be outlined in the following section, captures the trade-offs for the involved parties and considers the specific aspects of data-driven revenue models. We use this model to answer the following two main research questions:

RQ 1 :

How does a right to data portability affect the amount of data that online services collect?

RQ 2 :

How does a right to data portability affect consumers?

Additionally, we investigate the effects on an incumbent’s and an entrant’s profits, which arguably influences service variety and innovation, and investigate which regime (data portability or no data portability) is more efficient with regard to total welfare.

3 Outline of the Economic Model: Assumptions and Notation

We propose a two-stage, game-theoretic model in order to analyze the effects of introducing a right to data portability (\(d=P\)) vis-à-vis a regime without the possibility to port data (\(d=NP\)). The market environment is assumed to consist out of two content providers (CPs) and users having heterogeneous preferences over the set of content providers.

Content Providers. We consider a market with two competing, differentiated CPs (\(i=A,B\)) that offer substitutable services. To highlight the competitive effects of a right to data portability and to capture the implications on market entry and innovation, we consider two time periods (\(t\in \{1,2\}\)) and assume that CP A is active in \(t=1\) and \(t=2\), whereas CP B enters in \(t=2\). Thus, CP A might be classified as an incumbent content provider, whereas CP B is an entrant. Although both CPs offer substitutable services and CP B enters the market in a later point in time, due to the user’s preferences over the set of CPs (see explanation below), the offered services are horizontally differentiated (additionally, c.f., Irmen and Thisse 1998; Gehrig and Stenbacka 2004), i.e., users have different tastes for the services offered by the CPs. Formally, we therefore use the model proposed by Hotelling (1929) and assume that on a unit interval of length one – whereon users are uniformly distributed –, the incumbent CP A is located at \(x=0\) and the entrant CP B is located at \(x=1\) (see, e.g., Montes et al. 2018, for a similar setup). Moreover, in order to highlight the effects of introducing a right to data portability on data collection, we consider that services are free of charge, i.e., users need to reveal data and CPs are solely financed by the exploitation of this data, e.g., by showing (targeted) advertisements which is the prevalent revenue model on the internet (c.f., Dou 2004; Evans 2009; Anderson 2012) and a frequently used assumption in the related literature (c.f., Choi and Kim 2010; Kourandi et al. 2015; Krämer et al. 2018).

Users. Users are uniformly distributed on the interval between zero and one. Consequently, users are heterogeneous in their preferences over the set of CPs. Users patronize the CP which provides them the higher utility in each period \(t\in \{1,2\}\). This utility \(U^t_i\) is determined by a CP’s exogenously given base utility \(v_i\) (e.g., determined by the service’s functionalities, the quality of the content, the ease-of-use), the amount of data a CP requires from users, i.e., a CP’s data consumption \(r^t_A\) (which is the strategic variable of a CP and results in a disutility for users, see introductory examples stated in Sect. 1),Footnote 1 and the inherent preference of a user over the set of CPs, i.e., their tastes, which is determined by a user’s location x on the unit interval. Please note that the relevance of this location can differ between different market environments. To be able to analyze this aspect formally, i.e., to account for markets with diverting characteristics, the users’ preferences over the set of CPs are influenced by the parameter \(\tau\) specifying the mismatch costs for users (see Sun 2012, for a similar setup). If \(\tau\) is low, the users’ mismatch costs are low. Thus, users preferences get relatively less important in the considered market environment and vice versa. Ultimately, it can be argued that low mismatch costs lead to a higher competitive intensity in the considered market because an user’s decision which service to patronize is then predominantly determined by the CPs’ qualities and data collection (c.f., “Appendix 1” which is available online via http://springerlink.com for an overview of the notation used for the model).

In period \(t=1\), CP A serves the market as monopolist. With the introduced notation, a user located at x choosing to become active at CP A derives a utility of \(U_A^{1}(x)=v_A-\tau \cdot x-r_A^{1}\). See that a user’s utility does not depend on the amount of revealed data. Consequently, the level of data consumption by CP A does not affect the service’s quality because (1) all users need to reveal the same amount of data in order to keep programming efforts low, and (2) the principle of data minimisation manifested in the GDPR (c.f., European Commission 2016b, Article 5(1)c), makes it impossible for CP A to require unnecessary data. As a result, determining the active users in \(t=1\) is straight forward: only users deriving an utility larger or equalling zero will use the service offered by CP A, i.e., if \(U_A^1 (x) \ge 0\) a user is active at CP A and a user with \(U^1_A(x)<0\) does not use any service in period \(t=1\). We denote the resulting location of the indifferent user by \(x^{*,d,1}\) and only users located at \(x\le x^{*,d,1}\) are active at CP A in \(t=1\).Footnote 2 Note that the location of the indifferent user equals the market share of CP A in period \(t=1\). The strategic variable of CP A is \(r_A^1\), i.e., setting a comparably low data consumption level (\(r_A^1\)) leads to more users being active at that CP (i.e., the market share increases all else being equal). However, the profits per user will then be lower.

In period \(t=2\), CP B enters the market. Consequently, users can now choose between two competing CPs and select the one from which they derive the higher utility. In order to investigate the competitive effects of introducing a right to data portability, we assume the market to be fully covered, i.e., at least one user can potentially port her data from CP A to CP B (additionally, see “Appendix 2”). The utility a user derives from staying (in case the user has been active at CP A in \(t=1\) and does not switch to the competing CP B) or becoming active at CP A in period \(t=2\) is given by

$$U_A^{2}(x)= {\left\{ \begin{array}{ll} v_A-\tau \cdot x - r_A^{2} & \text {, if }\,U_A^{1}(x)\ge 0 \\ v_A-\tau \cdot x - r_A^{2} - r_A^{1} & \text {, else.} \end{array}\right. }$$

Note that \(r_A^{2}\) is the strategic variable of CP A in \(t=2\). CP A is free in its decision how much data to require in that period. However, we assume that users that stay at CP A (i.e., are active at CP A in period \(t=1\)and\(t=2\)) do not experience a disutility in \(t=2\) from data already revealed in \(t=1\). For example, if a user entered (personal) data (e.g., her name, address, date of birth, interests, or uploaded photos and documents), she does not have to re-enter, re-validate or re-upload this information. Conversely, users who were not active in \(t=1\) but decide to become active in \(t=2\) have to reveal all required data if they decide to become active in the second period. Thus, these users need to reveal data of \(r_A^{1} + r_A^{2}\). However, users may also use the competing CP B. A user located at x who becomes active at CP B in \(t=2\) derives a utility of

$$U_B^{d,2}(x)= {\left\{ \begin{array}{ll} v_B-\tau \cdot (1-x) - r_B^2 + r_A^{1} &{} {\text {, if }}\,U_A^{1}(x)\ge 0\,{\text {with\, data\, portability }}\,(d=P) \\ v_B-\tau \cdot (1-x) - r_B^2 &{} {\text {, else}}\, (d=NP\,{\text {or}}\,U_A^{1}(x)<0). \end{array}\right. }$$

The utility function \(U_B^{d,2}(x)\) captures the effect that users becoming active at CP B need to enter all required data (i.e., \(r^2_B\)) either if they have not been active in \(t=1\), or if there is no ability to port already revealed data \((d=NP)\). Additionally, the equation captures the effects of a right to data portability if users switch CPs: if users have been active at CP A in the first period, i.e., \(U_A^{1}(x)\ge 0\), and are able to port already entered data to the new CP without incurring any costs (as envisaged by the European Commission, \(d=P\)), they do not have to reveal this data again.Footnote 3 Based on the utility functions, the location of the indifferent user in period \(t=2\) (\(x^{*,d,2}\)) can be calculated. Again, the location of the indifferent user directly translates into the CPs’ market shares, i.e., \(x^{*,2}\) equals the market share of CP A and \(1-x^{*,2}\) equals the market share of CP B.

Content Providers’ Profits. Based on the market shares given by the location of the indifferent user, CPs’ payoffs can be specified by defining their profit functions. In our base model, we assume that CPs with data-driven revenue models benefit from data entered in one period also in later periods as the obtained information is still valuable to them (e.g., in terms of the ability to target ads, or tailor or customize services). However, we relax this assumption in Extension 5.3. Moreover, we do not consider any costs associated with the introduction of the right to data portability in our base model. However, we relax this assumption in Extension 5.1. Thus, for now, total profits of CP A after two periods are given by

$$\pi _A^d = \underbrace{x^{*,d,1}\cdot r_A^{d,1}}_{\pi _A^{d,1}} + \underbrace{x^{*,d,2}\cdot (r_A^{d,1}+r_A^{d,2})}_{\pi _A^{d,2}},$$

and CP B, which is only active in \(t=2\), makes total profits of

$$\begin{aligned}\pi _B^d&= (1-x^{*,d,1}) \cdot r_B^{d,2} + (x^{*,d,1} - x^{*,d,2}) \cdot ((r^{d,2}_B-r_A^{d,1})+r_A^{d,1}), \\ \pi _B^d&= (1-x^{*,d,2})\cdot r_B^{d,2}. \end{aligned}$$

Note that we implicitly made two further assumptions. First, we assumed that CPs cannot discriminate between old, new and switching users, i.e., the amount of data a CP requires from a specific user in \(t=2\) is independent of this user’s decision in \(t=1\). Thus, all users active at a CP need to reveal the same amount of data (we refer to the limitations in Sect. 6.2 for a discussion of the implications if this assumption is relaxed). Second, we assumed that all data that is transferred to CP B is valuable for the entrant. We relax this assumption in the second extension of the base model (see Extension 5.2).

Timing of the Game. To summarize, the considered two-stage game proceeds as follows:

Stage 1 :

The incumbent CP A sets the amount of required data \(r_A^{1}\) for period \(t=1\) anticipating CP B’s action in period \(t=2\). Then, users decide whether to become active at CP A (if \(U_A^{1}(x)\ge 0\)).

Stage 2 :

Both CPs simultaneously set the amount of required data for period \(t=2\), i.e., CP A sets \(r_A^{2}\) and CP B sets \(r^2_B\). Again, users then decide at which CP they choose to become active. Under the full market coverage assumption, users in \(t=2\) are active at exactly one CP. If \(U_A^{2}(x) \ge U_B^{d,2}(x)\), users are active at CP A and vice versa.

Figure 1 illustrates the assumed market setting. Here, squares above the user depict the (net) amount of data (illustrated by symbols) different users \((j=1,2)\) would have to reveal in the considered period for becoming active at the respective CP. In contrast, circles underneath the CPs indicate the amount of data a CP requires. In the illustrated scenario, user 1 is active in period one, whereas user 2 becomes active only in period two. Without data portability (upper illustration in Fig. 1), user 1 has to re-enter the data already revealed to CP A at CP B, if she wants to switch to CP B in the second period (thus, she needs to re-enter: star, moon and heart, and additionally needs to enter: thunderbolt). In contrast, with data portability (bottom illustration in Fig. 1), user 1 has the ability to port her already entered data and thus only has to enter the net amount of required data (here: thunderbolt) if she wants to switch to CP B. For user 2, who has not been active in the first period, both cases are identical, i.e., user 2 has to enter all of the CP’s required data independent of the considered regime (i.e., star, moon, heart and sun to become active at CP A or star, moon, heart and thunderbolt to become active at CP B). Note that Fig. 1 only illustrates the (net) amount of data that is required by the CPs and needs to be entered by users in the respective period. A user’s actual decision which CP to patronize is not illustrated in Fig. 1 because it depends (inter alia) on the base utilities.

Fig. 1
figure 1

Illustration of the regimes without (top) and with (bottom) a right to data portability. Note The effect of introducing a right to data portability is relevant in period \(t=2\) for users becoming active at the entrant CP B (see highlighted amount of data users need to reveal). Without data portability \((d=NP)\), user 1 has to re-enter her already revealed data if she switches to CP B in \(t=2\). However, with data portability \((d=P)\), user 1 has the ability to port her already entered data. User 2 did not enter any data in \(t=1\). Consequently, irrespective whether a right to data portability is introduced, she cannot port any data in \(t=2\)

4 Model Analysis, Results, and Discussion

We solve for the subgame perfect Nash equilibrium through backward induction beginning in Stage 2 to deduce the equilibrium amounts of required data (c.f., Sect. 4.1). The results are successively used to analyze the effects on CPs’ profits (c.f., Sect. 4.2), consumer’s surplus (c.f., Sect. 4.3) and total surplus (c.f., Sect. 4.4).

In Stage 2 both CPs compete for users and revenues. Consequently, a CP’s decision is affected by the decision of its competitor and the corresponding actions of users, i.e., the CPs take into account the amount of data required by the competing CP. Consequently, the payoffs of the CPs are affected by both CPs’ strategic variables \(r_i^t\). Analytically, these effects are captured by simultaneously solving and maximizing \({\partial \pi _A^{d,2}}/{\partial r_A^{2}}=0\) and \({\partial \pi _B^{d}}/{\partial r_B^2}=0\), which yields the CP’s equilibrium amount of required data for period \(t=2\) (c.f., Sect. 4.1 as well as “Appendix 5” highlighting the second order conditions). In doing so, we need to calculate the location of the indifferent user in \(t=2\) by accounting for the different regimes: If users have the possibility to port their data \((d=P)\) and were active in period one, the indifferent user in \(t=2\) can be calculated by solving \(v_A-\tau \cdot x - r_A^{2} = v_B - \tau \cdot (1-x) - r_B^2 + r_A^{1}\). If users do not have the possibility to port their data \((d=NP)\), but were active in period one, the indifferent user in \(t=2\) can be calculated by solving \(v_A - \tau \cdot x - r_A^{2} = v_B - \tau \cdot (1-x) - r_B^2\). Technically, the indifferent user in period two might also be located right to the location of the indifferent user in period one, i.e., \(U_A^{1}(x^{*,d,2})<0\). We do not explicitly analyze this case within the main analysis (see “Appendix 3” for more details). To summarize, the indifferent user in \(t=2\) is located at:

$$x^{*,d,2}= {\left\{ \begin{array}{ll} - \frac{r_A^{2}+r_A^{1}-r_B^2-\tau -v_A+v_B}{2\tau } & \text {, if }\,U_A^{1}(x^{*,d,2})\ge 0 \qquad (d=P), \\ - \frac{r_A^{2}-r_B^2-\tau -v_A+v_B}{2\tau } & \text {, else } \qquad (d=NP). \end{array}\right. }$$

In Stage 1 CP A serves the market as monopolist. However, it anticipates the effects on second-period profits in its decision how much data to collect. Analytically, we use the equilibrium results of Stage 2 (i.e., \(r_B^{*,d,2}\) and \(r_A^{*,d,2}\)) to specify CP A’s profits over two periods \((\pi _A^d)\) and then solve and maximize \({\partial \pi _A^{d}}/{\partial r_A^{1}}=0\) to obtain the optimal amount of required data for CP A in period \(t=1\) (i.e., \(r_A^{*,d,1}\), c.f., Sect. 4.1 as well as “Appendix 5” highlighting the second order conditions). In doing so, we need to calculate the location of the indifferent user in period \(t=1\) by solving \(U_A^1=0\) with respect to x which leads to \(x^{*,d,1} = \frac{v_A-r_A^{1}}{\tau }\).

4.1 Amount of Required Data by the CPs

As outlined above, to calculate the amount of required data, we maximize the CPs’ profit functions considering both periods (for CP A) or only period \(t=2\) (for CP B). Successively, the equilibrium amounts of required data can be compared. Here, it can be seen that CP A requires a higher amount of data under the regime without data portability \((d=NP)\). Interestingly, the data consumption of CP A without data portability in the first period is even higher than the monopoly data consumption \(r_{Monopoly}^*\), i.e., the amount of data CP A would require without the entry of CP B:

$$r_A^{*,NP,1}=\frac{3\tau +10v_A-v_B}{17} > \frac{v_A}{2} = r_A^{*,P,1} = r_{Monopoly}^*.$$

This highlights the effect of anticipated entry: Intuitively, CP A requires a high amount of data to generate (higher) switching costs to weaken competition in later periods (i.e., generates data-induced switching costs). The effect of weakened competition even dominates the (negative effect of) reduced period one market shares and, compared to a regular one-period monopoly, reduced profits. The observation that CP A requires an even higher amount of data than in monopoly is, at first sight, in contrast to the traditional switching cost literature. Here, anticipated entry results in price wars lowering early-period prices to gain market shares, which can thereupon be harvested in later periods (c.f., Klemperer 1989, 1995). But, within our considered setting of a data-driven market environment, lock-ins are not generated by participation alone (e.g., positive network externalities or the functionalities of a service), which can be stimulated by low prices (additionally, c.f., Extension 5.4), but additionally by a user’s invested effort to enter, i.e., a user’s disutility to reveal (personal) data. Thus, lock-in effects do play a pivotal role for CPs in these market environments, although the underlying rationale differs compared to traditional market environments. This is because (1) data required by a CP (i.e., “prices” set) in early periods are directly relevant to CPs’ profits in later periods, and (2) the incumbent’s “price setting” is (additionally) constrained by entrants in later periods. With data portability \((d=P)\), the incumbent CP requires the monopoly amount of data. Because lock-in effects vanish through the users’ ability to port data to the competing CP in the following period, the incumbent CP cannot benefit from establishing lock-ins anymore. Consequently, CP A maximizes its profits in the first period by requiring the same amount of data it would require in a one-period game, where it acts as monopolistic CP.

Insight 1

Without a right to data portability, incumbent CPs anticipating the entry of a competitor have an incentive to create data-induced switching costs by increasing their data consumption to a level higher than in monopoly.

With respect to the amount of required data in the second period, this restricting effect is also observable: the incumbent CP always requires less data if users are able to port their data, i.e.,

$$r_A^{*,NP,2}=\frac{15\tau -v_A-5v_B}{17} > \frac{6\tau -v_A-2v_B}{6} = r_A^{*,P,2}.$$

Conversely, evaluating optimal data collection by the entrant (CP B) reveals that the required amount of data with a right to data portability is always higher than in the case without data portability:

$$r_B^{*,NP,2}=\frac{16\tau -9v_A+6v_B}{17} < \tau -\frac{v_A-v_B}{3} = r_B^{*,P,2}.$$

Intuitively, CP B requires more data with data portability because users that switch from CP A experience less disutility due to the possibility to port the already entered data. Thus, these users now only reveal the net amount of required data which is lower (i.e., \(r_B^2-r_A^{1} \le r_B^2\)), all else being equal, leading to higher market shares and profits for the entrant under this regime. Proposition 1 summarizes these findings:

Proposition 1

Under a data portability regime, incumbents always require less user data, whereas entrants unambiguously increase their data consumption level.

Next, to deduce possible business strategies for CPs (additionally, c.f., managerial implications in Sect. 6.1) and to analyze the factors influencing a CP’s data consumption in equilibrium, we conduct comparative statics, i.e., analyze the effects on a CP’s data consumption by changing the exogenous model parameters. First, we find that CP A’s period one data consumption increases in its base utility \(v_A\), whereas its second-period data consumption decreases in \(v_A\), i.e., \({\partial r_A^{*,d,1}}/{\partial v_A} > 0\) and \({\partial r_A^{*,d,2}}/{\partial v_A} < 0\) irrespective of the considered regime. The negative effect on the second-period amount of required data by CP A can be explained by the incumbent’s rationale to protect its market share in a competitive environment, i.e., after a competitor has entered the market: Through an increased base utility, CP A is able to require a large(r) amount of data in period one. Protecting this market share in period two (through a comparably low amount of required data in this period) dominates the positive effects arising from requiring more data in the second period. On the contrary, if its base utility is decreasing, protecting market shares does not dominate the positive effects of requiring additional data in period two. Second, an increase in CP B’s base utility \(v_B\) lowers CP A’s data collection: in period one to increase the share of users that are locked-in, in period two due to stronger competitive forces. Since the lock-in effect vanishes with data portability, the period one amount of required data is unaffected by \(v_B\). In conclusion: \({\partial r_A^{*,NP,1}}/{\partial v_B}< 0, {\partial r_A^{*,P,1}}/{\partial v_B} = 0, {\partial r_A^{*,d,2}}/{\partial v_B} < 0\). Third, the mismatch costs of users \((\tau )\) have an unambiguous effect on CP A’s data consumption: the higher the mismatch costs, the higher the amount of required data, i.e., \({\partial r_A^{*,NP,1}}/{\partial \tau } > 0\) and \({\partial r_A^{*,d,2}}/{\partial \tau } > 0\), because high mismatch costs reduce the competitive intensity in the market as a user’s location, i.e., a user’s preferences over the set of CPs, gets relatively more important. Finally, for CP B, comparative statics show that an increase in the competitor’s base utility \((v_A)\) reduces the amount of required data. In contrast to the incumbent, an increase in the own base utility \((v_B)\) unambiguously increases the amount of required data. The effect of the mismatch costs for users on CP B’s data consumption is qualitatively the same as the effect on CP A’s data consumption, i.e., the higher the mismatch costs, the higher the amount of required data. Thus, it can be summarized that:

Insight 2

A (in terms of service quality) strong competitor or low mismatch costs for users reduce a CP’s amount of required data. If a CP increases its own quality, it requires more data in the first period being active.

4.2 CPs’ Profits

To analyze CPs’ profits \((\pi _i^d)\), we evaluate optimal profits given the just derived equilibrium amount of required data. Within the feasible parameter range (c.f., “Appendix 2”), the incumbent always suffers from data portability (i.e., \(\pi _A^P \le \pi _A^{NP}\)), whereas the entrant always benefits from data portability (i.e., \(\pi _B^P \ge \pi _B^{NP}\); see “Appendix 6” for analytical details). Thus, since data portability unambiguously increases an entrant’s profits, service variety (and innovation) is arguably increased because entrants are more likely to enter the market due to higher profits. Hence, if the market is dominated by a single firm, data portability may be a suitable device to foster competition.

Comparative statics show that an increase in the CP’s own base utility has always a positive effect on its profits. Conversely, an increase in the competitor’s base utility decreases a CP’s profits (i.e., \({\partial \pi _i^d}/{\partial v_i} > 0\) and \({\partial \pi _i^d }/{\partial v_{-i}} < 0\) for \(i=\{A,B\}\) and \(-i\) denoting the competing CP i). Interestingly, the effect of higher mismatch costs for users (i.e., an increase in \(\tau\)) is ambiguous: with respect to \(\pi _A^P, \pi _A^{NP}\) and \(\pi _B^{NP}\), higher mismatch costs are beneficial only if the competing CP (CP \(-i\)) is strong in terms of its base utility, i.e., \(v_{-i} \gg v_i\); arguably because the considered CP then focuses on users which are located close to it. Otherwise, the effect of the mismatch costs \(\tau\) depend on the characteristics of the considered market.Footnote 4 With regard to \(\pi _B^{P}\), the effect of the mismatch costs are unambiguous: the higher the mismatch costs for users, the higher the profits.

Insight 3

A right to data portability unambiguously increases an entrant’s profits arguably increasing service variety and innovation. In contrast, an incumbent always suffers under a data portability regime.

4.3 Consumer’s Surplus

To examine the effects on consumer’s surplus \((CS_i^d)\), we compare the users’ utility accounting for the different regimes. With respect to users active at CP A, consumer’s surplus for both periods is given by:

$$CS_A^d = \underbrace{\int _0^{x^{*,d,1}}U_A^{1}(x)dx}_{\text {period } t=1} + \underbrace{\int _0^{x^{*,d,2}}U_A^{2}(x)dx}_{\text {period } t=2}$$

Note that users active at CP B differ with regard to their utility under the regime with data portability depending on whether they have not been active in the first period (and consequently have a utility of \(U_B^{NP,2}\)), or whether they have been active in the first period, switch from CP A to CP B and port their data. Hence, the latter group has a lower disutility for a given amount of data required by CP B (and thus, has an utility of \(U_B^{P,2}\)). If data portability is not enforced, all users becoming active at CP B derive a utility of \(U_B^{NP,2}\). In conclusion, consumer’s surplus can be calculated by:

$$CS_B^d= {\left\{ \begin{array}{ll} \int _{x^{*,P,2}}^{x^{*,d,1}}U_B^{P,2}(x)dx + \int _{x^{*,d,1}}^1 U_B^{NP,2}(x)dx & {\text {, with\, data\, portability}}\, (d=P), \\ \int _{x^{*,NP,2}}^1 U_B^{NP,2}(x)dx & {\text {, without\, data\, portability}}\, (d=NP). \end{array}\right. }$$

By comparing consumer’s surplus in equilibrium, it can be seen that a regime without data portability may leave users actually better off. Thus, the sum of consumer’s surplus at both CPs can decrease with introducing a right to data portability, i.e., \(CS_{A+B}^P = CS_A^P+CS_B^P<CS_A^{NP}+CS_B^{NP}=CS_{A+B}^{NP}\) (see “Appendix 7” for analytical details). Consequently, although data portability is most commonly justified by the potential benefits for end customers (c.f., Macgillivray and Shambaugh 2016; European Commission 2016b), this goal is not necessarily achieved.

Moreover, it can be shown that (relatively) high mismatch costs for users may lead to users being worse off with a right to data portability, i.e., the consumer’s surplus is reduced if the critical threshold (\(\tau _{CS}\)) is exceeded. More precisely, if \(\tau \ge \tau _{CS} := {(174v_B-822v_A+17 \sqrt{6658v_A^2-752v_A v_B+16v_B^2})}/{726}\), users are better off without a right to data portability (additionally, c.f., “Appendix 8”). Intuitively, as we have shown above, CPs require higher amounts of data if the mismatch costs for users are high (because \({\partial r_i^{*,d,t}}/{\partial \tau } > 0\)). This, in turn, increases the disutility a user derives from being active at the considered CP, which, consequently, reduces consumer’s surplus (i.e., \({\partial CS_{A+B}^{d}}/{\partial \tau } < 0\)). However, the threshold \(\tau _{CS}\) is not always within the feasible parameter range: If the CPs’ base utilities are relatively equal (i.e., \(v_B < v_{B,CS} := {447}/{160}\cdot v_A\)), consumers unambiguously benefit under a data portability regime. Additionally, higher base utilities always positively affect consumer’s surplus (i.e., \({\partial CS_{A+B}^{d}}/{\partial v_i} > 0\)). Proposition 2 summarizes these findings:

Proposition 2

The possibility to port data from one online service to another online service has ambiguous effects on consumer’s surplus. If both services offer a comparable service quality for users (i.e.,\(v_B<v_{B,CS}\)), consumer’s surplus always increases. However, if the entrant offers a better service (i.e., \(v_B \ge v_{B,CS}\)), users may suffer under a data portability regime if their mismatch costs to using a service are higher than \(\tau _{CS}\).

Figure 2 illustrates the possible negative effect on consumer’s surplus for a specific parameter constellation by showing total consumer’s surplus, as well as the consumer’s surplus at each CP with and without data portability for different mismatch costs.

Fig. 2
figure 2

Illustration of consumer’s surplus for different mismatch costs. Note Illustration of total consumer’s surplus with (\(d=P\), solid line) and without data portability (\(d=NP\), dash-dotted line) for \(v_A=1\) and \(v_B=4\). As \(v_B > v_{B,CS}\), users are worse off if \(\tau > \tau _{CS}\). Additionally, consumer’s surplus at each CP i for the different regimes is illustrated [dashed (dotted) lines refer to CP A (CP B, respectively)]

4.4 Total Surplus

Finally, total surplus \((TS^d)\) being the sum of consumer’s surplus and CPs’ profits, i.e.,

$$TS^d= \sum _{i=A,B} (\pi _i^d + CS_i^d)$$

is examined (see “Appendix 8” for analytical details). Within the feasible parameter range, it can be concluded that total surplus is unambiguously increasing with a right to data portability, i.e., \(TS^P>TS^{NP}\). Thus, although consumers might be worse off in some cases and CP A always experiences lower profits under a regime with a right to data portability, the increased profits of CP B always outweigh these effects.

Insight 4

Total surplus unambiguously increases with a right to data portability.

5 Extensions

In the following, we explore four extensions to the base model, which confirm the robustness of the main insights highlighted by Proposition 1 and 2 and provide more nuanced results: Sect. 5.1 considers costs for CPs implementing a right to data portability (subscript F), Sect. 5.2 assumes that not all data that is ported to a CP is relevant to that CP (subscript ID), Sect. 5.3 considers cases where the value of collected data is diminishing over time (subscript DV), and Sect. 5.4 considers services that are characterized by network effects (subscript NWE).

5.1 Costs for Providing the Possibility to Port Data

Until now, we assumed that the possibility to port (personal) data does not incur any costs for the CPs. However, giving users the possibility to port personal data may result in additional costs such as costs for the programming effort to implement the technical functionalities. To account for such costs, we extend the model by assuming that both CPs face some exogenous costs F if a right to data portability is introduced. Consequently, the CPs’ profit functions with a right to data portability now incorporate an additional fixed cost term F (see “Appendix 9”).

The timing of the game remains unchanged. By solving for the subgame perfect Nash equilibrium, it is easy to see that the CPs’ data consumption remains unchanged by introducing (fixed) costs to implement the possibility to port data. This implies that also (1) all insights with respect to the amount of required data (c.f., Proposition 1), and (2) all insights with respect to consumer’s surplus remain unchanged (c.f., Proposition 2). Consequently, users can still be worse off if a right to data portability is introduced. In contrast, CPs’ profits change if a right to data portability is introduced. Obviously, CPs’ profits are affected negatively by introducing costs, i.e., \({\partial \pi _{i,F}^{P}}/{\partial F} < 0\) with \(i \in \{A,B\}\). Thus, the entrant is not necessarily better off if a right to data portability is introduced. Instead, the entrant is worse off (i.e., \(\pi _{B,F}^{P} < \pi _{B,F}^{NP}\)), if the fixed costs for the implementation of a functionality to port data exceed the critical threshold \({\hat{F}}\) (see “Appendix 9”), i.e., if

$$F > {\hat{F}}:= \frac{(10v_A-v_B+3\tau )\cdot (35v_B-44v_A+99\tau )}{5205 \cdot \tau }.$$

Thus, if the costs associated with providing the possibility to port personal data are too high, the right to data portability does not necessarily stimulate market entry or innovation as entrants may find it unprofitable to enter the market at all. Please note that this very same result is true, if we would assume that fixed costs are only relevant for entrants but not for established firms (i.e., incumbents). Moreover, total surplus may now decrease with the introduction of a right to data portability because all CPs as well as users can be worse off. Therefore, policy makers need to deliberately define the scope of data that can actually be ported and additionally specify the concrete mechanism of data portability in order to reduce costs. For example, in many cases the transmission of personal data should not occur directly between different CPs as this arguably increases implementation costs, particularly as the transmission needs to be secure in order to protect users’ sensitive data.

Insight 5

If implementing a right to data portability is associated with fixed costs F, even entrants can suffer from introducing a right to data portability if the resulting costs exceed\({\hat{F}}\). Then, also total surplus is likely to be reduced asallCPs are worse off and userscanbe worse off under a data portability regime.

5.2 Porting Irrelevant Data

Although this paper investigates the effects of a right to data portability on two CPs providing substitutable services, these CPs may not necessarily require identical data from users becoming active at their platform. Whereas we address a benchmark case in our base model by assuming that all data that is transferred to the competing CP is valuable, we now modify our model to account for cases where also irrelevant data (ID) is ported to the entrant (CP B).

In doing so, we introduce the parameter \(\gamma \in [0,1]\) defining the share of ported data that is (also) useful for the CP where the data is ported to (here: the entrant CP B). For example, a user may have entered her name, date of birth and cellphone number at CP A in \(t=1\) (i.e., \(r_A^{1}\)) and now ports this data to CP B in \(t=2\). However, CP B requires the name, date of birth and address from users becoming active at the platform (i.e., \(r_B^2\)) and cannot analyze or monetize a user’s cellphone number. Consequently, only some share of the ported data is relevant to the new CP. Thus, the net amount of required data is not given by \(r_B^2-r_A^{1}\) as in the base model, but by \(r_B^2 - \gamma \cdot r_A^{1}\). Hence, if data portability is possible, the utility of users that have been active at CP A in \(t=1\) and switch to CP B changes compared to the base model. By assuming that only a share of the ported data is useful for the new CP, a user located at x becoming active at CP B in \(t=2\) derives a utility of

$$U_{B,ID}^{d,2}(x)= {\left\{ \begin{array}{ll} v_B - \tau \cdot (1-x) - r_B^2 + \gamma \cdot r_A^{1} & {\text {, if }}\, U_A^{1}(x) \ge 0\, {\text {with\, data\, portability }\, (d=P),} \\ v_B - \tau \cdot (1-x) - r^2_B & {\text {, else }}\, (d=NP\, {\text { or }}\, U_A^{1}(x) < 0). \end{array}\right. }$$

Consequently, with data portability, the location of the indifferent user changes in \(t=2\), which also affects CPs’ profits as well as the amount of required data (c.f., “Appendix 10” for analytical details). Note that this extension is a generalization of the base model outlined above. Thus, assuming \(\gamma =0\), the results are identical to the benchmark case without data portability because none of the ported data is useful for the entrant. Conversely, assuming \(\gamma =1\), the results are identical to the benchmark case with data portability where all ported data is relevant to the entrant.

To deduce more nuanced results, we solve the game through backward induction. Due to the extreme cases already analyzed, we restrict our analysis to \(\gamma \in (0,1)\). In summary, we obtain:

$$\begin{aligned} r_{A,ID}^{*,P,1}&= \frac{(3\tau +v_A-v_B)\gamma -3\tau -10v_A+v_B}{\gamma ^2-2\gamma -17} \text { with } \,r_{A}^{*,NP,1}> r_{A,ID}^{*,P,1}> r_{A}^{*,P,1}, \\ r_{A,ID}^{*,P,2}&= \frac{(-3\tau +2v_A+v_B)\gamma -15\tau +v_A+5v_B}{\gamma ^2-2\gamma -17} \text { with } \,r_{A}^{*,NP,2}> r_{A,ID}^{*,P,2} > r_{A}^{*,P,2}, \\ r_{B,ID}^{*,P,2}&= \frac{2\tau \gamma ^2-(4\tau +3v_A)\gamma -16\tau +9v_A-6v_B}{\gamma ^2-2\gamma -17} \text { with }\, r_{B}^{*,NP,2}< r_{B,ID}^{*,P,2} < r_{B}^{*,P,2}. \end{aligned}$$

It can be seen that a higher \(\gamma\) increases the entrant’s amount of required data, i.e., an entrant CP’s data consumption increases with the amount of data that is ported and valuable, whereas the incumbent’s amount of required data is reduced (i.e., \({\partial r_{A,ID}^{*,P,t}}/{\partial \gamma } < 0\) with \(t \in \{1,2\}\) and \({\partial r_{B,ID}^{*,P,2}}/{\partial \gamma } > 0\)). Additionally, the incumbent’s period one amount of required data now also (negatively) depends on \(v_B\): Due to the assumption that not all data is relevant to CP B, CP B’s decision in \(t=2\) now affects CP A’s decision in \(t=1\). This has not been the case in the base model. In the base model, \(v_B\) does not affect the data consumption in period one, because all of the data collected by CP A is transferred and valuable for CP B. Consequently, CP A behaves like a one-period monopolist with respect to its data consumption irrespective of CP B’s decision in \(t=2\). However, Proposition 1 still continues to hold, i.e., the incumbent still requires less data and the entrant requires more data if users have the possibility to port (some share of their) personal data. Moreover, CPs’ profits behave intuitively with respect to the introduced parameter \(\gamma\), i.e., the incumbent’s profits decrease, whereas the entrant’s profits increase the more data is relevant to the entrant, i.e., \({\partial \pi _{A,ID}^{P}}/{\partial \gamma } < 0\) and \({\partial \pi _{B,ID}^{P}}/{\partial \gamma } > 0\). Consequently, the incumbent may be able protect its profits by strategically reducing the amount of explicitly stored information that can be ported with a right to data portability, e.g., by inferring information from a user’s action on the website instead of requiring data to be actively entered by users (because only data provided by users may be subject to data portability, c.f., European Commission 2016b) or by requiring data from users that is only useful in combination with other data that is not subject to data portability.

Assuming that not all data is relevant to the entrant also affects consumer’s surplus. However, Proposition 2 still continues to holds, i.e., if the entrant provides a better service quality, users may actually be worse off with a right to data portability. Here, it can be seen that \(\gamma \in (0,1)\) can dampen the negative effects of data portability on consumer’s surplus compared to the base model with data portability: If users suffer most with a right to data portability assuming \(\gamma = 1\), i.e., if the mismatch costs are very high, they suffer less with \(\gamma \in (0,1)\). Consequently, from a policy perspective, restricting the amount of data that can be ported may be a device to protect users. However, this necessitates that policy makers need to precisely analyze the competitive intensity of the market apriori, because restricting the amount of data that can be ported also dampens consumer’s surplus in cases where users benefit from a right to data portability (additionally, c.f., “Appendix 10”). Finally, it can be shown that total surplus is always higher with a right to data portability–although only some share of the ported data is actually relevant to the entrant.

Insight 6

If users can port their personal data from an incumbent to an entrant but only some share of this data \(\gamma \in (0,1)\) is relevant to the entrant, incumbents (entrants) reduce (increase) their data consumption with a right to data portability which may lead to users being worse off compared to a regime without a right to data portability.

5.3 Diminishing Value of Collected Data

The benchmark case analyzed in the base model assumes that the data an incumbent collected in \(t=1\) is equally important in \(t=2\), i.e., has an identical effect on profits. In the following, we relax this assumption by assuming that the value of data is diminishing (DV), i.e., the incumbent can only monetize a share \(\rho \in [0,1]\) of collected data in succeeding periods. Thus, \(\rho\) represents the share of data collected in period one that is (still) valuable for CP A in period two. Herewith, CP A’s profit function changes to

$$\pi _{A,DV}^d = \underbrace{x^{*,d,1}\cdot r_A^{d,1}}_{\pi _{A,DV}^{d,1}=\pi _{A}^{d,1}} + \underbrace{x^{*,d,2}\cdot (\rho \cdot r_A^{d,1}+r_A^{d,2})}_{\pi _{A,DV}^{d,2}}.$$

It is worth mentioning that assuming \(\rho =1\) leads to the benchmark cases analyzed in Sect. 4. Thus, we concentrate on cases with \(\rho <1\). Note that users’ utility functions remain unaffected by introducing \(\rho\). Consequently, the formulas derived in the benchmark case to calculate the locations of the indifferent users can also be used for this extension. Moreover, CP B’s profit function does not change compared to the base model. However, CP A now incorporates the diminishing value of the data collected in \(t=1\) in its profit function for \(t=2\). Due to solving the game through backward induction, this affects the amount of required data for all CPs in each period. For the regime with a right to data portability, we obtain:

$$\begin{aligned}r_{A,DV}^{*,P,1} &= \frac{(-3\tau -v_A+v_B)\rho +3\tau -8v_A-v_B}{\rho ^2-2\rho -17} \text { with } \, r_{A,DV}^{*,P,1} < r_{A}^{*,P,1}, \\ r_{A,DV}^{*,P,2}& = \frac{(3\tau +v_A-v_B)\rho ^2+(-3\tau +5v_A+v_B)\rho -18\tau -3v_A+6v_B}{\rho ^2-2\rho -17} \text { with }\, r_{A,DV}^{*,P,2}> r_{A}^{*,P,2}, \\ r_{B,DV}^{*,P,2} &= \frac{2\tau \rho +(-4\tau +3v_A)\rho -16\tau +3v_A-6v_B}{\rho ^2-2\rho -17} \text { with } \,r_{B,DV}^{*,P,2} > r_{B}^{*,P,2}.\end{aligned}$$

and for the regime without a right to data portability:

$$\begin{aligned} r_{A,DV}^{*,NP,1} &= \frac{(-3\tau -v_A+v_B)\rho -9v_A}{\rho ^2-18} \text { with }\, r_{A,DV}^{*,NP,1} < r_{A}^{*,NP,1}, \\ r_{A,DV}^{*,NP,2} & = \frac{(3\tau +v_A-v_B)\rho ^2+6v_A\rho -18\tau -6v_A+6v_B}{\rho ^2-18} \text { with } \, r_{A,DV}^{*,NP,2}> r_{A}^{*,NP,2}, \\ r_{B,DV}^{*,NP,2} &= \frac{2\tau \rho ^2+3v_A\rho -18\tau +6v_A-6v_B}{\rho ^2-18} \text { with } \, r_{B,DV}^{*,NP,2} > r_{B}^{*,NP,2}.\end{aligned}$$

See that introducing the parameter \(\rho\) has a negative impact on CP A’s period one data consumption, i.e., the incumbent requires less data in \(t=1\) compared to the benchmark case. Conversely, the period two amount of required data increases with \(\rho\), i.e., the incumbent as well as the entrant require more data in \(t=2\). In conclusion, \({\partial r^{*,d,1}_{A,DV}}/{\partial \rho } < 0\) and \({\partial r^{*,d,2}_{i,DV}}/{\partial \rho } > 0\). Intuitively, compared to the base model, the benefits from data collected in \(t=1\) that the incumbent CP A can convey to the succeeding period is lower. This leads to a lower data consumption in period one; however, in period two, the incumbent then increases its data consumption compared to the benchmark case. This also leads to an increasing data consumption of the entrant CP B. Please note that without a right to data portability, CP A still requires at least the amount of data a monopolist would require. This corroborates our insight that incumbent firms have an incentive to generate data-induced switching costs (i.e., \(r^{NP,1}_{A,DV} \ge r^*_{Monopoly}\)). Moreover, it can easily be shown that Proposition 1 continues to hold, i.e., the incumbent reduces its data consumption with a right to data portability (i.e., \(r_{A,DV}^{*,NP,t} > r_{A,DV}^{*,P,t}\)) whereas the entrant increases its data consumption (i.e., \(r_{B,DV}^{*,NP,2} < r_{A,DV}^{*,P,2}\)). With respect to CPs’ profits, it can be shown that CP A (CP B) suffers (benefits) with \(\rho < 1\), i.e., \(\pi ^{d}_{A,DV} < \pi ^{d}_{A}\) and \(\pi ^{d}_{B,DV} > \pi ^{d}_{B}\), respectively. Comparative statics reveal that the effect of \(\rho\) on the CPs’ profits is monotone, i.e., \({\partial \pi ^{d}_{A,DV}}/{\partial \rho } > 0\) and \({\partial \pi ^{d}_{B,DV}}/{\partial \rho } < 0\) within the feasible parameter range. Moreover, with respect to consumer’s surplus, also Proposition 2 continues to hold, i.e., users can – again – be worse off with the possibility to port data (see “Appendix 11”).

Insight 7

If the value of data an incumbent collects in period one is not equally valuable in period two, the incumbent reduces its data consumption in period \(t=1\) , but increases its data consumption in period \(t=2\) . In contrast, the entrant unambiguously increases its data consumption compared to a scenario where the value of collected data does not change over time. However, compared to a regime without a right to data portability, the incumbent (entrant) still reduces (increases) its data consumption which can lead to users being worse off.

5.4 The Role of Network Effects

As highlighted in the previous sections, network effects are not a precondition for online CPs to become successful and are not necessarily the (main) source for users to become locked-in. However, the utility a user derives from being active at an online service may nevertheless be affected by the number of other users active at that platform, i.e., direct network effects may exist and influence a user’s decision, but also the CPs’ strategies in setting the amount of required data. Intuitively, the presence of positive network effects may reduce a user’s incentive to switch to an entrant CP because the derived utility from the already installed base at the incumbent may outweigh the potentially higher base utility from the joining CP – although data already entered can be ported to that joining CP with a right to data portability. To investigate the role of network effects formally, we modify the users’ utility functions and incorporate positive direct network effects (NWE). In doing so, we assume that the total number of users active at the considered CP has a positive effect on a user’s utility, i.e., \(U_{A,NWE}^{d,t} (x) = U_A^{d,t} (x) + \omega \cdot x^{*,d,t}\) for CP A and \(U_{B,NWE}^{d,2} (x)=U_B^{d,2} (x) + \omega \cdot (1-x^{*,d,t})\) for CP B, respectively with \(\omega > 0\). By changing the utility functions, also the location of the indifferent user changes. Relying on the concept of fulfilled expectations (i.e., in equilibrium, the network size determined by the location of the indifferent user equals the expected one, additionally, c.f., Katz and Shapiro 1985), the indifferent user in period \(t=1\) is now located at \(x_{NWE}^{*,d,1} =\frac{v_A-r_A}{\tau -\omega }\) and the indifferent user in period \(t=2\) is now located at:

$$x_{NWE}^{*,d,2}= \left\{ \begin{array}{ll} \frac{v_B+\omega -\tau -r^2_B+r_A^{1}+r_A^{2}-v_A}{2(\omega -\tau )} & {\text {, if }}\, U_{A,NWE}^{1}(x_{NWE}^{*,d,2})\ge 0 \qquad (d=P), \\ \frac{v_B+\omega -\tau -r^2_B+r_A^{2}-v_A}{2(\omega -\tau )} & {\text {, else }} \qquad (d=NP). \end{array}\right.$$

The resulting profit functions as well as our proposed two stage game remain qualitatively unchanged (additionally, c.f., “Appendix 12”).

Again, we solve for the subgame perfect Nash equilibrium using backward induction and derive the period one and period two level of data consumption as shown in Sect. 18.1. Compared to the base model without incorporating network effects (c.f., Sects. 3 and 4), one can easily show that CPs never require more data, i.e., the existence of positive direct network effects has a negative impact on CPs’ data consumption (\({\partial r_{i,NWE}^{*,d,t}}/{\partial \omega } < 0\)). Intuitively, CP A now has the possibility to lock-in users without increasing its data consumption. This improved competitive situation also leads to CP B reducing its data consumption which is beneficial to users (see Fig. 3). However, our results with respect to CPs’ data consumption highlighted in Sect. 4.1 continue to hold, i.e., \(r_{A,NWE}^{*,NP,t} > r_{A,NWE}^{*,P,t}\) and \(r_{B,NWE}^{*,NP,2} < r_{B,NWE}^{*,P,2}\), and consequently, Proposition 1 continues to hold.

As the CP’s data consumption changes, incorporating network effects has ramifications on all players within our considered market. However, also our other results of introducing a right to data portability qualitatively remain unchanged which further corroborates the robustness of the model: The incumbent always suffers from introducing a right to data portability, the entrant is always better off, and total surplus always increases. Moreover, the effect of data portability on consumers remains ambiguous. Although consumer’s surplus with a right to data portability is now higher in more cases, i.e., the intersection of both functions is shifted to the edge of the feasible parameter range (c.f., Fig. 3 for an illustration and comparison), users nevertheless may experience a lower consumer’s surplus compared to a regime without a right to data portability if their mismatch costs exceed \(\tau _{CS,NWE}\), i.e., also Proposition 2 continues to hold (additionally, c.f., Sect. 18.3).

Insight 8

If being active at a CP induces positive direct network effects for users, the CPs’ level of data consumption is lower. However, introducing a right to data portability increases (reduces) an entrant’s (incumbent’s) level of data consumption which may lead to users being worse off compared to a regime without a right to data portability.

Fig. 3
figure 3

Comparison: Consumer’s surplus for services with and without network effects. Note Total consumer’s surplus with and without incorporating positive direct network effects with and without a right to data portability for different users’ mismatch costs using the parameters \(v_A=1,v_B=4\) and \(\omega =0.05\). The upper (lower) two curves refer to a model with (without) incorporating network effects

6 Conclusion

Data portability allows users to transfer their data entered at a certain service to another service. Although some online services have implemented such features voluntarily, and built-in autofill features of internet browsers can reduce the effort to create new accounts, a standardized and mandatory ability for users to port (personal) data is pursued by the European Commission for all online services available in the EU’s member states through the General Data Protection Regulation (European Commission 2016b). Additionally, this topic also gains momentum for non-European policy makers, as the request for information in the United States suggests (c.f., Macgillivray and Shambaugh 2016).

Despite the importance of this issue resulting from the far-reaching implications on business strategies of online services and thus on the total economy, we are – to the best of our knowledge – the first to analyze the resulting competitive effects theoretically. In doing so, we not only shed light on current policy issues, but also highlight relevant implications on the interface of the IS, the technical and the economic realm to better understand and develop systems’ value propositions. For this purpose, we propose a game-theoretic model that captures competing online services’ strategic incentives and identify the feasible market outcomes together with the implications for all stakeholders.

In conclusion, we find that if the CP’s costs to implement data portability are not too large, on the one hand, data portability fosters market entry, which arguably enhances service variety and innovation, but on the other hand, incumbent services unambiguously suffer from data portability. Whereas such an outcome might be desired by policy makers to alleviate concerns about dominant online services, we highlight that end users may actually suffer from a right to data portability, because new services have an incentive to increase the amount of collected data compared to a regime without data portability. However, as the total surplus increases due to higher overall profits, a decision to introduce a mandatory right to data portability invokes a complex assessment. In the following, we outline policy implications as well as strategies for services active in data-driven markets based on the obtained results and discuss avenues for future research.

6.1 Policy and Managerial Implications

From a policy perspective, the rationale to introduce a (general) right to data portability is clearly focused on the protection of end users (see, e.g., European Commission 2016b, Article 1). Consequently, our results imply that data portability should not be applied to all online services because consumer’s might actually be worse off. On the other hand, considering the total economy, overreaching goals such as the Digital Single Market Strategy (DSM strategy) within the European Union (c.f., European Commission 2016a) or former-president Obama’s executive order on competition from April 2016 (c.f., Obama 2016) highlight the importance of open, fair and non-discriminatory (data-driven) markets. As we show that the entrant’s profits increase under data portability, a right to data portability may attribute to these goals. However, these goals are only achieved if the resulting costs (for implementation as well as administration) of a right to data portability are low. Therefore, our findings evoke the necessity for policy makers to carefully weigh whether they want to promote market entry to stimulate innovation and successively service variety, or purely focus on consumer’s surplus.

If new services should be incentivized to enter the market, data portability should be enforced strictly with few exceptions. To date, the concept of data portability proposed by the European Commission solely focuses on personal data revealed by users themselves. Hence, data revealed by third persons (say, reviews for a private lift, or endorsements on professional networking sites) are excluded in the current version of the regulation. Therefore, policy makers might think of extending the scope of data that can be ported. In fact, as highlighted in the mid-term review on the implementation of the Digital Single Market Strategy, the European Commission already “subject to impact assessment, prepare[s] a legislative proposal [...] which takes into account [...] the principle of porting non-personal data” (European Commission 2017, p.11). In most cases, extending the scope of portable data would be in line with the goal of enhancing consumer’s surplus. However, it has to be taken into consideration that (1) porting sensitive data (e.g., credit card numbers, tax IDs, social security numbers) bears important privacy and security risks, although users entered these data voluntarily, and (2) there are cases where users are actually worse off with a right to data portability, as we have shown throughout all of our model specifications and analyses. Our results suggest that users are likely to be worse off if base utilities are asymmetric, e.g., if the entrant has a superior value proposition providing the user a higher base utility. Arguably, entry is then beneficial for the entrant even without a right to data portability. Consequently, one may hypothetically think of a concept where data portability is only granted to some services. Although this seems possible in theory, the likeliness of success of such an approach is questionable as (1) this concept would contradict popular “neutrality regimes”, which might get increasingly important on a service level (c.f., Easley et al. 2018), (2) the current political view aims at giving end users back the control of their (personal) data; independent of the considered service (c.f., European Commission 2016b), and (3) the nature of the internet with independent parties and hard-to-control data flows makes supervision costly. However, as we have shown that the negative effects of data portability on consumer’s surplus can be dampened by restricting the amount of data that can be ported, this might be a possible way to facilitate market entry and to limit potential adverse effects on consumers.

From a managerial perspective, it has to be emphasized that incumbent services have an unambiguous incentive to inhibit the concept of data portability because their opportunity to soften competition vanishes, leading to reduced profits. In contrast, entrant services or start-ups should promote the concept of data portability because their flexibility in setting the amount of data that is collected rises, leading to higher profits and thus, earlier profitability. If services have no possibility to influence the scope of data that can be ported, incumbents should pursue a differentiation strategy if the entrant is superior in terms of its base utility. This arguably increases a user’s mismatch costs which reduces the competitiveness of the market and ultimately benefits the incumbent. For this purpose, incumbents may try to change (aspects of) their service offering (i.e., differentiate) to escape the fierce competition with the new service. In contrast, a strategy designed to imitate the competitor can be seen as an incumbent’s opportunity if the entrant is relatively equal in terms of its base utility and if the mismatch costs of users are already comparably high. This might be achieved by matching all of the entrant’s value propositions to reduce mismatch costs which increases competition and thus, profits (see effects of the users’ mismatch costs on profits outlined in Sect. 4.2). Additionally, incumbents may try to (1) infer information from a user’s browsing behavior as data that has not been actively provided by users is not covered by the right to data portability, and (2) require “proxy data” from users that is only useful for services if they are analyzed in combination with other data (that is not subject to data portability). The entrant always benefits from higher mismatch costs of users and should thus differentiate as much as possible from the incumbent, e.g., by acting as the industry’s innovation leader.

6.2 Limitations and Avenues for Future Research

Finally, we wish to conclude by highlighting possible model extensions and limitations that should be taken into consideration and analyzed in future studies. First, the market environment could be changed to capture the effects of data portability on two existing, competing services. In our terminology, CP B would then already be active in period one and data can be ported from CP A to CP B and vice versa. Arguably, as the CP’s flexibility in setting the amount of required data is reduced, CPs should suffer under a regime that enforces data portability. Conversely, such a market environment would be beneficial for end users. Second, the possibility to discriminate between new users and existing users might be seen as a possible model extension. However, this extension would assume that services have a non-uniform data consumption for data from different user groups, which may increase programming efforts and potentially complicates the provision of a streamlined and consistent (service) portfolio. With data portability, the entrant would then collect a relatively high amount of data from new (i.e., not switching) users and additionally maintains flexibility for the share of users that may switch services, leading to reduced consumer’s surplus. Third, we assumed that data entered once has no effect on a user’s utility in succeeding periods. Whereas we believe that this is a suitable benchmark, one may argue that the disutility of already entered data only diminishes over time, i.e., the effects of trust for a certain service or the possibility of data breaches at a CP might be included into the analysis. Incorporating trust can be achieved by assuming that there is a lower (or no) disutility if the same service is used again, whereas there is some disutility if the same data is ported to another service. Fourth, we only assumed the costs of revealing personal data. However, entering (more) personal data may also lead to a higher base utility of services because the service can be better personalized to a user’s needs. This effect can be introduced, e.g., by assuming that the valuation of a service is an increasing concave function of the costs. Finally, a right to data portability arguably also induces positive effects on other CPs, which supply independent or complementary services, but are not modeled within this study focusing on competing CPs. Thus, the positive effect of data portability on service variety and innovation may be stronger than assumed in this study.