Introduction

Background and initial problematization

Indeed, the desire of the organizations to deepen the capacity to accommodate digital information was accompanied by an exponential upward tendency of new data colliding in the 1980s with the mounting constraints regarding data warehousing infrastructures (Hilbert and López 2011). Hence, Big Data (BD) emerged in the 1990s as the data boom, defining the amount of data greater than the capacity of what a computer could process or store, nor their analytics performed by standard software, regardless of its configuration (Cox and Ellsworth 1997; Mayer-Schönberger and Cukier 2013).

Beyond the benefit of solving data-related constraints, Big Data opened a myriad of opportunities for information building (Abrantes 2020). With the rapid development of a triad of (advanced; exploratory and, discovery) Big Data Analytics techniques, a broad instrumentalization potential has emerged and its commercial use came naturally (Russom 2011). In the first decade of the twenty-first century, the BD resembled the “holy grail” of competitive advantage and many firms have re-strategized their position in business (Lambrecht and Tucker 2015).

BD allowed the creation of associative patterns between units of information, such as, meaningful data systems. This triggered the fashioning of modern business analytics strategies for maximizing market intelligence. Firstly, companies were able to remedy their inner vulnerabilities (enterprise-wide) and simultaneously leverage their system’s operational efficiency. Secondly, it democratized the use of data-driven insights for the gain of competitive parities or advantages (Boyd and Crawford 2012). An example of the utility of BD analytics is nowadays the prevention of defects on vehicles, since the handling of data from sensors measuring multiple parameters of functionality (e.g. temperature, vibrations, oil pressure, or fuel consumption) bypasses eventual breakdowns before they actually occur.

In this context, BD has expanded horizons of analyzability but also transforming the analysis function itself, which shifted their predictions from the rigor of sampling techniques to focus on the factum, as Big Data dealt with a sheer volume, almost entirely unstructured and heterogeneous, with the scale compensating data inaccuracies, unraveling patterns, forecasting scenarios and eliminating human bias and error (Baruh and Popescu 2017; Gandomi and Haider 2015).

Unsurprisingly, the virtues of BD spawned a wave of euphoria labeled as dataism, intrinsically assimilating the level of trust assigned to businesses and public institutions pertaining to the use of private data. Within the myriad of units of petabytes exchangeable every day, it is also the personal data deriving from the individual’s internet navigation through online channels (i.e. digital footprints) (Muhammad et al. 2018).

In recent years, an emerging “anti-dataist” discourse has emerged, concerning the rising datafication processes, targeting, particularly, the overuse and misuse of personal data and digital footprints. Such perspective emphasized the increasing manipulation of Big Data, such as life events or discrete social interactions, as valuable and monetizable insights for the firms, and potentially hazardous and noxious for the individual, here identified as the personal data-owner (Van Dijck 2014; Gandomi and Haider 2015).

In this regard, Zuboff (2019) termed the prior controversy (dataism) as a totalitarian pattern of a surveillance capitalism (dataveillance) imprisoning citizens and subordinating those to behavioral modifications. Accordingly, the ability to control behaviors and predict events is a valuable asset allowing the tradeable of separate units of data, equivalent to human stocks. Data-brokers, as the entities responsible for the direct and/or indirect manipulation of those human stocks are perceived as hazardous entities with a questionable moral conduct. This occurs, not solely on the grounds that organizations are able with BD analytics to anticipate future actions; but foremost, because of the asserted anti-democratic power to alter what one is able to think, see or feel. Market wise is a deceptive way of modifying the patterns of choice and consumption (Ahn and Lee 2020; Dekimpe 2020; Mayer-Schönberger and Cukier 2013; Leinweber 2007).

This constitutes the initial problematization for pursuing such a research avenue, i.e. to understand how individuals and firms perceive (sense and feel) the access and utilization of digital footprints and the consequent behavior adopted by both parts. A literature review will expose an alleged dispute of individuals and firms and the rationale for such conflict of interests.

Research gap and aims/objectives

As an emergent field, BD earned a growing audience of academics and rise on publications in recent years, with a significant coverage of digital footprints, namely as to sentiment analysis. Nevertheless, the studies are few on the Nordics, especially on the Danish market, cross-observing rationem (reason) and (motus) emotions, regarding the utilization of personal data’s perceptions (digital footprint’s awareness and sentiment) and individual behavior (action and self-defensiveness). Therefore, the uniqueness of our research resides on the joint investigation of these two underlying components, i.e. data-owners and brokers. Moreover, the firm’s perspective (perception and behavior) is a fairly untapped research angle in Denmark. This would justify per se the choice of the marketplace of observation; however, the country is a highly digitalized society which constitutes an ideal scenario for running such test.

Hence, our purpose is to grasp whether the individuals acknowledge and/or accept the utilization of their personal data by third parties. Likewise, our enquiry mirrors the same components on the data-broker’s side. Thus, the general aim (A) of this research is to understand and explore perceptions and behavior of individuals (data-owners) and firms (data-brokers) in relation to their mutual data exchange. Therefore, a set of objectives is derived from the above:

  • Objective 1 (O1—Data-owners’ perceptions): To comprehend the consumer’s general perception of data-broker’s conduct processing their personal data.

    • Objective 1.1 (O1.1—Data-owners’ perception versus action): To identify behavioral consistency or deviation in attitudes.

  • Objective 2 (O2—Data-brokers’ perceptions): To grasp the data-brokers’ knowledge about the consumers’ perceptions and sentiment.

    • Objective 2.1 (O2.1—Data-brokers’ perception versus actions): To determine whether firms adapt the utilization of personal data based on consumer’s degree of avowed (un)satisfaction/(un)willingness.

The figure represents the association of aims, objectives and the formulation of research questions. A general aim is divided into four specific objectives, of which two general objectives (O1, and O2) and two specific objectives (O1.1 and O2.1.) deriving from the first ones. The general objectives cover the perceptions of both data-owners (O1) and data-brokers (O2) while the specific objectives address the gap between perceptions and in/actions. A multi-case research with a comparative design utilizes embedded units for testing the above framework. A one-phase mixed methods study instrumentalizes two methods with a concurrent mode of collection, as Danish firms and final consumers’ perceptions and actions are scrutinized through internet-mediated interviews and questionnaires with the researchers’ assistance (Fig. 1).

Fig. 1
figure 1

Research framework. Source Own elaboration

Accordingly, the general structure of this manuscript proceeds with a theoretical revision of the literature on Big Data Analytics on spatial BD emphasizing the digital footprint’s utilization of personal data focused on the knowledge and behavioral- related aspects of awareness and sentiment. The article proceeds with a description of the research design, the typology of case research, the demographic profiling of the participating firms and sampled units of analysis, and with the clarification of the rationale for the choice of methods. Subsequently, the research continues with the application of methodological procedures to data analysis, extrapolation of results from data manipulation and a discussion, as to the findings deriving from the data outputs. Finally, the conclusions of the study are presented in connection to the aim and objectives, complemented by the managerial implications and the opened avenues for further exploring this topic and subsequent knowledge-building.

Literature review

Big Data is a phenomenon of increasing ubiquity and henceforth, literature associated with the phenomenon is becoming inherently copious (De Mauro et al. 2015). The vastness of the theme led us to immerse into a theoretical revision centered on the initial problem addressed in the previous section, i.e. the sentiment of digital footprint’s (ab)use. However, we start by conveying first a consensual definition of BD (De Mauro et al. 2015, p. 103):

Big Data represents the Information assets characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value.

This definition, covers most components addressed in others and conveys four discernible components:

  1. 1.

    Information: the fuel of BD, requires organizing/cross-referencing to generate insights.

  2. 2.

    Technology: The exponential increase of data with the proliferation of Internet of Things (IoT) requires an equally exponential increase in data storage and data processing capabilities.

  3. 3.

    Methods: The bottleneck of BD lies presently on the actual analysis.

  4. 4.

    Impact: Actual and potential (impact) across multiple industry areas and science.

The forth component (impact) refers to the three-dimensional perspective of BD (termed as the “3 V’s”), a combination of societal advancements and hastening of BD analytics (i.e. volume, velocity and variety), which opened new horizons for a data-driven economy (with a rising market value), labeled as the Second Economy (Lerman 2013; Manovich 2011; Jacobs 2009).

Therefore, BD is acknowledged as a key economic asset for the future, and an instrument for competitive advantage-gains; however, with challenges for the firm’s ability to acquire new skills and incorporate new technologies; especially as to the scalability (volume) to accommodate drastic changes in magnitude one can manipulate, on a fast pace (velocity) and on a broad span of informational assets (variety) (Cavanillas et al. 2016; Kitchin and McArdle 2016).

These datasets with massive quantities of data (large volume) generated in a continuous way, on real-time basis (velocity) and, accommodating variety (structured/unstructured and semi-structured) are also exhaustive records of the entire population, rather than samples, capturing entire systems of behavior (Mayer-Schonberger and Cukier 2013). Furthermore, these datasets provide a tight and fine-grained instrumentalization (resolution) and uniquely indexical (identification) of their significance (Dodge and Kitchin 2005); with a strong relationality as to the conjoining with other datasets; extensionality, allowing the flexible incorporation of new fields and expansion in breadth of coverage (Marz and Warren 2012).

Data-brokers are unsurprisingly called nowadays the “wizards of mining” (Marr 2017). It is undeniable the dependence of BD, as a key economic asset (though value-free) to BD analytics. The latter is the key activity responsible for the effective value creation, yielding insightful depth of information extracted from units of petabytes. On the next section, are here introduced the main phenomena surrounding the problematization of the commercial exploitation of personal data and contextualized into EU’s legal framework.

Digital footprints: wrangling and challenges

To arbitrate the European Big Data Ecosystem (EBDE) a new legal framework from the European Parliament and Council’s side was issued on May 25th May 2018, through regulation for the protection of unrestricted movement of such data. This legal act was designated as the General Data Protection Regulation (GDPR)—Regulation (EU) 2016/679 (EU 2020; Eur-Lex 2016; Barlow, 1996), including, among others, the instruction on data portability (relationality); data breaching and data sharing, consent and transparency (resolution and indexicality) (De Hert et al. 2017; Abrantes and Venkataraman, in press). Therein, article 5 established seven key-principles of data protection policy: (a) lawfulness, fairness and transparency; (b) purpose limitation; (c) data minimization; (d) accuracy; (e) storage limitation; (f) integrity and confidentiality; and, the accountability principle. Despite, the main similarities to the earlier legal regime, i.e. the Data Protection Act 1998 (1998 Act), the GDPR neglected a principle for the protection of individual rights as to the international transference of personal data (ICO 2018).

Such an omission in the previous regulation is attempted to be disentangled as to personal data protection, as the principles above, are extended on November 21st on the Official Journal of the European Union (L 295/39) with the Regulation (EU) 2018/1725 which came to force on the 28th October 2018 emphasizing that,

The protection of natural persons in relation to the processing of personal data is a fundamental right. Article 8(1) of the Charter of Fundamental Rights of the European Union (the ‘Charter’) and Article 16(1) of the Treaty on the Functioning of the European Union (TFEU) provide that everyone has the right to the protection of personal data concerning him or her. (p. 39)

Regardless of such legal framework, a criticism towards the gap between law making and supervision is seemingly growing in EU, especially as to the commercial usage of personal data by corporations (as presented at the “Introduction” section). Such friction seems to be triggered by a confrontational scenario, fueled by the firm’s practices of hyper-targeting of consumers, in conflict with privacy rights (including individual ones) observed in the acts above (Crawford and Schultz 2014).

Furthermore, the GDPR touted in article 17 the “right of erasure” or the “right to be forgotten (RTBF)” described below. Likewise, article 37 defined the creation of a mandatory role, the Data Protection Officer (DPO) for all organizations acting as data-brokers, which core activities involve regular and systematic monitoring of personal or sensitive data on a large scale (ICO, 2018; EDPS, online). Here, the notion of data-broker covers both the concepts of data suppliers and technology providers. The first (data suppliers) referring to the “Person or organization [Large and small and medium-sized enterprises (SME)] that create, collect, aggregate, and transform data from both public and private sources” (Cavanillas et al. 2016, p. 52). The second (technology providers) as “typically organizations (Large and SME) as providers of tools, platforms, services, and know-how for data management.” (Cavanillas et al. 2016, p. 52).

Such hyper-targeting (or micro-targeting) gives attention to an exhaustive utilization of detailed customer-data and marketing automation tools, based on predictive modeling, statistical tools, and algorithms to target and deliver highly personalized messages across a large number of digital channels (Semerádová and Weinlich 2019). This represented another underlying assumption about Big Data, which is, on its own BD is value-free unless algorithmically transformed into “actionable insights” which in turn can be applied to solving problems in a wide variety of applications by identifying patterns imperceptible to human logic and thereby predict future conditions (Baruh and Popescu 2017).

However, the controversy towards digital footprints, reveals an exploitation of data with both commercial and non-commercial purposes (Muhammad et al. 2018). We recall here one of the most striking examples of the latter, the 2015s Cambridge Analytica data (breach) scandal on the North-American presidential campaign, with the harvesting of social media information without personal consent (Demant 2019; González 2017). Nonetheless, some scholars raise doubts as to the absolute effectiveness of BD, as some studies of BD’s advertising performance exhibit contrary results. Here, Semerádová and Weinlich’s (2019) research of over 840 Facebook ads, concluded in fact a limited effect and counterproductive results in terms of user reactiveness. This suggests a negative correlation of overexposure to hyper-targeting and future purchasing causality, as the first may lead to negative attitudes toward the advertiser. Yet, a negative sentiment finds real ground implications on privacy loss, self-determination and price discrimination. Hence, some scholars have postulated the importance of “stopping the unjustified accumulation and commercialization of personal data” and furthermore advocating a higher responsibility of consumer protection, which ought to be vested in the legislator hands to immediately regulate and execute the compliance of practitioners to it (Lecuona and Villalobos-Quesada 2018, p. 291). These author’s assertion comes in a current scenario where further concerns were raised towards the public sectors’ equivalent exploitation of BD.

Yet, regardless of the ability of the data-brokers to determine future outputs of on consumers’ actions, the instrumentalization of human behavior is nowadays unavoidably criticized as a hazardous use of digital technology towards mind-reading, as an abusive anticipation of human behavior and personality profiling (González 2017). Here, the author stresses furthermore the peril of a malevolent instrumentalization of artificial intelligence (AI) for the manipulation of the masses. Conversely, consumers are further perceived furthermore as being unprepared to understand or evaluate the ethical implications and the knock-on effects on their lives of the usage of their digital footprints (Zwitter 2014). For instance, personal life actions as “feeds” and ‘‘likes’’ used by Twitter and Facebook’s analysis of sentiment, group manipulation and micro-targeting, working as monetary units of value for marketing companies. This instrumentalization of personal data has been associated with a furtive molding of consumer’s thoughts, and their intentional redefinition of behaviors (causality reasoning). Therefore, some scholars have alluded to a form of digital captivity or servitude, as a current state of Big Brother’s social regime, of post-modern digital voyeurism and surveillance capitalism of (quasi) totalitarianism (Zuboff 2019; Diamond 2019; Brayne 2017; Xue et al. 2016).

Whether they be commercial, symbolic or political issues relating to Big Data analytics within the EU, the issues themselves become more pertinent due to the GDPR. Jones (1991, p. 367) defined business ethics as an attribute of a decision that is both legal and morally acceptable, and unethical “is either illegal or morally unacceptable to the larger community”. The EU’s GDPR, as a legal framework attempting to promote digital democracy, proclaimed in article 17 the citizen’s Right to be Forgotten (RTBF), which then spurred a worldwide debate as to the right of the individual to privacy (Xue et al. 2016). The RTBF includes the supreme right to own digital anonymity. Thus, the data-broker is obliged upon requested to delete all personal data without undue delay. This latter agent retains in this case no overriding legitimacy to further process of data, being such action considered an illegal act.

Nevertheless, companies seem to comply only partially with the GDPR. For instance, RTBF requesters of delisting (e.g. from a company’s newsletter) face the unprotected assault of data reutilization by third entities (Streisand effect). Whenever consentment is once provided to them, the annulment of the decision is in practice irreversible. Even under the eye of the compliance authority (i.e. the European Data Protection Supervisor) the data-sharing chains are hardly undone and the person is unable to fully exercise the right to anonymity, whether to hide, to remove, or to censor a piece of information, (Xue et al. 2016; Thatcher 2014).

Moreover, firms use analytics to yield new data. The original inputs from individuals are also reconditioned into new self-generated data bundles deriving from the interpretation of patterns of behavior and further monetizable predictions (data fumes). In this way, natural data obtained from online “base behaviors” face again the threat of undue exposure, as their reutilization through data fumes correspond to an exhaustive extraction procedure of third-party vendors with inherent dangers of building biased contrived datasets, with a likely distance from the real demonstrations of behavior (Thatcher 2014).

Consequently, the GDPR created a mixture of disappoint and skepticism among EU citizens, due to the EU’s inability to ensure a full compliance of principles and dispositions, namely written on the articles 17 and 37. Furthermore, as the article 17 forces the controlee to take action on his/her hands and report abusive practices to the supervisory authority to act by complaint, the regulatory system is questioned as to the RTBF’s effectiveness. The inaction of the consumer is penalized and stimulated the data-broker’s prevarication and non-compliance. This constitutes one of the objectives of the study (O.1.1.consumer’s perception vs. action) to comprehend the dissonance between digital footprint’s sentiment and individual actions taken.

Digital footprint’s awareness

Consumer’s background

Through data-driven marketing, consumers began to experience empowerment through optimal satisfaction of personal preferences (André et-al. 2018). Big Data provides insights into consumer preferences, which in turn enabled the industry to target individual customers with personalized options or recommendations better fit to satisfy them. Consumers enjoy nowadays an eased shopping experience, time-saver and focused on preferential choices, which translates into convenience and lower resource consumption (Chen et al. 2012). Furthermore, consumers benefit from an abundance of internet-based free of charge services, such as communication through social media, access to endless sources of information and entertainment enjoyed by most people.

On the other hand, data-driven marketing backfires through pigeonholing consumers into behaviors, often referred to as consumer welfare depreciation (André et al. 2018). Data-driven marketing naturally focusses on behavioral anticipation, and not necessarily on higher-order psychological processes (e.g. emotions, moral judgements, preferences or “meta-preferences”) being customers aspirational preferences differing from their actual behavioral (determined) preferences (André et al. 2018).

Van Dijk (2014) argues also that a loss of privacy and furthermore the normalization of privacy loss, is exemplified by the common practices of the consumers (e.g. accepting cookies on websites) without actually measuring the associated privacy challenge and assessing the privacy policy of a website, and thereby giving unrestricted consent to their data. Baruh and Popescu (2017) argue that this is a matter of structural failure instead of a lack of individual skills. These authors challenge then the notion of privacy as being incorrectly constructed as an exclusive individual concern and responsibility when it is in fact a collective value with a collective social dimension. Regulatory efforts should therefore consider privacy in the digital domain not by exclusive self-management by the individual but by government’s more effective legislation. Furthermore, Big Data is enabling not merely Price Steering but Price Discrimination. Price Steering as to the personalized content in e-commerce, as two consumers using the same search string for the same product receive different product results, or also, the same results presented in a different order, according to the algorithm’s prediction of the relative affluence of the consumer. This is a way of nudging the consumers towards products of higher value, and with a higher price. Subsequently, the Price Discrimination entails the differential offer of prices to potential consumers for the same product. The more affluent consumer is shown a higher price for the same product than the less affluent consumer (Hannak et al. 2014). Here, a significant threat is the First-Degree Price Discrimination aiming at micro-target the consumer’s maximum price or reservation price. This refers to the mapping of the maximum limit of willingness of the consumer to pay for a certain product.

Typically, this is theoretically possible only in a monopolist market structure, but Steinberg (2020) asserts that advancements in Big Data identifying individual consumer’s reservation prices make it possible even in a perfect competition scenario. Yet, Mayer-Schönberger and Cukier (2013) assert that the analytics’ exacerbation toward price controlling or fixing is more likely within central planning systems.

Data-broker’s background

Approximately ¾ of the surveyed companies argue Big Data represents an opportunity, as analytic algorithmization yield a variety of benefits ranging from the simple understanding of customer behavior until a better segmentation and targeting for tapping into new market opportunities (Russom 2011). Thus, harnessing big data effectively may represent additionally a competitive edge (Mayer-Schönberger and Cukier 2013). The International Data Corporation (IDC) asserts that the current digital transformation has pushed the IT industry towards a third platform technologies phase, which syndicates (and intensifies) mobile and cloud computing; social media, the internet of things (IoT) and BD analytics. This anticipated the “digisense” era of abundant interrelatedness of sensors, controllers, big data and data science (Gartner 2019).

Consequently, the market size of this technologies is expected to reach the USD 6 trillion USD mark by 2022 (IDC 2018). For instance, Walmart which is the large retailer in USA is also a pioneer on BD analytics on its industry. Walmart collects already 2.5 petabytes (2,500,000 gigabytes) of data per hour and applies real-time analysis on internal sources (e.g. product turnover, customer transactions, financial data and customer traffic) and external sources (e.g. social media comments, mobile phone data, e-mails, website clicks, weather and temperature). Such inputs allow the targeting of recommendations, in-store navigation, improvements on merchandise display, and the optimization of the supply chain processes, as to the price negotiation, and maximization of profit pools (Benjelloun et al. 2015; Marr 2017).

The traction gained by Big Data on the public administration favored also a variety of new applications ranging from education to health, increasing citizen’s engagement in public affairs, prevention fraud/crime, improving national security, and supporting other forms of well-being (Kim et al. 2014). For instance, the New York City (NYC) fire department developed a whole new fire-prevention strategy gathering 900.000 buildings’ typologies on a single dataset correlated with tax information, ambulance visits, local crime rates, rodent complaints, and records of fires in buildings, to establish fire risk predictors, anticipating incidents and optimizing the daily inspection work (Mayer-Schönberger and Cukier 2013). However, prior literature denotes a mix of externalities: strengths, weaknesses; opportunities and threats (SWOT) associated with both the Big Data analytics by both public and private organizations. This raises ethical issues as to its adoption and uncovering the debilities of the GDPR, namely the insufficiency of compliance supervision (Xue et al. 2016; De Mauro et al. 2015; Thatcher 2014).

Methodology

The research framework diagrammatically represented in “Research gap and aims/objectives” section aggregates the tenets of the theoretical testing here followed, as the researcher team intended to gain an insight as to the extent of conformity between sentiment and behavior towards digital footprints. Moreover, it aims to unveil whether consumers are willing or not to waive own rights and their efforts to ensure that these are effectively met.

We have adopted a post-positivistic stance, using a one-phase mix-methods with a concurrent application. Such research decision derived from a seminal reflection on a triad of considerations: the aims/objectives; the assumptions of the research team; and, the ethical commitment to this empirical testing. In this context, a multiple case research with a comparative design focused on two large technological Danish firms, with the incumbents classified as Firm 1 (F1) and Firm 2 (F2) for anonymity purposes. From those, quants and qual data was collected on and about these firms, respectively from the senior managers representing the data-brokers’ side, and from the final consumers corresponding to the data-owner’s side.

Quantitative data (quants) obtained from 137 eligible questionnaires (Qn), applied to the data-owners’ side, targeted potential respondents through a social media platform, using personalized messages. As to the latter, the response rate achieved a high mark (0.9648), as observed with 5 partial questionnaires excluded covering between > 50% and < 100% of the questions. No cases observed of break-off records (< 50% of response).

The quants is further discussed in “Data-owners” section. Data-Owners and represented as to an econometric formulation of the model as to the extent of a relation between two parts: perceptions and behavior, given respectively by opinion variables and action variables. Thus, the development of an appropriate method for the measurement of these relations, is set on the assumption that the (relationships of the) individuals’ endeavors to ensure an optimal degree of privacy (or behavior) constitute the y type of variables (action), and the opinion hold about the privacy of their digital footprints (perception) the x variables.

Qual units of analysis (UA) applied to each case-firm represent the data-brokers’ side, and multiple quantitative ones obtained from their customer-base, configured the use of identical analytical procedure between cases, assuming naturally an iterative logic with embedded UAs. In total, 2UAs were collected from the case-firms using open-end interviews (In). The profiling of both case-firms and the participants/respondents is presented in Tables 1 and 2. Moreover, a pre-testing was conducted on Feb-20. The first method was applied, as to the collection momentum, as a one-off vis-à-vis interview conducted respectively on March 6th (F1) and March 10th (F2). The second method was internet-mediated and self-administrated during the same timeframe, a 20 days’ time around was given from counting from the invite for further enquiries and subsequent delivery.

Table 1 Case-firms (Fn)
Table 2 Participants (Pn) and Respondents (Rn)

The selection of the cases followed a (non-probabilistic) purposive and snowball logic, including as to the accessibility to the senior manager on case-firm F2. Conversely,

The participants P1 (of F1) and P2 (F2) adjusted to the same age group of the modal class of the respondents, which contained 93 respondents. The mean of seniority ( = 14,5) of the participants corroborated the formal education background ( = 6,5) supports their purposive selection as potentially insightful participants.

As to the design of the collection methods, the one applicable to senior managers, as data-brokers, used prompting as technique through an interview guide containing 10 questions, and with no intertwining of any probing. To the analysis of the respondents, as data-owners, opinion and behavioral variables were used to comprehend both sentiment and actions, through 14 investigative questions (IQ) being safeguarded the seminal tenets of anonymity of the individual, confidentiality of answers and the restriction of access to data records to third parties. Figures of enquiry above exclude in both methods the profiling/demographic questions above summarized.

Both methods were fashioned in Danish language with subsequent retroversion of results to English language.

The signifiers of the data-brokers interviews (Exhibit 1) were audio-recorded, transcribed and converted into to English for the analysis of manifest content, as the content verbalized by the informants, on the light of thematic analysis (TA) while data-owners answers were manipulated using statistics tools for the generalization of their results. Both are, furthermore exposed in “Data analysis and discussion” section. Data analysis and findings.

Exhibit 1
figure 2

Excerpt of method 2 (quants—questionnaire). Source Own elaboration

Data analysis and discussion

Data-owners

According to the purpose of this study (O1; O1.1; O2; O.2.1) and testable propositions (RQ1; RQ2; RQ3) exhibited at the research framework (1. Introduction), the data-owner’s questionnaire was divided into three categories of variables, as exhibited below (Table 3):

Table 3 Data-owner’s categories and variables

Type 1 or attribute variables (var001x: navigation), Type 2 or opinion variables (var002x: online activity), and Type 3 or behavioral variables (var003x: exposure-willingness).

Hence, a theoretical model with the linear function (f) is chosen as,

$$f \, = \, \left( { \, X_{1,} X_{2, \ldots ,} X_{k,} } \right) = \, \left( { \, X_{1,} X_{2, \ldots ,} X_{k,} } \right)$$
(1)

With an empirically testable form, the specification of the stochastic structure of the variables of the linear regression is explained by:

$$y \, = f\left( {X_{1,} X_{2,} \beta_{1} \beta_{2} + \cdots + \, X_{k} \beta_{k,} } \right) + \varepsilon$$
(2)

In which, the approach to determine the linear model consists of,

$$y = f\left( {X_{{{\text{data-owners opinions}},}} \beta_{{\text{data-owners actions}}} } \right) + \varepsilon$$
(3)
$${\text{Activity}}_{(y1)} = \, \alpha_{0} + \beta_{1} {\text{awareness}} + \beta_{2} {\text{sentiment}} + \varepsilon$$
(4)
$${\text{Defensiveness}}_{(y2)} = \, \alpha_{0} + \beta_{1} {\text{awareness}} + \beta_{2} {\text{sentiment}} + \varepsilon$$
(5)

Here, the statistical model accounts a residual (ε > 0) accounting for the error of judgement of data-owners. The partial derivatives of y are considered independent (i = 1, 2, 3…) with respect to each of the parameters β1, β2,…βk,.

The attribute variables referred to the user’s profile as to the attention placed on internet-mediated activities (var001-time expenditure). The opinion variables express both awareness and sentiment towards navigation. The first addresses (awareness) depicting its utility (var002-1: convenience of exposure) and the latter (sentiment) the perspective of the individual towards own data commercial exploitation (var002-2 exposure-willingness), here including data fumes and Streisand effect. The behavioral variables emphasize the actions taken online, as the patterns of navigation (var003-1: Navigation track) and proactive defensiveness of own rights (var003-2:-self-protection).

These five variables allowed a dyad of data triangulation procedures within primary data. First, a crosschecking of demographics (e.g. gender; age; or, education) with the answers to IQs. Furthermore, a triangulation of the variables Type 2 with the variables Type 3, for understanding its conformity, between awareness-activity (conformity model 1); awareness-defensiveness (conformity model 2); sentiment-activity (conformity model 3); and, sentiment-defensiveness (conformity model 4); and therefore answer the RQ3 as to association of perceptions/opinions with behaviors.

Opinion variables were treated by four investigative questions (IQ# 9; IQ# 10;IQ#11; IQ# 14). One on awareness ((IQ# 9), emphasizing the utility or convenience versus nuisance (convenience-exposure) (“What is your opinion of following statement: "The collection of my personal data, in general has a positive impact on my convenience when using the internet"?). The remaining is on sentiment, emphasizing confidence, sentiment and openness to further paid services. The IQ#10 focused on the confidence of utilization (exposure willingness) using question (“How would you describe your trust on having your personal data processed legally using your own activities on the internet?”). The prior is articulated on sentiment by IQ#11 (“What is your opinion on, having your data being for targeted advertising and personalization of search results?”). But also with openness to paid services with IQ#14 (“Would you be willing to pay a small amount, in order to avoid your personal data to be collected and sold?”).

Data-owner’s responses to IQ#9, denoted a significant level of disagreement as to the convenience of utilization of their internet navigation patterns, with a relative frequency of 0.51 (disagreement) against 0.38 (neutral) and 0.11 (agreement). The “neutral” scale classification describes the stimulus of “Both Positive and Negative”; “Neither Positive nor Negative”, as well as, “I don’t know”. As to the sentiment (var002-2), the question IQ#10 about the level of confidence with a legal processing of data, responses had an identical result with 0.51 of disagreement; 0.24 (neutral) and 0.25 (positive). In addition, IQ1#11 revealed an accentuation of the “annoyed” to a relative frequency of 0.7, while 0.22 were neutral and 0.09 had a positive opinion about their personal data being collected and used for marketing purposes.

Moreover, a confirmatory question IQ#13 (“How do you consider targeted advertising on the internet?”) tested the internal consistency of the results on IQ#11. The mean difference between the two groups of results was of 0.049 and the Cronbach alpha of 0.83. Answers to IQ#13 confirmed similar results, with respondents assuming to be 0.65 (annoyed), 0.25 (neutral) and 0.1 (positive). Furthermore, enquired about the future, the responses to the question IQ#15 (“how do you think your personal data will be processed in the future?”) the differences of opinions seem more meager, with 0.41 of respondents claiming to be “less secure”, 0.26 “neutral” and 0.33 “more secure”. Again, asked about the future, as to the likelihood of paying for securing the non-utilization of their data, results exhibited clear negative opinion as 0.22 answered “Yes”, 0.2 “I don’t know”, and 0.58 answered “No”.

The interpretation of the results from interval variables of the respondents’ answers to IQs instrumentalized the correlation coefficients, as to Evans (1996) levels of significance. The latter varying from a very strong negative (− 1 and − 0.8) to very weak negative (− 0.19 and − 0.01) correlation; and from very weak positive (0 to 0.19), moderate (0.4 to 0.59), to very strong positive (0.80 to 1) one. Here, the results on opinion variables as to convenience (var002-1) revealed a respondent’s acquiescence bias as he results (0.51) illustrate neither a positive/negative sentiment on sharing of personal data, while 31% neither agree/disagree at all. Conversely, 18% demonstrated a clear positive judgement of agreement. As to the sentiment variable (var002-2), the tabulation of demographic data with sentiment revealed no significant correlation, as to gender-sentiment (r = 0.09) and age-sentiment (r =  0.05). Yet, regardless of their profile, either more than 70% of the respondents consider “annoying” or “very annoying” their commercial utilization, when enquired about (IQ #11) “What is your opinion on your personal data being collected and used on targeted advertising and personalization of search results?”. However, answers to IQ #14 (“Would you be willing to pay a small amount, in order to avoid your personal data to be collected and sold?”). The majority of the respondents (0.58) are unwilling to pay for internet services to avoid personal data collection and its commercial exploitation. Despite the weak correlation coefficient per gender, women revealed though a more openness to pay for internet services (gender(female) = 2r(2) =  = 0.13) against (gender(male) = 1r(1) =  = 0.11) but to comprehend its statistical significance the sample required to be added representative features to the target population.

The behavioral variables emphasized as to the typology of activity, as chatting, dating, education, email utilization, gaming, gambling, net banking, news, shopping, social media usage or other purpose (Exhibit 2).

Exhibit 2
figure 3

Sum of Internet activity per class (age group). Source Own elaboration

As to the self-protection, respondents indicated which protective measures they have applied and how often split between the following options (Table 4).

Table 4 Tabulation of Data-owners’ profile with self-protection endeavors

The majority acknowledged that they never (or almost never) have applied such protection measures. The least utilized entail the privacy policy (A) and the associated practices pertaining to cookies (B and C.). 74% of the respondents never or almost never read the cookie policy of a website, and 64% will never or almost never alter cookie settings or opt out of using a website due to its privacy policy. The protective measures most utilized, respectively by 0.82 and 0.59 of the respondents are automatic scanning (malware) and the deletion of browsing history. As to the regularity of action, the browsing history is mostly taken on monthly basis. However, the most utilized protective measure, is not a “real” endeavor of self-protection since its application is passive (option H), as the scanning for viruses uses an automated feature of an anti-virus program. Looking at the correlation between the protective measures and gender it appears that in general there is a very weak positive correlation (0.11 to 0.27) on the majority of the protective measures indicating that women apply the measures to a lesser degree than men do. With respect to age, it appears to be a weak positive correlation (0.23 and 0.27) indicating that self-protection endeavors, as to the modification of cookie settings and the use of incognito mode increase with age. Furthermore, respondents exhibited an average of 2½ internet- enabled device/respondent, similar to Gartner’s (2019) assessment on IoT devices/individual by 2020 (Gartner 2019).

In sum, data-owners’ vested interest, at present, in changing the status quo, to extend the privacy over personal data. They find it inconvenient to have others to access their data, and furthermore, they seemed disturbed by the fact that someone is yielding from internet navigation records; most specifically exploiting them for producing marketing campaigns and/or other online ads with a commercial content. However, such marked negative opinion (awareness and sentiment) of individuals is contrary to the current level of action taken to inhibit others from access their data.

From the eight mechanisms of self-protection enquired about, only one of them (automatic scanning for malware) revealed a clear utilization by the majority of the respondents (3 in 4). Noteworthy is, this is a passive self-defensiveness mechanism. In this sense, the “real” endeavors (action) to self-protection of digital footprints seems short, merely circumscribed to one single option (deleting browsing history) and taken merely by half of the respondents on a regular basis.

However, the future of personal data protection it does not seem to change, as the Danish consumers here enquired seem skeptical (less secured) about how their data will be processed. They denote a lower faith on the compliance supervision to compensate their own inaction. Against the current, the gender correlation of women respondents (in IQ#11 and 15), which hold the most negative perception over target advertising, are also the (only) sub-set of the sample willing to incur costs for securing their privacy. This provides a new insight for the private sector, namely for companies on business intelligence, data science and cybersecurity services. Indeed, for marketing professionals within these sectors as well. Moreover, this opens furthermore the horizon of discussion for further research about the privatization of data anonymity services.

(Data-brokers’ qualitative) manifest content and Thematic Analysis (TA)

As to the comprehension of the data-broker ‘side (perception and action), a thematic analysis (TA) method is applied, as to the exercise of fashioning discrete theoretical themes (codification design) and the subsequent use of a Gestalt analysis for unraveling the meaning of the content verbalized by the informants (i.e. the signifiers). The latter corresponding to a coding and interpretation of the evidences (quotation) of the coding exercise. Thus, TA implies the establishing of associations of those meanings (i.e. signifieds—deriving from signifiers) into theoretical foundations fitting into prior acknowledged phenomena, which assumes a recognition of the researcher/s of patterns in transcripts’ data, as the coding evidences of these conceptions (Braun and Clarke 2016; Buetow 2010). Hence, our instrumentalization of TA analysis is rather centered on primary saliency, as the linkage of launched signifiers and into realized signifieds (Buetow 2010).

An open coding procedure aiming at illustrating each theme, made use of a situational analysis tool, i.e. the well-known SWOT framework (Madsen 2016). Each component (S–W–O–T) is equaled to a theme. Theme 1 (T1) corresponding to the organizational strengths of the firms F1 and F2 (actions) utilizing BD analytics resources (BDAR). T2 emphasizing the organizational weaknesses regarding its exploitation (i.e. BDAR), including verbalized inactions, indecision or omissions. T3, the product-market opportunities for further applicability of BD to consumers as possible innovation routes (awareness and sentiment: data-brokers). T4 as the market-related weaknesses to its implementation, whether commercial, ethical or legal debilities (perception of awareness and sentiment: data-owners).

In this context, it is relevant to clarify that the explicit content, as sum of the signifiers per participant, accounts a prompting divided into three topics, discussed in this following order: (i) Firm’s use of Big Data analytics; (ii) perception of BD’s benefits/drawbacks for the firm; and, (iii) perception of benefits/drawbacks to the consumer. Such design reflects an introductory discussion of BD morphology and future traits, followed by an immersion on the firm’s seizing (thoughts versus actions) of BD benefits; and, finally, the understanding of the consumer sentiment, as to their willingness to be exposed and accordant endeavors (Table 5).

Table 5 Coding data outputs per theme

The open coding of the (qual) data-brokers’ interviews unraveled 10 signifieds with the majority (0.6) covering threats, although, with the second largest (0.3) theme covering opportunities. As to the strengths (T1) is pinpointed the development of new product solutions, refinement of the portfolio, service support and feedback systems towards the consumer. T3 highlighted the (un)know experience gains, as to the correction of defects, improvements of technology with direct expression in ambient and effort/energy-saving for the consumer. Furthermore, other opportunities acknowledge the benefits of BD’s economies of scope through the subsequent exploring of collaborate advantages regarding cross-enterprise cooperation. However, the latter shields the other side of the coin, as to the potential exploitation of these economies of scope across firms/industries against the consumer well-being. Thus, regarding the T4, the participant F1, referred to the real risks of misuse of the BD ecosystem for enhancing collaborative advantages. In fact, has provided a virtual example of such hazard: “…imagine an American fast-food retail giant partnering with a Danish biotech company whom produces blood pressure measuring devices and together monitoring sugar level fluctuations, suggesting their consumers, in real-time, to buy their burgers/pizzas, this is a harmful use of data…” (Quotation 2—T4:2:1:2; of CId T4:2:1).

Furthermore, the divergent regulations across the globe were asserted as hampering a wider and faster spreading of strengths and opportunities associated with BDAR. Another identified threat was the current degree of confidence (or belief) of the consumer on search engine optimization (SEO) ranking and reputation management software, rating systems and the perceived consumer peer reviewing experiences. Such services delivered among others by providers such as, Apple Store; BirdEye; Podium; TrustPilot; Yotpo; or Google Play, were portrayed as holding the power to influence data-sharing willingness, contributing therefore to polarization between acceptable/unacceptable standards, high/low ratings, with an inherent loss to data-brokers data inflow and data-owners delivered value. The biased data inputs returned by these type of software applications is claimed to be a matter of managing data pollution, as labeled as garbage-in determining garbage out.

Participant 2, considered to be furthermore “….as in an arms race” especially against North American and Asian companies, with the GDPR being costly and a time-consuming limitation. The GDPR hinders the development of Artificial Intelligence (AI) of EU firms, obliged to restructure and fully comply in terms of their data collection, handling, and the storage infrastructure (including their destruction policies). Consequently, both participants perceive legislation as the biggest threat to the exploitation of BD. Yet, both companies assert Big Data as “need” for their survival, and a “stick” behind the necessity.

Triangulation (analytical) procedure

As indicated in section methodology, this study adopts a data triangulation and data-theory triangulation procedure to achieve a better picture of the factual and the real (Altricher et al. 1996). This occurs for a dyadic purpose. First, to validate primary data obtained from data-brokers and data-owners in the light of a “sense-giving” analytical procedure (data triangulation). Secondly, to extend our findings and deliver a higher contribution both to practitioners in the industry (senior/middle managers) and to academics whom may be also conducting research within the Big Data realm (data-theory triangulation). This instrumentalization of the triangulation procedure is compliant with the universal practices for cross-examination of heterogeneous sources, whether neoclassic or contemporary ones, pursued by multiple methodologists (Turner and Turner 2009; Lincoln and Guba 2000; Altricher et al. 1996).

Regarding data triangulation, we have revised the TA method qual outputs of the data-brokers with the quants applied to the firm’s consumers. Recalling the type 3 variables (behavior) of defensiveness, var003-2 (data-owner’s self-protection) referring to RQ3 and O2.1, reveals a low degree of endeavor towards personal data protection, with the respondents figures (0.74 and 0.64) indicating that a majority of them have never used the factor A (“Reading privacy settings/cookie terms”) and factor C (“Opt out (privacy policy unacceptance)”). For the validation of the prior we have used qual data from the data-broker’s side. Here, Participant 1 signifier is inconclusive but the Participant 2 corroborates the results above acknowledging a literal null opposition of their customers to the firm cookies policy:”

All cookie responses are monitored, and the current status is, that 100% of the visitors to the websites give consent to cookies and thereby to sharing personal data. There have thus far not been any complaints from customers of [Firm 2] in relation to its privacy policy.

Yet, we ought to emphasize the incongruence (action vs. perception), since the perception denotes a 70% of the respondents declaring in IQ# 11 to have a negative sentiment about their personal data utilization for commercial purposes. Furthermore, this is accentuated by the rather coherent answers to IQ#14 which confirm such negative sentiment as more than half (0.58) of the sampled individuals are willing to pay a short fee for maintain their privacy.

As to respondent’s age significance, the tabulation of self-protection by age per factor (A; C) exhibited respectively a very weak positive, and a very strong negative relation, respectively. However, results are inconsistent when crossed with the signifiers of the qual method, since Participant 1 expressed a dissimilar perspective, arguing on the existence of two main consumer profiles, which he calls category A—digital natives; and category B—digital immigrants, with a perceived different sentiment toward their digital footprints. The digital natives (category A), refer to the generation z and millennials, those who fit in the sample onto one age group (under 25 years old) meaning that they are born or brought up during the age of digital technology and this participant considers they accept more easily to share data with internet service providers. Whereas, the category B (digital immigrants) experience most reluctance in accepting to share data. Those are the ones the participant termed them as “unwise” since they are born or brought up before the widespread use of digital technologies, they are more resistant in sharing data, thus more aware of such implications, but nevertheless, accept generally the terms and conditions without reading or understanding them.

Furthermore, we have conducted a cross-observation of quants/qual data outputs from the aggregate of primary data alongside with the theoretical review conducted in “Literature review” section (data-theory triangulation). This procedure is though a hybrid one, accounting both a pure data triangulation and pure theoretical triangulation, in which the latter represents the use of multiple concepts, as sub-themes within the field, gathered from the theoretical revision to confirm its observation in the empirical testing phase (Turner and Turner 2009; Dzurec and Abraham 1993). This data-theoretical procedure is as postulated by other methodologists an “attempt to map out, or explain more fully, the richness and complexity of human behavior by studying it from more than one standpoint” (Cohen and Manion 1986, p. 254). Table 6 below deliver a triangulative output for further interpretation and so building a broader meaning system as to the perception versus action of both parts.

Table 6 Qual Data-theory (BDAR) triangulation

The results above highlight the mapping of 42 implications of digital footprints (for data-brokers and data-owners) as to the utilization of Big Data Analytics’ resources for commercial purposes. From the absolute frequency, 3 of them co-occur (f = 0.07) between the two components of data/theory (i.e. data breach; GDPR; NPD). The theoretical-driven mapping seems dominant over then data-driven mapping, with the first corresponding to 76.19% of all encountered implications.

Noteworthy is also that the impact being mostly felt on the data-owners’ side, since 64.15% of the typology from the total of 42 implications observed, whether positive or negative falls on data-owners’ side. However, 12 units (22.64%) yield an impact on both data-owners and brokers.

A positive relation (S1: strengths) is conveyed by 8 sub-themes with full implication on individuals, and a more incipient figure (0.125) on the data firms, solely affecting the latter as to the ability to improve products and NPD). The current negative one (T2: weaknesses) contain also 8 sub-themes all affecting solely the data-owners. These units came solely from the theoretical revision since the coding of the interviews to the participants (as data-broker’s representant) did not generate any quotations, as perceived evidence of this phenomenon.

As to the perception of the future opportunities brought by these digital footprints (T3: opportunities) 11 sub-themes emerged with implications mostly felt over the data-brokers side, as to ¾ of benefits. The risk of future losses (T4: Threats) uncovered 18 sub-themes or potential implications, here affecting dominantly the data-owner’s side (0.8333) (Table 7).

Table 7 Perception of digital footprint’s implication per societal domain

The large majority of implications (or sub-themes) fall within the economic landscape, which covers over one-half of the evidence collected from the empirical exercise (including the triangulation with the theoretical revision). Although, the legal and technological landscape are relevant domains of influence for Big Data analytics resources and digital footprints’ exploration, accounting for 0.225 and 0.2, respectively. Furthermore, considering these outputs, we have modeled the impaction index (II) of BD analytics and digital footprints, considering the relative impact (RI) of each theme (T1 to T4).

$$ {\text{II}}_{{i = {\text{Sum}}}} \left( {\text{current positive implications/overall positive implication}} \right) \, + \Delta (T_{n} ) $$
(6)
$${\text{II}}_{i \, = } \left( {{\text{RI}}_{{\left( {\text{data-brokers}} \right){ } }} U\;{\text{RI}}_{{\text{data-owners}}} } \right) \, + \, \Delta_{{t = {1}}} \left( {{\text{RI}}_{{\text{data-brokers}}} U\;{\text{RI}}_{{\text{data-owners}}} } \right)$$
(7)

Considering furthermore that the typologies of impact (S–W–O–T) are by definition identical, where as

$${\text{RI}}_{{i \, \left( {\text{data-brokers}} \right)}} \equiv {\text{RI}}_{{i{\text{ (data-owners}})}}$$
$${\text{RI}}_{{i \, \left( {\text{data-owners}} \right)}} = \sum \left( {\left( {\frac{S}{{{\text{OS}}}} - \frac{W}{{{\text{WT}}}}} \right) + \left( {\frac{S}{{{\text{OS}}}} - \frac{W}{{{\text{WT}}}}} \right)} \right)$$
(8)

While the variation function, considers the arithmetic combination of the impact function in present (t = 0) and in future (t = 1) here represented in the polynomial expression.

$$f\left( x \right) \, = {\text{ SWOT}}_{(t1)} {-}{\text{ SWOT}}_{(t0)}$$
(9)

The computing of these results of strengths (theme 1) over weaknesses revealed a shared benefit exploitation for both data-owners (1.33) and data-brokers (1.00). Conversely, the perception as to the present/future revealed dissimilar results. For the first, the opportunity over threats function is clearly a negative one (0.33) contrasting with the latter ones (1.57). The impaction index ratios (0,9011 and 1,57,143) denoted a better exploitation function of BD for the firm’s side than the individuals.

Discussion

We have deduced from the empirical testing presented in “Data analysis and discussion” section that the individual benefits of the digital exposure of personal data came at the expense of the legal and economic consequences of the consumer. On the legal landscape, this encompassed the unprotecting of legal rights (namely the ones consecrated on the GDPR described in “Literature review” section) and the abusive utilization of data by third parties. Thus, consumers experience the non-safeguarding of their rights and misuse of unauthorized data. On the economic landscape, consumers are being targeted, manipulated and deceived in current and future product-offerings, offer delimitation and price discrimination for the optimization of corporate profits.

Data-brokers have also benefited from BD analytics resources (BDAR), modifying and/or extending portfolios and refining digital positions. The major drawbacks of BD analytics, pinpointed at the interviews, is the creation of data pollution. We argue that the economic consequences of digital exposure came as the tail of the legal landscape, especially due to the debilities discussed in previous sections of the compliance supervisory system.

However, the participants signaled relevant information as to the future threats of BD, both for corporations and consumers. For instance, the overvaluing of rating systems and its inherent bias and/or deception as fake data. These constitute an example of what might be a faulty use of data fumes, which can in turn, manipulate the consumer and polarize the preferences of products/services in favor of large and dominant incumbents. This is a phenomenon which one participant pinpointed as a type of data pollution. The participant claimed also data pollution has spread to multiple quadrants of our society and is used with multiple purposes. He argued that the spiraling up of this phenomenon is yet unpredictable. Here, another participant stressed how unprepared organizations are nowadays to deal with informal data breach, which may even make more complex even more than current typology, flow and directionality of BD in circulation. Yet, also opportunities to data-brokers were clearly manifested, as to cooperation, networking and from these extract collaborative advantages. Also, the room for service optimization brought by tech-developments on the new digisense era of artificial intelligence, machine learning and IoT and their application to the industry (4th industrial revolution—or industry 4.0) with inherent benefits for broad scientific developments as well.

The perceptions and actions revealed though however a clear gap emphasized by the most mature age groups. Consumers denote some laissez-faire and a passive acceptance of non-agreed practices of data-brokers. Results from self-protection, as the enquired summarized in Table 4. Tabulation of Data-owners profile with self-protection endeavors, demonstrated a very low extent of proactive defensiveness (from the range of given options—from A to H). These options were incipiently used, and the most frequent being an automated one (a computer automatic check of malware). Thus, the generalized negative perception of Big Data among consumers appears disproportionate to the protective measures adopted by the same consumers to mitigate possible undesired effects of dataveillance and analytics practices. Data-owners denoted in general some apathy and resignation adopting mostly passive-protection measures, as the most significant being automatic scanning of malware.

Furthermore, the respondent’s open- comments conveyed in comments boxes revealed that consumers seem to rely faithfully on GDPR protection rather than self-protection, a phenomenon also acknowledged before by Baruh and Popescu (2017). Women respondents exhibited though the desire to alter the current context, revealing a higher pre-disposition for spending on small payments on internet services to become problem-free of commercial exploitation of personal data.

Finally, it shall be emphasized though that both collection methods and analytical procedures that allowed us to map a broad range of implications of BD, were discussed in number (or breadth) but not in the degree (or depth) of influence inflicted on the consumer or the company. Thus, we recommend other researchers in the field to explore the results of this project on the Danish market and replicate them to other Nordic countries and furthermore pursue new avenues for enlightening the extent of influence per theme and compare them through replication on multiple research angles.

Conclusions

The initial euphoria surrounding Big Data has clearly vanished. Two decades ago, the marvel of the digital markets and online platforms have attracted many firms to enhance their competitive positioning through the electronic channels plus to re-segment towards other consumer’s targets, subsequently reaping the economic benefits of such strategic decisions. Online businesses assisted a steady growth against brick-and-mortar shopping, while the commoditization of the web through the user’s daily usage has, with time, depreciated the technological benefits. Firms in most digital societies (as Denmark, here in examination), feel an accentuated market pressure for the attraction of attention of the consumer and for the efficient usage of their data. As referred by one of the participants in this study, it is a “…arm race…”. Big Data is a “need” and companies identified an element of “stick” behind this necessity.

The results of our investigation on two large technological firms on the Danish market are consistent with the revised literature. Both firms and consumers seem discontented as to the current state of the EU’s data protection policy. Consumers have turned their attention to the safeguarding of individual rights over personal (profiling) data and navigation/ consumption footprints’ data. The major discontent of data-owners can be traced to a couple of issues. Firstly the “Right To Erasure” or Right to Be Forgotten” (RTBF). Secondly, the (alleged) predatory practices for exploiting third parties’ data. Consumers assert to be hyper-targeted and price discriminated/steered. Firms claim that global competition is tight and policies are a competitive limitation for European firms, and especially restrictive for advancements in artificial intelligence.

The perception of personal data utilization (objective 1—O.1), is not accompanied by according actions (O1.1.) prevailing the hazard of laisser faire. For instance, the category “sentiment” within the perception category (var002-2—exposure willingness) does not come together with the defensiveness behavior (var003-2 self-protection) of data-owners to secure their own privacy. For instance, the tabulation of these categories (sentiment with defensiveness) determined that the sole significant mechanisms of self-protection is, in vast majority, a passive mechanism (i.e. automatic malware scanning). Noticeably, the unique action of self-protection of approximately ½ of the respondents is the regular monitoring of the browsing history. As to the data-brokers’ side (objectives 2 and 2.1.-O2; O2.1.), data revealed an equivalent gap of perception versus action. Yet, we argue that companies’ myopia (perception) or isomorphism (inaction) seems to be clouded by the competitive needs of their businesses.

The bottom-line issue here uncovered by such perception–action gaps is that the consumer’s expected degree of privacy is significantly higher than the action taken to ensure the first. Moreover, data-owners seem unable to secure their own rights (RTBF) per se, and moreover, are not endowed with technical monitoring tools or mandated with executive power to do so. Finally, the role of self-supervision seems a burden for the individual (even demonstrating a clear negative perception of privacy, relying faithfully on a better future as to the GDPR’s Compliance Supervisory Model.

Scholars in earlier literature have identified a mixture of major causes for the negative perception over Big Data analytics which can be summarized as follows: (i) insufficient regulation; and, (ii) debilities on the current legal framework (i.e. the GDPR). These causes are argued to be the root for the proliferation of deviating and unethical practices. Scholars point in special the finger to the articles 17 and 37 of the GDPR, asserted to have failed to safeguard the individual data protection rights and the competitiveness of European firms; but also, targeting the supervisory system conferred to the European Data Protection Supervisor (EDPS) (Baruh and Popescu 2017; Xue et al. 2016; De Mauro et al. 2015; Hannak et al. 2014; Thatcher 2014).

In this context, we do concur with Baruh and Popescu’s (2017) assertion that this problem is holistic, and is a matter of failure of the regulatory system. The current legal framework encountered in the GDPR from 2018 is entirely built on the assumption that “privacy” is an individual intangible good, which is therefore justified to be secured by the skills of the person/owner (individual capabilities) to the self-administration of the asset. These authors reminded us though that privacy is, not an individual, but a phenomenon of collective value for the whole social regime, which is supported by the article 5’s principles of lawfulness and fairness, and is furthermore, a fundamental right defined on the article 8(1) of the Charter of Fundamental Rights of the European Union. Hence, the definition of simply directional policies seem sparse, such as, in article 37, the imposition to appoint an internal Data Protection Officer (DPO), or nominate an external one, as policies lack of specifics as to spectrum of responsibilities, and elucidation of the seriousness of a faulty practice. It is our understanding that there is still categorically a road to be traveled in Denmark and in the EU regarding the regulatory affairs on personal data (including digital footprints).

We advocate a reform of the current GDPR’s Compliance Supervisory Model, to reduce the burden of self-protection and enlarge the operational span of control of the regulator, currently limited in resources to an advisory organ (the European Data Protection Board) and a supervisory organ (the EDPS).

In parallel, firms hold undoubtedly the responsibility of complying scrupulously with legal conventions, reversing reputational and economic risks of past practices while exploring data on a risk-free manner. This means, to avoid data breaches and administrate transparently the easy-to-use access to data. Data-brokers, are recommended to build IT-related dynamic capabilities (namely on data science and cybersecurity) in order to explore, on a positive manner the window of opportunity given by BD analytics. The rationale is less imitation and more innovation. Here is important to focus on the gauging of the consumer’s sentiment and attend to the willingness of particular sub-segments to experiment new ideas and concepts, such as, the sub-set of respondents in our sample, whom signaling their openness to paid for data protection services (presented on “Data-owners” section).

It is though imperative that the refashioning of the regulatory system can carry regulation and stronger inspection mechanisms on collective protection, cover current legal gaps on BD analytics’ practices (e.g. data fumes or Streisand effects) and make a more equitable redistribution of accountabilities between authorities, firms and individuals. Negatives practices of algorithmic refinement for the predictive modeling ought to be more tightly scrutinized and not left to the self-arbitration of industry practitioners. Finally, the meta-national supervision hub (EDPS) is suggested to expand the interaction with EU State-Members’ authorities and share a larger extranet of both joint technical expertise, but also, share the responsibilities of inspection.