1 Introduction

The concept of big data is attributed to Laney (2001), who described their main characteristics as its size (volume), speed (velocity), and its varying shape (variety). More recently, the definition has been revisited with the addition of one more characteristic: Quality (Veracity), meaning that the inclusion of external and heterogeneous data—even though big—still raises questions about the accuracy and completeness of datasets (Chen et al. 2014). More in general, different levels of analytics combine to form a set of data that can provide useful pointers for making organizational decision making more effective. Yet, their use for policy setting is strictly related to the nature of data, statements, algorithms, and—of course—data interpretation.

In the educational field, big data tends to coincide with the analytics framework in all its variation—learning, institutional, academic—and specification—retrospective, real time, and predictive. A clear conceptual framework on the definition of analytics in their different operational context is offered by van Barneveld et al. (2012) that distinguishes learning effectiveness from operational excellence: «with the latter referring to the metrics that provide evidence of how the training/learning organization is aligning with and meeting the goals of the broader organization. Learning analytics in the academic domain is instead focused on the learner, gathering data from course management and student information systems in order to manage student success, including early warning processes where a need for interventions may be warranted» (p. 6).

The introduction of various educational software systems dramatically changed the process of educational delivery for both distance and on-campus modes of instruction. Teaching in the digital era is, in fact, characterized by a predominance of complexity. It is no longer being perceived as part of the supply demand production chain. At the same time, learning becomes an interconnected experience where the learner alternates between the physical and the digital space, between production and consumption, and between formal and informal learning. Also the new concept of MOOC and MOOC delivery platforms offers the potential to reshape important aspects of the policy-making environment in the educational field, with analytics playing a new role for policy choices.

This explains why the set of data (learning analytics) provided by online learning tracking systems (including logs, quizzes, and social network analysis) are growing in popularity as a way of giving face-validity to learning outcomes in the digital world. It is defined as

an educational application of web analytics aimed at learner profiling, a process of gathering and analyzing details of individual student interactions in online learning activities (Horizon Report 2016) (p. 38),

learning analytics aim at building better pedagogies, empowering active learning, targeting at-risk student populations, and assessing factors affecting completion and student success.

Due to the profound implications of analytics, it becomes urgent to define a clear policy framework, starting from the definition of a shared research agenda. In a situation where limited financial and human resources is a common ground on which higher education institutions are trying to reform themselves, it is not a surprise if analytics are becoming a core issue for policy makers and stakeholders, acting for scaling up data for a more efficient education system. But what does efficient learning really means? And how can data-driven decisions really help to cope with this increase in complexity and fast changing social environment?

In the next paragraphs, we will present some arguments in support of data use.

2 The Transformative Power of Data

Approaching big data in education from the business perspective is the best way to understand their disruptive potential. It is also helpful to understand the reason why strong resistances are usually raised from diverse points of the system. Analytics has been, in fact, recognized as one of the most relevant and growing submarket fields in education. Visiongain (2015) considers analytics as the foremost reason universities are offering open MOOCs as big data technologies allow to collect and analyze valuable data derived from online education, to identify strengths and weaknesses and to reach conclusions that will raise the overall value of the institution. A submarket estimated to grow from 27.5 billion dollars in 2015 to 109.3 in 2020 (CAGR of 36.1%). According to this approach, the WICHE Cooperative for Educational Technologies (WCET), using data accumulated by the U.S. Department of Education, can definitively state that distance education is no longer an institutional accessory, highlighting the current state of the art in the distance education industry, which has passed from 1.6 million students in 2002 to 5.8 in 2014.

In McKinsey Institute Quarterly titled Are you ready for the era of ‘Big Data’ Brown et al. (2011), it is clearly stated that big data will become a new type of corporate asset representing a key basis for competition. The McKinsey report also suggests how big data potential can be explored and exploited to create potentially disruptive business models by transforming processes, altering corporate ecosystems, and facilitating innovation. Although the McKinsey narrative is not expressly devoted to education, the framing of education as an economic activity is behind the logic of using big data for business intelligence in education, especially in such areas as outreach and advertising, enrollment, management, personnel recruitment, and fundraising (Clow 2013). Moreover, as already happened in several human activities based on the productive paradigm, investments in education are increasingly connected to data-driven decision making, thus shifting the focus from student success and innovative pedagogies to institutional effectiveness and return of investments (ROI) strategies. In some cases—such as the online education—this implies an almost direct correlation between the success of the learning experience, and the political and social approval of financial investing in digital learning.

Following this economic framework, academic institutions are becoming aware of the potentiality of analytics starting to test technologies, data mining processes, visualization modeling, and dashboards. They are following a natural path to move analytics from hindsight to foresight, from description to prediction through a diagnosis of what to change in teaching methods and organizations. The next paragraphs are organized around three main spheres—the systems of measurement, influence, and evidence—that all together shape the special datascape for policy making in education (Fig. 1).

Fig. 1
figure 1

Datascape in the education policy making

It tries to highlight the role of pedagogies, the space reserved to software development, and the type of actors involved in decision-making processes.

3 System of Measurement

Leveraging Analytics in Community Colleges is an interesting Educause guide (Educause Review 2015) providing a literature review, list of definitions, and resources to community college leaders. It distinguishes different types of analytics related to different use/purpose of data: academic analytics—data collected to support operational and financial decisions—learning analytics—feedback about teaching and learning performances—and predictive analytics, aiming at identifying trends to forecast the future on academic, operational, and financial plans (Long and Siemens 2011; Ferguson 2012a). Feedback loops, policy indications, and operational knowledge are expected from data use; meantime, several case studies to be inspired by are already available at any level [see LAK Conference 2016]. The impact of analytics on social organization is very much related to the chosen option.

3.1 Description

A few European projects are piloting learning analytics in the online education context. This is the case of the Emma Project (http://www.europeanproject.eu) which is experimenting with data at user and platform level to raise the awareness of the MOOCs’ participants’ learning activities as well as to provide feedback for MOOC instructors about their course design. This with a dual perspective: real time and retrospective analytics. This kind of approach, although interesting in helping instructional designers to create online courses with higher retention and completion rate, focuses mainly on description of learning behavior with little capacity to distinguish prospective students and/or offer adaptive content. However, in the learner’s perspective, the possibility to compare themselves with the performances of the online classroom is valuable even though it can exercise a form of pressure toward conformity. Investigating the impact of data on the teacher and student means, however, to identify not only strengths and weaknesses of analytics tools and practices, but also their role for social control and/or policy effectiveness.

3.2 Prediction/Prescription

The Predictive Analytics Reporting (PAR) appears one of the most large scale projects so far. PAR is a framework (PARframework.org) adopted by 35 US academic institutions and created with the participation of 350 campuses with millions of students’ performance already processed. It uses descriptive, inferential, and predictive analyses to create benchmarks, institutional predictive models, to map, and measure student interventions that should have direct positive impact on behaviors correlated with success. Using data mining on a federate datasets of millions of de-identified student records, this framework should be able to identify those variables affecting student achievement and performance. Acting as a nonprofit, multi-institutional collaborative venture, the PAR framework focused on leveraging common data definitions and predictive analytics in the service of student success using common data definitions for core measures across institutions to seek patterns of student loss and success. However, Ellen Wagner, Chief Strategy Officer for Predictive Analytics Reporting (PAR) Framework, in her presentation at Online Educa Berlin 2015, introduced the concept of data as a meme, focusing on the dangers of «naïve or nefarious uses of data to restrict access or to punish», opening up the door to more reflection on the hidden curriculum effect.

3.3 Accountability

The Research Assessment Exercises in Italy—known as Research Quality Evaluation (VQR)—is an assessment system where data are used mainly for system accountability by stakeholders in order to reorganize the academic world, reducing costs, increasing productivity, and allocating human resources and funds in a more efficient way. Representing the VQR as only a face of the academic work—which concerns a complex set of interconnected activities, from teaching to community networking, from projecting to expert support in decision-making processes, as well as institutional organization, dissemination, and social work—it is not a surprise that academic staff often refused to legitimize the VQR exercise as representing only a ‘single dimension’ of academic work. System accountability is, in fact, strongly rooted in the datascape, with a substantial impact at an organizational and academic level. Yet, the system—because of its inherent push toward a flat standardization of metrics and productivity assessment—is producing ambiguous and—sometimes—non acceptable results, creating struggles and discontent in the academic staff. Although the Italian Minister of Education has recently created a working group on big data in order to support system decision making, a more complex data-driven profile of academic work is still far from being realized, and—overall—socially recognized. However, it seems that the more we claim for a multimodal approach, the more is the space of opportunity for the application of Educational Data Mining (EDM), focused on prediction, clustering, data mining, and distillation of data for human judgment [see LACE Project 2014]. An holistic approach to the analytics field seems then the only solution to offer a social vision of its use, putting the development of the human being (learners, teachers, researchers etc.) at the very core of policy making.

4 Culture of Evidence

The McKinsey Report (Manyika et al. 2013) considers as the main challenge of this new century the possibility to leverage the transformative power of big data to create transparency, to enable experimentation and to discover new needs. While, from the institutional policy perspective, it helps to expose variability and improve performance segmenting populations, to customize actions and replace/support human decision making with automated algorithms, and, finally, to support innovation in new business models, products, and services. In other words, big data and analytics require a change of paradigm and the foundation of a new culture of data, available at any level of the societal interest. Only an intelligent management accompanied by organizational capacities and institutional commitment can offer such a possibility helping institutions to prevent bias associated with the emergence of a new field.

4.1 Short Circuits

The national pupil database, established in 2002 by the UK government, is a central policy instrument of educational governance used to translate massive data into actionable policy indications as well as school performance tables accessible to parents and media. In Italy, taken into account the PISA test, the education policy program—known as the National Plan for the Digital School (PNSD)—has activated a plethora of actors, agencies, platforms, and networks acting as a learning ecosystem in which schools have no longer the monopoly of teaching but receive support from different education providers. Activated by evidence-based reasoning (i.e., digital skills in education and performance in STEM domain), this learning ecosystem aims at innovating pedagogies, engaging teachers in reinventing processes, and introducing new assessment forms.

Data governance is usually—or at least should be—oriented to understand why things happen, what are the current trends, their impacts, and how to orchestrate the appropriate answers to meet the organization’s goals. In other words, it is positioned

«to short-circuit existing educational data practices, enabling data and feedback to flow synchronously and recursively within the pedagogic apparatus of the classroom itself»  (Norris and Baer 2013).

It requires an efficient and complex combination of data stewardship, reporting, query, and analytics tools. Only this combination can help policy makers to monitor events and take strategic decisions on both short and long terms.

4.2 Collateral Risks

Indication about how students are performing in a comparative way could, more or less implicitly, act to suggest that some students are not good enough to keep studying—or to deserve our financial investment —and, so, it may reproduce social inequalities. Scholars called it the hidden curriculum effect (Edwards 2015).

As researchers, we do not take too seriously the social implications of software in education, and how they are able to shape our lives. Yet the discussion about the verifiability of the hidden curriculum effect needs to be at very heart of the academic discussion and political debate in order to evaluate both evidences (issue representation by data) and results (output and outcomes).

As the interest in evidence-based policy making is increasing and online education is gaining momentum, it is imperative to pose the issue of technology embedded into education as an urgent research question, since we cannot consider computer technologies simply as a tool by which learning is delivered (Edwards 2015). Hence, a genuine culture of evidence should be never untangled by a deep culture of social inquiry.

4.3 Methods

Experts argue that the most valuable plus of analytics is that they can help managers distinguish causation from mere correlation, thus reducing the variability of outcomes while improving financial and institutional performances. The risk for socio- and techno-determinism is, instead, what social sciences really want to avoid in order to provide complex multimodal explanations. It is for this reason that the learning analytics community need to build stronger connections with the learning sciences, to develop methods of working with a wider range of datasets as well as to avoid simplistic data interpretation and misuse. Ferguson (2012) indicates in this link a way to ensure the optimization of learning environments under the guidance of a clear set of ethical guidelines.

However, to really exploit the data universe and optimize methods and techniques, social sciences needs a number of devoted researchers, people with deep analytical skills and data-driven mind-set, but also a number of scientists and practitioners able to use and transform the information based on data into actionable knowledge. This implies creating a genuine culture of evidences inserting it into institutional and organizational practice. It implies also the need to explore and assess the qualitative value of quantitative data.

5 System of Influence

Even though the what and the how of analytics have been explored enough by the specialized literature, the who still remains a question mark. Who is in charge for data translation into policy, with what kind of expertise, and goals in mind, is probably a crucial research question that it is not possible to avoid any more.

5.1 Hybrid Actors

The British Education Endowment Foundation (EFF) is part of The What Works center acting, with the govern legitimation, to share research with local decision makers. EEF provides independent and accessible information teaching and learning toolkits summarizing educational research from the UK and around the world. EFF participates also to the Alliance for Useful Evidence—an open access and wide network of individuals and organizations interested in promoting useful evidences in decision making across social policies. This case works quite well as an example of what Ben Williamson, in New governing experts in education: Policy labs, self learning software, and transactional pedagogies (Williamson 2014), sees as the emergence of a hybrid actor—in between think tank, R&D, and social enterprises—that produces material able to transform education by making it «problematic, thinkable, intelligible, and hence practicable in new ways» (p.2). Such material is in large part derived from data-driven technologies to support forms of self-regulated public policy where the learners are considered a calculable governing resource. What is evident is the shift from the formal organs of government toward a larger network of actors playing their role on different basis (i.e., commercial and nongovernmental) (Lynch 2015).

5.2 Software Space

The Pearson Learning Curve is an example of a project that offers country indexes, country profiles, and comparable data set and visualization in education by time and outputs. A project explicitly devoted «to help influence education policy and practices, at local, regional and national levels»  (Williamson 2016a).

Clearly, from different points of the social systems, there is a claim to expand the capacity of organizations to make sense of complexity. This goal, however, is strongly related to the creation of an expert software system for governing education. About that Lynch (2015) in his book, The Hidden Role of Software in Education, explored how a new kind of ‘software space’ (code, algorithms, dataset) is joining the ‘political space’ of educational governance, influencing the ‘practice space’ of the classroom. In that sense, the ‘software’ space is playing an agency role among a wide network of actors.

5.3 Transactional Pedagogies

Williamson (2016b) suggests that the role of data and software is much more than offering guidance to stakeholders throughout the decision-making process. What he finds particularly worrying is, in fact, the translation process of problems, ideas, practices into “inscription” devices such as visualization, infographic, images that all contribute to generate a representation of a «governable education system» (p.11). The conclusions highlight the emergence of transactional pedagogies and transactional policies, both data-driven, as ideal progress of the ‘knowing capitalism.’

6 Are Big Data Empirical Evidences Only? Some Final Concerns

The aforementioned Educause report raised warnings about student profiling techniques, affordability, and misuse of data, privacy issues as well as business-like practices, data ownership, and their exploitation. This criticism is only a first step toward a full social awareness about the ontological implications of big data concerning, for example, the pressure toward conformity, dataveillance, and the increased power of microstructure and macrostructure through data governance. The scientific debate, however, has a plurality of voices.

In his seminal book The Philosophy of Software, Code and Mediation in the Digital Age (Berry 2011), Berry defines as datascape the computational narrative of the subject represented by all data concerning his activity streams; a datascape that can influence very deeply our present and future lives. Boyd and Crawford (2012) claim that big data are represented as a higher form of intelligence able to generate insights with «the aura of truth, objectivity, and accuracy» (p. 663) but they are not, since numbers do not speak for themselves. On the same direction, Kirschner contrasted the use «to look at data we have and not at data we need» to make inferences [LAK2016 conference].

Concerns were raised also during a debate at Open Education Berlin (OEB2015). Mayer-Schönberger, Professor of Internet Governance and Regulation at Oxford University’s Internet Institute, linked the use of data to human progress to demonstrate how data has been and will be even more necessary. So Darrell West, author of Digital Schools: How Technology Can Transform Education (West 2013), considers the introduction of analytics at any level not only useful but also necessary since «schools face a situation where they need to improve the overall accountability of their operations» (p.9). For G. Siemens, one of the inventors of MOOCs, analytics is a cognitive process that makes data more manageable, enabling us to make sense of the world.

Even though one could try to balance pros and cons, the use of big data and analytics in education implies a dimension of social control never experienced with such a level of efficiency and pervasiveness. Adopting the science and technology studies perspective (STS), the governance system based on digital technologies must be considered as a policy instrumentations for social control, partial, or fictitious representations of that reality that one want to change. With the relevant implication of empowering software, data companies and agencies have to become dominant and stable partners for governing education.