With improvements in and availability of educational technologies, demand has increased for technology integration to enhance and improve the instruction educators provide (Davies and West 2014), including the potential use of learning analytics (USDOE 2012; Woolf 2010). Learning analytics, sometimes called academic analytics (Campbell and Oblinger 2007), is a relatively recent application of an older data analytics discussion (see Skinner 1968; Tyler 1949); it is described “as the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs” (Siemens 2011, p. 1). The principal difference between earlier calls for data use and current mandates is the massive amount of data available through technology-enabled systems. Interestingly, a key notion conveyed by those who now mandate and advocate that instruction be enhanced through the use of data is the awareness that educational data mining and learning analytics has the potential to improve instruction and learning but is not currently reaching that potential (Woolf 2010).

Technology use is becoming entrenched in education. But with the exception of limited use of assessment data, most of the current technology-enabled instructional systems use very little data to adapt and enhance instruction in the way Oblinger (2012) anticipates (see also Woolf 2010). The ability of instructional designers and researchers to obtain and work with large data sets is still in a nascent state; in practice, most educational data analysis is conducted as separate educational data mining research, not real-time application of learning analytics (Chung 2014; Mayer 2009). Although much progress has been made and considerable research conducted, the development of instructional systems that make full use of learning analytics remains an untapped prospect for instructional designers developing technology-enabled instructional systems (Woolf 2010). Part of the problem seems to be that instructional designers do not design for data use.

A common criticism of early computer-assisted instruction (CAI) was, and to some extent still is, that these instructional systems are simply technology-facilitated content delivery systems based on passive learning models of instruction (Chung 2014; Nicholson 2007; Robson and McElroy 2008). In essence many of these systems are simply electronic page turners or menu driven audio-visual players (Fairweather and Gibbons 2000). Modern variations of CAI usually continue with a didactic approach to instruction, with the presentation of content intended to inform the learner (Robson 2013). Instructional technology in these types of systems is seen as an affordable, flexible, and efficient way to present information. From this notion, a whole class of easy-to-use tools has been created (e.g., Camtasia, Authorware, etc.) based on the belief that instructors could easily develop a CAI version of their course.

Research into the design of these types of information-providing systems has often focused on user control of the content being presented as well as the intuitiveness of the human–computer interface. Certainly integrating technology into education still involves helping students gain quick access to information of various types (Davies and West 2014). These types of computer-based systems improve the efficiency of instruction but often do little to improve the efficacy of learning (Cuban et al. 2001; Kleiman 2004). Gibbons (2014) suggested that a lack of learning facilitated by these instructional designs may be due in part to the natural tendency of many designers to focus on the surface layers of instruction (i.e., the content and control layers), while failing to adequately design the internal (less visible) aspects (e.g., the data management layer).

Some designers have attempted to enhance the content delivery approach to CAI by adding assessment components to an instructional system. Testing components often take the form of problems that provide practice questions intended as a self-test for students, and summative assessments of the course learning objectives. Many instructional systems track assessment results to indicate students’ progress in completing the course but provide little actionable information to students or teachers. Intelligent tutoring systems grounded in competency-based instruction models attempt to do more than simply report assessment results (Graesser et al. 2012). But simple reporting of assessment results is still the most common consideration designers make as they create the data management layer for an instructional system. While this is a good first step in starting an instructional conversation that may improve the helpfulness of the instruction being designed, much more could be done (Gibbons et al. 2008). The personalization and adaptability of instruction depends on the system’s ability to obtain data (including and in addition to basic assessment data). These relevant data must be analyzed; the results must then be reported as actionable information in near real-time. At this level more needs to be done in the design of data management so that technology-enabled instructional systems can reach the potential of data-driven instruction.

Basic learning analytics is becoming the standard for many online learning systems and will continue to become more important. While data analytics has many purposes (e.g., formal institutional and educational research including traditional and blended learning environments), the focus of this paper is to explore the idea of designing instruction (specifically CAI and technology-enabled systems) utilizing data to adapt and improve instruction. In this paper, we outline aspects of instructional design within the data management layer that should be considered when integrating learning analytics into a technology-enabled learning system.

A Framework for Learning Analytics

Learning analytics is a relatively new application of the broader data analytics field involving and extending the integration of educational data mining into general educational practices (Baker and Siemens 2014; Baker and Yacef 2009; ElAtia et al. 2012; Picciano 2012). Currently several educational data mining frameworks have been proposed but are still being refined (Elias 2011; Greller and Drachsler 2012). Campbell and Oblinger (2007) described academic analytics (an educational data mining framework) in five steps: capture, report, predict, act, and refine. One proposed modification to this framework suggests adding a separate data selection (i.e., data mining) step prior to the data capture process to provide guidance for it (Dron and Anderson 2009). Another possibility suggested is to integrate data aggregation, organization, and access considerations into the capture, reporting and acting processes (Hendricks et al. 2008). For the purposes of this paper we use a modified version of Campbell and Oblinger’s (2007) educational data mining framework which includes five steps: data selection, data capture and storage, data visualization and reporting, data use, and system refinement. This adaptation makes the framework more suitable for the design of learning analytics into the data management layer. The aspects of data selection and data capture and storage will be treated extensively as much of the data-management design is done in these steps. Data visualization, reporting, and use are discussed in more general terms, with the eventual refinement introduced as an evaluative feedback loop. These steps illustrate design processes necessary for instructional designers to integrate learning analytics more effectively into an instructional system.

Data Selection

The first step in integrating data use into an instructional design using this framework requires a decision regarding which data should be collected. Data are required for learning analytics, but not all data provide equal benefits. A “digital ocean of data” exists (DiCerbo and Behrens 2012), largely because technology-enabled systems can capture “data exhaust” (digital traces from learners’ online activities). But, the call to use data exhaust based simply on its availability may be ill-informed (Watters 2012). Educators appear to be drowning in that digital ocean, and much of the data-exhaust solution seems to be salt water (superfluous irrelevant data); in many instances the use of data exhaust (e.g., click stream data) has not delivered promised improvements to instruction (Thille et al. 2014).

Purpose and Questions

Rather than settling for the collection of “available and affordable data,” instructional designers should plan to capture data that will provide actionable information (Behrens and DiCerbo 2014). This requires identifying the purpose for data collection and the questions to be answered with these data (Papamitsiou and Economides 2014). If the purpose involves identifying patterns of use or non-use and developing prediction models for academic success or behavioral risk, then broad between-learner data sets are needed. If the educational need involves determining how to adapt instruction for or provide remediation to an individual student, then deep within-learner information is required. Multiple purposes may be identified for data capture within a single instructional system. The dimensionality of the data to be obtained depends on instructional purposes and goals (Thille et al. 2014), and selecting appropriate data that can be used to answer specific questions is more likely to produce actionable information. Designs that do not initially include the selection and capture of relevant data may require a costly retrofit; in some cases a complete redesign of the instructional system may be required.

Theory and Context

Two considerations are necessary when planning data collection: (a) pedagogical and instructional design theory, and (b) contextualization of data and appropriate “grain size.” Addressing the first concern, Reigeluth (1999) suggested that the reason for capturing specific data in an instructional system should be based on relevant theory. He advocated that research-based pedagogical and instructional design theories be used for explicit guidance on how to better help people learn and develop. With a theory in mind, designers should work on identifying appropriate data that would allow them to test those theories and inform action.

Speaking to “grain size” and the types of data to collect, Thille et al. (2014) suggested that the value of educational data obtained from an instructional system is most often determined not by the amount of data obtained about a given learner but by the contextual information and semantic meaning added to the data captured. Additionally, Chung and Kerr (2012) argued for using the “finest usable grain size,” meaning “a data element that has a clear definition associated with it” (p. 3). For example, a click event in a learning environment is by itself meaningless; but when that same piece of data is contextualized, it becomes informative. The click data may be useful if we know the user clicked a button at a specific moment in the learning process and that the button provided the learner with access to a screen that gives a definition for a term. For this reason, Chung and Kerr (2012) recommended that the data collected from a learning environment should be as detailed as possible so it can be linked with other information in the system.

Outcome and Process

Chung (2014) identified two student interactions that may be of interest in a design: (a) outcome measures and (b) process measures. Capturing and reporting basic outcome measures is common practice in most instructional systems that claim learning analytics capabilities (Pardo 2013). However, capturing and using process level data that can be applied in real time are rather uncommon. Chung (2014) lamented that most learning management systems fail to capture data of interest because “they are designed to host content and not designed to measure learners’ interaction with that content” (p. 5). A properly designed instructional system would be planned to capture those data relevant to the learning analytics needed (both assessment and process level data).

In many ways the data selection step is an educational data mining activity. The creation of an initial theory-based prediction model is required; one that can be tested and refined over time. The process is likely to be iterative, not unlike design-based research or rapid prototyping, but at the outset a beginning is needed. In addition to prediction models, assessment tools must be carefully designed and created to measure core skills and threshold concepts relevant to the instructional objectives of the course (Meyer et al. 2006). Too often the assessments we use are not particularly suitable for the purposes of learning analytics (Cizek 2010; Keefe 2007; Marzano 2009). The success of an instructional design endeavor will depend a great deal on the pedagogic vision of the designer (Dron and Anderson 2011), inevitably mitigated by the availability and practicality of obtaining requisite data (Chung 2014; Behrens and DiCerbo 2014).

Data Capture and Storage

Once design decisions about requisite data have been outlined for the instructional design of a system, the second step in this framework is the capture phase of the design process which requires decisions that concern logging and linking of relevant data so users can access it efficiently and use it effectively. In many instances, the data-producing environment will be a single learning management system (LMS). Unfortunately, for a variety of reasons, most commercially available LMSs are used by educators as a means to monitor students’ completion of instructional activities or as an electronic resource repository, rather than as a medium to facilitate and improve student learning through data use (Chung 2014). Designers and educators typically settle for the existing data capture capabilities and functions of the technology platform (LMS) rather than attempting to capture and utilize essential data relevant to the pedagogical purposes of the instruction (Salkind 2010).

Several challenges must be addressed in the design of the data management layer of an instructional system if learning analytics is to be employed. A properly designed instructional system that makes full use of learning analytics will likely require data access from a variety of sources within and outside a specific LMS. Designers need to be mindful of where those data can be obtained, where they will be stored, how disparate data types will be structured, and how data will be made accessible within the system.

Challenge 1: Obtaining Interoperability Access

Substantial amounts of data exist within an instructional system, but much of the learning process for a student occurs outside the purview of the LMS. Rarely will a single LMS contain all the information about a student needed by an instructional system. The design process needs to be concerned with where pertinent student information exists and how to access these data. Recent efforts to help solve this issue include the development of Learning Tools Interoperability (LTI) standards, developed by IMS Global to allow students to have single sign-on access to external tools from their LMS. This allows a student to log into a LMS and have access to a number of other systems. If the data do not exist within the system, the designer must determine how such data may be obtained.

Challenge 2: Capturing Activity-Trace and Process-Level Data

Many systems capture potentially useful data but fail to store or utilize it effectively. Most LMSs can track basic activity-trace data (e.g., page views, resource access, and task submission information). Often, however, these data are only temporarily stored (e.g., for the duration of a user’s session). When these data are saved by the LMS, they are generally stored in the systems’ own databases, which can be expansive. Often stakeholders cannot access this raw data unless specific permissions are granted. When these data are accessible, they are often less than useful because the format, linking information, and contextual metadata are not specified.

In addition, LMSs typically do not track process-level data on how students interact with the instructional aspects of the course: including how the learners interact with videos, quizzes, or content, or how they go about solving an educational task. Work is being done to facilitate the capture of these types of data. For example, Experience API (xAPI) is an interoperability standard developed by Advanced Distributed Learning (ADL) which specifies for the user how to structure and store analytics data. It uses an actor, verb, object structure for analytics statements. These data can be sent to a learning record store (LRS), which is a database allowing users to store their analytics data.

Challenge 3: Storing Data for Real-Time Use

Developing real-time reporting and adaptive instruction requires real-time access to relevant data. Most LMSs do not give users real-time access to student data, but rather provide some sort of application programming interface (API) to extract batch data. Many of these have rate limits that make pulling and analyzing data in real time less feasible. For example, at a university with 60,000 students, a week may be required to pull all of the necessary data using an API. Thus the data provided is a week old. Even on a small scale, using APIs for real-time data analysis and reporting can still require minutes, which is longer than students want to wait for real-time data reporting. Batch processing data works well for educational data mining research when developing a predictive model, but not for real-time access needed when personalizing instruction.

To facilitate quick access, the needed data must often be restructured and transformed into a more usable state. Many organizations choose to use data warehouses to store pre-processed data. A data warehouse is essentially a database used to store data obtained from a wide variety of sources, enabling the data to be used for learning analytics with greater speed and ease. Design decisions need to be made for packaging pre-processed data and storing these data for quick access.

Designing the Visualization and Reporting of Data

The next step in this framework involve presenting results to stakeholders. Data visualization refers to relaying manipulated raw data obtained from a learning environment back to the stakeholders in a visually comprehensive and intuitive data representation that can be quickly understood and interpreted (Pardoe 2013), providing the intended data users with relevant, timely, comparable, and actionable information. Knowing from the outset what data need to be collected is essential (Buckingham Shum 2012); however, design decisions also need to be made about what, how, when, and to whom data will be reported.

Data visualization and reporting can be used for a variety of purposes at the macro (regional, state, national, international), meso (institution-wide), and micro (course, individual user) levels (Frech and Damaske 2012). By far the most common data reporting practice is communicating assessment data results (Woolf 2010). At the course level, most instructional systems track assessment results simply to indicate students’ progress in completing the course. Successfully finishing assigned tasks in a timely manner is deemed indicative of satisfactory learning. This may not be the case. Data reported at the course level are most often provided in various forms of digital dashboards where histograms, graphs, timelines, traffic lights, etc. are used to communicate progress (Behrens and DiCerbo 2013; Pardoe 2013; White and Larusson 2013). At the institution level, administrators use summary assessment data for calculating course completion rates or aggregating indications of basic student competencies.

Effective and efficient data visualization requires that the data captured has been purposefully organized and stored in an accessible format; otherwise, data reporting is challenging and unlikely to be done in real time. At each level it is assumed that the information reported will be useful to the teacher, student, or administrator in making decisions regarding action. The organization and visualization of data are helpful in deciding on subsequent actionable steps only if the data being communicated are accurate and complete (Koskas 2004).

Incorporating Data Use in the Design

The purpose of any learning analytics endeavor is to provide actionable information. The design implication for this step of the framework involves deciding how the data will be used. Regrettably, the designs of many instructional systems end at reporting, with the assumption that users will interpret the data and decide what to do next. However, an instructional system has the potential to do more (Graesser et al. 2012).

An enhanced application of learning analytics might provide stakeholders with reporting that incorporates predictive models as well as interpretation of the data and possibly recommendations for action (Macfadyen and Dawson 2010; Pistilli et al. 2013). The design for an intelligent tutoring system, which might include adaptive algorithms and learning analytics engines, might also include plans for ways the system will adapt or personalize the instruction provided. Actions might include providing the instructor with a list of students who might benefit from remediation, recommending to students the next steps to take, or customizing (adapting) instruction in terms of scope and sequence. Such actions would need to be designed.

System Refinement

The final step in this framework involves monitoring the learning analytics function of an instructional system in a continual self-improvement effort. The predictive models, as well as the capture, reporting, and application procedures, need to be updated on a regular basis (Campbell et al. 2007). The design implications of revising are inherent in the need to revise and improve the designs.

By evaluating various aspects of the system over time (including its function, use, and effectiveness), instructional designers can assess the effects of their designs and deepen their understanding of the pedagogical practice they employ (Brooks et al. 2013). Evaluating design performance not only informs the pedagogical theory of the instruction, it can inform and optimize the learning provided (Brooks et al. 2013; White and Larusson 2013). Evaluating our products requires continuous consideration so that education is refined and improved.

Concluding Summary

Education has fallen behind business in its use of data analytics, for a variety of reasons. However, basic learning analytics is becoming the standard for many online learning systems and will continue to become more importance. In business, a minor increase in market share or purchases due to actionable information provided by data analytics is considered a success. But designing for learning analytics in education is much more challenging. The instruction we design is expected to be effective for a broad range of students in a variety of circumstances. Much of the data needed to facilitate and improve learning is difficult to obtain and tricky to automate. However, part of the problem seems to be that instructional designers do not design for data use. Designers naturally attend to designing the visible layers of instruction (the content and control layers) but fail to adequately design the data management layer of an instructional system (Gibbons 2014). To improve the potential for learning, learning analytics needs to be specifically addressed in the design of instruction.

Largely due to improvements in, and increased availability of educational technologies, learning analytics has the potential to improve the teaching and learning process, but it has not to date reached its potential. A more strategic approach to designing technology-enabled instructional systems is needed if these systems are to more fully benefit from data analytics. Adapting an educational data mining framework for academic analytics proposed by Campbell and Oblinger (2007), we recommend areas in the instructional design process where learning analytics decisions need to be made.

In the data selection process, theory-based decisions must be made concerning requisite data. Too often a designer relies on the data capture capabilities of a specific technology rather than designing in terms of the data needed to answer instructional and pedagogical questions. Many instructional systems are designed to simply track task completion (i.e., progress in a course) as an indication that learning has occurred. Rarely do designers attend to the processes of learning or the aspects of learning that have not been accomplished (i.e., misconceptions, strategy flaws, and faulty practices). At this phase of the design, educational data mining research is needed to create prediction models that will inform designs and help guide decisions on which data are required. Additionally, care should be taken in designing and creating assessment tools that measure core skills and threshold concepts relevant to the instructional objectives of the course (Meyer et al. 2006).

When planning data capture and storage, design decisions need to include accessing data from a variety of sources. Using external tools that are compliant with interoperability standards can help in overcoming the challenges of using log data for learning analytics. Additional challenges of capturing difficult-to-obtain data, packaging pre-processed contextualized data, and storing (i.e., data warehousing) of these data for quick access also need to be considered. If essential data is difficult to access, it likely will not be used.

For effective data visualization, designs need to be completed that support effective reporting of data. Fundamental to the utility of this process is providing the intended users of these data with relevant, timely, comparable, and actionable information. Designers must make decisions about what, how, when, and to whom data will be reported. Design must also anticipate data use; enhanced applications of learning analytics have the potential to go beyond the typical simple reporting. Instructional designers might increase the utility of learning analytics in education by providing stakeholders with reporting that uses predictive models to interpret the data and possibly make recommendations for action. Designs that consider how the data will be used broaden the value of the instruction.

The success of an instructional design endeavor will depend a great deal on the pedagogic vision of the designer, though it will inevitably be constrained by the availability and practicality of obtaining requisite data. These designs need to be tested and revised. By evaluating various aspects of the system over time, instructional designers can improve their designs and deepen their understanding of the pedagogical practices they employ. If learning analytics is to have the effect on education intended and anticipated by many, designers must attend more carefully to the data management layer of instruction.