1 Introduction

Massive Open Online Courses (MOOCs), such as Coursera, Udacity, and edX, are “online courses designed for large numbers of participants, that can be accessed by anyone anywhere as long as they have an internet connection, are open to every one without entry qualifications, and offer a full/complete course experience online for free” (Jansen & Schuwer, 2015, p. 4). In recent years, MOOCs have become an important research topic and have received increasing attention in both popular writing and academic scholarship (Ebben & Murphy, 2014; Kovanović, Joksimović, Gašević, Siemens & Hatala, 2015; Toven-Lindsey, Rhoads & Lozano, 2015). Indeed, MOOCs have gained increasing prominence and have been heralded as promising learning initiatives with great potential for improving learning and learning opportunities (Bozkurt, Akgün-Özbek & Zawacki-Richter, 2017; Brahimi & Sarirete, 2015). A recent review noted that learner retention, motivation, experience, satisfaction, and assessment were the most common foci of MOOC research (Zhu, Sari, & Lee, 2018). Relatedly, an important question regarding MOOCs is how effectively do MOOCs support learning (Formanek, Wenger, Buxner, Impey, & Sonam, 2017). Although a long line of analytical and qualitative research on MOOCs has been conducted which offer valuable insights, more research is needed to provide deeper insights into how learner behaviors on MOOCs are related to learner outcomes.

A growing body of research has examined the use of learner-system interaction data. Digital learning environments can now track and log fine-grained data about learners’ interactions with the learning environment (Dutt, Ismail, & Herawan, 2017). Moreover, new computational methods are available to mine and analyze this data to potentially develop models of learner behavior and gain insights into learning behavior (Slater, Joksimović, Kovanovic, Baker, & Gasevic, 2016). The introduction of massive open online courses (MOOCs) has afforded an unprecedented opportunity to explore how learning behaviors influence learning outcomes using big data (DeBoer, Ho, Stump, & Breslow, 2014). The availability of fine-grained big data (e.g., learner behaviors) collected in MOOCs can be leveraged to develop learner models for predicting various learner outcomes (e.g., academic performance). It is this research aspect that we focus on in the current paper.

We recognize the importance of learner behaviors in learning environments as key features for predictive modeling. Indeed, research on grade prediction has taken on new life as predictive modeling can be leveraged to deliver pedagogical agents that support personalized learning experiences in online learning (Li, Xie, & Wang, 2016; Mothukuri et al., 2017; Romero & Ventura, 2010; Yang, Brinton, Joe-Wong, & Chiang, 2017). However, to be useful, MOOC research must move beyond single course descriptions to comparisons across contexts, and from post hoc analyses to greater use of experimental designs to study learning in MOOCs not simply studying engagement (Reich, 2015). In the following section, we review the literature related to grade prediction in MOOCs.

2 Literature review

Several studies have investigated the use of learner behavior data from MOOCs for grade prediction (Ashenafi, Riccardi, & Ronchetti, 2015; Brinton & Chiang, 2015; Hong, Wei, & Yang, 2017; Li, Xie, & Wang, 2016; Meier et al., 2015; Xie, Zheng, Zhang, & Qu, 2017; Yang et al., 2017). However, little is known about how study behaviors in MOOC learning environments influence student course performance (Li, Xie, & Wang, 2016) even though study behaviors appear to directly influence learning outcomes (Azevedo & Hadwin, 2005).

Heterogeneity of learning behaviors has prompted researchers to consider other data sources that may be better correlated with learning performance. Researchers in China built a model leveraging demographics, forum activity, and learning behavior that outperformed other grade prediction models so far (Qiu et al., 2016). However, response sparsity per individual and the need of personalized models have led others to consider more plentiful data streams. Indeed, researchers (Yang et al., 2017) have proposed a model based on video-watching clickstreams that outperformed an average of past performance as predictor of performance of multiple-choice evaluations.

Li, Xie, and Wang (2016) built a predictive model based on expert feature engineering, selecting 15 features based on social cognitive and behavioral theories of learning, to predict quiz grades. These theoretically informed features align with other proposed models that include both demographic characteristics, video-watching behaviors, and previous academic performance. They found high prediction accuracy across mean prediction, regression, and neural networks, with small gains over mean prediction for regression, and neural networks only. Unfortunately, the authors did not report statistics on the average contribution of each variable, though from previous academic performance research, it is well-established that prior performance is the single strongest predictor of future performance (Bazelais, Lemay, & Doleck, 2018) as is likely the case here too.

Whereas logistic regression has been favored for dropout prediction in MOOCs as this is essentially a classification problem, both linear regression and neural network models have been used for grade prediction (Yang et al., 2017; Li, Xie, & Wang, 2016). Brinton & Chiang (2015) conducted exploratory modeling of video watching behavior using artificial neural networks to predict student performance on course assignments. They demonstrated that video viewing behaviors can accurately predict student performance from the first data points and hence provide an early detection system to target student difficulties. When researchers have compared regression and neural networks, they have found similarly high predictive accuracy with either approach (Li, Xie, & Wang, 2016). However, one advantage of linear models is the ability to evaluate the relative contribution of each factor such as the amount of variance explained in structural equation models. The ability to disentangle the influence of different factors (or features) allows to understand the underlying dynamics and better formulate a response that is aligned with the learner’s needs.

A k nearest neighbors algorithm was used to find the optimum grade prediction moment using weighted predictors of evaluation categories (Meier et al., 2015). Their mathematical derivation permitted the analysis of interactions between and within factors and identified exams and quizzes as most accurate predictors of final grade. Indeed, their recommendation that students be tested early and often finds its proponents in testing (Agarwal, Karpicke, Kang, Roediger, & McDermott, 2008; Bangert-Drowns, Kulik, & Kulik, 1991; Roediger & Karpicke, 2006a, 2006b).

Sinha et al. (2014) demonstrated an interesting attempt at assessing the type of behavior exhibited by video-watching clickstreams, conceived as sequences of video-viewing related actions that collectively describe behaviors employing limited-capacity information processing theory. The authors used a natural language processing technique to identify the most common 4-grams and coded them as different behaviors such as fast/slow watching, skipping, checking reference, clarifying concept, etc. Such activities were then weighted to compute an information processing index such that particular clickstreams were assessed as being high or low instances of information processing. They argued that an information processing index could be used to estimate the degree of difficulty experienced by the students with respect to the course material, such that a positive score indicates higher information processing and a negative indicates less information processing. The authors’ study goes some way to characterizing video viewing patterns from clickstreams and helps to develop intuitions regarding how learning behaviors are correlated with course performance, however, it remains to examine how such features are related to assignment and course completion.

Objective

Unlike the emphasis of previous work examining MOOC learning outcomes, our goal in this study is to examine how learner video-viewing behaviors influence grade prediction in MOOCs. Specifically, the aim of the present study was to ascertain the links between students’ video viewing behaviors and course grades. As such, we address the following research question: Can students’ video-viewing behaviors predict course grades?

3 Methods

Context and sample

The data for this study was derived from a MOOC [Big Data and Education on the EdX platform] focusing on educational data mining and learning analytics offered by the University of Pennsylvia, and developed and taught by Dr. Ryan S. Baker. The course had a total of 10, 432 registered users.

Sample and procedure

Video logs from the two courses were mined using Python Jupyter scripts (https://github.com/davidjlemay/EdX-Video-Feature-Extraction). The dataset included 6241 instances. In Table 1, we present the ten video-viewing features used in this study. We used the nine features from Brinton and Chiang’s (2015) study: number of Rewinds, Fastforwards, Pauses and Plays, in addition to fractional and total amounts played, paused, playback rate. In addition, a tenth feature was included to the feature set: number of videos viewed per week. To provide more information about the data we also present the ten first rows in the dataset (Table 2): the first ten columns are the video-viewing features and the last column is the outcome variable (i.e., course grade).

Table 1 Video-Viewing Features
Table 2 First ten tabulated features

4 Analysis and results

While predictive modeling is the core aim of the current research, our analytical approach also follows calls for complementing predictive analytics with explanation analytics (Shmueli, 2010).

Prediction analysis

The objective of the analysis was to evaluate how accurately the ten features could predict course grade. We analyzed the data using the WEKA workbench (Hall, Frank, Holmes, Pfahringer, Reutemann, & Witten, 2009). The results of the ten-fold cross-validation evaluation for the different commonly used classifiers (Logistic, SMO, NaiveBayes, J48, JRIP, IBK, RandomForest, and WekaDeepLearning4J) are provided in Table 3. We should note that performance was observed to be above chance (Kappa>0 and AUC > 0.5) (Hulse et al., 2018; Jiang et al., 2018). As seen in Table 3, the predictive accuracies reach as high as 70.1941%. We find that frequency of video viewing per week is a better predictor than individual viewing features such as plays, pauses, seeking, and rate changes.

Table 3 Cross-validation Results

Explanation analysis

The objective was to assess the degree to which the ten features help explain the variance in the dependent variable (course grade). A partial least squares structural equation modeling (PLS-SEM; Henseler, Hubona, & Ray, 2016) approach was employed and we used the WarpPLS software (Kock, 2018a, 2018b) for analyzing the data. Table 4 illustrates the significance of the links between each feature (F1 to F10) and the dependent variable (course grade). The coefficient of determination, R2 value, of the dependent variable was 0.086; that is, the antecedent variables (ten features) explain 8.6% of the variance in the dependent variable (course grade). The findings suggest that while the features used may lead to a highly predictive model (Accuracy = 70.1941%), the same feature set may not necessarily lead to a highly explanatory model (R2 = 8.6%). This suggests the need for further inquiry into salient features that can lead to models that are both highly predictive and explanatory.

Table 4 Structural Model

5 Discussion

Research in grade prediction has recently begun to examine the influence of video-viewing behavior on academic performance, specifically on multiple-choice quizzes associated with video lectures in a MOOC. However, this research has for the most part focused on predictive accuracy as metric to evaluate the robustness of the models. This approach proves problematic because of the high heterogeneity in the samples. Due to the inherent limitations of correlational analysis that is not grounded in explanatory modeling, predictive models may not transfer across instructional situations. Boosting accuracy with many parameters runs the risk of overfitting the data. Indeed, the features employed here demonstrate a respectable amount of predictive accuracy, though weaker than models employing demographic data. As one of the few studies to perform an explanatory analysis, we find that engineering features contributes an additional layer of complexity as it increases the chance of introducing bias into the models. As in this case, number of videos viewed explains more variance alone than all the other expert engineered features. Our analytical approach of complementing predictive modeling with explanatory analysis has broader implications that should be of interest to researchers interested in analytical advancements.

Whereas video-viewing behavior is proving an important data source for predictive modeling in MOOCs, it is important to expand the search space to avoid the phenomenon of searching in the well-trodden paths of intuition only. From a pedagogical tutor perspective, this means we ought to search for signals that are pedagogically meaningful, that is, that can support self-regulated learning and metacognition (Azevedo and Hadwin, 2005). In other words, our putative pedagogical agent would support learning by inciting students to make connections with prior knowledge and help the student to identify misconceptions in their knowledge. Such pedagogical actions would translate in MOOC user behavior such as searching across multiple videos as students are encouraged to review earlier lectures to build and refine their knowledge structures. In such cases, the salient objective for a pedagogical agent would be increasing number of videos viewed in a particular period as evidence of student self-regulated learning behavior.

Certainly, focusing of video-viewing behaviors within a video—as opposed to across all videos in a MOOC—appears to be related to effective study behaviors and may indeed improve quiz performance (Brinton & Chiang, 2015). However, that may not be sufficient to boost retention and course completion rates by itself. Self-regulated learning theory suggests that to increase MOOC completion, the salient learning signals—which may ultimately prove to best explain MOOC performance—may be sought in measures that demonstrate self-regulation and metacognition in the user, such as multiple video-viewing best explaining assignment completion (Lemay & Doleck, under review). Indeed, self-regulation and metacognition have been shown to be strong predictors of academic performance.

MOOC learning environments offer an excellent context to study video watching behavior given the ubiquity of video-centered MOOC platforms like EdX and Coursera. Explaining how learning behaviors in MOOCs are related to course performance can help educationalists develop adaptive scaffolds for supporting the learner’s self-regulation and metacognition. By comparing patterns of video-viewing activity and associated grades, we can develop responsive pedagogical agents informed by self-regulated and metacognitive learning theory to provide predictive contextual support and hopefully improve MOOC completion rates.

Limitations

The present study has some limitations that also provide avenues for future work. This study is constrained by its analysis of a single MOOC. It only examined video-viewing features and ignored other data and potential sources of variability including demographic characteristics and prior knowledge. These findings may be extended by comparing other courses and by using more robust experimental designs to determine how user study behaviors are related to academic performance in MOOCs.

Conclusions and future directions

This study deepens our understanding of the links between learner behaviors on MOOCs and performance. There is a clear need for further insights into ascertaining the salient features that contributeto both a highly predictive and explanatory model. Only by comparing predictive and explanatory analyses can we begin to understand how user behaviors are related to MOOC outcomes. Such causal analysis would help to explain how instructional decisions influence student learning behaviors and help design MOOCs that support student self-regulated learning to boost retention and course completion.