Introduction

Society, as a whole, has become more entwined with the use of technology and computers in everyday life and work. This has increasingly influenced online learning environments to assist students in acquiring knowledge and skills. The use of technology-based training and the growing adoption of learning management systems have more than doubled in the past decade (Brown et al. 2012). Higher education has seen paradigm shifts from the traditional classrooms and tangible learning resources to asynchronous e-learning environments. This paradigm shift has fundamentally changed how learners are engaged. Technology-enhanced learning has many challenges, one of which is the learner online engagement. Learner engagement is primary factor to the effective teaching and learning in an e-learning environment. Bruner (2013) postulated that “engagement is the ultimate test” (p. 34) to successful learning in an online environment.

As economies of the world continue to evolve, there is a continuum of educational and training needs for adult learners in lifelong learning. According to OECD statistics, 57% of the population aged 25–64 years old in Singapore, participated in formal and/or non-formal education in 2015 (Kuczera 2017). From an adult education survey, this statistic saw an upward trend in EU-28, from 2011 to 2015. Increasingly, course developers and instructional designers have to grapple with the learning needs of this growing group of adult learners. As formal learning take place in an increasingly networked environment over e-learning platforms and learning management systems, in order to cater to the changing demographics of learners. The shift towards digital environments makes it possible to retrieve, store, manage and analyse increasingly large amounts of data over the cloud in the digital earth and relevant science domains (Yang et al. 2017). In particular, in the online learning context, the abundance of clickstream data provides an opportunity for educational practitioners and researchers to harvest data related to learner engagement. Aided by data-mining methods, the analysis and sense-making of the interaction data between the learner, learning environment and learning activities has become less cumbersome than before, and this can support a better understanding of the online engagement behaviour (Gaševic et al. 2015). The importance of understanding these interactions and what might increase effectiveness of such interactions in online education is paramount for meaningful adult learning. In particular, this research focuses on discovering meaningful patterns of engagement and disengagement in learning activities, observed from traces of the adult learners’ online engagement behavioural data.

Measuring learner engagement and its influence on learning is challenging. While the definition of learner engagement should stay consistent with more traditional learning environments, the measurement of learner engagement should be unique to the data availability of the online learning environment (Anderson 2017). Identifying proxies of online learner engagement can provide a degree of measurability that can be used to inform and improve upon existing teaching and learning practices (Beer et al. 2010, p. 75). Hence, the aims of this study is twofold: (1) to explore the potential of reconstructing a variation of the RFM (a marketing segmentation technique based on customers’ recency, frequency and monetary purchasing behaviour) analysis, as a framework to codify and quantify the adult learners’ online engagement; and (2) to explore the online engagement patterns of adult learners using data-mining techniques (i.e. unsupervised learning method: non-fuzzy clustering).

Literature review

The literature review comprises four parts. The first part of the section describes the online learner engagement. The second part presents reviews of some studies on the RFM model as a measurement variable. The third part focuses on the log data with sub-parts on the engagement metrics. The last part of the section presents some studies on applications of learning management system data analysis.

Online learner engagement

There are several different definitions of learner engagement. A literature review of the learner engagement by Trowler in 2010 defined learner engagement as “the investment of time, effort and other relevant resources by both students and their institutions intended to optimise the student experience and enhance the learning outcomes and development of students, and the performance and reputation of the institution” (Trowler 2010, p. 3). The definition is learner centred while also acknowledging the role of the institution in learning engagement. It is possible to argue that activity within a learning management system (LMS) might serve as a measure of learner engagement because LMS log data contain a record of learner behaviours as well as institutional practices. Learner behaviours such as logging in, accessing course content, completing quizzes, submitting assignments and taking part in discussions related to coursework are captured.

According to Fredricks and MsColskey (2012), “researchers, educators, and policymakers are increasingly focused on student engagement as the key to address problems of low achievement, high levels of student boredom, alienation, and high dropout rates” (p. 763). The definition and measurement of learner engagement become more complex in the case of online learning environments. In 2009, Allen and Seaman (2010) conducted a survey on online teaching and learning and concluded that learner engagement online was a key challenge in higher education. A National Survey of Student Engagement (2012) showed that the online learners while faced with more challenges are less engaged in their online learning activities. This has led to their dissatisfaction with the overall learning experience.

Learner engagement has been defined in research in terms of effort (Jung and Lee 2018), time-on-task (Lee 2018), and motivation (Jin 2017). Research has also linked learner engagement with involvement time in learning activities and practices that lead to increased academic achievement (Axelson and Flick 2011; Leach and Zepke 2011; You 2016).

Learner engagement is made up of the interaction that a learner has with their instructor, course content, and other learners (Siemens 2013). Online student engagement has been measured by single observed variables such as independent time in course (You 2016), the number of on-task as well as off-task internet activities (Lee 2018), and self-esteem (Harbaugh and Cavanagh 2012). Students engage instructors and other learners by posting on discussion boards, attending faculty live-chat sessions, instantly messaging their instructor during faculty office hours, viewing faculty-recorded chat sessions, viewing faculty comments and announcement postings, and e-mailing their instructor and classmates (Anderson and Dron 2011). In addition, students receive online instruction using streaming media. Most campuses offer students a mix of both asynchronous and synchronous communication technology in the delivery of online instructions (Cleveland-Innes and Ally 2013). Online interaction and collaboration occur in virtual classrooms that allow learners to decide which e-learning technologies they will employ to engage in online instructional activities (Anderson and Dron 2011).

Given the above, there does not appear to be a distinct definition for online engagement; however, the following definition characterizes an amalgamation of existing literature. Online engagement is seen to comprise active and collaborative learning, participation in challenging academic activities, formative communication with academic staff, involvement in enriching educational experiences, and feeling legitimated and supported by university teaching communities (Coates 2007, p. 122). At its basic tenet, online engagements are manifestations of a range of interactions such as interactions of learner-to-content, learner-to-learner, and learner-to-teachers. Closely related to this, this paper proposes a study that looks at online engagement as a construct that is composed of three key indicators (i.e. Immediacy, frequency and duration); the inclusion of these indicators is expected to be adequate in providing a degree of measurability for online engagement.

RFM and its application in learning analytics

In this study, the RFM (or recency, frequency, monetary) model is used as a reference to derive the proposed online engagement indicators. The RFM model is most widely used in marketing research and practice to quantify customer loyalty and purchase behaviour (Chang 2010). Based on this, the three variables are harvested individually from marketing databases, combined linearly and then scored as a mean to quantifying a customer’s overall purchasing intent (Hu et al. 2012). Following and related to this, the concepts of RFM have also been proposed for different domains such as for the electronics industry (Chiu et al. 2009) and for the automobile industry (Chan 2008).

The RFM Model’s fluid and yet simplistic approach is perhaps the fundamental reason to its effectiveness and reliability when it is replicated in different domain areas. Other studies have extended its use to the learning context, Chang (2010) has adopted and re-conceptualized the RFM model to an eLearning-recency, frequency, monetary (EL-RFM) model to score learners’ participation in an e-learning environment. The EL-RFM model was further customized to quantify learners’ learning outcome online. The model, a reclassification of the RFM model, has been redefined to EL-R (recency of eLearning), frequency of eLearning (EL-F), and investment of time in eLearning (EL-M). The EL-RFM (eLearning-recency, frequency, monetary) model was designed to measure learners’ motivation, and for the instructor to analyse learners’ behaviours online (Chang 2010).

Similarly, Kim and his colleagues (Kim et al. 2015) proposed a LS-RFD model (learning style-recency-frequency-durability) to score the level of activities of learners for analysis and modelling. The study extracted the variables of teaching–learning activities and mapped them according to the learning style of the learner. This is to provide a measurement of the learners’ preferences according to teaching–learning activity. The measurement is based on the recency, frequency and durability scores of each learning activity. From the results, user characteristics were extracted and grouped according teaching–learning activities. This is further categorized by the levels of preference and activity of each learner. In another study conducted in Hungary, the RFM model was re-classified to score online learning behaviour (Toth 2013). The classification streams of the variables of recency and frequency were applied to identify online learning behaviour. The segmentation of the two variables were done in five parts: ranging from high (80%, top 20%), to medium–high, to medium, medium–low and to low (20%, bottom 20%).

Log data as a data source

Online learning invariably revolves around the learning management system (LMS) and its online access by learners. The measurement of learner engagement is unique to the data availability of the online learning environment. In such an environment, online materials (e.g. recorded lectures, quizzes, course content and readings) are provided to the learners, and access to these online materials typically happens between the start and end of a course. An access (i.e. the action) to the online materials (i.e. the learning object) by a learner (i.e. the actor) triggers a learning event, and this is logged by the LMS in the form of an interaction, with a timestamp. Each interaction (e.g. a learner accessing a certain content or watching a video) is recorded as an instance (i.e. a row of data) (refer to Fig. 1 for an illustrative example).

Fig. 1
figure 1

Illustrative example of a source dataset (anonymized)

In this study, however, the recency, frequency and monetary (RFM) segmentation is adapted to immediacy, frequency and duration (IFD) where recency is replaced with learner’s immediacy (of access; which reflects the learner’s sense of urgency or excitement to learn in an online learning context) and monetary is replaced with duration (or better known as time-on-task). Besides using the RFM model as a reference to derive online engagement metrics, the RFM approach of discovering and segmenting the profiles of customers’ purchasing behavioural intent (e.g. Birant 2011) is also adapted in part to describe learners’ engagement level.

Associated with the discussion above, there is typically a designated starting point/date for online learning (e.g. the start of a course), a designated ending point/date (e.g. the end of all assessments or a course). These are necessary reference dates for the extraction of indicators of online engagement proposed in this study.

Immediacy (of access)

The role of immediacy behaviour in the learning context was mentioned as early as 1970s and 1980s, and is widely documented in the area of instructional communication (e.g. Mehrabian 1981). In those times, the concept of immediacy construct is characterized by teacher–student interaction (Hosek et al. 2017), where the focus of instructional methods are very much teacher-centric. However, the adoption of technology has also in part reversed the role of a learner. Teaching and learning activities have become increasingly learner-centric, where a learner is expected to play a more active role in achieving their learning outcomes. Active learning is the interactive role learners assume with course materials, instructors and peer learners in constructing new and more complex knowledge and understanding to make sense of information, interpret and solve problem (Keengwe et al. 2014). In an online learning environment, learners’ active learning can be observed by their accessing log data, and by evaluating learners’ interactions with the content, his peers and the instructors. And so in this study, we relate a learner’s immediacy to the time lapse between the online materials’ access start time and the start of his first online access. In this context, it can be expected that an increased level of online engagement is associated with a shorter Immediacy (i.e. an engaged learner is expected to be faster in accessing online materials once they become available).

Frequency (of access; or login frequency)

Research has found the learners’ login frequency in an online course to be a key variable in predicting learning performance. An early study by Piccoli et al. (2001) found LMS’ login frequency of learners to be highly correlated with course satisfaction. Kang et al. (2009)’s study examined the learners’ online participation and frequency and their effects on academic achievement in an e-learning course. They found that the access frequency online is correlated with academic grades as well as attendance rate.

Similarly, Jo et al. (2015) highlighted the significance of frequency of learners’ login as a predictor of learner’s academic performance. They studied how frequently learners logged into the LMS by totalling the learner’s login time and frequency in the LMS. The study showed that the (ir)regularity of the learning interval is correlated with and can predict learning performance. The regularity of the learning interval represents the learners’ engagement, active participation and awareness of their learning level during the e-course.

In this study, a learner’s frequency is defined as the number of episodes of online access in the maximum relevant time period (i.e. from access start time to the end of course). In the current context, it can be expected that a higher level of online engagement is associated with higher levels of frequency (i.e. an engaged learner can be expected to access the online materials more frequently; maximizing more of his learning opportunities).

Duration (of access; or total time-on-task)

Time-on-task has long been recognized to be a significant variable that is correlated with learner engagement as well as a predictor of learners’ achievement. Time-on-task becomes a major component of how well learners learn and retain knowledge (Albert and Kussmaul 2008). Learners’ participation time during the online course is calculated based on “login time”, as it is deemed that in an eLearning course, learning usually takes place while the learners are logged on. Gunn et al. (2007) also posit that “learning takes time because it requires growing new dendrites, synapses, and neural networks” (p. 63). As the learners spent time manipulating what they are learning, it became integrated into their memories. In a research project that studied the online learning behaviours of 824 post-secondary learners, Wagner et al. (2008) found a positive co-relation between academic grades and the amount of time invested. The quantity of online time invested was calculated based on “login time”. The research has noted the increased learning time with improved learning outcomes. In this study, duration represents a learner’s total online access time.

Applications of LMS analysis

The analysis of the learners’ log data and patterns is often performed using educational data mining (EDM). Delavari et al. (2008) suggested that the use of data mining can enable educators and their learning institutions to improve decision making within administration, to have greater accuracy in predicting learners’ academic performance, to be effective in developing strategies, and to have a more efficient resource allocation. In particular, analysing the log data in the LMS can identify the at-risk learners during the course to provide the just-in-time support and guidance (van Barneveld et al. 2012). The data trails obtained from the LMS are significant and can provide key inputs with regard to the online teaching and learning processes.

The Blackboard, a widely used LMS, provides for the early prediction of learners’ academic performance by analysing the learners’ log data (Fountain 2016). Similarly, Purdue University implemented Course Signal, a mid-course warning system. Course Signal provides different colour signals that are the same as those of a traffic light to alert both the instructor as well as the learners of their learning status. Findings from a research study conducted by Purdue University indicated that the Course Signals assisted the at-risk learners to improve their academic performance and helped decrease attrition rates (Purdue University 2013). The study mined data from the LMS and employed learning analytics to predict key teaching and learning challenges of the at-risk learners. The application of educational and learning analytics enabled the educators to provide timely interventions to improve learning outcomes (van Barneveld et al. 2012).

More importantly, besides harvesting useful indicators to inform educational practice and future research, this study takes a more reasonable approach of examining the deplorability of proposed indicators, and of equal importance, its extensibility to analyse learner’s behaviour in other online learning courses. In using the RFM approach to learn learners’ segments based on the three indicators as proposed in this study, the ease of the interpretation of findings by end-users (e.g. instructors and administrators of courses), who may not be so technically conversant, carries more weight in our consideration of our proposed methodology.

Methodology

Dataset

The data for this study were extracted from a six-week-long undergraduate course offered at a University in Singapore. The course was offered primarily to adult learners. The delivery of the course was conducted predominantly online using interactive study materials. Adult learners received guidance and support largely from an online learning environment, supplemented with some face-to-face sessions. Learning resources, pre-class quizzes, and online discussions are hosted on the University’s Canvas LMS platform. Data are captured by a learner’s click-action, such as, when an adult learner accesses the content or interacts with his peers in discussions, triggering an interaction log with the LMS. The dataset contained such behavioural data logged from 418 adult learners, comprising more than 100,000 instances of online access.

Data preparation

Several data processing steps will be discussed in the following sections to prepare the data for modelling to analyse learners’ engagement patterns. Table 1 recalls the three indicators of online engagement proposed in this study.

Table 1 Description of online engagement metrics—immediacy, frequency and duration

Accordingly, the following online engagement metrics can be harvested for a particular learner based on Fig. 2.

Fig. 2
figure 2

Schematic diagram—harvesting online engagement metrics (immediacy, frequency, duration) for a learner

Calculating and normalizing immediacy, frequency and duration

It is important to note that different forms of analysis can have different implications for how the data should be prepared. For the purpose of this analysis, we address some general considerations for data preparation.

It is given that the calculated values of immediacy, frequency and duration will vary according to their characteristics. In the original form, the metrics can take on different ranges and scale of measurements. For example, while Immediacy may be bounded within a range from 0 to 45, and measured in days, the values of frequency and duration can range from 0 to relatively larger numbers, and measure in different scales (i.e. counts and in minutes, respectively).

In general, learning algorithms benefited from data normalization. Furthermore, for comparability purposes, there is a need to scale individual samples to unit norm. In this regard, a min–max normalization procedure is adapted to transform the original values to within a range of 0–1. The equation is given as

$$ {\text{Normalized metric value }} = \, \left( {{\text{original metric value }}{-}{ \hbox{min} }} \right) \, / \, \left( {{ \hbox{max} }{-}{ \hbox{min} }} \right) $$

As an illustration, suppose the minimum and maximum Frequency values are 1 and 13 for the entire class of learners. Then for learner I (with a Frequency of 5) we have,

$$ {\text{Normalized }}F_{\text{I}} = \, \left( {{\text{Original }}F_{\text{I}} {-}{\text{ Min }}F} \right) \, / \, \left( {{\text{Max }}F \, {-}{\text{ Min }}F} \right) \, = \, \left( {5 \, {-} \, 1} \right) \, / \, \left( {13 \, {-} \, 1} \right) \, = \, 4 \, / \, 12 \, = \, 0.25 $$

Normalized FI will range from 0 (when original FI = Min F) and 1 (when original FI = Max F). The same applies to the other online engagement metrics.

Computing an aggregate online engagement index

As discussed in the preceding section, a shorter immediacy is expected to correspond with a higher level of engagement. In deriving an aggregate online engagement index, all of its included component metrics should be measured in the same and “right” direction (i.e. a higher component metric should be associated with a higher level of online engagement). Otherwise, the component metrics can net-off each other, and the aggregate index will not be meaningful.

In the proposed framework, normalized frequency and normalized duration are in the right direction in that higher values indicate a higher level of online engagement. To set the normalized Immediacy to the right direction, the following reverse-scoring needs to be applied:

$$ {\text{Reversed}} {-} {\text{scored immediacy}}\, = \,1 - {\text{Normalized }}I_{\text{I}} $$

Then, taking reference from the RFM Model, it is possible to aggregate II, FI and DI into a single online engagement index (such that a higher index reflects a higher level of online engagement). The approach is to first reclassify each of II, FI and DI values into bins (i.e. in this case, into terciles). In the study, the values are split into three bins such that Bin 1 contains the lowest values and Bin 3 the highest. At the end of the reclassification, each learner will belong to three unique bins corresponding to their II, FI and DI values.

Next, each bin membership is quantified by its corresponding bin number (i.e. Bin I = I). In other words, Bin 1 is given a value of 1 (reflecting a low level of online engagement as described by the variable), Bin 2 a value of 2 (reflecting a moderate level of online engagement as described by the variable), and so on. Finally, the index is computed by summing the three bin numbers. Weights can be incorporated, if desired, to reflect the importance of the component metrics (i.e. IndexI = IFDScoreI = WI*Bin[II] + WF*Bin[FI] + WD*Bin[DI], where WI, WF and WD are the weights of I, F, and D, respectively.

As an illustration (see Fig. 3), suppose Student A is placed in Bin 3 for each of I, F, and D. He obtains an index score of nine (given that IndexA = IFDScoreA = 1*3 + 1*3 + 1*3 = 9, where WI = WF = WD = 1).

Fig. 3
figure 3

Illustration of online engagement level based on IFD model

Performing clustering to identify learners’ clusters

It is possible to carry out clustering to identify the natural grouping of learners according to behavioural characteristics described by their II, FI and DI values. The challenge of clustering, that is, the number of clusters is not known a priori. In this study, clustering was performed using a two-stage approach. Firstly, the data are passed through a TwoStep clustering analysis algorithm (available in IBM SPSS Modeler Version 18.0) to determine an optimal number of k clusters. Then, k-means is applied to discover the groupings. The hard clustering algorithm will group each instance (i.e. in this case, each learner) into one of the clusters based on their IFD values, giving us homogenous groupings.

Scoring IFD

To illustrate the operationalization (i.e. measurement) of online engagement, the focus was on one learning activity (i.e. pre-class quiz) identified from the course. Table 2 presents samples of the original and normalized II, FI and DI values for four adult learners, in a tabular format (note that, the normalized Immediacy (normalized I) values have been reversed-scored).

Table 2 Example of learners’ I, F, and D values for the learning activity

As discussed earlier, en-route to computing an aggregate index, each learner will be “segmented” into three unique bins corresponding to their II, FI and DI values. It is possible to interpret an individual learner’s online engagement level, relative to the entire class, as described by their II, FI and DI values.

To illustrate its application (see Table 3), suppose that Learner ID130 has been grouped into BinI = 3, BinF = 3 and BinD = 1, which indicates that this learner has demonstrated a higher level of engagement (i.e. in terms of a shorter immediacy and a higher level of frequency), as compared to Learner ID241. Learner ID130 is also associated with a higher index score, relative to Learner ID241.

Table 3 Example of results from IFD analysis

In addition, it is possible to further analyse the output and interpret online engagement patterns, at a whole-of-class level (see Fig. 4). From it, we can see that close to 50% of the learners who have exhibited high immediacy, have also demonstrated high level of online activity. Conversely, close to 40% who have exhibited low immediacy, have also demonstrated a low level of online activity.

Fig. 4
figure 4

Visualization of engagement patterns (based on immediacy and frequency)

The application of RFM segmentation (re-adapted here as IFD) results in 27 possible groupings, from a 3 × 3 × 3 RFM matrix. While this application has its benefits in our proposed framework, in that, it becomes easier to interpret the learners’ level of online engagement, in a relative manner. The groupings may still appear divergent, and it may be become daunting to administer or tailor targeted actions to engage 27 distinct groups.

It is possible to perform clustering to further group learners into a smaller number of groups. As discussed earlier, clustering is applied via a two-stage process. In doing so, the TwoStep clustering algorithm recommends that k = 5 (k = the number of clusters).

To illustrate the utility of clustering and its output, please refer Table 4. The application of clustering gives us five natural groupings of learners, explained by the centroid value of I, F, D, and IFD score. The centroid value is the average of the data points in a cluster. In the context of this application, the centroid of I, F, and D is any value ranging from 1 to 3, whereas, the centroid of IFD score is anything from 3 to 9. The group’s average IFD score is used to estimate their level of online engagement.

Table 4 Learner segments, centroid values and engagement level

It is possible to further analyse the output and interpret online engagement patterns. For example, the results demonstrate that learners from cluster C1 are more engaged compared to learners from the other clusters. C1 consists of learners who have accessed the online materials more often. They belong to the group who have demonstrated moderate urgency to engage with the online materials, and also spent a fair amount of time on the materials like an average learner in class. On the contrary, learners in cluster C5 are somewhat more dormant. They show a lack of enthusiasm (i.e. unhurried) in learning, and appear to be disinclined to exert more effort to engage with the online materials. Thus, they are our learners with low level of online engagement.

Discussion

The silhouette function is used to interpret and validate cluster validity (e.g. Battaglia et al. 2015; Jun and Lee 2010). The average silhouette coefficient of 0.6 has been shown empirically to be effective given a clear and distinct cluster structure. The study has also taken considerable steps in ensuring the validity of the measurements. Past works have been consulted to support the derivation of the proxy measures of online engagement. Both the methodology and its results are expected to be replicable in other studies of the online engagement behaviour of adult learners.

The outputs from this analytics procedure is expected to advance analytical procedures for decision-making in the area of teaching and learning, instructional designs, and student support. For instance, online activities that have fallen short of the desired engagement level can be re-examined for its effectiveness at the whole-of-class level. Related to this, online activities that have not helped students achieve their learning outcomes may require reinforcement of instructional methods.

The measurement of an adult learners’ online engagement behaviour is an important precursor for deeper analysis of student learning at the individual level: from the perspective of adult learning, and especially in the shoes of an adult learner who may have to juggle the demands of work, life and academic progression. Following this, it will be interesting to examine engagement levels with antecedents and consequents of learning, that is, the degree of association of an adult learner’s engagement with the course and his course performance. In addition to this, it will be interesting to uncover further unknown patterns or trends from learner engagement to aid educational practice and research at the university level.

Conclusion

The study presented findings from an online learning activity that is targeted at adult learners. Firstly, this study, while being simplistic in its approach, explored possible data features as proxy measures of online engagement based on the LMS log data presented. Although the use of similar count-based and time-based measures have been frequently discussed in the existing literature, the potential of using the “Immediacy” variable as one of component metrics for estimating online engagement was discussed in this study. The study also looks at re-adapting a well-proven approach in behavioural segmentation in the RFM Model, and hopefully gives fresh impetus to it in the learning analytics context for modelling learning engagement behaviour. Extending this approach, the use of clustering was also examined to discover distinct learner segments from multidimensional data.

Secondly, good conceptualization and operationalization are expected to facilitate productive utility of the IFD approach and its metrics. Very importantly, future research can be conducted to examine and test the reliability, validity and utility of the proposed online engagement metrics. In line with this, future research can be conducted to include the antecedents and consequents of online engagement. (That is, to study if certain learner, course or other attributes are associated with certain patterns of online engagement.) In addition, it will be useful to know if particular patterns of online engagement, together with other potential determinants of academic performance, are associated with particular levels of academic performance, for example. Such findings can aid teaching and learning and optimize their outcomes.

Thirdly, there are several limitations arising from the present study. The limitations affect both the results and the directions for future research. Some of the limitations are embedded within an overarching limitation of how learner engagement is defined in an online learning environment. Whilst the primary focus of our study is on discovering useful and meaningful data features of online engagement from traces of learners’ online behavioural data. It is important to note that these measures are proxy indicators of online engagement, and may also not represent an all-encompassing measure of the extent of online engagement. The study is further limited by the Canvas LMS data source that is accessible and available to us at present. Addressing the limitations highlighted above can provide some future directions for the research. More generally, future research can derive new online engagement metrics to aid educational research and practice in the technology-enhanced learning environment.

Finally, it is the spirit of producing open, replicable and extensible research, which drives this study. It is hoped that this study can provide the groundwork, from a practical perspective, to better model online engagement behaviour.