1 Introduction

Along with the digitalization of learning environments, a significant amount of data has been collected long-term, including educational programs, courses, and learner details. Learning Analytics (LA) emerged to explore and provide insight from the data in an educational context due to the various levels of granularity of the collected data. LA aims to monitor learners' progress, predict their performance, dropout/retention rates, provide feedback to the learners, provide advice, and facilitate the self-regulation of online learners (Chatti et al., 2012; Papamitsiou & Economides, 2014). However, analytics alone are not enough to improve learning processes (Wong et al., 2019). For example, the analytics may show that a student has limited interactions across the entire system and is, therefore, likely to achieve a poor level of performance. However, human intervention is also required to improve the learner's system interactions or level of academic performance. LA implementation is a prerequisite to designing these interventions (Chatti et al., 2012; Clow, 2013; Omedes, 2018).

When starting LA, it is also necessary to draw the boundaries ("what purpose," "for whom," "what data," and "how to analyze") and to reveal objectives due to the broad scope of LA (Chatti et al., 2012). In this context, it is remarkable that by focusing on academic success, most researchers predict performance with LMS data (Conijn et al., 2017; Iglesias-Pradas et al., 2015; Mwalumbwe & Mtebe, 2017; Saqr et al., 2017; Strang, 2016; Zacharis, 2015), compare various techniques to increase the predictive power (Cui et al., 2020; Hung et al., 2019; Miranda & Vegliante, 2019; You, 2016), and predict using individual characteristics and LMS data (Ramirez-Arellano et al., 2019; Strang, 2017). However, there is still no consensus on designing interventions to increase learning outcomes.

LA studies showed that researchers should not use "learning analytics as a one-size-fits-all approach" (Gašević et al., 2016; Ifenthaler & Yau, 2020). Results of analytics (e.g., predictive analytic) may show extraordinary results according to different contexts. Ifenthaler and Yau (2020) found that the positive effect of LA on learning outcomes is in small-scale studies. Therefore, the small-scale data collected, the analysis made, and the results obtained are limited only to their context. For example, some LA research indicates the difficulty of generating generalizable models due to the differences in course structures in the online learning environment (Hung et al., 2019; Olive et al., 2019).

LA studies showed that "researchers should not use learning analytics as a one-size-fits-all approach" (Ifenthaler & Yau, 2020). Results of analytics (e.g., predictive analytic) may show extraordinary results according to different contexts. Ifenthaler and Yau (2020) found that the positive effect of LA on learning outcomes is in small-scale studies. Therefore, the small-scale data collected, the analysis made, and the results obtained are limited only to their context. For example, some LA research indicates the difficulty of generating generalizable models due to the differences in course structures in the online learning environment (Hung et al., 2019; Olive et al., 2019).

Another example is the way learning outcomes are handled. Learning outcomes are a new paradigm referencing learner achievement for various levels of education (Macayan, 2017). This paradigm refers to the knowledge, skills, and values students acquire at graduation or the end of a course (Premalatha, 2019). In other words, learning outcomes represent a more comprehensive experience than learners' assessment grades. However, although this paradigm reflects the ideal situation, adaptation towards different countries has not fully achieved this goal. For example, in distance education programs in Turkey, the effect of process assessment, such as performance, project, homework, thesis, and portfolio, and unsupervised exam and assessment activities on overall success cannot be more than 40%. Until Sep 2020, this ratio was also 20% (Council of Higher Education, 2020). Therefore, this proportional power (60%-80%) given to the supervised exam (e.g., final exam) in the assessment process significantly affects the teacher's behavior while structuring the lesson and the student's behavior while continuing the learning activity. In this study, the learning outcome is considered only final performance due to the Turkish context.

Namoun and Alshanqiti (2021) systematically examined the studies that predict learning outcomes. Researchers found that the dominant factors in most of the studies were learning and activity behavior (e.g., time and number of online sessions), assessment data (e.g., assignment, exam grade), emotions (e.g., motivation), and previous academic performance (e.g., prior knowledge). Another study (Yau & Ifenthaler, 2020) determined LA indicators of study success collected in three groups as student profile data (e.g., prior knowledge, motivation), learning profile data (e.g., LMS engagement data, assessment grade), and curriculum data (e.g., course characteristics, course structure). Therefore, both studies showed that prior knowledge, motivation, LMS engagement data, assessment data have a predictive effect on learning outcomes.

Predictive analytics enable the design of meaningful, evidence-based interventions to collect and store data for the various factors mentioned above and combine them, thereby improving learning outcomes/study success (Ifenthaler & Yau, 2020). Therefore, expanding the data profile collected and analyzed (e.g., self-regulation, e-readiness) can yield more evidence-based results to design these interventions. This study tested multiple regression and classification models by considering self-regulation and online readiness variables and the data profiles (e.g., online activities, prior knowledge, motivation) to predict the final performance.

In an ICT course and Turkey context, this study aims to determine indicators that affect the final exam performance of students within an online learning environment by using predictive learning analytics. Research questions are as follows.

  1. 1.

    Does prior knowledge, e-readiness, self-regulation skills predict final grade? If so, to what extent?

  2. 2.

    Do LMS engagement analytics have a positive relationship with final grades and with other predictors?

  3. 3.

    According to the classification model generated using learners' Moodle engagement and other predictors, what variables come to the fore in learners' final performance?

2 Related Literature

2.1 Prediction Analytics Using Data in Learning Management System

The log data produced in Learning Management System (LMS) constitute the primary reference source of LA research based on data. Naturally, there have been many studies investigating the prediction of academic success with LMS data. In some studies (Mwalumbwe & Mtebe, 2017; Saqr et al., 2017; Zacharis, 2015), the classification power of LMS data on academic achievement is worth considering, while in some studies (Conijn et al., 2017; Iglesias-Pradas et al., 2015; Strang, 2016) LMS data partially contributed. For example, Saqr et al. (2017) found that engagement parameters showed significant positive correlations with student performance, especially those reflecting motivation and self-regulation. The researchers were able to classify performance with 63.5% accuracy and identify 53.9% of at-risk students. Another study (Mwalumbwe & Mtebe, 2017) found that peer interaction (beta value = 19.6%) and forum posts (beta value = 77.1%) significantly affected students' performance in Applied Biology. But, in Service and Installation IIT, forum posts (beta value = 48.5%) and exercises (beta value = 51.5%) impacted students' performance.

In studies where LMS data partially contributed, Conijn et al. (2017) revealed that the accuracy of the prediction models differed mainly between the courses, with between 8 and 37% explaining variance in the final grade. For early intervention or in-between assessment grades, the LMS data proved to be of little value. Another study (Strang, 2016) compared student test grades with engagement LA indicators to measure hypothesized relationships' strength and predictive nature. The researcher indicated very little correlation between student online practices and their academic outcomes. Iglesias-The study of Iglesias-Pradas et al. (2015) founded no relation between online activity indicators and either teamwork or commitment acquisition. Therefore, LMS data alone may sometimes not provide sufficient information in terms of the final performance.

2.2 Using Different Techniques to Increase Predictive Performance

LA can use a data analysis combination determined by the purpose. This combination may include simple statistical methods to very complex techniques such as deep learning (Avella et al., 2016; Leitner et al., 2017). While advanced data analysis methods are used widely in computer science, analytical techniques such as statistics, data visualization, clustering, regression, and decision trees are used primarily to support decision-making with regards to learning (Du et al., 2019). Therefore, to increase the classification accuracy/precision or predictive power of LMS data on academic achievement, many studies have been conducted in which different data conversion or classification techniques (Cui et al., 2020; Helal et al., 2018; Hung et al., 2019; Miranda & Vegliante, 2019).

These studies in the literature (Cui et al., 2020; Helal et al., 2018; Hung et al., 2019; Miranda & Vegliante, 2019) showed that some classification techniques produced better results in some situations (e.g., data conversion). For example, Cui et al. (2020) compared various machine learning classifiers (e.g., logistic regression, Naïve Bayes, neural network, ensemble model, gradient boosting machine) for the three undergraduate courses. The researchers found the mean grade of quizzes/assignments as one of the essential features for all three courses rather than time-related and frequency-related LMS data. Another study (Hung et al., 2019) investigated how the absolute frequency variables or the relative-transformed variables affect the forecast results in estimation analysis. Classification algorithms (Neural Network, Random Forest) produced better estimation results when using relative-transformed variables (e.g., scale the frequency variables from 0 to 10). However, each classification technique has different estimation results by data collection types (self-report or event-based) even under the same conditions (Moreno-Marcos et al., 2020). In this context, it is challenging to create a precise roadmap about which analysis technique to use in LA under different conditions. Therefore, in this study, the researchers used a practical way of selecting the one that produces more precise results among multiple classification techniques.

2.3 Prediction Using Individual Characteristics as Well as LMS Data

LA research focused on increasing the number of variables by combining LMS data with data collected from different sources (e.g., self-report data) to strongly predict academic success (Ramirez-Arellano et al., 2019; Strang, 2017). Ramirez-Arellano et al. (2019) investigated the relations between students' motivation, cognitive–metacognitive strategies, behavior, and learning performance in the context of blended courses in higher education. This experimental study was carried out with 137 Mexican students. Only six (e.g., missing learning activities, self-efficacy, metacognitive self-regulation) of the 19 variables discussed explain approximately 67% of each student's overall grade variance. The model correctly classifies 96% of the risk of failing using six variables. Strang (2017) used a mixed-method approach that examined several important student attributes and online activities, which seemed to predict higher grades best. Strang (2017) collected qualitative data and analyzed it with text analytics to uncover patterns and tested Moodle engagement analytics indicators as predictors in the model. The findings revealed a significant General Linear Model with four online interaction predictors that captured 77.5% of grade variance within an undergraduate business course. In this context, the use of individual characteristics with LMS data can yield better predictive results.

Due to the difficulties of physically providing practical support to learners within online learning environments, learners need to self-regulate. Learners' self-regulation concerns individual factors that can change in many ways (Wong et al., 2019). In the online or blended environment, many variables (e.g., adopting learning strategies, prior knowledge, self-regulated strategies, motivation) that may affect academic success addressed (Azevedo et al., 2010; De Barba et al., 2016; Pardo et al., 2016; Sun et al., 2018). Pardo et al. (2016) provided robust evidence of the advantages of combining self-reported and observed data sources to gain more precise insight into learning experiences leading to more effective overall improvement. The current study address self-regulation skills and e-readiness are as examples of these individual characteristics. These variables were measured through self-reported data and added as indicators in the predictive and classification models.

2.4 Self-Regulation and e-Readiness

Self-regulated learning in educational research offers a means to understand why students from various perspectives are more successful than others. For example, academic performance can decrease when learners do not apply self-regulated learning strategies (Pardo et al., 2016). Self-regulated learning (SRL) strategies are considered from a broad perspective as forethought, performance, and reflection (Lu & Yu, 2019), and more complexly as goal setting, strategic planning, self-evaluation, task strategies, elaboration, and help-seeking (Kizilcec et al., 2017; Papamitsiou & Economides, 2019). Kizilcec et al. (2017) examined the relations between SRL strategies, learner behavior, and goal achievement. The results of their research showed that learners with high goal-setting and strategic planning scores are more likely to achieve their personal course goals. However, other SRL strategies, except for help-seeking, have been associated with the frequency of re-interacting with course material. Papamitsiou and Economides (2019) showed that goal‐setting and time management have strong positive effects on autonomous control. Effort regulation moderately positively affects learner autonomy, while help-seeking can have a strong negative impact (Papamitsiou & Economides, 2019).

Online learning readiness appears to be an essential variable for instructional design processes, with a high potential to influence learners' academic performance. For example, Joosten and Cusatis (2020) stated that student success might increase by evaluating learners' preparedness and readiness. For example, a student who has the skills (e.g., technical skills) to learn online and has motivation and expectations for online learning can succeed. The literature has stated that readiness is positively related to academic success and satisfaction (Horzum et al., 2015; Yilmaz, 2017). For instance, in a study with 236 undergraduate students in the context of flipped learning, Yilmaz (2017) found that e-learning readiness positively affected student satisfaction (β = 0.61; R2 = 0.43). Joosten and Cusatis (2020) found that online learning efficacy (the belief that online learning can be as effective as traditional classroom learning) was significant in predicting the academic performance or course grades of learners (β = 0.38, p < 0.0001).

3 Methods

The current study followed learning analytics to answer the research questions (see Fig. 1).

Fig. 1
figure 1

Application of learning analytics in this study

3.1 Study Group

The study was conducted in an online computer literacy course (14 weeks in one semester) delivered to freshmen students of all faculty and schools of a large state university. The study group consists of 3765 registered users from 17 different faculty and 75 different departments. A total of 1209 students participated in the research. Of the participant students, 382 (31.6%) are female and 827 (68.4%) are male; they are aged between 19 and 75 (mean = 20.9; median = 19), and a total of 1106 (91.5%) are unemployed and 103 (3.5%) are employed.

3.2 Teaching–Learning Process

Information and Communication Technologies 101 (ICT 101) is a beginner-level course as a fully asynchronous online activity in Moodle. The course content was organized linearly, with students advised to study the relevant topic based on their pre-assessment scores and forced to be 70% successful or more before starting the next topic. In other words, students were expected to perform a level of at least 70% in their post-assessment tests to be labeled as "successful" and thereby ready to start the next topic.

Based on these facts, learning goals were defined for each topic which varied in number and difficulty. Interactions to be investigated in an LMS were limited by reading course handouts, watching instructional videos, solving interactive questions, and completing achievement tests. The students discussed (at 8th and 12th weeks) the topics before the midterm and the final exam (see Fig. 2).

Fig. 2
figure 2

Teaching–learning process

3.3 Data Collection

Data were collected from five tools: Self-Regulation Survey, e-Readiness Scale, Moodle Analytics, Assessment Tests (Table 1). The scales developed in the native language of the participants were used.

Table 1 Data collection process

3.3.1 Self-Regulation Survey

For revealing students' self-regulated learning skills (SRS), the questionnaire named "Self-regulated Learning Skills for Self-managed Courses," as developed by Kocdar et al. (2018) was used to collect data. The questionnaire contains 30 items as 5-point, Likert-type questions. Kocdar et al. (2018) found the total variance of the scale to be 58.204% and the Cronbach's alpha coefficient for the reliability of the scale to be 0.918. As a result of the tests made with our study's data, the scale's total variance was 67.73% (Kaiser–Meyer–Olkin (KMO) = 0.968, p = 0.000), and the Cronbach's alpha coefficient of the scale was 0.96.

3.3.2 e-Readiness Scale

"e-Readiness Scale" was originally developed by Gülbahar (2012). The KMO value of the scale was found to be 0.941, and the value of the Bartlett test was found to be significant (p < 0.001) by Gülbahar (2012). The researcher found the reliability of the scale to be 0.94 (Cronbach's alpha). For the current study, questions from two factors of the e-Readiness Scale, "Technical Skills (TS: α = 0.79)" and "Motivation and Attitude (MaA: α = 0.79)," which both consist of six questions, were taken into consideration. Thus, the scale version employed in the current study is composed of 5-point, Likert-type questions. In our study, the total variance of the Technical Skills scale (α = 0.94) was 71.66% (KMO = 0.993, p = 0.000) and the total variance of the “Motivation and Attitude (α = 0.88)” was 74.12% (KMO = 0.812, p = 0.000).

3.3.3 Moodle Analytics

Moodle Engagement Analytics refers to the system interactions of learners in an online course. Eleven variables were determined regarding the system interaction (actions for all components: creating, viewing, submitting). These variables were chosen according to the activities (Table 2) included in Teaching–Learning Process (Sect. 3.2).

Table 2 The variables according to the activities

3.3.4 Assessment Tests

In this study, assessment tools were used to increase learning performance during the application process. These tools were organized as; (1) multiple-choice pre-test measuring prior knowledge and (2) multiple-choice final exam held face-to-face with paper-and-pencil. Since the final exam has high proportional power (70%) for learners' assessment, the final grade was determined as a dependent variable. Other variables were used as independent variables to predict the learners' final performance.

3.4 Data Analysis

Data analysis includes regression, correlation, and classification analysis. Regression analysis and correlation were used to reveal linear relationships between indicators and final grade using SPSS 20. Classification analysis was used to disclose nonlinear relationships using Orange 3.24.1.

Before classification analysis, dimension reduction was performed by applying Principal Component Analysis (PCA) to compute principal components from system interaction variables. Accordingly, nine variables were collected in three dimensions; "exam (E)", "content (C)", and "discussion (D)" (KMO: 0.743). The total explained variance value was found to be 72.42%. Other variables (total time spent, total action) were evaluated separately from exam, content, and discussion because they were not loaded according to the dimensions (Table 3).

Table 3 Principal component analysis (rotated matrix)

Classification stages were shown in Fig. 3. In stage1 (Select Column), the target variable was selected as the final performance. The final performance was categorized as low performance: " < 70" or high performance: " >  = 70", as we expect students to perform at a level of 70 points for each subject (Fig. 2). Furthermore, the features were selected interaction data (Exam, Content Discussion, Total Action, and Total Time Spent), Assessment Grade (Average PreTest), individual characteristics (Technical Skills, Motivation and Attitude, and Self-Regulation). In stage2 (data discretization), continuous variables were categorized using equal frequency. Equal frequency divides the attribute into a certain number of intervals so that each interval contains approximately the same number of samples. In stage3 (Feature Selection-Rank), all features were selected because we wanted to see the predictive power of all the indicators in the classification model. In stage4 (e.g., Tree, Naïve Bayes, SVM, KNN, Neural Network, CN2), the researchers used a practical way of selecting the one that produces more precise results among multiple classification techniques. The classification model was evaluated by comparing the performance of various algorithms based on probability-based and rule-based. In stage5 (test & score), cross-validation (five-folds) sampling was used for the internal validity of the model. Cross-validation splits the data into a certain number of multiples (usually 5 or 10). The algorithm was tested by taking samples one layer at a time. Stage 6 (Tree Viewer, Confusion Matrix) was presented in the results section.

Fig. 3
figure 3

Classification analysis in Orange 3.24.1

4 Results

4.1 Effect of Prior Knowledge (AvePreTest), e-Readiness (TS and MaA), and Self-Regulation Skills (SRS) on Final Grade (FG)

The effect of predictors on final grade was investigated through the Forward Regression Method. The P–P plot, Durbin Watson, and Residual Statistics (Mahalanobis Distance, Cook's Distance, and Centered Leverage Value) were examined for the assumptions. In this context, it was observed that the standardized residual distributed normally (see Fig. 4).

Fig. 4
figure 4

Distribution of residual

For the multicollinearity assumption, it is considered sufficient for the coefficient values (r) to be below 0.800 and the VIF value to be 2.5 and below (Allison, 1999; Berry, Feldman, & Feldman, 1985). This study found correlation coefficients between independent variables (AvePreTest, TS, MaA, SRS) below 0.600, and VIF values were 1.092 (Table 4).

Table 4 Pearson coefficients between variables

Mahalanobis, Cook, and Centered Leverage values were examined for outlier control, and the outliers were deleted until all values reached the desired level. After this process, it was ensured that (1) Mahalanobis's maximum distance is equal or lower than 16.27 in the chi-square table, (2) Cook's maximum distance is lower than "4/(sample size − number of predictors-1)" calculation and (3) Centered Leverage Value is less than "(2*number of predictors + 2)/sample size" calculation (Hair et al., 2010) (Table 5).

Table 5 Distance values

The regression forward method showed that MaA (β = 0.033; t = 0.946; p = 0.343) and SRS (β =  − 001; t =  − 0.023; p = 0.982) were excluded from the model since these variables had no significant effect on the final score. Regression model using only AvePreTest as independent variables (F = 218,001; p = 0.000) and the regression model using AvePreTest and TS together (F = 113,465; p = 0.000) were found to be significant. According to the two models, AvePreTest alone explained 18.1% (Adj. R2 = 0.181) of the final grade and 18.7% with TS (Adj. R2 = 0.187) (Table 6).

Table 6 Model summary and excluded variables

Prior knowledge alone positively affected the final grade (β = 0.427; p = 0.000). In the model created together with prior knowledge and technical skills, prior knowledge has a positive effect of 0.403, and technical skills positively impact 0.082 on FG (βAvePreTest = 0.403; βTS = 0.82; p < 0.01). Accordingly, prior knowledge's significant and positive effect on the FG can be considered an ordinary situation. However, although technical skills have a significant impact, the minimal effect size concludes that there may not be a linear relationship between TS and FG (Table 7).

Table 7 Coefficients

4.2 Relationship of Moodle Engagement Analytics, Other Predictors and Final Grade

Correlation analysis found that there was a positive and low-level significant relationship between AvePreTest (rTAaction = 0.270; rTSpent = 0.283; rE = 228; rD = 156; p < 0.01) or FG (rTAction = 0.248; rTSprent = 0.194; rE = 218; rD = 0.98; p < 0.01) and all analytics excluding C (rC = 0.053 and rC = 0.021; p > 0.05). No significant relationship was found between TS and analytics other than C (rC =  − 0.080; p < 0.05). Relationship of MaA with Total Action, Time Spent and E is significant and positive (rTAction = 0.118; rTSpent = 0.104; rE = 0.93; p < 0.01). Significant relationship was found between SRS and analytics other than E (rTAction = 0.135; rTSpent = 0.153; rC = 0.136; rD=0.74; p < 0.05) (Table 8).

Table 8 Correlation of Moodle engagement analytics with other predictors and final grade

Table 8 showed small-level significant relationships observed between final grade, Moodle analytics, and other predictors. Moreover, although there is a significant relationship between some variables at 0.01 or 0.05 level, the relationship level is deficient. (rFG-D = 0.098; rTS-C =  − 0.080; rMaA-E = 0.093; p < 0.05).

4.3 Classification of Final Performance

The classification used to Moodle engagement analytics (total action, time spent, exam, content, discussion), learner characteristics (TS, MaA, and SRS), and prior knowledge (AvePreTest). The performance of classification was presented in Table 9.

Table 9 Performance of classification

Table 9 showed that the decision tree (CA = 0.644, Pre = 0.645) and Naive Bayes (CA = 0.633, Pre = 0.633) algorithm had the highest accuracy and precision rates. When the confusion matrix table was examined (Table 10), the decision tree correctly classified 67.8% of "learners with high performance (LwHP)" and 60% of "learners with low performance (LwLP)," Naive Bayes correctly classified 67.3% of LwHP and 58.2% of LwLP.

Table 10 Classification rates of tree and Naive Bayes by actual status

In Figs. 5, 6, and 7, rules created for decision trees according to cases of LwLP were presented. 57.5% of the students had low prior knowledge (pre-test average < 45.7) and who had low total system interactions (< 687.5), and who had not have a high level of technical skills (TSmax = 40; TS < 33.5) were LwLP. When the technical skills were greater than 33.5, the probability of LwLP increased to 70.5% (4th depth in Fig. 5). 62.3% of the students had low prior knowledge (pre-test average < 45.7) and who had not a high level of technical skills (TSmax = 40; TS < 33.5) and who had a low performance of total system interactions (< 486.5) were LwLP. When the total system interactions were greater than 486.5, the probability of LwLP decreased to 52% (5th depth in Fig. 5). Based on this finding, firstly, even if the students have very high technical skills, if they have a low level the prior knowledge and total system interactions, the probability of being LwLP is relatively high. Second, when the system interaction of the students with lower-level prior knowledge and technical skills increases slightly, their likelihood for LwHP may increase.

Fig. 5
figure 5

Low prior knowledge, total system ınteraction, and technical skills

Fig. 6
figure 6

"Low prior knowledge and total system ınteractions" and "high technical skills and high participation in the discussion"

Fig. 7
figure 7

"High total system ınteraction and low prior knowledge" and "motivation and attitude"

62.7% of students who had a low prior knowledge (Pretest < 45.7) and who had low total system interactions (< 687.5) and who had high technical skills (> 33.5) and who had discussion interactions (≥ − 0.28) were LwLP; whereas the proportion was higher (76.8%) for those who had lower discussion interactions (< − 0.28) (5th depth in Fig. 6). Accordingly, if the learners who have low-level prior knowledge and system interactions and high technical skills participate in the discussion, the probability of LwHP may increase.

59.6% of students who had dramatically low prior knowledge (pre-test average < 35.7) and who had high system interaction levels (> 687.5) were LwLP. In this group, the probability of being LwLP (71.4%) of very low or high motivated students was higher than those with low or very high motivation and attitude (48.3%) (5th depth in Fig. 7). In this context, the high level of system interactions of learners with low-level prior knowledge reduces their probability of being LwLP. In terms of motivation and attitude, it is difficult to state that "motivation and attitude" increase final performance linearly.

Rules created for decision trees according to cases of LwHP were presented in Figs. 8, 9, and 10. The students who had high prior knowledge (pre-test average > 45.7) and who had low or very high motivation and attitude, and who had high technical skills mainly were successful (81.9%). In this group, the probability of being LwHP was very high (71.7%), even if their interactions with the exam were low (exam: between − 0.547 and − 0.192) (Fig. 8). There were some inconsistencies in terms of both exam interactions, "motivation and attitude," as in the case of LwLP. For example, those with low and very high "motivation and attitude" (78.1%) were more likely to be successful than those with very low and high "motivation and attitude" (57.8%). Typically, those with high and very high "motivation and attitude" (78.1%) would be expected to be LwHP. In this context, motivation or exam interactions, categorically divided into four levels, may not linearly increase the probability of final performance.

Fig. 8
figure 8

High prior knowledge, "low or very high motivation," and high technical skill

Fig. 9
figure 9

High prior knowledge, "low or very high motivation," high total system ınteraction and technical skill

Fig. 10
figure 10

High prior knowledge and system ınteraction

If the pre-test was high, "motivation and attitude" were very low or high, and there was high total interaction, 57.8% of students were LwHP. However, 75.4% of the students in this group were LwHP when their technical skills were high. 60.6% of the students were successful when technical skills were low (Fig. 9). In this context, technical skills may have a positive effect (high prior knowledge and very low or high motivation and attitude) and a negative impact on final performance in some cases (low prior knowledge and system interaction).

51.9% of the students with low pre-test scores and high system interaction were LwHP. 40.4% of the students in this group were LwHP, where pre-test scores are deficient (< 35.7). Besides, 62% of the students in this group were LwHP when the prior knowledge increased from < 35.7 to 35.5–45.7. Therefore, the higher the level of prior knowledge, the higher the probability of being LwHP for students with a high total system interaction.

5 Discussions

Each of the students brings their knowledge and experience to the learning process. Throughout the learning process, they have different levels of prior knowledge, motivation, self-regulation, and other ways of interacting with the content, instructors, and peer learners. The current study revealed the following key findings based on quantitative measures to understanding learners.

The result of regression showed that prior knowledge has a significant positive impact on final performance. Yau and Ifenthaler's (2020) study, a systematic review of the previous literature, stated that while learning outcomes are predicted especially for educational institutions (e.g., universities), study history is still the most fundamental indicator. The study history includes data such as previous academic achievement, assessing learning progress with various assessment methods, or prior knowledge tests. For example, Shulruf et al. (2018) found that prior academic achievement had the most significant predictive value, with medium to substantial effect sizes (0.44–1.22) in five undergraduate medical schools in Australia and New Zealand for predicting the binary outcomes (completing or not completing course; passing or failing examination). As expected, prior knowledge is the variable that has the most substantial effect on the final performance in the current study.

Technical skills (e-readiness dimension) include using technologies such as computers, the Internet, social networks for information search or communication. Liu (2019) confirmed that, for the orientation of students taking online courses, their technical skills should be considered part of the instructional design, social competence, working strategy, and communication dimensions. Therefore, learners with high levels of technical skills can more easily adapt to the online environment. They are more likely to have benefitted from the opportunities offered by the system. Indirectly, final performances may be expected to be positively affected. The current study found that although technical skills significantly affect the final grade, the effect size is minimal. Therefore, this finding concludes that there may not be a linear relationship between technical skills and final grades.

In MOOCs, since the effects of motivation on participation (e.g., De Barba et al., 2016) or self-regulation on goal attainment (Kizilcec et al., 2017) and autonomous control (Papamitsiou & Economides, 2019) are known, it is essential to design interventions to increase motivation (e.g., Aguilar et al., 2021; Herodotou et al., 2020) or to support self-regulated learning (e.g., Aguilar et al., 2021; Jivet et al., 2020). In the current study, it is difficult to state that "motivation and attitude" (an e-readiness dimension) or self-regulation skills increase or decrease success linearly. In multiple regression analyses, "motivation and attitude" and self-regulation skills also had no role in predicting final performance. Moreover, in classification analysis, both variables were found unlikely to play a role in classifying learners with high performance (> = 70). These findings may be explained in course-related characteristics (e.g., position relative to other courses, course structure) based on the classification of learning analytics indicators paired with three data profiles (Yau & Ifenthaler, 2020).

The context of the ICT course is not a situation where the learner chooses a course that appeals to individual interest, supporting professional development, as in MOOC settings. The course (ICT101) is planned only for asynchronous delivery within the first year of undergraduate university programs. Learners had face-to-face learning experiences in other courses. Therefore, no matter how motivated the students may be, they may have faced particular challenges adapting to the lesson delivery style or medium or did not care much for the lesson/course. Accordingly, they may have experienced lower levels of motivation as the course weeks progressed.

When a sufficient variety of learning resources is given, and rich learning experiences are offered, students can prefer different media and learning paths. Some students may choose handouts in the current study, while others prefer videos or interactive activities in the ICT course. Therefore, contrary to the current study's findings, learners with higher self-regulation skills could be expected to be more successful in a course generally offered at this level of diversity. However, this research does not show that self-regulation skills are not crucial for the final performance. Viberg et al. (2020) draw attention to the insufficient number of studies (20%) showing evidence of improvement in learning outcomes for LA (including interventions to support SRL). However, researchers state that the evidence in these studies has been slight so far, and there is a generalization problem. Therefore, supporting self-regulation skills may not guarantee high learning outcomes in the current study context (e.g., ICT course structure, assessment structure). For example, linearly designed content divided into weeks in the course (ICT101) may contribute positively to learners with low self-regulation skills. For this reason, the effect of self-regulation skills may not have been reflected in the final performance.

Moreover, a rich learning experience in both synchronous and asynchronous activities should be provided for a practical learning experience in the design of online courses be replicating classroom biases pedagogies (e.g., only content transfer oriented). These learning experiences can occur in various forms, such as acquiring, researching, applying, producing, discussing, and collaborating (Laurillard et al., 2013). In addition, the interaction of learners with assessment activities is also a learning experience (Holmes et al., 2019). The activity that the student will do for each learning experience is also different. In this study, the learning experiences in the ICT course were designed with assimilation activities (watching interactive videos, reading notes) and assessment activities (pre-test and post-test for each topic). The learner is expected to demonstrate the required minimum 70% success for each subject by repeating the assimilation activities and using their experience in the post-test (can be repeated an unlimited number of times). This design does not offer flexibility in the learner's use of self-regulation skills. Everything is evident in this design. Students with high or low self-regulation skills are also forced into the same experience. However, the role of motivation or self-regulation skills can be felt more when the learner is offered more autonomy for a lesson designed with activities such as producing, discussing, and collaborating more.

Although a significant low-level relationship, the correlation result showed that it is worth exploring further the role of system interaction on the final performance. Ordinarily, it can be expected to be higher relationships between the interactions of content, exams, or discussion and final performance. However, there was a low-level positive significant relationship between students' final grades and the interactions of exams and discussions in online lessons. In the literature, both parallel (e.g., Schumacher & Ifenthaler, 2021; Strang, 2016) and opposing results (e.g., Saqr et al., 2017) can be found compared to the current study results. For example, Schumacher and Ifenthaler (2021) investigated whether trace data can inform learning performance. The researchers found that "only participants' number of views of the handout was a significant predictor of their learning performance in the transfer test and stated, "trace data did not, as expected, provide explanation for learning performance" (Schumacher & Ifenthaler, 2021, pp. 10–11). However, another study found that the Moodle engagement analytics showed significant positive correlations with student performance, especially for parameters that reflected motivation and self-regulation (Saqr et al., 2017). In parallel with the current study, this difference may be considered in terms of the impact of learning design (Er et al., 2019; Holmes et al., 2019) on both performance and learner behavior in the online learning environment.

While traditional instructional design focuses on content transfer, learning design (or the new interpretation of instructional design based on constructivist theories) focuses on activities in the learning process (Holmes et al., 2019; van Merriënboer & Kirschner, 2017). Learning design is broadly referred to as the design of sequences of learning activities (such as reading texts, analyzing data, practicing exercises, producing videos, participating in discussion forums, or collaborating in group projects) in line with the activity's aims, outcomes, teaching methods, assessment, learning approach, duration, and necessary resources (Holmes et al., 2019). Although the impact of a particular learning design on course success has not been observed, its effect on learner behavior has been proven (Holmes et al., 2019). Therefore, the fact that the course design applied in the current study is content-transfer-oriented and linear may have caused the system interactions of learners to be seen as similar.

Regression and correlation analysis to explain the final performance based on the research findings of RQ1 and RQ2 showed the importance of e-readiness (in terms of technical skills) and prior knowledge. It showed that "motivation and attitude" and self-regulation skills were not significant in the context of the ICT course. However, these analyses indicate a linear relationship between the variables that are not considered essential and the final performance. In this context, testing the classification models that predict the final performance over probability may emphasize the importance of some variables in terms of the final performance.

In the classification model generated by using learners' prior knowledge, Moodle LMS engagement analytics, and learner's characteristic variables, 67.8% of learners were correctly classified according to best probability. Strang (2017) showed that the effect of Moodle engagement analytics on assessment grades was high. For example, the General Linear Model with four online interaction predictors captured 77.5% of grade variance within an undergraduate business course. On the contrary, Conijn et al. (2017) revealed that the purposes of early intervention or when in-between assessment grades were taken into account, the LMS data proved to be of little value. In the current study, the results were obtained parallel to Conijn et al. (2017). For example, the effect of Moodle Engagement Analytics (partially excluding discussion interactions) was not observed to classify the performance.

The partial effect of discussion interaction was observed in low-level prior knowledge and system interactions and highly technical skills. For example, when the learners participated more in the discussion, the possibility of low performance (< 70) decreased (from 76.8 to 62.7%). On the other hand, when technical skills are combined with high prior knowledge, most learners have high performance (81.9%). In this context, students with high technical skills may not need much effort to be successful by comparing themselves with the class in the general discussions. If learners with low-level prior knowledge have a high level of total system interactions, their probability of high-performance increases from 40.4 to 60.2%. When these findings are evaluated together, the computer literate students are expected to score higher on their pre-tests and pass the course with less effort when compared to new beginners to the topic. So, prior knowledge, which is consistent with previous research (Duffy & Azevedo, 2015; Moos & Azevedo, 2008), can be stated as increasing the probability of success. Nevertheless, system interactions alone may not be sufficient for high performance.

6 Conclusions, Limitations, and Future Research

This study, in an ICT course and Turkey context, examined indicators that affect final performance within an online learning environment by investigating learners' characteristics (e.g., technical skills, "motivation and attitude," self-regulation), assessment scores (e.g., pre-test scores), and behaviors in using an LMS (Moodle engagement analytics). Consequently, final performance seems to be related to but not "motivation and attitude" and self-regulation skills. The effect of prior knowledge, system interactions, and technical skills on the final performance should be viewed as a typical situation. The contribution of this research is that it shows that leading reference indicators (e.g., self-regulation and motivation) in most learning analytics research may not impact final performance. However, it should not be understood from the results of this research that self-regulation and motivation are unimportant in online learning. It would be helpful to look at results in the context of the characteristics of the course and the differences in learning design.

Learning analytics for online learning is primarily based on motivation and self-regulated learning (Wong et al., 2019). Self-regulated learning that the student controls, monitors, and influences their thinking and learning process requires knowledge and skills (Kocdar et al., 2018). This approach is often used by learning analytics researchers in intervention design, provided that the responsibility for learning is not left entirely to the student. Nevertheless, recent research found little evidence for LA that improved learning outcomes (Schumacher & Ifenthaler, 2021; Viberg et al., 2020). For example, Viberg et al. (2020) found evidence that the implementation of LA to support students' SRL improves their learning outcomes in only 20% of the systematically reviewed papers (n = 11). The researchers suggested the LA research should focus more on measuring different SRL parts rather than on its support to improve learner performance—one of the LA's key goals—for online learning. Schumacher and Ifenthaler (2021) found that the prompts based on self-regulation (e.g., cognitive, motivational prompts) might have only limitedly impacted declarative knowledge and knowledge transfer. The researchers stated, "prompts might not have been efficient, as they were not related to students' characteristics or behavior, resulting in inappropriate support." (Schumacher & Ifenthaler, 2021, pp. 11). The current study supports recent researches (Schumacher & Ifenthaler, 2021; Viberg et al., 2020) and, unlike these studies, discuss interpreting learning analytics with learning design (Macfadyen et al., 2020; Mangaroska & Giannakos, 2019; Lockyer & Dawson, 2011). The problem points to the situation in which the indicator should be given importance rather than which indicator is necessary. Instructors/researchers, therefore, should focus more on learning design in terms of LA usage, intervention, or which indicators they should use.

This research was a small-scale, techno-centric, and exploratory study designed to deepen understanding of the online learning process in ICT course and Turkey. When considering insights from this study, actions can be planned to increase system interactions for learners with low levels of prior knowledge in the current course design (linear structure, only asynchronous, video-based content transfer, and pretest–posttest assessment). For example, these students can be provided with extra materials to fill their knowledge and skills gaps. Those with low levels of system interaction can be alerted weekly via email. Moreover, dynamic visualization tools, where learners can compare themselves with system interactions of high performance, can be integrated into the system. However, it would be helpful to repeat that it cannot be generalized in different course designs (e.g., discussion-intense course designs or productive activity intense course designs; Holmes et al., 2019). By researching different course designs that offer more flexible opportunities for learners, the effects of other dimensions of e-readiness (e.g., motivation) or self-regulation on final performance and final performance may be investigated again. Moreover, student opinions may determine variables that can affect performance, and the resulting variables may be added to regression and classification analysis.

This study has limitations in a different aspect other than generalizability. For example, in the analyses made for classifying the final performance, various methods (such as using indicators as categorical or continuous variables or the number of categories of indicators) have been tried to achieve the best estimation results. It is known in the literature that different techniques show different classification performances (Cui et al., 2020; Hung et al., 2019). Therefore, this study is limited to the data transformation steps and classification algorithms it uses. When different methods and techniques are applied, the findings related to variables (e.g., motivation and attitude) whose significance cannot be consistently shown in this study may differ.