1 Introduction

The authors of 11 relevant studies about online courses found some benefits but they also cited many problems when trying to assess student learning through combinations of learning analytics, learning management system (LMS) activity data logs, and graded performance results (Agudo-Peregrina et al. 2014; Fidalgo-Blanco et al. 2015; Gomez-Aguilar et al. 2015; Iglesias-Pradas et al. 2015; Nieto-Acevedo et al., 2015; Reyes, 2015; Ruiparez-Valiente et al. 2015; Scheffel et al. 2014; Xing et al. 2015; Yahya et al. 2015). These studies will be examined in the literature review.

Two groups of researchers conducting studies simile to this one (Agudo-Peregrina et al., 2014; Gomez-Aguilar et al., 2015) found significant correlation relationships between student online activity reported in learning analytics and academic performance. Another researcher with a study similar to this one (Iglesias-Pradas et al., 2015) determined that there was no relationship between learning analytics, LMS activity data and student learning outcomes.

Therefore, the focus of the current study is to measure the link between student learning and online activity with a large sample from a business course. Gunn (2014 ), Xing et al. (2015), as well as Chatti et al. (2012) asserted that we need better research design practices to study learning analytics from a scholarly perspective, so this gap in the literature also needs to be addressed. Additionally, the purpose of the current study is to build on the existing literature by showing how to triangulate data from an online course to test the relationship between student activity and their graded outcomes. Quantitative and qualitative data are collected to accomplish this.

2 Literature review

Given the scientific and predictive nature of learning analytics discussed by Shum (2012) one of the three strategic levels of analysis ought to be aligned with research design principles when practitioners undertake scholarly academic studies. Relevant quality indicators developed by Scheffel et al. (2014) should be incorporated.

Researchers should design learning analytics studies by first identifying their overall ideology (positivist to pragmatic, interpretive, or constructivist), and then describe the strategy - which is developed from the level of analysis, unit of analysis, within or between group focus, as well as bearing in mind the generalization goal of the research questions and hypotheses (Strang, 2015). The unit of analysis and hypotheses should include one or more of the quality indicator variables from the taxonomy developed by Scheffel et al. (2014). Given that Shum (2012) and others (e. g., Xing et al., 2015) pointed out that learning analytics generally enlists a predictive mandate, within group correlation or between group mean comparison techniques will most likely be needed in positivist and pragmatic oriented research designs (Strang, 2015). Other variations are possible; for example, the creator of Moodle from Curtin University in Perth WA Australia uses the constructivist ideology (Dougiamas & Taylor, 2003). Researchers need to be clear in their design so as to collaborate with their colleagues, to share their studies and to facilitate the understanding of their findings, in order to enlarge this field in the higher education community of practice.

An exemplary alternative but compatible viewpoint for conducting learning analytics research was described by Chatti et al. (2012). They developed a learning analytics reference intended for researchers. The model is predicated on the researcher answering four questions to design their study: What (data and environments), who (stakeholders), why (objectives), and how (methods). In contrast to the above literature that predominately focused on correlating with of predicting student performance, they asserted that learning analytics usually employ “techniques to detect interesting patterns hidden in educational data sets” (Chatti et al., 2012, p. 10). The most valuable aspect of their study was the literature review followed by some applied examples of the research methods in learning analytics. They explained the four distinct techniques that have received the most attention in the scholarly learning analytics literature: Statistics, information visualization, data mining, and social network analysis (Chatti et al., 2012). In big data analytics these four techniques are integrated along with data warehousing due to the significant volume, high velocity, value importance and variable complexity of the information collected (Sun et al. 2014). Ethics and respect data privacy are issues that have arisen in big data analytics studies and more recently Beattie et al. (2014) reminded researchers that these principles also apply to learning analytics.

Common techniques applied in learning analytics studies such as regression have constraints associated with the method, particularly the assumptions of underlying distributions or a priori models that are often unmet and therefore would result in unreliable or invalid estimates of student performance (Xing et al., 2015). The use of parametric statistical techniques require rigorous designs that ensure the prerequisites of the data are satisfied including distribution, population-sample homogeneity, sample group size, data type, and other inferential thresholds including collinearity and variance tolerance (Strang, 2015). Learning analytics software generally involve nonparametric distribution-free nonlinear techniques utilized in big data analytics (Chatti et al., 2012, p. 10; Strang & Sun, 2015; Sun et al., 2014; Xing et al., 2015) which include cluster analysis, neural network analysis with Bayes probability theory, nonlinear math programming, correspondence analysis and genetic nonlinear programming (Nersesian & Strang, 2013; Strang, 2012; Vajjhala et al. 2015; Xing et al., 2015). The strategy for this study is to accept learning analytics as a ‘black box’ big data summarization tool by using its output for input into the unit of analysis during hypothesis testing. In other words, the learning analytics summary data output will become the input for testing if student online activity in Moodle is related to, or can predict, their performance towards the course learning objectives.

The empirical evidence of predicting online student learning performance from learning analytics data is weak and needs further investigation. In a study similar to the current one, Agudo-Peregrina et al. (2014) measured the relationship between student activity and performance (grade). Their sample drawn from several online courses (N = 138) and for three LMS-assisted face-to-face (F2F) courses (N = 218) at a university in Spain within a masters program focused on information technology and life long learning subjects. They used Pearson as well as Spearman correlation and backward regression to estimate the direction, strength and predictability of the hypothesized relationships. The learning activity consisted of factors including student-student, student-content (materials), student-teacher, and student-system (e.g., tests, assignments), as well as other factors that were beyond the scope of this review, which were extracted from Moodle log files rather than using engagement analytics as in the current study. Student activity followed a similar pattern between the online as compared to F2F course, except that the student-system interactions were less in the F2F modality because “all tests and assignments” were offline (Agudo-Peregrina et al., 2014, p. 546). While they did find statistically significant positive correlation between all of the online student activity indicators and grade, this was not observed for the F2F modality. In their regression on grade, only two factors in the online courses, student-student with a standardized B = 0.209 (T = 2.94, p = .004) and student–teacher having a standardized B = 0.508 (T = 7.14, p < = .001) were statistically significant but weak predictors of student learning performance. The high beta for student-teacher at more than twice the student-student interactions was problematic because it could indicate that customized tutoring was instead influencing the student’s grade. The researchers were cautious with their findings and recommended more replications.

In a follow up study of Spanish master level students, Iglesias-Pradas et al. (2015) collected Moodle activity log data and surveyed students (N = 39) to test the relationship of several new factors in an online teaching and learning with technology course. They used their own procedure to collect student-student, student-content, student-teacher and student-system interactions based on their previous work (Agudo-Peregrina et al., 2014). They added a new a priori validated tool to capture student self-reports of their perceived teamwork and commitment learning performance competency, both of which became the dependent variables in their analysis. Unfortunately, their findings were “counter intuitive, showing no relation between interactions in the LMS and the level of competency acquisition” (Iglesias-Pradas et al., 2015, p. 88). They reaffirmed their findings were similar to other studies in that learning analytics data concerning student activity in online courses had very little correlation relationship to learning performance and generally no predictive capability. They did find marginal relationships between the dependent variables team work and commitment. An insight they mentioned was that the levels of student activity seemed to interact. Thus, a multivariate analysis or structural equation model may have been able to illuminate more of these hidden relationships.

Zacharis (2015) used regression to test the predictive capability of 29 learning analytics factors on learning performance with a sample of 134 engineering students in a blended (partly online) computer programming course. His study is comparable to the current one since all of the relevant student activity and grading took place online in Moodle. On the other hand, there were numerous forums, activities, and quizzes in his course whereas the current study used a streamlined approach with less student interaction points. Nonetheless, he found 14 of the 29 factors had significant positive correlation relationships with grade. He used stepwise regression, which produced a statistically model capturing 51 % of adjusted variance on grade (R = 0.721, r 2 = 0.520, adjusted r 2 = 0.505, SE = 1.2036, p < 0.01). This model included four predictors: RePo messages (reading and posting messages in forum), CCC (content creation contribution), Quiz efforts (interacting with quizzes), and Files viewed (lesson materials viewed online). He performed a binary logistic analysis to measure if these four factors could correctly discriminate at risk students in this course. His logistic regression model “correctly classier 30 students who failed [68.8 %] and also correctly classified 79 students who did not fail [86.8 %]” (Zacharis, 2015, p. 50). Nevertheless, his sample was drawn from computer science students participating in a Java programming course so it may not be generalizable to other populations, and classifying only 69 % of students at risk of failing could be considered too low of an academic benchmark.

In a study indirectly related to the current one, Xing et al. (2015) developed a genetic programming (GP) model to predict student performance from learning analytics data. Xing and his colleagues (2015) applied the GP model on a sample of 122 students in an algebra category course within a proprietary LMS. They found that the GP model was “interpretable and has an optimized prediction rate”, and it was “useful for teachers to identify reasons that students are struggling” (Xing et al., 2015, p. 180). GP is based on nonlinear decision tree analysis with the ability to loop back through of-then-else logic (Xing et al., 2015), but GP has certain deterministic characteristics such as the parameters for the logic so it is limited (Strang, 2012), although it is more accurate in identifying optimistic relationships once customized for a sample. By comparison to GP statistical algorithms use predictive inferential logic without requiring any parameters except for a p-value (Strang, 2015). GP is difficult to use by faculty in business schools without programming skills.

2.1 Research design and hypotheses

Guided by the literature review, the following hypotheses were developed:

  • H1: Sample will be representative of population and historical learning outcomes (Grade);

  • H2: Student demographic factors – Age, Gender and Culture – will not be cross-related so as to confound the dependent variable (Grade) or skew the sample limiting generalizations;

  • H3: Student self-efficacy of the course technology context will not negatively impact learning outcomes (Grade);

  • H4: Course logins from Moodle engagement analytics (EngageC) have a positive causal relationship with student learning (Grade);

  • H5: Forum postings identified by Moodle engagement analytics (EngageF) have a positive predictive relationship with student performance (Grade).

  • H6: Assignment activity identified by Moodle engagement analytics (EngageA) have a positive predictive relationship with student performance (Grade);

  • H7: Lesson reading activity identified by Moodle system logs (LessonR) have a positive predictive relationship with student performance (Grade);

  • H8: Lesson quiz activity identified by Moodle system logs (LessonQ) have a positive predictive relationship with student performance (Grade);

  • H9: Lesson quiz scores identified by Moodle system logs (LessonS) have a positive predictive relationship with student performance (Grade);

  • H10: Student reflections on course learning objectives will be related to outcomes (Grade).

We applied a theory-dependent positivist ideology consisting of a deductive literature review (above) to inform the research question, to develop the hypotheses, and to select the methods (Strang, 2015). Since this study was a mixed-methods design to collect student activity and performance data, quantitative techniques were selected to answer the research question and to test the hypotheses then qualitative data analysis techniques were used to triangulate the evidence (Strang, 2015).

3 Methods

Descriptive statistics, correlation, regression techniques, ANOVA and cluster analysis techniques were applied at the 95 % confidence level. Minitab version 14 was used for the statistical tests, Nvivo was used for the text analysis, while Moodle version 2.8 with the engagement analytics plugin were installed during this study.

3.1 Participants

The population for the sample frame was the State University of New York (SUNY) which is a system of regionally accredited public higher education institutions serving nearly 467,000 students and at the time of writing having more than 3 million alumni around the world (www.suny.edu). The College at Plattsburgh is regionally accredited by Middle States Commission on Higher Education (MSCHE). The ongoing stable enrollment at this college is approximately 6200 students, with roughly half of those in the School of Business and Economics (SBE), of which 350 were in the undergraduate BSBA program at the time of writing. Average class size is 21, student-faculty ratio is 16:1, and 90 % of faculty holds the highest degree (e.g., PhD or doctorate) in their discipline (www.plattsburgh.edu/admissions/quickfacts.php).

A sample was taken from natural intact convenience groups (existing online class sections). The enrollment at this university was 6350 matriculated students, with 1050 of those in the School of Business and Economics, of which approximately 350 were in the undergraduate Bachelor of Science in Business Administration (BSBA) program at the time of writing.

The final size of this sample was 228 students who were drawn from several sections of the same courses taught by two professors (one was this author). All participants were undergraduate students in an upper division Professionalism Seminar and Human Resource Management (HRM) course. This course had been taught by the researcher for three years in this context using Moodle, and this professor had taught a similar version of this course at other universities using Blackboard, Angel, Moodle and a proprietary LMS. A pilot had been successfully completed in a previous term using an identical course syllabus and with the same configuration in Moodle.

3.2 Instrumentation

The demographic variables for the students’ age, gender, and culture were exported from our course registration system, re-formatted slightly, saved in Excel spreadsheet format, and then imported into Minitab for analysis.

The materials for the course included lectures, videos, examples of resumes and cover letter, and URLs for job searches, as well as the rubrics for all assignments. Each lesson ended with a quiz. These were uploaded as mandatory lessons in Moodle. The student views were tracked as counts in Moodle activity logs and exported in variables prefixed by the name ‘Lessons’. This set of variables were intended to capture how much the students accessed the material for learning or clarification.

There were five formative assignments used in this study as enumerated below (with course weighting):

  1. 1.

    Job search must match skill competencies - must show ability to plan career advancement (10 points);

  2. 2.

    Resume development – must be customized for one of five available job descriptions (20 points);

  3. 3.

    Cover letter development – must be customized for one of five available job descriptions (20 points);

  4. 4.

    Interview – answers must clearly satisfy 5 mandatory criteria and 2 desirable criteria (40 points);

  5. 5.

    Reflection – student self-evaluates their performance against the course learning objectives (10 points).

All assignments took place in a Moodle Workshop forum where students could not initially see peer submissions, but later they were allowed to view and comment on one another’s posts. All assignments were graded using rubrics. A rubric applied to an example assessment for the most heavily-weighted assignment 4 (interview) is shown below in the appendix. The rubrics were designed to assess the student submission against the following course learning objectives:

  1. 1.

    Understand resume building and job search techniques;

  2. 2.

    Understand communication skills and career success;

  3. 3.

    Understand the writing process including grammar and structure (this is the AWR course for SBE);

    1. 3.1.

      Students will demonstrate the ability to synthesize ideas in writing;

    2. 3.2.

      Students will be able to articulate clearly in writing concepts relevant to a particular discipline;

    3. 3.3.

      Students will be able to use writing to communicate ideas to someone outside their particular discipline;

    4. 3.4.

      Students will demonstrate their writing mastery of the basic rules of English or the language of instruction;

  4. 4.

    Have a base knowledge of key ingredients of professionalism in the workplace;

  5. 5.

    Be able to construct the framework for a career assessment plan;

  6. 6.

    Understand professionalism, diversity and ethics;

  7. 7.

    Understand effective interview techniques and follow-up.

Moodle learning engagement was configured to capture the student interactions for the above five assignments. The Moodle learning engagement attendance threshold was based on students logging into the course site twice per week over the 15 weeks (the sixteenth week was allocated to final exams) – thus, a student who logged in 30 times on different days would receive 100 %. None of the Moodle learning engagement fields were used as part of assessing the student’s learning performance.

The engagement analytics block was run when assignment grading was completed and when the course was complete. The Moodle engagement analytics final totals screen was captured and pasted into Excel, refined for alignment and column names, saved and then imported into Minitab. These were defined as ratio data types (percentages). The variables from Moodle engagement analytics were named: EngageF (activity related to the lesson forums), EngageA (activity within assignment workshops), and EngageC (course logins).

Moodle system logs were accessed to examine student interactions within the weekly lesson modules. All course reading material was stored inside the lessons. Although lessons were not mandatory, the progress checkbox was used on the main course site to show the student if they had completed the lesson. A screen shot of a typical course page is shown in Fig. 1.

Fig. 1
figure 1

Moodle course lessons screen shot

Most lesson pages were laptop screen size, requiring the student to click on the “NEXT” button to advance or the “PREVIOUS” button to reverse. In this way, views were recorded. The views of lecture-based screens were captured in a variable named ‘Lesson reading activity’ (LessonR). Every weekly lesson ended with a quiz consisting of a few multiple choice, matching, or keyword answers, usually all three type. Quizzes were marked but they did not contribute to course grade and students were clearly advised of this. Views of lesson quizzes were recorded in a system log variable ‘Lesson quiz activity’ (LessonQ). Finally, the scores for each lesson were stored and downloaded from the system logs, averaged and saved into a variable named ‘Lesson quiz score’ (LessonS). All variables were intervals except the mean quiz score was a ratio data type.

The reflections variable was a qualitative data type collected as a short essay at the end of the course in the designated Moodle Workshop module. It was marked and included in the course grade, as noted earlier. However, additional text analysis was performed to identify keywords to determine if there was any hidden relationship between what the students through about the course learning objectives compared to the other factors and variables in the research design.

A pre-test and post-test was given to allow students to become familiar with the online course setup in the Moodle Learning Management System. These short 20-min preliminary tests were mandatory and forced the student to go read the syllabus, and try out all the Moodle objects used for the course, but without actually completing any of the assigned work. None of the actual course learning content was provided in these preliminary exercises. The results were recorded in the variables ItPreTest and ItPostTest as ratio data types.

The dependent variable ‘Grade’ was the total of the grades for all five assignments, graded as a ratio data type out of 100 % and shown as a percent out of 100. The field was exported from Moodle gradebook and imported into Minitab. No outliers were deleted meaning that students not participating received a zero and were retained.

4 Results and discussion

4.1 Phase 1: Preliminary descriptive statistics analysis

The mean (M), standard deviation (SD), and other descriptive statistics of the demographic variables (age, gender, and culture) for the sample (n = 228) are shown in Table 1. The average age of the students was 22 which is young as expected for an undergraduate program in USA that attracts mostly domestic young adults (M = 21.95, SD = 5.083, median = 21). Most of the students in the sample were male, at 59 %, and 79 % were domestic USA culture. The 21 % of foreign cultures were East Asian (Chinese) and Latin American. A z test of course grade against the 90.3 % historical average from past online and classroom-based courses indicated this sample was similar with respect to learning outcome, based on a Z(228) = 1.99, p = .047 (no significant difference). This result supported the first hypothesis H1: Sample will be representative of population and historical learning outcomes (Grade).

Table 1 Descriptive statistics of sample (n = 228)

Two preconditions of preliminary analysis interest were the ItPreTest and ItPostTest scores that measured the student’s understanding of the syllabus, the Moodle Learning Management System (LMS) as well as the lesson structure including quizzes and assignments. A prerequisite of this course would have required students to be familiar with the style of syllabus and with the Moodle LMS. The questions on the pre test and post test were identical in nature but different in content. The items included questions such as how many graded items were in the course, where are graded items located, how do you know when a lesson has been completed, what is the difference between a lesson quiz and an assignment, and so on. The purpose of the pre and post tests was to gauge the prior information technology (IT) knowledge of the student regarding the learning environment. Additionally, this was a mandatory tool that encouraged students to become more familiar with the LMS IT and online content; so that they could focus on learning through ought the course.

An independent two-sample t test confirmed that the post-test score was significantly higher than the pre-test result, based on a T(454) = −4.26, p = 1.0, (ItPreTest M = 79.3, SD = 38.9; ItPostTest M = 90.52, SD = 7.43, n = 228). The 95 % confidence intervals of the ItPostTest were (−16.3, −6.02). This may be interpreted as 95 % of students in the business school, if sampled, would increase their understanding of the LMS by obtaining a mark between 6.02 and 16.3points higher from the pre-test to the post-test. This preliminary result indicated that students had increased their IT knowledge of the LMS and course site. This result also established that the sample of students knew the IT environment of the course reasonably well, considering the post test mean was 91 % with a much lower standard deviation of 7.43 as compared to the pre-test SD = 38.9. This preliminary result should establish that the students had self-efficacy with the IT of the course and therefore they would not be hindered by IT. In a sense, it brought the students closer to a common norm of IT understanding with this course. This supported the third hypothesis H3: Student self-efficacy of the course technology context will not negatively impact learning outcomes (Grade). Nonetheless, data will be examined later to verify that.

A Chi Square Independence Test was conducted to determine if the nominal factors gender and culture were related in any way, since prior research had indicated popular business schools may be heavily loaded with specific combinations of culture and gender. The Chi Square test indicated gender and culture combinations in the sample were not related based on the result of X2(5) = 26.199, p = .000 (significantly different). A Pearson Product Moment correlation test was performed between grade, ItPreTest, ItPostTest and age to detect if any of the commonly assumed student attributes impacted learning. A Spearman correlation test was applied to compare gender and culture with the three dependent variables. The key correlation results are shown in Table 2. There were no significant correlations between the demographic attributes of age, gender or culture with one another or with the dependent variables, so we may assert that student characteristics did not impact their academic learning ability or outcomes in this sample. This and the previous tests supported the second hypothesis H2: Student demographic factors – Age, Gender and Culture – will not be cross-related so as to confound the dependent variable (Grade) or skew the sample limiting generalizations.

Table 2 Correlation of sample demographic factors with dependent variables (n = 228)

However, ItPostTest score and grade were significantly correlated at R = +0.867, p = .021 (see Table 2). The effect size of r 2 = 0.751 (75 %) is a strong positive correlation relationship. This result clearly shows that students who performed better on the ItPostTest also earned a higher final course grade. This implies that students could have an exercise to go through the IT learning curve at the start of an online course in order to help them learn more and obtain higher grades. A casual relationship between the preliminary pre-test or post-test conditions and final grade is not theoretically valid because the grade is determined from assignment scores representing the lesson content, and thus are different and unrelated to the IT or LMS structure. A positive correlation of 75 % merely indicates that students who were prepared for the IT and LMS usually learned more of the actual course content. This was further evidence to support the third hypothesis H3: Student self-efficacy of the course technology context will not negatively impact learning outcomes (Grade). This is still hypothetical in reality since we do not know if there were other environmental, physical, cognitive or personality attributes in students that affected their learning ability and therefore impacted their grades.

4.2 Phase 2: Quantitative learning performance analysis

Multiple regression and analysis of variance (ANOVA) techniques were used to examine the remainder of the hypotheses associated with the predictive factors as listed below (except reflections):

  • Engage C (Moodle engagement analytics),

  • Engage F (Moodle engagement analytics),

  • Engage A (Moodle engagement analytics),

  • Lesson R (lesson reading activity),

  • Lesson Q (lesson quiz activity),

  • Lesson S (lesson mean quiz score).

All of the six proposed predictive factors were entered into stepwise regression and specified as free factors to allow them to be systematically removed from a model based on their adjusted r2 estimate of variance captured. This process produced five significant models with various combinations of the key factors resulting in a low r2 of =52.5 and a high r2 = 80.8. These effect sizes are considerably strong according to Cohen et al. (2003).

A General Linear Model (GLM) multiple regression was produced with all factors because the lowest variance captured in the best subsets regression was over 50 %. A single GLM multiple regression was chosen due to its calibration power and to facilitate the interpretation of the analysis estimates. In the GLM, rather than interact the terms after testing for a main effect, the Variance Influence Factor (VIF) test was invoked to flag factors that demonstrated cross loading on other variables. The benefit of the VIF approach is that it simplifies the model and it avoids entering the interaction of all factor combinations as terms to test (Strang, 2015).

The results of the GLM analysis were significant and the key estimates are summarized in Table 3. The f test outcome was F (6221) = 155.33, p = .000 with an effect size of r 2 = 80.8 % which was further modified to an adjusted r 2 = 80.3 % to account for the number of factors in the model. This small difference between effect size and the adjusted value corroborates the idea of keeping all the factors in the model after the earlier best subsets multiple regression.

Table 3 GLM of key factors regressed on dependent variable grade (n = 228)

The GLM results in Table 3 show that two hypotheses may clearly be rejected namely H5: Forum postings identified by Moodle engagement analytics (EngageF) have a positive predictive relationship with student performance (Grade) and H6: Assignment activity identified by Moodle engagement analytics (EngageA) have a positive predictive relationship with student performance (Grade).

The GLM results in Table 3 clearly show support for the following hypotheses H4: Course logins from Moodle engagement analytics (EngageC) have a positive causal relationship with student learning (Grade); H6: Assignment activity identified by Moodle engagement analytics (EngageA) have a positive predictive relationship with student performance (Grade); and H7: Lesson reading activity identified by Moodle system logs (LessonR) have a positive predictive relationship with student performance (Grade). However, all VIF results were higher than the desired benchmark of close to 1 which means a predictor is orthogonal to the others in the matrix (no significant correlation); while a VIF higher than 10 indicates extreme multicollinearity (Tamhane & Dunlop, 2000). Some statisticians recommend removing factors with VIF > 5 (Snee, 1973); other statisticians suggest investigating factors having a VIF > 1 (Carlson et al. 2004). Two factors have extremely high VIF’s > 10 so they must be removed from the model, namely EngageF and EngageA. Removing these two highly intercorrelated factors would reduce the other factor VIF’s.

Therefore the GLM was run a second time after removing EngageF and EngageA. The GLM results are summarized in Table 4. In this model, all VIF’s were reduced to an acceptable range of 2.1 to 2.5 although some hidden factor minor cross-interaction was still possible. The f test outcome was F (4223) = 196.6, p = .000 with an effect size of r 2 = 77.9 % which was further modified to an adjusted r 2 = 77.5 % indicating a large variance accounted for by these four factors to predict grade. LessonQ had the highest coefficient of all the four predictors. An interesting observation from the GLM in Table 4 is that the coefficient for LessonQ is 2.4 times higher than LessonS, indicating that putting time into the online lesson quizzes had a larger positive impact on increasing grade as compared to scoring well on the quizzes.

Table 4 GLM of significant four factors regressed on dependent variable grade (n = 228)

4.3 Phase 3: Qualitative learning performance analysis

Given that most of the important hypotheses associated with identifying the four significant predictors from online learning interaction for final course grade, the second phase of the mixed method study was focused on examining and organizing the qualitative data to test the final hypothesis. This final hypothesis test sought to show that student reflections were also related in some way to learning outcome. This would serve as triangulation of the data (different source) and since different techniques would be used, triangulation of method would also be accomplished.

It is common in a mixed-methods study that variables may be developed during the study after other phases or hypothesis are first proven. Therefore, these techniques, and the variable definitions, are often explained during the results and discussion rather than in the methods section of a scholarly paper. The reason for this sequence is that in a mixed-methods study, secondary techniques may not be used at all if earlier dependent hypothesis tests fail to be successful.

The data from the fifth course assignment, the 228 student essays for reflective learning on the course objectives, were downloaded from the Moodle Workshop. Each assignment was stored in a separate file, with an identifier that could be linked back to the student and back to the student record in the Minitab data. The text files were analyzed in NVIVO for the most common words first on a within subjects basis. The most common keywords associated with the online learning process were retained. These words were related to viewing the online materials, the course learning objectives, or problems with the online learning process. Then the list of common learning process words were analyzed on a between subjects basis, to identify the most popular or frequently cited keywords. The following list of 17 between-subject keywords were finalized - with a short explanation for the surrounding context:

  1. 1.

    confusing due dates – referring to student(s) did not understand that all assignments were due Mondays at midnight following the week due (could be work-overloaded student(s);

  2. 2.

    dislike layout- referring to the online course site and particularly the lesson structure;

  3. 3.

    easy to learn – referring to the content was easy to master;

  4. 4.

    fair rubrics – concerning assignment grading, the rubrics were always displayed and used;

  5. 5.

    fair process – referring to students being given at least a week for every assignment;

  6. 6.

    fun lessons – referring to the sequencing and interaction within the lessons;

  7. 7.

    good experience – student(s) were required to update their resumes with college courses, internships and part time work for use to gain internships and other part time college jobs;

  8. 8.

    great interviews – referring to guest speaker conducting simulated online interviews;

  9. 9.

    hard assignments – referring to the graded assignments were challenging (students had to find an actual job, apply for it using a cover letter and updated resume);

  10. 10.

    industry prep – similar to job relevant but orientated to how to address specific business sub disciplines with cover letters, such as marketing research, marketing sales, and so on;

  11. 11.

    job related – course objectives were to develop career goals and find a good fitting job that was also linked to their skills and experience;

  12. 12.

    liked videos – each lesson contained one or more short videos to illustrate industry applications such as a video of an interview or a video of a hiring manager giving advice;

  13. 13.

    IT/laptop problems –students could use campus lab but most chose to do their work on their laptops even though they lived on campus in residences;

  14. 14.

    quiz hard – referring to the quizzes in the lessons (not assignments) took too much time;

  15. 15.

    relevant samples – referring to example cover letters and resumes provided in the lessons;

  16. 16.

    too much work – referring to the graded assignment work taking too much time;

  17. 17.

    visuals useful – each lesson started with a concept diagram of relevant theories and keywords, colours were used to highlight concepts in the lessons and forum discussions.

The between-subjects learning keywords were listed beside each student such that each student now contained a within-subject list of keywords (from the common list of 17) that they had mentioned in their course learning objectives reflection essay. Most students had only a singly keyword but several students had more than one. These keywords were then uploaded into the Minitab database, using the student id to associate them to the correct record.

Then several qualitative data analysis techniques were performed to investigate the relationships between the reflective course learning keywords and grade. Unfortunately cluster analysis was not able to produce a definitive model. Discriminat analysis comparing keywords to a pass/fail variable was also insignificant.

A Surface Response Regression was then performed to quantify the number of reflective learning keywords associated with the level of grade. From this a one-way analysis of means (ANOM) was performed to identify the mean grade associated with each reflective learning keyword (using all subjects). The results of the ANOM are visually illustrated in Fig. 2. The reflective learning keywords are used for the x-axis (they were abbreviated by the software and some are not listed due to space constraints). The y-axis was the mean course grade. The mean grade of 91.168 for all students is superimposed as a horizontal green line in Fig. 2. The horizontal top red line shows upper confidence interval per reflective keyword, and likewise, the bottom red line is the lower confidence interval. The vertical black drop lines (with an ending dot) show the extreme value for the keyword.

Fig. 2
figure 2

ANOM response diagram of reflective learning keyword association to grade (n = 228)

For example in Fig. 2, the first dot on the left shows that students citing the reflective learning keyword “dislike layout” obtained lower than average grades (M = 88.8), while at the second dot from the left, this was the keyword “job related” with the highest mean (M = 100). The lower dots in Fig. 2 show three keywords which corresponded to the lowest mean grades in the course, namely: “Confusing due dates”(M = 80), “Quiz too hard”, (M = 76.3) and “Hard assignments (challenging)” (M = 72.50).

The quantitative estimates from the ANOM are summarized in Table 5 which is sorted by mean in descending order. The ANOM diagram (Fig. 2) and analysis (Table 5) present an informative pattern of learning objective-related reflective keywords associated with high grades and problem behavior linked to lower mean grades (for the most part). Based on the results in Fig. 2 and Table 5, there is sufficient evidence to support the last hypothesis H10: Student reflections on course learning objectives will be related to outcomes (Grade).

Table 5 ANOM reflective learning keyword association with grade (n = 228)

A form of manual cluster analysis was performed on the ANOM results in Table 5. Each segment is shown indented in Table 5. The segment frequency sizes and mean grades are most meaningful to analyze. The results from Table 5 could be broken into five similar categories. The first category would be the “goal-oriented performers”. In this category were the top four reflective learning keywords (with respect to students earning the highest grades). The keywords were “Job related (career search)”, “Great interviewer (guest speaker)”, and “Relevant samples (resumes)”. The 20 goal-oriented performers amounted to 11 % of the sample and they achieved an average grade of 99.2. This segment contained the students who strongly identified with the value of the course content especially with the learning objectives – they also learned the most according to their grade.

“Pedagogy admirers” was the next segment appeared to like the course pedagogy and structure most. They identified with how the course was taught and what materials were used, rather than the content goals of the course. This segment included reflective learning keywords “Easy to learn”, “Visual concept maps useful”, “Liked videos in lessons”, and “Fun lesson content, sequence.” The pedagogy admirers constituted almost a quarter of the sample, at 52 or 23 % and their mean grade was 95.7 which is also a high performing learning result. Interestingly, these first two segments were developed with basically positive reflective learning keywords and they earned the highest mean grades.

The third segment was “IT problematic” containing “Too much work” (possibly not a perfect match to the segment but it was a small group), and “IT/Laptop problems.” This segment contained 64 students or 28 % of the sample, and their mean grade was 92.3 (still very good). A perplexing observation that “IT/laptop problems” keyword were cited by many students, 56, nonetheless, their mean grade was 92.49 (SD = 3.473). Even with the SD considered for the confidence intervals, the grade mean is approximately 4 points higher than the start of the negative-issue keywords at “Dislike course layout”. This may imply that students were simply being honest in reporting issues like Browser problems (a common complaint received from students when using advanced objects like embedded lesson content within Moodle). Thus, this implies that students have developed IT self-efficacy to overcome their IT problems and still learn course material as evidenced by their excellent grade in the low 90’s (the A- range in the SUNY grading system).

The fourth segment was the “take away champions” containing students citing reflective keywords “Industry prep application letters”, “Good experience for resume”, “Fair process using rubrics”, “Fair tests.” Admittedly some of these keywords also could belong to the second segment. The segment was named as such because the keywords and their underlying meanings were focused more on what they received in value from the course but not directly related to the learning objectives. The take away champions were a sizable group at 60 or 26 % of the sample and they recorded a respectable mean grade of 91.

Last was the “nay sayers” segment containing reflective learning keywords “Dislike course layout”, “Confusing due dates”, “Quizzes too hard”, and “Hard assignments (too hard).” There were 32 students or 14 % of the sample in this category with the lowest average grade of 76.3. These segments seem to make logical sense with respect to the association with mean grade. If a researcher were looking at the earlier quantitative results, where the four significant online interaction predictors (course logins, lesson reading, lesson quiz activity, lesson quiz scores) were considered with the reflective learning keyword associations to grades, this may suggest that higher goal oriented online activity and less IT-fault-seeking behavior, leads to higher grades. Another way of looking at these results is that building up student IT self-efficacy and emphasizing the whats-in-it-for-me learning goals, this would likely lead to higher course grades, if all other factors were relatively constant. The issue of all other factors being constant leads into the next topic of limitations.

4.4 Limitations with recommendations

A key limitation in this research, which affects any generalizations, was the small sample size of 45 students. Additional the context of SUNY may not be similar to other universities. The course design and pedagogy may be dissimilar to other populations. Also the sample had a higher concentration of international students (51 %) as compared with the population at SUNY institution which means this sample does not even generalize perfectly to SUNY as a whole. Additionally, the sample was drawn from the School of Business & Economics so this would not necessarily be compatible with other disciplines.

Another potential limitation was the subject process used to codify the reflective learning keyword using the NVIVO text analysis of word frequency from the student essays. There is a wide margin of potential interpretation just in the keywords used in this paper. Additionally there is room for error in how the author interpreted what the students wrote in their essays. Although another researcher looked over the keyword results, an inter-rater analysis was not performed due to lack of time. This is a suggested technique to include for future mixed-methods studies where subjectivity is needed.

Finally, the course was a professional human resource management course so students may behave differently during online courses in this subject matter. A technical limitation was that the rejection of the hypotheses was predicated upon the data provided by Moodle engagement analytics but it is unknown exactly how those results were calculated without examining the program code (which was beyond the scope of this study). Most of these limitations could be easily overcome through replications of the research design with other samples drawn from different populations. More testing of Moodle engagement analytics and online activity data are needed.

5 Conclusions

This is likely the first time that a mixed-method study has been published that focuses on online learning analytics, particularly one that combined test analytics on student reflective learning essays to further explain a regression model that had already identified four significant predictors of course grade. The large sample of 228 students makes the analysis powerful. The explanation and application of relevant statistical techniques makes this study credible. The verification of the sample to the population frame, and discussion of descriptive statistics, makes these results generalizable. The study has added to the current body of knowledge in several ways as summarized below.

First, after rigorous literature review, three common demographic factors related to student online course outcomes were tested, namely age, gender and culture. These three demographic factors were found to be unrelated to student online course outcomes. Testing demographic factors as predictors is a common preliminary step when performing regression analysis in online higher education. Furthermore, age, gender and culture were not correlated with one another either. This was similar to a similar study in Spain by Iglesias-Pradas et al. (2015). Several additional tests were added to the current study to ensure there were no cross-correlations between these demographic factors and with grade. Pre and post testing was also applied at the start of the course to motivate students to become familiar with the online IT, and also to ensure potential lack of IT understanding with the course did not result in lower grades (regression tests confirmed this).

Next, Moodle engagement analytics indicators were for the most part completely useless in predicting student online learning outcomes. None of the analytic factors were significant, except for course logins, which is clearly a predicate of any online activity. This finding was similar to the result found by Iglesias-Pradas et al. (2015) and also by Zacharis (2015). However, the sample size was larger here and the tests were more rigorous using a ratio data type as the dependent variable.

After only course logins was found to be significantly related to course grade, six key factors were drawn from the Moodle system logs, based on the work of Zacharis (2015) where he had identified 29 potential factors. In this study, a second GLM reduced these to four significant predictors of online student learning performance – course logins, lesson reading, lesson quiz activity, and lesson quiz scores. This four factor model captured 78 % of variance on course grade with an F (4223) = 196.6, p = .000 with an effect size of r 2 = 77.9 %, and adjusted r 2 = 77.5 %, which is a large variance accounted. This compares favorably to the final model of Zacharis (2015) that captured r2 = 52 % variance (his adjusted r2 was 50 %).

One factor, online lesson quiz activity (not to be confused with quiz score), had the highest coefficient of all the four predictors. An interesting finding was that the coefficient for online lesson quiz activity was 2.4 times higher than quiz score, indicating that students who put more time into the online lesson quizzes achieved higher grades as compared to obtaining higher scores on the quizzes. The implications are that if professors use problem based learning pedagogy for online courses with frequent quizzes then motivated students will learn more and score higher grades. Students should be advised to spend time on quizzes.

A methodological contribution of the current study was to demonstrate how to use a mixed-methods design where quantitative hypothesis tested was followed by qualitative data collection, text analytics and more quantitative analysis to uncover hidden patterns within student online essays. This process consisted of identifying within-subject keyword frequencies, and then between-subject frequencies, on 228 student reflective learning essays. A final list of 17 reflective learning keywords were analyzed against course grade using surface response regression and analysis of means, to identify strong associations with grade. A form of cluster analysis was performed on the analysis of means results to group the average grades into five segments of reflective learning keywords. From this, there were several logical deductions with the findings.

The first cluster segment named “goal-oriented performers” contained 11 % of the sample and they were the highest performers with an average grade of 99.2. This segment contained the students who strongly identified with the value of the course content especially with the learning objectives and they learned the most according to their grade. “Pedagogy admirers” contained 23 % of the students and their mean grade was 95.7, which is also a high performing result. The third segment was “IT problematic” amounting to 28 % of the sample, and their mean grade was 92.3 (still very good). Beyond the obvious interpretation that these students experienced IT problems, they still performed well, which implied that these students developed IT self-efficacy to overcome their IT problems. The fourth segment was the “take away champions” at 26 % of the sample, citing whats-in-it-for-me reflective statements yet they recorded a respectable mean grade of 91. The last segment was the “nay sayers” group containing reflective learning keywords “Dislike course layout”, “Confusing due dates”, “Quizzes too hard”, and “Hard assignments (too hard).” The “nay sayers” constituted 14 % of the sample reaching the lowest average grade of 76.3.

There are two insightful implications from these results. First, from a pedagogy standpoint, encouraging students to complete more online lessons including quizzes, generally promotes learning, resulting in higher grades, a win:win for students and the university. Secondly, from an IT perspective, the student pre and post testing resulted in statistically significant increase of IT-course knowledge, which is a solid starting position for an online course. Additionally, the link between students voicing IT problems but nonetheless scoring very well on the course certainly implies the development of IT self-efficacy. This self-efficacy may have been developed partly through the pre and post testing process, as well as the consistency of course design using the Moodle lesson models. An inside tip that this author can share is that a model lesson was developed and pilot tested and it was duplicated to create all remaining weekly lessons. This would have promoted a uniform look and feel for the course lessons. Likely this is what many students reflective positively on, and what helped struggling students develop an IT self-efficacy early in the course, simply by making lessons clear and consistent, as well as to include interactive quizzes at the end of all lessons.

It goes without saying that more research is necessary before these results could be considered totally valid. Replications in different university contexts and especially with other courses must be done.