Flow with an intelligent tutor: A latent variable modeling approach to tracking flow during artificial tutoring

Kang, Hyeon-Ah; Sales, Adam; Whittaker, Tiffany A.

doi:10.3758/s13428-022-02041-w

Flow with an intelligent tutor: A latent variable modeling approach to tracking flow during artificial tutoring

Published: 07 February 2023

Volume 56, pages 615–638, (2024)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

Flow with an intelligent tutor: A latent variable modeling approach to tracking flow during artificial tutoring

Download PDF

743 Accesses
3 Citations
2 Altmetric
Explore all metrics

Abstract

Increasing use of intelligent tutoring systems in education calls for analytic methods that can unravel students’ learning behaviors. In this study, we explore a latent variable modeling approach for tracking learning flow during computer-interactive artificial tutoring. The study considers three models that give discrete profiles of a latent process: the (i) latent class model, (ii) latent transition model, and (iii) hidden Markov model. We illustrate application of each model using example log data from Cognitive Tutor Algebra I and suggest analytic procedures of drawing learning flow. Through experimental application, we show that the models can reveal substantive information about students’ learning behaviors and have potential utility for describing the learning flow. The models differed in the assumptions and data constraints but yielded consistent findings on the flow states and interaction modalities. Based on our experiential analyses, we discuss strengths and limitations of the models and illuminate areas of future development.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

An intelligent tutoring system (ITS; e.g., ALEKS, ASSISTments, AutoTutor, Easy with Eve, MATHia, MetaTutor, SQL-Tutor) is educational software that provides computerized tutoring. The system employs an artificial intelligent tutor to guide a student through problem sets and provide customized feedback. In ITS, the tutor directly interacts with a student to perform tutoring activities. Since the instruction is mostly achieved by direction interaction between a tutee and a tutor, minimal help is needed from human teachers and it enables cost-effective large-scale individualized learning.

One of the important considerations in implementing ITS is whether a student adequately follows through learning activities with continued attention and engagement. As ITS is typically administered in a self-regulated environment, students can divert from active learning when they encounter challenges. For example, if a student is presented with tasks that exceedingly challenge his skill levels, the student can become frustrated and demotivated to learn new skills. If assigned tasks are too easy and require little effort, a student can also diverge from genuine learning and show effortless behaviors. Examining student’s interaction behaviors in these settings can help understand student’s learning process and the functioning of ITS. Since many ITS programs are designed to keep students engaged and flow in learning, any distinctive behaviors that deviate from the normal operation would indicate emergence of nonoptimal learning and ill-functioning of ITS.

The purpose of this study is to explore statistical models that can describe students’ learning behaviors during artificial tutoring and draw information that can help future ITS refinement and intervention planning. We in particular examine behaviors that illuminate students’ learning flow. Flow (Csíkszentmihályi, 1990) is a mental state learners experience when immersed in deep learning. A student in a flow state shows high engagement with the learning activities and tends to gain positive learning outcomes. In ITS, the inbuilt design makes the flow a highly achievable state. Many ITS programs customize tutoring activities to students’ skill levels and learning progression, and they generally expect students to flow while learning with the tutor. Modeling students’ flow states in this setting can help understand the students’ learning process and when students become subject to suboptimal learning.

For modeling learning flow, we apply latent variable models that give discrete profiles of a latent process. Three models are considered for application: the (i) latent class model (LCM), (ii) latent transition model (LTM), and (iii) hidden Markov model (HMM). These models allow analysis of large-scale multivariate time-series data and can describe systematic effects of contextual variables (e.g., problem effects, student covariate effects). The models differ in the specific ways of characterizing the variables (e.g., permissible indicators, latent state transition, covariate effects) and constraints of estimation software. In this study, we suggest practical strategies of applying the models, addressing the related assumptions and constraints. We show how each model can be applied to accommodate distinct characteristics of the ITS data and draw information relevant to learning flow.

To demonstrate the application, we employ example data from Cognitive Tutor Algebra (CTA).^{Footnote 1} Using log data from a particular time period, we show an analytic process of drawing learning flow. We give a didactic demonstration of data preparation, model formulation, and estimation, and show that the outcomes of the models reveal substantive information about students’ learning process. Based on our experimental analysis, we discuss strengths and limitations of the models in describing the ITS data and illuminate areas of future consideration.

The rest of this article is organized as follows. In “Cognitive Tutor Algebra”, we introduce CTA and present basic information about the evaluation data. We discuss characteristics of the raw CTA data, arrangements needed for analysis, and analysis steps that apply the latent variable models in phases. Sections “Profiling flow”–“Flow across problems” present specific analyses performed under each model. We discuss model formulation, data preparation, model fitting, and corresponding results. Section “Conclusion” concludes with a summary of the findings and future considerations.

Cognitive Tutor Algebra

Data

The study used Cognitive Tutor Algebra (CTA) I to illustrate the application of latent variable models in the ITS data. The example evaluation data were collected in 2007-08 as a part of an effectiveness study (Pane et al., 2014). The raw data contained observations from N = 2860 students that received tutoring between July 2007 and May 2008. The tutoring was offered during regular curricula under the supervision of teachers. The system contained a total of 637 problems across 106 sections that are nested within 27 units (e.g., algebra level 1, level 2; equation solver level 1, level 2). Across the study period, students received on average 276.826 problems (SD = 193.642), 43.900 sections (SD = 32.523), and 8.927 units (SD = 6.746). The specific problem sets and the order of problems differed by students, teachers, and school districts. Most of the problems were prompted by an artificial tutor following the student’s skill mastery, but teachers could reassign students to different sections and the system could also promote students to a new section if a student reaches a maximum number of problems.

Preparation

The raw interaction data bear a number of complications for applying the latent variable models. Since the system administered problems differently according to the students’ skill levels, the assigned problems will induce between-subject variance in the evaluation data. In addition, since the tutoring was offered in multiple sessions over a year, the interaction data will exhibit large temporal variance in the students’ flow progression. For examining learning flow, it is necessary to reform the raw data and regulate undesired variance.

Our strategy for regulating the variance in this study was to choose one problem unit and examine students’ workings on single days. Fixating on one problem unit helps regulate excessive measurement noninvariance. Limiting tutoring times to single days helps reduce temporal variance and dimensionality of latent states. Specifically, we chose an elementary problem unit, equation solver level 1(es1 hereinafter), and examined the interaction data that were collected on the single days. The es1 problems showed the most homogeneous measurement properties (see Appendix A) and it was reckoned that they would induce minimal variance in the indicator variables.

The data extraction was achieved as follows. If a student worked on es1 on multiple days, we picked the day the student attempted most problems and examined the student’s flow development during the day. Similarly, if a student worked on multiple units on the same day, only the observations from es1 were examined to regulate the variance from the other units and problems. We note that, although we carefully prepared the data to exhibit homogeneous measurement properties, we also additionally addressed the measurement noninvariance when models allow modification (e.g., random effect).

Variables

Applying the above strategy led to subset data of N = 2219 students. The students in the final data attempted 50.236 problems on average (SD = 16.102) with a minimum of four and maximum of 151. For evaluating flow, we examined three indicator variables: the interaction time, the number of erroneous attempts, and the number of hints requested. Each indicator variable was transformed to meet the constraints of calibration programs. For example, the timing variable was placed on the log metric to approximate normality. The count variables were used as observed or categorized into three ordinal categories (i.e., none, one, and more than one) and modeled by Poisson or proportional odds models. Along with the interaction indicators, we also made use of student-level covariates when inferring the state membership and transition behaviors. The covariates used include: Pre- and Gain test scores on the standardized test, Race (0 = White and Asian, 1 = Black and multiethnic, 2 = Hispanic and native America), Sex (0 = Female, 1 = Male), and whether a student was enrolled in a free lunch program (0 = No, 1 = Yes).

Analysis

As the cleaned data were obtained as above, we performed analysis in three stages as follows. We first conducted latent class analysis to examine heterogeneity in the interaction data and investigated if the identified heterogeneity can be characterized as distinct latent classes of in- and out-of-flow. Based on the findings from the latent class analysis, we then performed latent transition analysis to track progression of latent states across different tutoring stages. In both analyses, we applied sample-level data (i.e., data for 2219 students) to account for effects of contextual variables (e.g., populational characteristics, problem effects). The last stage analysis was performed on the individual student-level data (i.e., each student’s interaction data) using hidden Markov models. Unlike LCM and LTM, which require modification of data to reduce the event times, HMM can model intensive time series and requires minimal data transformation. The third-stage analysis drew on this flexibility and applied HMMs to examine students’ learning progression over individual problem-solvings. Since CTA customized problem assignments to each student’s skill levels, we surmised that the problems would exhibit weak measurement invariance if conditioned at the student level. We exploited this assumption to track student’s learning progression across individual problems.

Table 1 summarizes the analyses performed in each stage. Each analysis was carefully designed to address the assumptions of a model, constraints of a calibration program, and the characteristics of the CTA data. It is important to mention that, across the analyses, we applied the models assuming a small number of latent states. Since our study was mainly interested in modeling discrete flow states, we focused more on the stability of the extracted states, indicator modalities under each state, and the evolution of latent states over time. Our supplementary analysis suggests that allowing more states tends to result in the disintegration of the normal flow state, characterizing different problem-solving strategies. Although unraveling the flow state can help learn different working processes, the identified features generally require subjective interpretation and are difficult to validate beyond face validity. We therefore limit our attention to clear bimodality of flow—in- and out-of-flow. In the following sections, we discuss specific analysis conducted under each model, including the model formulation, analytic strategies for CTA, and corresponding results.

Table 1 Analysis settings

Flow with an intelligent tutor: A latent variable modeling approach to tracking flow during artificial tutoring

Abstract

Introduction

Cognitive Tutor Algebra

Data

Preparation

Variables

Analysis

Profiling flow

Random-effect latent class model

Profiling flow in CTA

Results

Flow across tutoring stages

Random-intercept latent transition model

Flow transition in CTA

Results

Flow across problems

Hidden Markov model

Flow progression in CTA

Results

Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Measurement properties of the problems within units

Appendix B: Fitting the random-effect latent class model in Stan

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation