Abstract
Bayesian diagnosis tracing model (BDT) replaces the generic “wrong” response in the classical Bayesian knowledge tracing model (BKT) with a vector of procedure misconceptions. Using a novel dataset with actual student responses, this paper shows the BDT model has better interpretability of the latent factor and minor improvement in out-sample predictability in some specification than the BKT model.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
1.1 Motivation
In our frequent exchanges with front-line teachers, a question often arises: “What does the 84% mastery mean in reality? Could you show us what students actually submitted?” Teachers are not only interested in predicting whether a student gets a question wrong, but also how they get it wrong. For example, the most frequent wrong answer to 54 − 26 is 38: students forget to trade a ten from the digit in tens. A less frequent wrong response is 32, which is caused by misunderstanding the rule of decomposition and treat the larger number in each digit as minuend (5 – 2 = 3, 6 − 4 = 2). The latter error exposes a more critical procedure misconception of subtraction. However, The Bayesian Knowledge Tracing (BKT) model (Corbett and Anderson [1]) cannot answer the question of “how” because of an implicit assumption that the response is a binary variable, thus all wrong responses are qualitatively the same.
1.2 Literature Review
Pelánek and Desmarais both provid the latest literature review on this extending the BKT model [2, 3]. Among them, the most influential innovations are contextual slip and guess parameter (Baker et al. [4]), individualized model (Yudelson et al. [5], Pardos and Neil [6]), and Deep knowledge tracing (Piech et al. [7]). Instead of elaborating the latent factor structure, this paper proposes to enlarge the observations. Such idea draws inspirations from VanLehn [8]’s work on procedure misconceptions. Liu et al. [9] encodes the misconception in the structures of knowledge components. In contrast, this paper treats the misconceptions as observable responses.
2 Diagnosis of Procedure Misconceptions
2.1 Dataset
The dataset comes from the Optical Character Recognition (OCR) of mental arithmetic practice booklet. Mental arithmetic means no vertical procedure. A student writes the answer on the booklet and takes a photo. An app auto-mark the photographed booklet so that a teacher does not need to. Figure 1 is a screenshot of a marked booklet.
The paper extracts two-digit subtraction items from the OCR data submitted during December 2018. It excludes students who practiced less than 5 times or more than 200 times. The remaining dataset includes 627,330 practices from 22,395 students, with a correct percentage of 92%.
2.2 Misconception Diagnosis
This paper identifies the following procedure misconceptions: forget borrowing a ten (54 − 26 = 38), miss one (54 − 36 = 27/39), miss the digit of tens (54 − 36 = 8) and general misconception of subtraction. The last category includes unnecessary trading a ten from the next digit (56 − 24 = 22) and treating larger number as the minuend in each digit (54 − 26 = 32). “skip” is not procedure misconceptions but frequent enough to merit its own category: leave a line empty (“54 − 26 = _”) or fill it with a number from the expression (“54 − 26 = 26”). Table 1 lists the distribution of wrong responses.
It should be noticed that more than half of the wrong responses are not diagnosed: Even for such a quite simple arithmetic operation, the distribution of misconceptions has a very long tail.
3 Bayesian Diagnosis Tracing Model
The misconception-as-observation model is called Bayesian Diagnosis Tracing Model (BDT), to distinguish it from the classical BKT model [5, 10, 11]. The BDT model consists of three parameters: the priors (\( P \)), transition matrix (\( T \)) and emission matrix (\( E \)). The likelihood function of BDT model is given in Eq. (1) [12]: (Fig. 2)
3.1 Two-State Latent Factor Model
The BKT model does not allow for forgetting. However, such specification performs poorly in this dataset. Therefore, the BKT model reported in this paper has a full transition matrix. For the sake of comparison, the BDT parameters are reformatted in the form of BKT by ignoring the intermediate state. Table 2 shows the two models have very similar parameters. The out-sample AUC of two models are both around 0.943. In the simplest latent structure, the two models are essentially equivalent.
Table 3 reports the BDT emission probabilities. The mastery students do not skip or incur the two misconception (general misconception and miss the digit of tens).
3.2 Three-State Latent Factor Model
This section employs a three-state model (No Mastery, Intermediate, Mastery) to better illustrate the benefit of misconception as observation. For better parameter convergence, the latent factor can only transit to the adjacent state. For the theoretical motivation of such specification, see Chap. 1 of Feng [4].
Table 4 reports the emission probabilities. The factors of the BDT model are more interpretable compared with the BKT: The no mastery state skips a lot; the intermediate state is prune to various misconceptions; the mastery state performs almost perfectly except for the most common misconceptions. The interpretable states are not only easy to communicate but also are helpful in constructing remedial instruction. In this case, students who skip and students who slip shall be treated differently: The no mastery students may need heavy intervention, such as interactive course or video tutoring; while the intermediate students can receive light-weight help, such as hint or more practices.
Besides the gain of interpretability, the BDT model also performs better in out-sample predictability. The out-sample AUC of the BDT model is 0.9243 while that of the BKT model is 0.9038.
4 Discussion
This paper explores the benefit of using procedure misconceptions as observation in the HMM model. The BDT model is more accurate in prediction and more interpretable in diagnosis for high dimension latent state model, when compared with the BKT model. However, there is more work to be done. For one thing, little is known about the tail of the distribution, whose diagnosis can improve BDT performance. For another thing, the BDT model has great potential in analyzing problems that has multiple knowledge components because identified misconceptions can accurately find the component(s) to blame.
References
Corbett, A.T., Anderson, J.R.: Knowledge tracing: modeling the acquisition of procedural knowledge. User Model. User Adapt. Interact. 4(4), 253–278 (1994)
Pelánek, R.: Bayesian knowledge tracing, logistic models, and beyond: an overview of learner modeling techniques. User Model. User Adapt. Interact. 27(3–5), 313–350 (2017)
Desmarais, M.C., Baker, R.S.: A review of recent advances in learner and skill modeling in intelligent learning environments. User Model. User Adapt. Interact. 22(1–2), 9–38 (2012)
Feng, J.: Essays on learning through practice. Doctoral dissertation, The University of Chicago (2017)
Yudelson, M.V., Koedinger, K.R., Gordon, G.J.: Individualized Bayesian knowledge tracing models. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS (LNAI), vol. 7926, pp. 171–180. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39112-5_18
Pardos, Z.A., Heffernan, N.T.: Modeling individualization in a Bayesian networks implementation of knowledge tracing. In: De Bra, P., Kobsa, A., Chin, D. (eds.) UMAP 2010. LNCS, vol. 6075, pp. 255–266. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13470-8_24
Piech, C., et al.: Deep knowledge tracing. In: Advances in Neural Information Processing Systems, pp. 505–513 (2015)
VanLehn, K.: Mind Bugs: The Origins of Procedural Misconceptions. MIT Press, Cambridge (1990)
Liu, R., Patel, R., Koedinger, K.R.: Modeling common misconceptions in learning process data. In: Proceedings of the Sixth International Conference on Learning Analytics and Knowledge, pp. 369–377. ACM (2016)
Piech, C., et al.: Deep knowledge tracing. In: Advances in Neural Information Processing Systems, pp. 505–513 (2015)
Käser, T., Klingler, S., Schwing, A.G., Gross, M.: Beyond knowledge tracing: modeling skill topologies with Bayesian networks. In: Trausan-Matu, S., Boyer, K.E., Crosby, M., Panourgia, K. (eds.) ITS 2014. LNCS, vol. 8474, pp. 188–198. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07221-0_23
Ghahramani, Z.: An introduction to hidden Markov models and Bayesian networks. In: Hidden Markov Models: Applications in Computer Vision, pp. 9–41 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Feng, J., Zhang, B., Li, Y., Xu, Q. (2019). Bayesian Diagnosis Tracing: Application of Procedural Misconceptions in Knowledge Tracing. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds) Artificial Intelligence in Education. AIED 2019. Lecture Notes in Computer Science(), vol 11626. Springer, Cham. https://doi.org/10.1007/978-3-030-23207-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-23207-8_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23206-1
Online ISBN: 978-3-030-23207-8
eBook Packages: Computer ScienceComputer Science (R0)