Bayesian Diagnosis Tracing: Application of Procedural Misconceptions in Knowledge Tracing

Feng, Junchen; Zhang, Bo; Li, Yuchen; Xu, Qiushi

doi:10.1007/978-3-030-23207-8_16

Junchen Feng²⁰,
Bo Zhang²⁰,
Yuchen Li²⁰ &
…
Qiushi Xu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11626))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

3293 Accesses
3 Citations

Abstract

Bayesian diagnosis tracing model (BDT) replaces the generic “wrong” response in the classical Bayesian knowledge tracing model (BKT) with a vector of procedure misconceptions. Using a novel dataset with actual student responses, this paper shows the BDT model has better interpretability of the latent factor and minor improvement in out-sample predictability in some specification than the BKT model.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Option Tracing: Beyond Correctness Analysis in Knowledge Tracing

Measuring Misconceptions Through Item Response Theory

Uncovering students’ misconceptions by assessment of their written questions

Article Open access 24 August 2016

Keywords

1 Introduction

1.1 Motivation

In our frequent exchanges with front-line teachers, a question often arises: “What does the 84% mastery mean in reality? Could you show us what students actually submitted?” Teachers are not only interested in predicting whether a student gets a question wrong, but also how they get it wrong. For example, the most frequent wrong answer to 54 − 26 is 38: students forget to trade a ten from the digit in tens. A less frequent wrong response is 32, which is caused by misunderstanding the rule of decomposition and treat the larger number in each digit as minuend (5 – 2 = 3, 6 − 4 = 2). The latter error exposes a more critical procedure misconception of subtraction. However, The Bayesian Knowledge Tracing (BKT) model (Corbett and Anderson [1]) cannot answer the question of “how” because of an implicit assumption that the response is a binary variable, thus all wrong responses are qualitatively the same.

1.2 Literature Review

Pelánek and Desmarais both provid the latest literature review on this extending the BKT model [2, 3]. Among them, the most influential innovations are contextual slip and guess parameter (Baker et al. [4]), individualized model (Yudelson et al. [5], Pardos and Neil [6]), and Deep knowledge tracing (Piech et al. [7]). Instead of elaborating the latent factor structure, this paper proposes to enlarge the observations. Such idea draws inspirations from VanLehn [8]’s work on procedure misconceptions. Liu et al. [9] encodes the misconception in the structures of knowledge components. In contrast, this paper treats the misconceptions as observable responses.

2 Diagnosis of Procedure Misconceptions

2.1 Dataset

The dataset comes from the Optical Character Recognition (OCR) of mental arithmetic practice booklet. Mental arithmetic means no vertical procedure. A student writes the answer on the booklet and takes a photo. An app auto-mark the photographed booklet so that a teacher does not need to. Figure 1 is a screenshot of a marked booklet.

The paper extracts two-digit subtraction items from the OCR data submitted during December 2018. It excludes students who practiced less than 5 times or more than 200 times. The remaining dataset includes 627,330 practices from 22,395 students, with a correct percentage of 92%.

2.2 Misconception Diagnosis

This paper identifies the following procedure misconceptions: forget borrowing a ten (54 − 26 = 38), miss one (54 − 36 = 27/39), miss the digit of tens (54 − 36 = 8) and general misconception of subtraction. The last category includes unnecessary trading a ten from the next digit (56 − 24 = 22) and treating larger number as the minuend in each digit (54 − 26 = 32). “skip” is not procedure misconceptions but frequent enough to merit its own category: leave a line empty (“54 − 26 = _”) or fill it with a number from the expression (“54 − 26 = 26”). Table 1 lists the distribution of wrong responses.

Table 1. Distribution of wrong responses

Full size table

It should be noticed that more than half of the wrong responses are not diagnosed: Even for such a quite simple arithmetic operation, the distribution of misconceptions has a very long tail.

3 Bayesian Diagnosis Tracing Model

The misconception-as-observation model is called Bayesian Diagnosis Tracing Model (BDT), to distinguish it from the classical BKT model [5, 10, 11]. The BDT model consists of three parameters: the priors ($ P $), transition matrix ($ T $) and emission matrix ($ E $). The likelihood function of BDT model is given in Eq. (1) [12]: (Fig. 2)

$$ P\left( {Y_{t} |S_{t} } \right) = \left( {P\left( {S_{0} |} \right)P\left( {Y_{0} |S_{0} } \right)\mathop \prod \nolimits_{t = 1}^{t} P\left( {S_{t} |} \right)P\left( {Y_{t} |S_{t} } \right)} \right)/P\left( {Y_{0:t} |S_{0:t} } \right) $$

(1)

3.1 Two-State Latent Factor Model

The BKT model does not allow for forgetting. However, such specification performs poorly in this dataset. Therefore, the BKT model reported in this paper has a full transition matrix. For the sake of comparison, the BDT parameters are reformatted in the form of BKT by ignoring the intermediate state. Table 2 shows the two models have very similar parameters. The out-sample AUC of two models are both around 0.943. In the simplest latent structure, the two models are essentially equivalent.

Table 2. Parameter comparison in the forms of BKT model

Full size table

Table 3 reports the BDT emission probabilities. The mastery students do not skip or incur the two misconception (general misconception and miss the digit of tens).

Table 3. Emission probabilities of two-state BDT model

Full size table

3.2 Three-State Latent Factor Model

This section employs a three-state model (No Mastery, Intermediate, Mastery) to better illustrate the benefit of misconception as observation. For better parameter convergence, the latent factor can only transit to the adjacent state. For the theoretical motivation of such specification, see Chap. 1 of Feng [4].

Table 4 reports the emission probabilities. The factors of the BDT model are more interpretable compared with the BKT: The no mastery state skips a lot; the intermediate state is prune to various misconceptions; the mastery state performs almost perfectly except for the most common misconceptions. The interpretable states are not only easy to communicate but also are helpful in constructing remedial instruction. In this case, students who skip and students who slip shall be treated differently: The no mastery students may need heavy intervention, such as interactive course or video tutoring; while the intermediate students can receive light-weight help, such as hint or more practices.

Table 4. Emission probabilities of the three-state model

Full size table

Besides the gain of interpretability, the BDT model also performs better in out-sample predictability. The out-sample AUC of the BDT model is 0.9243 while that of the BKT model is 0.9038.

4 Discussion

This paper explores the benefit of using procedure misconceptions as observation in the HMM model. The BDT model is more accurate in prediction and more interpretable in diagnosis for high dimension latent state model, when compared with the BKT model. However, there is more work to be done. For one thing, little is known about the tail of the distribution, whose diagnosis can improve BDT performance. For another thing, the BDT model has great potential in analyzing problems that has multiple knowledge components because identified misconceptions can accurately find the component(s) to blame.

References

Corbett, A.T., Anderson, J.R.: Knowledge tracing: modeling the acquisition of procedural knowledge. User Model. User Adapt. Interact. 4(4), 253–278 (1994)
Article Google Scholar
Pelánek, R.: Bayesian knowledge tracing, logistic models, and beyond: an overview of learner modeling techniques. User Model. User Adapt. Interact. 27(3–5), 313–350 (2017)
Article Google Scholar
Desmarais, M.C., Baker, R.S.: A review of recent advances in learner and skill modeling in intelligent learning environments. User Model. User Adapt. Interact. 22(1–2), 9–38 (2012)
Article Google Scholar
Feng, J.: Essays on learning through practice. Doctoral dissertation, The University of Chicago (2017)
Google Scholar
Yudelson, M.V., Koedinger, K.R., Gordon, G.J.: Individualized Bayesian knowledge tracing models. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS (LNAI), vol. 7926, pp. 171–180. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39112-5_18
Chapter Google Scholar
Pardos, Z.A., Heffernan, N.T.: Modeling individualization in a Bayesian networks implementation of knowledge tracing. In: De Bra, P., Kobsa, A., Chin, D. (eds.) UMAP 2010. LNCS, vol. 6075, pp. 255–266. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13470-8_24
Chapter Google Scholar
Piech, C., et al.: Deep knowledge tracing. In: Advances in Neural Information Processing Systems, pp. 505–513 (2015)
Google Scholar
VanLehn, K.: Mind Bugs: The Origins of Procedural Misconceptions. MIT Press, Cambridge (1990)
Google Scholar
Liu, R., Patel, R., Koedinger, K.R.: Modeling common misconceptions in learning process data. In: Proceedings of the Sixth International Conference on Learning Analytics and Knowledge, pp. 369–377. ACM (2016)
Google Scholar
Piech, C., et al.: Deep knowledge tracing. In: Advances in Neural Information Processing Systems, pp. 505–513 (2015)
Google Scholar
Käser, T., Klingler, S., Schwing, A.G., Gross, M.: Beyond knowledge tracing: modeling skill topologies with Bayesian networks. In: Trausan-Matu, S., Boyer, K.E., Crosby, M., Panourgia, K. (eds.) ITS 2014. LNCS, vol. 8474, pp. 188–198. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07221-0_23
Chapter Google Scholar
Ghahramani, Z.: An introduction to hidden Markov models and Bayesian networks. In: Hidden Markov Models: Applications in Computer Vision, pp. 9–41 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

17zuoye, Chaoyang District, Beijing, 100020, China
Junchen Feng, Bo Zhang, Yuchen Li & Qiushi Xu

Authors

Junchen Feng
View author publications
You can also search for this author in PubMed Google Scholar
Bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Li
View author publications
You can also search for this author in PubMed Google Scholar
Qiushi Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junchen Feng .

Editor information

Editors and Affiliations

University of Sao Paulo, Sao Paulo, Brazil
Seiji Isotani
University of Malaga, Málaga, Spain
Eva Millán
Carnegie Mellon University, Pittsburgh, PA, USA
Amy Ogan
DePaul University, Chicago, IL, USA
Peter Hastings
Carnegie Mellon University, Pittsburgh, PA, USA
Bruce McLaren
University College London, London, UK
Rose Luckin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, J., Zhang, B., Li, Y., Xu, Q. (2019). Bayesian Diagnosis Tracing: Application of Procedural Misconceptions in Knowledge Tracing. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds) Artificial Intelligence in Education. AIED 2019. Lecture Notes in Computer Science(), vol 11626. Springer, Cham. https://doi.org/10.1007/978-3-030-23207-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-23207-8_16
Published: 21 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23206-1
Online ISBN: 978-3-030-23207-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bayesian Diagnosis Tracing: Application of Procedural Misconceptions in Knowledge Tracing

Abstract

Similar content being viewed by others

Option Tracing: Beyond Correctness Analysis in Knowledge Tracing

Measuring Misconceptions Through Item Response Theory

Uncovering students’ misconceptions by assessment of their written questions

Keywords