Abstract
With the rapid development of artificial intelligence + education, knowledge tracing, as the core technology of adaptive education system, has gradually become a challenging research hotspot in the field of intelligent education. In recent years, the deep knowledge tracing model (DKT), which successfully applied neural network in knowledge tracing field for the first time, has made a great breakthrough in prediction accuracy, and has aroused the wave of knowledge tracing using neural network since then. In DKT, the recurrent neural network (RNN) stores the previous information of students in the hidden layer parameters. However, due to the continuous accumulation of hidden layer information, it is difficult to re-extract the important information at the earlier time, resulting in the deviation of prediction results. Meanwhile, the model does not consider the role of students’ recent state. That often has a more important impact on students’ current level of doing problems. Inspired by the above questions, we improved the DKT model and used the gate units of GRU model to determine the retention and forgetting of previous information, so as to solve the problem that the important information at the early time was difficult to use due to the continuous accumulation of hidden layer information. At the same time, the module of enhancing students’ learning state is added in the model, and the recent learning information of students is effectively used to enhance students’ recent learning state. The experimental results of the Assistment2009 and Assistment2017 public datasets show that the model proposed in this paper can effectively improve the accuracy of model prediction.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In the teaching process, the individual needs of multiple students need to be met. Due to the limited attention and energy of teachers, students’ learning state is constantly changing, so it is almost impossible for teachers to meet the personalized learning needs of each student. Teachers generally judge students’ mastery of knowledge points by their classroom performance and homework exercises. It is very complicated and time-consuming for teachers to mark students’ homework exercises. If there are too many homework exercises to be marked, even if the teacher spends a lot of time, it is difficult to extract and summarize the knowledge points of each student's grasp of the situation. With the advent of the era of big data, the application of artificial intelligence in education is becoming more and more popular. The data generated by students in the learning process are stored in large quantities, and the ability of computers to process data is greatly strengthened. The development of educational data mining and educational data analysis has provided impetus for the development of learning forecasting. Knowledge tracing has become one of the important tools to meet students’ individual needs.
Knowledge tracing refers to the computer modeling of relevant knowledge based on students’ previous learning information, and the prediction of students’ next answer performance based on students’ previous problem-solving data [1]. To put it simply, the knowledge tracing task is to find a way to obtain the current knowledge state of students through the historical sequence data of students. Using the interactive information between students and questions, the purpose of predicting students' next answer performance is achieved.
Traditional knowledge tracing models include Bayesian knowledge tracing using Hidden Markov model [2] and PFA using Logistic regression model [3]. In 2015, neural network was successfully applied in the field of knowledge tracing for the first time, and it was named Deep Knowledge Tracing (DKT) [4]. DKT has made a great breakthrough in the prediction accuracy of knowledge tracing, which has aroused the wave of knowledge tracing by using neural network. For example, DKT+ model points out that there are two problems in DKT model. The model fails to reconstruct the observed input and the model fails to reconstruct the observed input, and add three regularization terms into the loss function to solve the above problems [5]. There are also models that use neural networks from different perspectives of students’ learning as entry points, such as CKT model considering students’ personalized differences [14]. AKT model using monotone attention mechanism to consider the connection between the current question and each question answered by learners in the past [7]. GKT model using graph neural network to model student proficiency [8] etc. All these methods contribute to the development of neural network in knowledge tracing.
DKT uses RNN [15] to learn the sequence of knowledge points with timing to predict students’ future performance of knowledge points. In DKT, students’ previous information is stored in the hidden layer parameters of RNN. However, with the continuous accumulation of hidden layer information, it is difficult to extract the important information at the earlier time, making it difficult to consider the important information at the earlier time in the current prediction, resulting in the deviation in prediction. The gate units of GRU model are used to determine the retention and forgetting of past information, in order to reduce the accumulation of unimportant information in hidden layer and solve the problem that RNN in DKT is difficult to predict with important information at earlier time. We found that students’ recent learning state also has a certain impact on their current level of problem solving. If the student performs well in the previous problems, but performs poorly in the recent problems, it is highly likely that the student has problems in his recent learning state. And it is likely to affect the current performance of students. We use recent learning information to enhance students’ recent learning state. The experiment proves that the above methods can make the model achieve better prediction effect.
Our main contributions are summarized as follows:
-
(1)
We use GRU model to solve the problem that RNN in DKT is difficult to predict with important information at earlier time.
-
(2)
We through recent learning information to enhance students' recent learning state. The accuracy of model prediction is improved effectively.
2 Related Work
Knowledge tracing is one of the important practices of artificial intelligence in education. In recent years, while improving the knowledge tracing method proposed earlier, the DKT based on RNN is also proposed, which is the first successful practice of deep neural network in the field of knowledge tracing. DKT is to build a model based on RNN to predict students’ future performance through previous learning information. RNN is very effective for sequential data. It can mine temporal and semantic information in data. The ability of RNN can be used to predict students’ future performance from their previous learning information.
In 1982, John Hopfield proposed the embryonic single layer feedback neural network of RNN. Although RNN at this time has the ability to process time sequence information, the defects of gradient vanishing and gradient explosion of RNN make it difficult to achieve good effects in some long-dependent scenes. In 1997, Hochreiter S and Schmidhuber J proposed Long Short Term Memory (LSTM) [16]. LSTM uses three gate units, namely forget gate, update gate and output gate. LSTM solves the problem of RNN training effectively through its gate units. Since then, various variations of LSTM have appeared [9,10,11]. GRU model [12] was proposed in 2014 and is one of the most famous transformations of LSTM. GRU and LSTM solve the same problems and also use gate units. The GRU uses two gate units: reset gate and update gate. In DKT, students’ previous information is stored in the hidden layer parameters of RNN, but with the continuous accumulation of the hidden layer information, it is difficult to extract the important information at earlier moments, making it difficult to consider the important information at earlier moments in the current prediction. In order to solve this problem, we want to use the gate units of LSTM and GRU to determine the retention and forgetting of previous information, so as to reduce the accumulation of unimportant information in the hidden layer, and solve the problem that it is difficult to use the important information at earlier time to predict. Through experiments, we find that LSTM and GRU can achieve similar effects, but GRU has a simpler structure and easier training than LSTM. Therefore, we finally use GRU model to solve the problem that RNN in DKT is difficult to make use of important information at earlier time to predict.
In the field of knowledge tracing, scholars have made a lot of attempts to consider students’ current learning state. Such as LPKT [6] simulates the learning process of students. The model is divided into three parts: learning module, forgetting module and prediction module. The model considers the current knowledge state of students through learning and forgetting. LFKT model [13] also considers students’ learning and forgetting behaviors. LFKT model comprehensively considers four factors affecting knowledge forgetting, including knowledge repetition, knowledge learning interval, sequential learning interval and knowledge mastery degree. However, we found that students’ recent performance also have a very important impact on their current learning state. For example, if a student gets three questions wrong in a row, there is a high probability that the student will also get the questions wrong at the current moment. Therefore, we added a module to enhance students’ learning state in the model, using students’ recent learning information to enhance students’ recent learning state.
3 Model
3.1 Review of Deep Knowledge Tracing Model
Deep knowledge tracing is to predict students' future answer performance by using recurrent neural network (RNN) based on the relevant data of learners' knowledge point answers with time sequence and the relevant data of learners' correct or not answers to the knowledge point (as shown in Fig. 1). Where, \({\mathrm{x}}_{\mathrm{t}}\) represents the data of students at time t, including information about knowledge points made by students at time t and information about correct or wrong answers. \({{h}}_{\mathrm{t}}\) represents the hidden layer state of recurrent neural network (RNN) at time t, and represents the comprehensive problem-solving information of students before time t. \({\mathrm{y}}_{\mathrm{t}}\) represents the prediction of the students' performance in the next time. Because the model does not know which knowledge points the students will make next time by default, each prediction is the prediction of the correct probability of all the knowledge points the students will make next time.
The information transfer in the model can be simply described as follows: \({\mathrm{x}}_{\mathrm{t}}\) will be put into the recurrent neural network to generate the original prediction data \({\mathrm{h}}_{\mathrm{t}}\) (see Eq. 1). Then put the output of \({\mathrm{h}}_{\mathrm{t}}\) through a fully connected layer into the activation function, control each element of the output between 0 and 1, and get the final prediction result (see Eq. 2, 3).
where, \({{h}}_{\mathrm{t}}\) is the hidden state at time t, \({\mathrm{x}}_{\mathrm{t}}\) is the input at time t, and \({{h}}_{(\mathrm{t}-1)}\) is the hidden state at time \(\mathrm{t}-1\).
where, \({{h}}_{\mathrm{t}}\) and \(\mathrm{I}\) are the input and output of the linear layer, \(\mathrm{A}\) is the weight, \(\mathrm{b}\) is the bias.
3.2 DKT Model Improvement Framework
We cut and onehot encoding the data of students, so that each information of students contains information about the skills they have done and the correct or incorrect information of students’ answers, and there is a time sequence between the data. Form a sequence of students doing the exercises \(\left\{{\mathrm{x}}_{1},{\mathrm{x}}_{2},{\mathrm{x}}_{3},{\mathrm{x}}_{4},...,{\mathrm{x}}_{\mathrm{T}}\right\}\). Input students’ problem-solving sequence into our model accordingly, and get the prediction of the model for students’ next problem-solving performance. Our model schematic diagram is as shown in Fig. 2.
GRU Forms the Initial Forecast Data
The information of students’ problem solving is input into our model as hidden layer information. Not all previous information is needed to predict current student performance. Therefore, we need reset gate and update gate, combined with the data of students doing questions at the current time \({\mathrm{X}}_{\mathrm{t}}\), to determine the hidden layer information needed to predict students’ current answer performance. The information of the hidden layer is concated with the data of students doing the problem at the current moment. Through a fully connected layer, the vector with the same dimension as the hidden layer is obtained. Then, through the Sigmoid activation function, each element in the vector is controlled between 0–1. The reset gate and the update gate yield the same dimension as the hidden dimension \({\mathrm{R}}_{\mathrm{t}}\) and \({\mathrm{Z}}_{\mathrm{t}}\), respectively (see Eqs. 4, 5). The update gate is used to control the degree to which the information of the previous moment is brought into the current state. The larger the value of the update gate, the more information of the previous moment is brought into the current state.
The reset gate controls how much information from the previous state is written to the Candidate hidden state \({\stackrel{\sim }{\mathrm{H}}}_{\mathrm{t}}\). The smaller the reset gate, the less information from the previous state is written. The reset gate yields a vector \({\mathrm{R}}_{\mathrm{t}}\) with the same dimension as the hidden layer and each element being 0 to 1. The Candidate hidden state \({\stackrel{\sim }{\mathrm{H}}}_{\mathrm{t}}\) is obtained by the initial processing of multiplying \({\mathrm{R}}_{\mathrm{t}}\) by the elements of the hidden layer \({\mathrm{H}}_{\mathrm{t}-1}\) (see Eq. 6). Because Candidate hidden state \({\stackrel{\sim }{\mathrm{H}}}_{\mathrm{t}}\) is not directly output as the hidden state at time t, it is called Candidate hidden state. Update gate yields a vector \({\mathrm{Z}}_{\mathrm{t}}\) with the same dimension as the hiddden layer and with each element value between 0 and 1. Multiply \({\mathrm{Z}}_{\mathrm{t}}\) by the element of the last hidden state \({\mathrm{H}}_{\mathrm{t}-1}\) to get the part of vector \({\mathrm{H}}_{\mathrm{t}-1}\) that needs to remain in the current hidden state. Multiply \(1-{\mathrm{Z}}_{\mathrm{t}}\) by the candidate hidden state to get the part of the candidate hidden state that needs to be retained to the current hidden state. Add the two together to get the current hidden state, which is also the initial forecast data (see Eq. 7).
Emphasis Students’ Recent Learning State
where n is the number of recent problem-solving data of students that is needed to enhance their recent learning state, and \({\mathrm{H}}_{\mathrm{t}}\) is the information of the hidden layer at time \(\mathrm{t}\).
If the student performs well in the previous problems, but performs poorly in the recent problems, it is highly likely that the student has problems in the recent learning state. And it is very likely to affect the current problem-solving performance. So if we use the recent learning information of students to enhance the students' recent learning state, the model will achieve better prediction results. The data of students’ recent problem solving are spliced in the initial prediction result, and the final prediction of students’ performance in the next problem solving is obtained through the full connection layer (see Eq. 8). The experiment proves that this method can make the model achieve better prediction effect.
4 Experiments
We use the Assistment2009 and Assistment2017 datasets to verify and illustrate the performance of the our model. This experiment is based on python3.8, PyTorch v1.9.1, cuda v11.1, and optimized using Adam optimizer. 70% of the data were taken as the training set and 30% as the verification set. The batch size was 64 and the number of training epochs was 70. The optimal value of 70 epochs was taken as the experimental result. The specific experimental results are shown in Table 1.
According to the experimental data, our model performs well in the Assistment2009 and Assistment2017 datasets. In this task, it is feasible to use GRU model to solve the problem that RNN in DKT is difficult to predict with the important information at an earlier time, and to enhance students’ recent learning state through students’ recent learning information.
Experiments with Different Numbers of Questions
As shown in Table 2, the prediction accuracy reached the highest when the original prediction results were combined with the recent two records of students. From the previous learning records to enhance the students' recent learning state, improve the prediction accuracy, indicating that the students' recent learning state has an impact on the current students' performance.
Experiments of Different Size of Hidden Layer
The information of students’ previous problem solving is stored in the hidden layer of GRU model. The larger the size of hidden layer, the more information is stored. But the bigger the size of hidden layer is not the better. The size of hidden layer can not be too large, also can not be too small. As shown in Table 3, when the size of the hidden layer is 40, the best prediction result can be achieved in the ASSIST2009 dataset. However, in the ASSIST2017 dataset, when the size of the hidden layer is 50, the best prediction results can be achieved in this task. After comprehensive consideration, the hidden layer size of 40 is adopted in the following experiment.
Experiments of Different Number of Hidden Layers
As shown in Table 4, the effect decreases when the number of layers increases in the ASSIST2009 dataset. But in the ASSIST2017 dataset, the effect increases with the number of layers in this task.
5 Conclusion
In this paper, we put forward two problems of DKT model: the problem that RNN in DKT is difficult to predict with the important information at earlier time, and the model cannot enhance the students’ recent learning state. Experiments show that it is feasible to use GRU model to solve the problem that RNN in DKT is difficult to predict with important information at an earlier time, and to enhance students’ recent learning state through recent learning information in this task. In the future, we will conduct more experiments to verify the improvement effect of other methods on DKT model. For example, the idea of residual neural network can be used to solve the problem that RNN cannot perform long-term memory. At the same time, convolution neural network can be used to extract students’ problem-solving patterns (for example, if students make mistakes in question A, question B is highly likely to make mistakes).
References
Corbett, A.T., Anderson, J.R.: Knowledge tracing: Modeling the acquisition of procedural knowledge. User Model. User Adapt. Interact. 4(4), 253–278 (1994)
Yudelson, M.V., Koedinger, K.R., Gordon, G.J.: Individualized Bayesian knowledge tracing models. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS, vol. 7926, pp. 171–180. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39112-5_18
Pavlik Jr., P.I., Cen, H., Koedinger, K.R.: Performance factors analysis–a new alternative to knowledge tracing. Online Submission (2009)
Piech, C., Bassen, J., Huang, J., et al.: Deep knowledge tracing. In: Advances in Neural Information Processing Systems, 28 (2015)
Yeung, C.K., Yeung, D.Y.: Addressing two problems in deep knowledge tracing via prediction-consistent regularization. In: Proceedings of the Fifth Annual ACM Conference on Learning at Scale, pp. 1–10 (2018)
Shen, S., Liu, Q., Chen, E., et al.: Learning process-consistent knowledge tracing. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 1452–1460 (2021)
Ghosh, A., Heffernan, N., Lan, A.S.: Context-aware attentive knowledge tracing. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2330–2339 (2020)
Nakagawa, H., Iwasawa, Y., Matsuo, Y.: Graph-based knowledge tracing: modeling student proficiency using graph neural network. In: 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 156–163 IEEE (2019)
Siami-Namini, S., Tavakoli, N., Namin, A.S.: The performance of LSTM and BiLSTM in forecasting time series. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 3285–3292. IEEE (2019)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)
Liang, X., Shen, X., Feng, J., Lin, L., Yan, S.: Semantic object parsing with graph LSTM. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) ECCV 2016. LNCS, vol. 9905, pp. 125–143. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_8
Dey, R., Salem, F.M.: Gate-variants of gated recurrent unit (GRU) neural networks. In: 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1597–1600. IEEE (2017)
Li, X.G., Wei, S.Q., Zhang, X., Du, Y.F., Yu, G.: LFKT: deep knowledge tracing model with learning and forgetting behavior merging. Ruan Jian Xue Bao/J. Softw. 32(3), 818–830 (2021). (in Chinese)
Shen, S., Liu, Q., Chen, E., et al.: Convolutional knowledge tracing: Modeling individualization in student learning process. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1857–1860 (2020)
Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404, 132306 (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Acknowledgment
This work is supported by National Natural Science Foundation of China (Grant No. 62166050), Yunnan Fundamental Research Projects (Grant No. 202201AS070021), Scientific research foundation of Yunnan Provincial Department of Education (Grant No. 2022Y180) Yunnan Innovation Team of Education Informatization for Nationalities, and Kunming Key Laboratory of Education Informatization.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Han, X., Zhang, S., Zhou, J., Li, Z., Wang, J. (2023). Deep Knowledge Tracing with GRU and Learning State Enhancement. In: Xu, Y., Yan, H., Teng, H., Cai, J., Li, J. (eds) Machine Learning for Cyber Security. ML4CS 2022. Lecture Notes in Computer Science, vol 13657. Springer, Cham. https://doi.org/10.1007/978-3-031-20102-8_53
Download citation
DOI: https://doi.org/10.1007/978-3-031-20102-8_53
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20101-1
Online ISBN: 978-3-031-20102-8
eBook Packages: Computer ScienceComputer Science (R0)