Keywords

1 Introduction

Adaptive Learning (AL) is a concept that is now more talked about in the field of education technology. Its core problem is to improve students’ learning efficiency through personalized learning path planning. Knowledge tracing is a time-based modeling of students’ knowledge so that we can accurately predict students’ mastery of knowledge points and the next performance of students. Early knowledge tracking models rely on first-order Markov models, such as Bayesian Knowledge Tracking (BKT) [2]. Deep Learning (DL) is an emerging method of machine learning research. The development of deep learning also provides a new method for the development of adaptive learning. Deep Knowledge Tracking (DKT) [5] is a model that uses deep learning algorithms to evaluate student abilities using Recurrent Neural Networks (RNN). Some studies have shown that the DKT model has made great progress compared to the traditional Bayesian knowledge tracking model and has been proven to be used for course optimization [5, 9, 10]. For the data that crosses over a long time interval, LSTM-RNN is more suitable than base RNN [8].

The purpose of our research is to improve the effectiveness of the DKT model under the same data set, and introduce a deep knowledge tracking model based on the Bidirectional Recurrent Neural Network (BiRNN) to model the learning sequence and evaluate the students’ ability at various moments.

2 Related Work

In order to build an accurate student model, we need to understand the process of student learning. The knowledge tracking task can be summarized as: the observation sequence of a student’s performance on a particular learning task \( x_{0},x_{1},x_{2},...,x_{t}, \) predicting their next performance \( x_{t+1} \).

2.1 Recurrent Neural Networks

RNN are mainly used to process and predict sequence data. In a Fully Connected Neural Network or Convolutional Neural Networks, the network results are from the input layer to the hidden layer to the output layer. The layers are fully connected or partially Connected, but the nodes between each layer are unconnected [1]. The source of the Recurrent Neural Networks is to characterize the relationship between the current output of a sequence and the previous information. From the network results, the RNN will remember the previous information and use the previous information to influence the subsequent output. RNN suffer from the now famous problems of vanishing and exploding gradients, which are inherent to deep networks.

2.2 Bidirectional Recurrent Neural Networks

BiRNN connects two hidden layers of opposite directions to the same output. With this form of generative deep learning, the output layer can get information from past (backwards) and future (forward) states simultaneously [3]. Those two states’ output are not connected to inputs of the opposite direction states. By using two time directions, input information from the past and future of the current time frame can be used unlike standard RNN which requires the delays for including future information.

2.3 Deep Knowledge Tracing

DKT is a Seq2Seq model, and the structure of the model is LSTM [5]. The knowledge tracking problem can be described as: Given a student’s observation sequence \( x_{0}, x_{1}, x_{2}, ... \) to predict the next performance \( x_{t+1}\), usually \( x_{t}=\{q_{t},a_{t}\}\), where \( q_{t} \) represents the question component of the answer (eg corresponding knowledge point), \( a_{t}\) represents whether the corresponding answer is correct, usually \( a_{t}=\{0,1\} \). The DKT model is based on the LSTM model to model knowledge tracking tasks.

3 Methodology

In the time series, students do not have continuous consistency in the degree of mastery of knowledge points, and the fluctuations are large. It can be concluded that a student’s mastery of knowledge in the sequence is not necessarily complete. Here, we will use the Bidirectional Recurrent Neural Network model to integrate past and future contextual sequence information for prediction in order to achieve a better simulation of student learning.

3.1 Model

The batch input \( X_{t}\in R^{n*d} \) (the number of samples is n, the number of inputs is d) for a given time step t and the hidden layer activation function are tanh. In the architecture of the Bidirectional Recurrent Neural Network, it is assumed that the time-step forward hidden state is \( H_{t}\in R^{n*h} \) (the number of positive hidden cells is h), and the reverse hidden state is \( H^\prime _{t}\in R^{n*h} \) (The number of reversed hidden cells is h). As shown in Fig. 1.

Fig. 1.
figure 1

Time expansion sequence of Bidirectional Recurrent Neural Network knowledge tracing model.

We can calculate the forward hidden state and the reverse hidden state separately :

$$\begin{aligned} H_t = tanh(X_tW_{xh}^{(f)} + H_t - W_{hh}^{(f)} + b_h^{(f)}) \end{aligned}$$
(1)
$$\begin{aligned} H'_t = tanh(X_tW_{xh}^{(b)} + H'_t - W_{hh}^{(b)} + b_h^{(b)}) \end{aligned}$$
(2)

The weight of the model is given by \( W_{xh}^{(f)} \in R^{d*h}\), \( W_{hh}^{(f)}\in R^{h*h} \), \( W_{xh}^{(b)}\in R^{d*h} \), \( W_{hh}^{(b)}\in R^{h*h} \). The model’s Biases is \( b_h^{(f)} \in R^{1*h}\), \(b_h^{(b)}\in R^{1*h} \). Then we join the hidden states \( H_{t} \) and \(H^\prime _{t} \) in both directions to get the hidden state H and input it to the output layer. The output layer calculates the output \( O_{t}\in R^{n*q} \)(the number of outputs is q):

$$\begin{aligned} Y_{t} = sigmoid(HW_{hq}+B_{q}) \end{aligned}$$
(3)

The Weight \( W_{hq}\in R^{2h*q} \) and the Biases \( B_{q}\in R^{1*q}\) are the model parameters of the output layer. The number of hidden units in different directions can also be different.

The input data is \( x_{t} \) encoded by one-hot. If the model input data involves M knowledge components (such as knowledge points), each question has two results 0, 1 (corresponding to error and correct respectively), then the model input length is 2M. For example, suppose the knowledge component of a topic is i. If the answer is correct, the \((i+1)^{th}\) bit of the corresponding input is 1 and the remaining positions are 0; if the answer is wrong, the \(i^{th}\) bit is 1 and the remaining positions are 0. The output of the model is \( y_{t} \), and the length is M, which corresponds to the degree of mastery of each knowledge component (corresponding to the correctness of the questions corresponding to the knowledge component).

4 Experiment

4.1 Datasets

The problem with the Skill Builder problem set is based on specific skills, and the question can have multiple skill tags. Students must answer three questions correctly to complete the assignment. If the student uses a guide (“Prompt” or “Resolve this issue as a step”), the issue will be flagged as incorrect.

We observed that the proportion of a few abnormal student data in the raw data was too high, and there was about 20% duplicate data in the ASSISTments data set. And when the problem involves multiple knowledge points, the same problem exists in the case of reusing data. So we cleaned the data and got a new data as input to our model. The statistics of the datasets are shown in Table 1.

Table 1. The statistics of the datasets.

In real life, students have a lot of behavioral characteristics in the process of answering questions. This information can be of great help in assessing student abilities and predicting the correctness of the next question. Therefore, we added overlap_time (time of student overlap time), hint_count (number of student attempts for this question), and first_action (type of first action: try or ask for a prompt) in the model.

4.2 Implementation of Model

We decided to build the model based on Python and using the Keras API. The focus of Keras’ development is to support rapid experimentation [7]. Ability to turn your ideas into experimental results with minimal delay. We implemented the model using the loss function used in the original DKT algorithm. Our hidden layer contains 200 hidden nodes. We use the Adam algorithm to optimize the stochastic objective function to speed up our training process. We set the batch size of each epoch to 100. We set Dropout to 0.4 to avoid over-fitting of the model and use 20% of the data set as a test dataset to verify model performance [4].

Table 2. Model prediction results without introducing external features.
Table 3. Model prediction results with introducing external features.

4.3 Result

We trained the original DKT model and the knowledge tracking model using Bidirectional Recurrent Neural Network on the preprocessed and post-processed datasets. We used Area Under Curve (AUC) and accuracy as evaluation criteria to evaluate the model. The higher the AUC, the better the effect of the model fitting student ability. The higher the accuracy, the better the global accuracy of the model. We will organize the prediction results obtained from the experiment into Tables 2 and 3. By observing the experimental results, we can see that the model has a better effect after adding the Bidirectional Recurrent Neural Network, and it has a good performance in predicting the correctness of the next answer. And after the data cleaning and the addition of external features, the effect of the model has been significantly improved. Therefore, it can be concluded that the improved model can better simulate the student’s learning state and better predict the students’ understanding of the knowledge points.

5 Conclusion and Future Work

In this paper, we have improved the original DKT model, and the new model has added the Bidirectional Recurrent Neural Network to the original. The new model can be used to model past and future students’ learning of knowledge points. It effectively reflects the level of knowledge of students after reviewing and previewing, and makes the model’s prediction of student performance better. The experimental data shows that compared with the traditional model, the deep knowledge tracking model based on BiRNN has a good performance in predicting students’ problems, and has certain advantages in measuring students’ level. Moreover, after preprocessing the data and adding external features, the performance of the model is improved.

In the future, we will explore how to improve the model to solve the problem of long sequences. We will validate the validity of the model on more data, compare and discuss the impact of more external features on the model to improve model performance. At the same time, the problem is extended to more general exercises, and knowledge tracking will be conducted in a wider range of fields and more perspectives to help students more effectively.