Keywords

1 Introduction

With the development of Intelligent Tutoring Systems (ITS)  [1] and Massive Open Online Courses (MOOCs)  [15], a large number of students are willing to learn new courses or acquire knowledge that they are interested in through the online learning platform. When students have completed some exercises or courses, the online learning platform can obtain the knowledge state they mastered according to their learning records. For example, when a student tries to solve the exercise “\(x^{2} - 2x + 1 = 0\)”, the online learning system can estimate the probability that the student will answer this exercise correctly according to the student whether had mastered arithmetic operations and quadratic equation with one unknown. In online learning platform, the task of knowledge tracing is to model the learning process of students to trace their future knowledge state based on their historical performance on exercises and the underlying knowledge concepts they had mastered. Knowledge tracing can be formalized as: given observations of interactions X = {\( x_{1}, ... , x_{t}\)} taken by a student on a past exercise task, predict the probability that the student answers the next interaction \(x_{t+1}\) correctly. In the knowledge tracing, interactions take the form of a tuple of \(x_{t} = (q_{t}, r_{t})\) that combines a tag for the exercise being answered \(q_{t}\) with whether or not the exercise was answered correctly \(r_{t}\)  [18]. Therefore, knowledge tracing technology can help teachers to teach according to their aptitude and give students personalized guidance and can help students to strengthen training on unfamiliar or less familiar knowledge concepts, which is very meaningful in the teaching process.

Knowledge tracing is inherently difficult as the complexity of the human brain learning process and the diversity of knowledge. Several models have been proposed to model the knowledge state of students in a concept specific manner, such as Bayesian Knowledge Tracing (BKT)  [2, 5], Deep Knowledge Tracing (DKT)  [18], Dynamic Key-Value Memory Networks (DKVMN) [24]. BKT divides students’ knowledge states into different concept states and assumes the concept state as a binary latent variable, known or unknown, and uses the Hidden Markov Model to update the posterior distribution of the binary concept state. Although BKT can express each knowledge concept state for each student, it requires students to define knowledge concepts in advance. Moreover, BKT assumes that once students have mastered the knowledge, they will never forget it, which limits its ability to capture the complicated relationship between different knowledge concepts and model long-term dependencies in an exercise sequence.

In recent years, inspired by the successful application of deep learning [12], the deep learning model is beginning to be used to solve the knowledge tracing problem. DKT is the first deep knowledge tracing model, which exploits Recurrent Neural Networks (RNNs) to the problem of predicting students to exercise based on students’ previous learning sequences. Because of the RNNs has hidden layers with a large number of neurons, it can comprehensively express the relationships between different concepts to improve the accuracy of prediction. However, DKT summarizes a student’s knowledge state of all concepts in one hidden state, which makes it difficult to trace how much a student has mastered a certain concept and pinpoint which concepts a student is good at or unfamiliar with [11]. To address this deficiency, Zhang et al. proposed the DKVMN model based on Memory Augmented Neural Networks (MANNs) [19, 21], which can discover the correlation between input exercises and underlying concepts and reveal the evolving knowledge state of students through using a key-value memory. However, there are still some knowledge tracking problems that need to be addressed, as follows:

  • The knowledge growth of DKVMN is calculated by multiplying students’ exercises \(x_{t} = (q_{t}, r_{t})\) with a trained embedded matrix, which means that knowledge growth is only related to the absolute growth of this exercise. According to human cognitive processes, students’ knowledge growth is related to their current knowledge states [3], so the calculation method of knowledge growth has limitations.

  • The reading process of the DKVMN model ignores the effect of forgetting mechanism on students’ knowledge state. According to the research that forgetting occurs along an exponential curve [7], forgetting means that a student’s knowledge decreases over time, we believe that the knowledge state of students is affected by the forgetting mechanism.

  • Existing knowledge tracing models assume that all students have the same learning ability without considering their inherent differences, and then construct a unified model to predict the probability that students answer the exercise correctly at the next moment, which lacking personalized ability.

To solve the above problems, inspired by the literature [17, 20], we propose a novel method called Learning Ability Community for Personalized Knowledge Tracing (LACPKT) based on the learning ability community and learning and memory process. The main contributions of our paper are summarized as follows:

  • We present a knowledge tracing model called Knowledge Tracking based on Learning and Memory Process (LMKT), which solves the impact of the current knowledge state of students on knowledge growth and consider the effect of forgetting mechanisms on the current knowledge state of students.

  • We first define the learning ability degree and learning ability community and propose the definition of personalized knowledge tracking according to group dynamics theory.

  • We propose a novel method called Learning Ability Community for Personalized Knowledge Tracing (LACPKT), which models the learning process of students in a personalized way.

  • We conduct experiments on four public datasets to verify the effectiveness of the LMKT model and LACPKT model.

2 Related Work

There are many kinds of research to estimate the knowledge state of students. Bayesian Knowledge Tracing (BKT) [5] is the most popular knowledge tracing model based on machine learning, which is also a highly constrained and structured model. BKT models every knowledge concept for every student, and each knowledge concept only changes from the unmastered state to the mastery state. Some variants of BKT have also raised. For example, Yudelson et al. [23] proposed an individualized Bayesian knowledge tracing models. Baker et al. [2] presented a more accurate student state model by estimating the P(G) and P(S) contexts in BKT. Pardos et al. [16] added the item difficulty of item response theory (IRT) [10] to the BKT to increase the diversity of questions. Other information or technologies [6, 8, 9] have also been introduced into the Bayesian network framework.

Deep Knowledge Tracing (DKT) [18] applied the vanilla Recurrent Neural Networks (RNNs) to trace knowledge concept state, reports substantial improvements in prediction performance. DKT uses the RNNs with a hidden layer map an input sequence of vectors X = {\( x_{1}, x_{2}, ... , x_{t}\)} to an output sequence of vectors Y = {\( y_{1}, y_{2}, ... , y_{t}\)} to model student learning and predict student’s state of all knowledge concepts. Zhang et al. [25] improved the DKT Model by incorporating more problem-level features and then proposed an adaptive DKT model structure that converts the dimensional input into a low dimensional feature vector that can effectively improve accuracy. Cheung et al. [4] proposed an automatic and intelligent approach to integrating the heterogeneous features into the DKT model that can capture students behaviors in the exercises. Nagatani et al. [14] extended the DKT model behavior to consider forgetting by incorporating multiple types of information related to forgetting, which improves the predictive performance. Memory Augmented Neural Networks (MANNs) have made progress in many areas, such as question answering [21] and one-shot learning [19]. MANNs consists of two operations, reading and writing that are achieved through additional attention mechanisms. Because of the recurrence introduced in the read and write operations, MANNs is a special variant structure of RNNs, it uses an external memory matrix that stores the knowledge state of a student. Zhang et al. put forward a Dynamic Key-Value Memory Networks (DKVMN) [24] that uses the concepts of Memory Augmented Neural Networks (MANNs) to reveal the evolving knowledge state of students and learn the relationships between concepts. DKVMN with one static key matrix that stores the concept representations and one dynamic value matrix that stores and updates the students understanding concept state of each concept, thus it has more capacity to handle knowledge tracking problems.

3 The Proposed Model

In this section, we first give the formalization of definitions and then introduce Knowledge Tracking based on Learning and Memory Process. Finally, we propose a novel method named Learning Ability Community for Personalized Knowledge Tracing to model the learning process of students. In description below, we assume a student’s exercise sequence \(X = \{x_{1}, x_{2}, ... , x_{t}\}\) contains N latent knowledge concepts \(C = \{c_{1}, c_{2}, ... , c_{N}\}\), where \(x_{t} = (q_{t}, r_{t})\) is a tuple containing the question \(q_{t}\) and the correctness of the students answer \(r_{t}\), and all exercise sequences of M students \(U = \{u_{1}, u_{2}, ... , u_{M}\}\) are \({\textit{\textbf{X}}} = \{X ^{1}, X ^{2}, ..., X ^{M}\}\). The details are elaborated in the following three subsections.

3.1 Definition Formulation

Learning ability represents the internal quality of an individual that can cause lasting changes in behavior or thinking, which can be formed and developed through certain learning practices. In the knowledge tracing problem, because of the cognitive level of each student is different, the result of each student’s exercise sequence reflect their learning ability. We introduce the definition of learning ability degree \(\delta \) according to the exercise sequence of students.

Definition 1

Learning Ability Degree. We assume that the exercise of student \(u_{i}\) is \(x_{t} = (q_{t}, r_{t})\) contains knowledge concept \(c_{j}\), the learning ability degree of student \(u_{i}\) is \(\delta _{u_{i}}^{c_{j}} = s_{max}^{c_{j}}/s_{length}^{c_{j}}\) that represents the learning ability of student \(u_{i}\) to learn the knowledge concept \(c_{j}\) in question \(q_{t}\).

Where a big \(\delta _{u_{i}}^{c_{j}} \in [1, \mathbf{s} _{max}^{c_{j}}]\) indicates that the student \(u_{i}\) has a strong ability to learn this question \(q_{t}\), \(s_{length}^{c_{j}}\) represents the number of times that the student \(u_{i}\) repeatedly learns the knowledge concept \(c_{j}\), \(s_{max}^{c_{j}}\) represents the maximum number of times that a student repeatedly learns the knowledge concept \(c_{j}\). Because of the exercise sequence of student \(u_{i}\) is \(X_{i} = \{x_{1}^{i}, ... , x_{t}^{i}\}\) contains knowledge concepts \(\{c_{1}, ... , c_{j}\}\), the learning ability degree sequence of student \(u_{i}\) is \(\delta _{u_{i}} = \{\delta _{u_{i}}^{c_{1}}, \delta _{u_{i}}^{c_{2}}, ..., \delta _{u_{i}}^{c_{j}}\}\). Therefore, according to the learning ability degree sequence of all students, we definite the Learning Ability Community is as follows:

Definition 2

Learning Ability Community. We Suppose that the learning ability sequence of student \(u_{i}\) and student \(u_{j}\) are \(\delta _{u_{i}} = \{\delta _{u_{i}}^{c_{1}}, \delta _{u_{i}}^{c_{2}}, ..., \delta _{u_{i}}^{c_{j}}\}\) and \(\delta _{u_{j}} = \{\delta _{u_{j}}^{c_{1}}, \delta _{u_{j}}^{c_{2}}, ..., \delta _{u_{j}}^{c_{j}}\}\), if \(|\delta _{u_{i}} - \delta _{u_{j}}| \le \varepsilon \), we believe that student \(u_{i}\) and student \(u_{j}\) have similar learning abilities. In other words, they belong to the same learning ability community.

According to the exercise sequence of students and the definition of learning ability community, we use an unsupervised deep clustering algorithm to minimize \(\varepsilon \) to divide students into their range of learning ability through continuous iteration and acquire multiple different learning ability communities. In a learning ability community k, we input all exercise sequences into a basic knowledge tracing model for training and get a corresponding optimization model by adjusting the parameters of the basic model. Because all students have similar learning abilities in the learning ability community k, we can use group learning characteristics to guide individual learning. Therefore, we give the definition of the Personalized Knowledge Tracing is as follows:

Definition 3

Personalized Knowledge Tracing. We Suppose that m students had already learned the exercise sequences \(X = \{x_{1},x_{2}, ..., x_{T}\}\) contain knowledge concepts \(\{c_{1}, c_{2}, ... , c_{j}\}\) in the learning ability community k, if a student \(u_{m+i}\) wants to learn this exercise sequences, we are able to trace the personalized knowledge state of student \(u_{m+i}\) according to the knowledge state of m students. In other words, we can predict the probability that student \(u_{m+i}\) correctly answer this exercise sequence, which is called Personalized Knowledge Tracing.

3.2 Knowledge Tracking Based on Learning and Memory Process

Despite being more powerful than DKT and BKT in storing and updating the exercise sequence of students, DKVMN still has deficiencies when solved the knowledge tracing problem. To solve the problem, we propose the model: Knowledge Tracking based on Learning and Memory Process (LMKT), its framework is shown in Fig. 1. We assume the key matrix \(\mathbf{M} ^{k }\) (of size N \(\times d_{k}\)) is a static matrix that stores the concept representations and the value matrix \(\mathbf{M} _{t }^{v }\) (of size N \(\times d_{v}\)) is a dynamic value matrix that stores the student’s mastery levels of each concept, meanwhile \(\mathbf{M} _{t }^{v }\) updates over time. The task of knowledge tracing is completed by three mechanisms of LMKT: attention, reading and writing.

Attention. For the input exercise \(q_{t} \) of a student \(u_{i} \), we utilize the attention mechanism to determine which concepts are relevant to it. Thus, we multiply \(q_{t}\) by embedding matrix A to get an embedding vector \({\textit{\textbf{k}}}_{t}\). Relevance probability of \(q_{t} \) belongs to every concept in \(\mathbf{M} ^{k }\) is computed by comparing the question to each key matrix slot \(\mathbf{M} ^{k }(i )\), which is defined as attention weight vector \({\textit{\textbf{w}}}_{t}\). \({\textit{\textbf{w}}}_{t}\) represents the weight of each student’s attention between exercise and each concept and will be applied to read and write processes.

$$\begin{aligned} {\textit{\textbf{w}}}_{t}(i) = Softmax({\textit{\textbf{k}}}_{t}^{T}{} \mathbf{M} ^{k }(i )), \end{aligned}$$
(1)

where \( Softmax(x) = e^{x}/\sum _{y}(e^{y})\).

Fig. 1.
figure 1

The framework of Knowledge Tracking based on learning and memory process

Reading. When an exercise \(q_{t}\) comes, the value matrix \(\mathbf{M} _{t }^{v }\) of students’ current knowledge state cannot remain unchanged, because of the influence of human forgetting mechanism. Therefore, we assume the forgetting vector \(\mathbf{e} _{t}^{k}\) represents the forgetting of current knowledge state during the reading process, and students’ current knowledge state \(\mathbf{M} _{t }^{v }(i)\) will be updated as \(\hat{\mathbf{M }}_{t }^{v }(i)\).

$$\begin{aligned} \mathbf{e} _{t}^{k} = Sigmoid(\mathbf{E} _{k}^{T}{} \mathbf{M} _{t }^{v } + {\textit{\textbf{b}}}_{k}), \end{aligned}$$
(2)
$$\begin{aligned} \hat{\mathbf{M }}_{t }^{v }(i) = \mathbf{M} _{t }^{v }(i) [1- {\textit{\textbf{w}}}_{t}(i)\mathbf{e} _{t}^{k}], \end{aligned}$$
(3)

where \(\mathbf{E} _{k}^{T}\) is a transformation matrix, the elements of \(\mathbf{e} _{t}^{k}\) lie in the range (0, 1). Then, according to the attention weight \({\textit{\textbf{w}}}_{t}\), we can calculate the read content \(\mathbf{r} _{t}\) that stands for a summary of the student’s mastery level of this exercise through the value matrix \(\hat{\mathbf{M }}_{t }^{v }(i)\).

$$\begin{aligned} {\textit{\textbf{r}}}_{t} = \sum _{i}^{N}{\textit{\textbf{w}}}_{t}(i)\hat{\mathbf{M }}_{t }^{v }(i). \end{aligned}$$
(4)

next, we concatenate the read content \({\textit{\textbf{r}}}_{t}\) and the input exercise embedding \({\textit{\textbf{k}}}_{t}\) and then pass through a multilayer perceptron with the Tanh activation function to get a summary of knowledge state vector \({\textit{\textbf{f}}}_{t}\), which contains the student’s mastery level and the difficulty of exercise \(q_{t}\).

$$\begin{aligned} {\textit{\textbf{f}}}_{t} = Tanh({\textit{\textbf{W}}}_{1}^{T}[{\textit{\textbf{r}}}_{t}, {\textit{\textbf{k}}}_{t}] + {\textit{\textbf{b}}}_{1}). \end{aligned}$$
(5)

finally, after \({\textit{\textbf{f}}}_{t}\) pass through the sigmoid activation function, we can get the predicted scalar \({\textit{\textbf{p}}}_{t}\) that represents the probability of answering the exercise \(q_{t}\) correctly.

$$\begin{aligned} {\textit{\textbf{p}}}_{t} = Sigmoid({\textit{\textbf{W}}}_{2}^{T}{\textit{\textbf{f}}}_{t} + {\textit{\textbf{b}}}_{2}), \end{aligned}$$
(6)

where \({\textit{\textbf{W}}}_{1}^{T}, {\textit{\textbf{W}}}_{2}^{T}\) stand for the weight and \({\textit{\textbf{b}}}_{1}, {\textit{\textbf{b}}}_{2}\) stand for the bias.

Writing. Writing process is the update process of students’ knowledge state. In DKVMN model, The \((q_{t}, r_{t})\) embedded with an embedding matrix B to obtain the knowledge growth \({\textit{\textbf{v}}}_{t}\) of the students after working on this exercise [24], which is insufficient to express the actual gains in the learning process. However, concatenate the original knowledge growth \({\textit{\textbf{v}}}_{t}\) and the read content \({\textit{\textbf{r}}}_{t}\) and pass it through a fully connected layer with a Tanh activation to get the new knowledge growth \({\textit{\textbf{v}}}_{t}^{'}\).

$$\begin{aligned} {\textit{\textbf{v}}}_{t}^{'} = Tanh({\textit{\textbf{W}}}_{3}^{T}[{\textit{\textbf{v}}}_{t}, {\textit{\textbf{r}}}_{t}] + {\textit{\textbf{b}}}_{3}). \end{aligned}$$
(7)

before writing the student’s knowledge growth into the value matrix \(\mathbf{M} _{t }^{v }\), we should consider the forgetting according to human learning and cognitive processes. We assume that the forgetting vector \(\mathbf{e} _{t}^{v}\) that is computed from \({\textit{\textbf{v}}}_{t}^{'}\).

$$\begin{aligned} \mathbf{e} _{t}^{v} = Sigmoid(\mathbf{E} _{v}^{T}{\textit{\textbf{v}}}_{t}^{'} + {\textit{\textbf{b}}}_{v}), \end{aligned}$$
(8)

where \(\mathbf{E} _{v}^{T}\) is a transformation matrix, the elements of \(\mathbf{e} _{t}^{v}\) lie in the range (0,1). After the forgetting vector \(\mathbf{e} _{t}^{v}\), the memory vectors component \(\mathbf{M} _{t-1 }^{v }(i )\) from the previous timestamp are modified as follows:

$$\begin{aligned} \widetilde{\mathbf{M }}_{t }^{v }(i ) = \mathbf{M} _{t-1 }^{v }(i )[\mathbf{1} - {\textit{\textbf{w}}}_{t}(i)\mathbf{e} _{t}^{v}], \end{aligned}$$
(9)

where \({\textit{\textbf{w}}}_{t}(i)\) is the same as in the reading process. After forgetting, the add vector \({\textit{\textbf{a}}}_{t}\) is the actual gains of the new knowledge growth \({\textit{\textbf{v}}}_{t}^{'}\), which is calculated as follows:

$$\begin{aligned} {\textit{\textbf{a}}}_{t} = Tanh({\textit{\textbf{W}}}_{a}^{T}{\textit{\textbf{v}}}_{t}^{'} + {\textit{\textbf{b}}}_{a}), \end{aligned}$$
(10)

where \({\textit{\textbf{W}}}_{a}^{T}\) is a transformation matrix. Finally, the value matrix is updated at each time t based on \(\widetilde{\mathbf{M }}_{t }^{v }(i )\) and \({\textit{\textbf{a}}}_{t}\).

$$\begin{aligned} \mathbf{M} _{t }^{v }(i ) = \widetilde{\mathbf{M }}_{t }^{v }(i ) + {\textit{\textbf{w}}}_{t}(i){\textit{\textbf{a}}}_{t}. \end{aligned}$$
(11)

Training. All parameters of the LMKT model, such as the embedding matrices A and B as well as other weight parameters, are trained by minimizing a standard cross entropy loss between the prediction label \(p _{t}\) and the ground-truth label \(r _{t}\).

$$\begin{aligned} \mathcal {L}=-\sum _{t}\left( r _{t} \log p _{t}+\left( 1-r _{t}\right) \log \left( 1-p _{t}\right) \right) \end{aligned}$$
(12)

3.3 Learning Ability Community for Personalized Knowledge Tracing

In this subsection, we introduce Learning Ability Community for Personalized Knowledge Tracing (LACPKT) based on the previous two subsections. The framework of LACPKT is shown in Fig. 2, the process of the LACPKT is as follows:

Fig. 2.
figure 2

The framework of learning ability community for Personalized Knowledge Tracing

Firstly, we input the exercise sequences of all students into the LMKT model for training and get a basic \(LMKT_{0}\) model suitable for all students.

Secondly, According to Definition 1, we process each student’s exercise sequence to obtain their learning ability degree sequence \(\{\delta _{1},\delta _{1},...,\delta _{L}\}\).

Thirdly, we input the learning ability degree sequence of all students into the deep clustering network (DCN) [22], which joints dimensionality reduction and K-means clustering approach in which DR is accomplished via learning a deep neural network. According to the Definition 2, we assume that we obtain k learning ability communities (LAC) as follows:

$$\begin{aligned} \{LAC_{1},LAC_{2},..., LAC_{k}\} \Leftarrow \{(\delta _{1},\delta _{1},...,\delta _{L}), DCN\}. \end{aligned}$$
(13)

Then, we input the exercise sequences of k learning ability communities into the basic \(LMKT_{0}\) model for training, and acquire k optimized LMKT models by adjusting the parameters of the basic model, as shown in Eq. (14). In any learning ability community, we can trace the personalized knowledge state of each student.

$$\begin{aligned} \{LMKT_{LAC_{1}},...,LMKT_{LAC_{k}}\} \Leftarrow \{(LAC_{1},..., LAC_{k}), LMKT_{0}\}. \end{aligned}$$
(14)

Next, according to the Definition 3, if there are m students had already learned the exercise sequence \(X = \{x_{1}, x_{2}, ... , x_{t}\}\) involves knowledge concepts \(\{c_{1}, c_{2}, ... , c_{j}\}\) in the learning ability community k, we are able to construct m personalized knowledge models for these m students, and obtain these m students’ current knowledge state \(\{f_{k}^{u_{1}}, f_{k}^{u_{2}}, ..., f_{k}^{u_{m}}\}\), as shown in Eq. (15) and Eq. (16).

$$\begin{aligned} \{LMKT_{LAC_{k}^{u_{1}}},...,LMKT_{LAC_{k}^{u_{m}}}\} \Leftarrow \{(u_{1},...,u_{m}), LMKT_{LAC_{k}}\}, \end{aligned}$$
(15)
$$\begin{aligned} \{f_{k}^{u_{1}}, f_{k}^{u_{2}}, ..., f_{k}^{u_{m}}\} \Leftarrow \{LMKT_{LAC_{k}^{u_{1}}},...,LMKT_{LAC_{k}^{u_{m}}}\}. \end{aligned}$$
(16)

Finally, when student \(u_{m+1}\) wants to learn the exercise sequence \(X = \{x_{1}, ... , x_{t}\}\), we concatenate these m students’ current knowledge state \(\{f_{k}^{u_{1}}, f_{k}^{u_{2}}, ..., f_{k}^{u_{m}}\}\) and then pass it through a fully connected network with the Sigmoid activation function to predict the probability \(p_{k}^{u_{m+1}}\) that student \(u_{m+1}\) will correctly answer this exercise sequence.

$$\begin{aligned} p_{k}^{u_{m+1}} = Sigmoid(W _{k}^{T}[f_{k}^{u_{1}}, f_{k}^{u_{2}}, ..., f_{k}^{u_{m}}] + b _{k}), \end{aligned}$$
(17)

where \({\textit{\textbf{W}}}_{k}^{T}\) and \({\textit{\textbf{b}}}_{k}\) stand for the weight and bias of student \(u_{m+1}\). The optimized objective function is the standard cross entropy loss between the prediction label \(p_{k}^{u_{m+1}}\) and the ground-truth label \(r _{t}\).

$$\begin{aligned} L_{k}^{u_{m+1}}=-\sum _{t}\left( r_{t} \log p_{k}^{u_{m+1}} +\left( 1-r_{t}\right) \log \left( 1-p_{k}^{u_{m+1}}\right) \right) . \end{aligned}$$
(18)

4 Experiments

In this section, we first evaluate the performance of our LMKT model against the state-of-the-art knowledge tracing models on four experimental datasets. Then, we verify the validity of our LACPKT model and analyze the effectiveness of Personalized knowledge tracing.

4.1 Experiment Settings

Datasets. We use four public datasets from the literature [18, 24]: Synthetic-5Footnote 1, ASSISTments2009Footnote 2, ASSISTments2015Footnote 3, and Statics2011Footnote 4, where Synthetic-5 is one synthetic dataset, the other three are the real-world datasets obtained from online learning platforms. The statistics of four datasets are shown in Table 1.

Implementation Details. First, we encoded the experimental datasets with one-hot encoding, and the length of the encoding depends on the number of different questions. In the synthetic-5 dataset, 50% of exercise sequences were used as a training dataset, but in the other three dataets, 70% of exercise sequences were used as a training dataset. A total 20% of the training dataset was split to form a validation dataset that was used to find the optimal model architecture and hyperparameters, hyperparameters were tuned using the five-fold cross-validation. Then, we constructed the deep clustering network consist of four hidden layers to cluster learning ability community, of which the number of neurons in the four hidden layers is 1000,800,500,100, in addition to every layer is pre-trained for 100 epochs and the entire deep network is further finetuned for 50 epochs. Next, in all experiments of our model, we trained with Adam optimizer and repeated the training five times to obtain the average test results. Finally, about evaluation indicators, we choose the \(AUC \in [0,1]\), which is the area under the Receiver Operating Characteristic (ROC) curve, to measure the performance of all models.

Table 1. The statistics of four datasets.

4.2 Results and Analysis

LMKT Model Performance Evaluation. We evaluate our LMKT model against the other three knowledge tracing models: BKT, DKT and DKVMN. The test AUC results are shown in Table 2.

Table 2. The AUC results of the four models on four datasets.

Since the performance of the BKT model and the DKT model is lower than the DKVMN model, the AUC results of these two models refer to the optimal results in [24]. For the DKVMN model, we set the parameters of it to be consistent with those in the original literature, then trained five times on four datasets to obtain the final average test results. In our LMKT model, we adjusted the model parameters to find the best model structure and obtain the best results according to the change of our model structure. It can be seen that our model outperformed the other models over all the four datasets. Although the performance of our model is improved by 0.25% to 0.5% compared to the DKVMN model, it proved the effectiveness of dealing with knowledge tracking problems based on human learning and memory processes. Besides, the state dimensions and memory size of the LMKT model are 10 or 20, it can get better results. However, the state dimensions and memory size of the DKVMN model are 50 or 100, which leads to problems such as more model parameters and longer training time. As shown in Fig. 3, the performance of the LMKT model is better than the DKVMN model, especially the LMKT model can reach the optimal structure of the model in fewer iterations. However, In the Statics2011 dataset, although the performance of the LMKT model and the DKVMN model are similar, the gap exists between the training AUC and the validation AUC of the DKVMN model is larger than the LMKT model, so the over-fitting problem of the LMKT model is smaller than the DKVMN model.

Fig. 3.
figure 3

Training AUC and validation AUC of DKVMN and LMKT on four datasets

Learning Ability Community Analysis. By analyzing the exercise sequences of the four experimental datasets, we divide each student into different learning ability communities thought the deep clustering network. According to Definitions 1 and 2, we knew that \(\delta \) determines the difference in learning ability among students and then normalized the learning ability degree is between 0 and 1. However, we set up clusters of deep clustering networks to determine the number of learning ability communities and divide students into the learning ability communities to which they belong. We set the ASSISTments2015 dataset contains six different learning ability communities and the other three datasets contain five different learning ability communities, and visualize each dataset through t-SNE [13]. Figure 4 shows the results of the visualization of the learning ability community, where each color represents a learning ability community. In the Synthetic-5 dataset, because it is obtained by artificially simulating the student’s answering process, the effect of dividing the learning ability community shows discrete characteristics. In ASSISTments2009 and ASSISTments2015 dataset, because of the exercise sequences are shorter in the datasets or the exercise questions are repeated multiple times in the same exercise sequence, some data is far from the center of the learning ability community. Moreover, the dataset ASSISTments2015 contains 19,840 students, so the number of learning ability communities set is not enough to meet the actual situation, so it has the problem of overlapping learning ability communities. In the Statics2011 dataset, since it contains only 333 students, the overall clustering effect is sparse.

Fig. 4.
figure 4

Learning ability community

Personalized Knowledge Tracing. Because the dataset ASSISTments2015 is an updated version of the data set ASSISTments2009, and the data set Statics2011 has only 333 student exercise sequences, we chose the datasets Synthetic-5 and ASSISTments2015 for experimental verification of personalized knowledge tracing. Through the analysis of the dataset and the learning ability community, we set the number of learning ability communities of the datasets Synthetic-5 and ASSISTments2015 to 4 and 6. To ensure the sufficiency and reliability of personalized knowledge tracing experiment, we input different learning ability communities to DKVMN called the LAC\(\_\)DKVMN model and compare it with LACPKT. According to Definition 3, we conducted experiments to validate the validity of the LACPKT model, the experimental results are shown in Fig. 5.

According to Fig. 5(a) and (b), we found that the personalized knowledge tracking capability of the LACPKT model is better than the LAC\(\_\)DKVMN model in all learning capability communities of the two datasets. In dataset Synthetic-5, the AUC results of the LACPKT model in the four learning ability communities are 83.96%, 84.98%, 82.56%, 81.67%, respectively. In dataset ASSISTments2015, the AUC results of the LACPKT model in the six learning ability communities are 76.25%, 70.34%, 73.23%, 71.87%, 70.68%, 72.72%, respectively. From the experimental results, when the LACPKT model tracks the knowledge status of students in different learning ability communities, its performance is shown to be different. The reason is that students learn different exercise sequence length and problems in different learning ability communities, so the performance of the LACPKT model is different in different learning ability communities. Figure 5(c) and (d) show the changing state of AUC of the LACPKT model in the validation dataset. In dataset Synthetic-5, because it is an artificial simulation dataset, the length and questions of each student’s exercise sequence are the same, the test results and the verification results of the LACPKT model remain the same. However, in ASSISTments2015 dataset, because each student’s exercise sequence length and questions are different, the performance of the LACPKT model in different learning communities is different. For different learning ability communities, the LACPKT model can model students’ learning processes according to different learning abilities, and track students’ personalized knowledge status based on the effect of group learning behavior on individuals. Therefore, the experimental results proved the effectiveness of dividing the learning ability community and the effectiveness of the LACPKT model in tracing students’ personalized knowledge state.

Fig. 5.
figure 5

The results of personalized knowledge tracing.

5 Conclusions and Future Work

In this paper, we propose a novel method that is called Learning Ability Community for Personalized Knowledge Tracing (LACPKT) to model the learning process of students and trace the knowledge state of students. The LACPKT model consists of two aspects, on the one hand, we propose the definition of learning ability community, which utilizes the effect of group learning on individuals to trace the personalized knowledge state of students; On the other hand, we propose a Knowledge Tracking based on Learning and Memory Process model that considers the relationship between the current knowledge state and knowledge growth and forgetting mechanisms. Finally, we demonstrate the effectiveness of the LACPKT model in tracing students’ personalized knowledge on public datasets. For future work, we will integrate the content information of the problem to optimize the learning ability community and optimize the network structure to improve the prediction ability.