1 Introduction

With the continuous advancement of education informatization and education modernization (Cao 2016), educational data mining (Heiner and Heffernan 2014) has received extensive attention from researchers at home and abroad. In order to promote the development of educational technology, a large number of technologies related to educational data mining are constantly being proposed (Slater et al. 2018). The concept map is an effective knowledge visualization tool and the educational data mining technology on the concept map has become the current research hotspot (Markham et al. 2010). The concept map was first proposed by Dr. Novak (Novak and Gowin 1984) of Cornell University in 1984 and the concept map proposed by Novak expresses the associations between concepts through edges, and describes the associations between concepts using a near-natured language. The form of the concept map in recent years is still based on the network-based conceptual structure proposed by Novak, using nodes to represent concepts, using directed edges to represent the connections between concepts and using prepositional labels to represent the dependencies between concepts associations (Paul 2012).

Scholars both at home and abroad have conducted extensive research on concept maps and applied the concept maps to different fields, such as teaching diagnosis (Hirashima et al. 2015), knowledge building (Zhang et al. 2007) and clinical nursing (Kaddoura et al. 2016), and achieved some results. However, the early generation of concept maps was mainly handmade by experts based on their experience, not only time-consuming, but also difficult to guarantee its accuracy (Coffey et al. 2002). In recent years, automatic generation algorithms of concept maps relying on educational data mining techniques have been continuously proposed. Jiang et al.(Jiang et al. 2009) proposed a method to understand the structures of hand-drawn concept maps and a kind of structure-based intelligent manipulation technique. However, the concept map needs to be manually generated by experts first. Chen et al. (2006) used text analysis techniques to automatically generate conceptual maps of the e-learning domain from the literature. But they only considered the association rules between words and did not reflect the association rules between concepts. Caputo and Ebecken (2011) use the natural language processing method of text analysis technology to generate concept maps from e-commerce web pages. They used information extraction to mine concepts and analyze their associations, but they do not fully consider the role of dynamic data in the process of concept map generation. Huang et al. (2015) proposed an algorithm to automatically generate concept maps under simulated datasets. They calculated the correlation between concepts through improved Apriori (Toivonen 2011) algorithm, but the association rules between concepts and test questions was still classified by experts manually. Atapattu et al. (2017) extracted concept maps from lecture slides and the suitability of auto-generated concept maps as a pedagogical tool. However, they only considered the content of the slides, and did not pay enough attention to the test data of students. In summary, researchers have made great achievements in the automatic generation algorithms of concept maps in recent years. However, there are common shortcomings such as the excessive reliance of experts, long time required to generate concept maps, and the lack of rational use of dynamic data such as student’s answer records.

In this paper, we propose a text analysis-association rules mining (TA-ARM) algorithm based on text analysis and association rules mining, which is based on the test data of students. The TA-ARM algorithm uses text classification of text analysis technology to classify test questions into concepts to reduce the time spent manually classifying questions into concepts. At the same time, it combines student’s answer records and introduces association rules mining to automatically generate the concept map. The experimental results are shown that the TA-ARM algorithm can rapidly generate a high quality concept map under the premise of reducing the labor intensity.

The remainder of this paper is organized as follows. Related literature is reviewed in Sect. 2. The explanation of the TA-ARM are discussed in Sect. 3. Section 4 conducts the computational experiments and analysis. Finally, we conclude our results and point out the future research directions in Sect. 5.

2 Literature review

Data mining methods and models can uncover hidden nuggets of information on large data sets (Booth 2007), and has been successfully applied in medicine (Li et al. 2004), finance (Cowan 2002), biology (Hirschman et al. 2002) and other fields. However, data mining has started late in the education field, compared to other fields (Romero and Ventura 2007). In the field of education, data mining can find and solve various problems in education. Through a variety of methods and models, educational data mining can be used to design better and smarter learning technologies to better inform learners and educators (Baker 2014).

As a tool of knowledge organization in educational data mining, concept maps are generally used to represent knowledge structures (Acharya and Sinha 2017). The method of generating the concept map has been widely concerned by researchers (Novak and Cañas 2007). Tseng et al. (2007) proposed a two-phase concept map generation (TP-CMC) method to construct a concept map of a course through learner’s historical test records. Their dataset comes from the 104 students of junior high school in Taiwan and the domain of the examination is the Physics course and the subordination between concept and question are given directly. Bai and Chen (2008) apply fuzzy rules and fuzzy reasoning techniques to automatically construct concept maps, they use simulated questions-concepts matrix and grade matrix to calculate the relevant degrees between concepts. The association rules between the concepts that have been mined have generated concept maps. The authors are more inclined to propose the theory of the algorithm. Under the condition that they can be quickly calculated, the feasibility of the algorithm is verified using small sample data. Chen and Sue (2013) improved on the basis of Bai and Chen (2008) algorithm, used existing data sets, combined association rule mining methods, and then generated concept maps. Their methods can dynamically generate concept maps based on students’ answer records, which have certain practical significance. Oppl and Stary (2017) present a tabletop interface designed to assist in the generation of concept maps, and this tool plays an active role in the collaborative construction of concept maps. The generation of concept maps requires the participation of multiple concept map builders and need to use the experience of many people to improve the quality of the concept map. Acharya and Sinha (2015) used automatic hashing and pruning algorithms to automatically generate concept maps. The algorithm proposed by them is helpful to improve the efficiency of the concept map generation and has achieved certain results in practical application. Although the above-mentioned papers have a variety of algorithms and have achieved good results, their data are clear final datasets and are rarely involved in multi-semantic data such as textual data. However, it takes a lot of time to acquire these final datasets.

In addition there are many studies related to text analysis. Lai et al. (2017) proposed a new system based on information retrieval technology that automatically creates keyword concept maps for each part of the book. By analyzing the association rules of key words in the book, a static concept map is generated. In this process, no dynamic data is involved in the calculation. Santos (2018) used natural language processing and machine learning techniques to discover the associations between concepts from text documents and ultimately generate concept maps. They used text classification techniques in natural language processing to process text and grouped a set of 497 structured abstracts from Computer Science and Software Testing areas. The experiments have shown that the proposed method can assist researchers to generate concept maps. Qasim et al. (2013) presented a cluster-based approach to semi-automatically construct concept maps from unstructured text documents. They selected a total of 65 sample documents from 2007 to 2011 from the information system domain as the dataset for the generation of concept maps and used an unsupervised clustering algorithm to extract the structural associations of the candidate terms in the documents to generate concept maps. Nugumanova et al. (2015) used to analyze the frequency of document terms based on the collection of teaching materials, build terminology and document matrix to generate the concept map between terms. They summarize the information obtained from the document and the concept map generated by their proposed algorithm has the following advantages: quickness, effectiveness, completeness and actuality. There are also many methods to mine association rules from texts such as academic articles or teaching materials and then generate concept maps. These methods deal with unstructured texts, which can effectively save the time to obtain the final datasets, and have great significance for the generation of concept maps. This paper is inspired by the above-mentioned literature, through the text analysis to obtain the final datasets, replacing the time-consuming process of the experts, and combine with the classic association rules mining method, and ultimately achieve the automatic generation of the concept map.

3 TA-ARM concept map automatic generation algorithm

The TA-ARM algorithm consists of two phases: test questions text analysis and association rules mining between concepts. As shown in Fig. 1, in the test questions text analysis phase, extracting text features from test questions first, then build a classification model and use text classification to classify the test questions into concepts. In this phase, the associations between test questions and concepts can be obtained. In the association rules mining between concepts phase, generating frequent item sets of test questions combine with answering records first, and mapping the associations between test questions in the previous phase to the relevant degree of concepts, and finally generate the concept map. The concept map can be automatically generated combined with text analysis and association rules mining method without the aid of expert experience. In addition, we use the notations in Table 1 throughout the paper.

Fig. 1
figure 1

TA-ARM algorithm schematic diagram

Table 1 Notations

3.1 Test questions text analysis

Text analysis (Matsumoto et al. 2017) is an important branch of traditional data mining, but it is different from traditional data mining. Common text analysis methods include text clustering (Steinbach 2000) and text classification (Cohen 2004) and so on. In the test questions text analysis phase of TA-ARM algorithm, we use text classification techniques to automatically classify test questions into concepts to replace the manual classification relying on expert experience. The process of test questions text classification is shown in Fig. 2.

Fig. 2
figure 2

The process of test questions text classification

3.1.1 Word segmentation and stop words filtering

The texts of test questions are unstructured, which cannot be directly classified by the computer (Kurbatow 2015). Therefore, text preprocessing is required. The text of the Chinese test questions used in this paper is more complex than the English text, and there is no fixed interval between words, so the word segmentation is needed (Islam et al. 2008). Test questions after word segmentation contain many meaningless words, so we need to filter out meaningless words, that is, stop words filtering.

Test questions after word segmentation and stop words filtering are expressed as \({\text{Q}}=({Q_1},{Q_2},~ \ldots ,{Q_j}, \ldots ,{Q_m})\), where \(m\) is the number of test questions and \({Q_j}\) is the \(j\)-th test question. In the next steps, the \({\text{Q}}\) will be analyzed instead of the original test questions.

3.1.2 Text features extraction

Term frequency–inverse document frequency (TF-IDF) (El-Khair 2009) is a weighting function that depends on the term frequency in a given document calculated with its relative collection frequency. In this step, we choose the TF-IDF method to extract text features and transform \({\text{Q}}\) into a vector space model (Melucci 2017) that can be understood by computers.

The text features extracted by TF-IDF are expressed as \({\text{W}}={W_1},{W_2}, \ldots ,{W_j}, \ldots ,{W_m}\) corresponding with \({\text{Q}}\), where \({W_j}\) is the \(j\)-th text feature. Similarly, \({W_j}\) can also be expressed as \({W_j}=({W_{j1}},{\text{~}}{W_{j2}}, \ldots ,{W_{ji}}, \ldots ,{W_{jr}})\), where \({W_{ji}}\) is the weight of the feature item \(i\) is the \(j\)-th test question, and \(r\) is the dimension of the text feature. The formula for calculating the weight of a feature item \({W_{ji}}\) is:

$${W_{ji}}=T{F_{j,i}} \times ID{F_i}.$$
(1)

The \(T{F_{j,i}}\) denotes the word frequency of text feature item \(i\) in the text of the \(j\)-th test question and \(ID{F_i}\) denotes the number of feature items \(i\) appearing in the whole texts, word frequency is proportional to weight, and reverse text frequency is inversely proportional to weight.

3.1.3 Classified by classification model

The Q are digitized into multidimensional vectors after text feature extraction. The text features \({\text{W}}\) which extracted in the previous step can be processed by the classification model. There are many commonly used classification models, such as Rocchio (Moschitti 2003), Logistic Regression (Bertsimas and King 2017), Naive Bayes (Mccallum 1998), k-NN (Zhang et al. 2017) and SVM (Noble 2006) and so on.

Among them k-NN is a type of lazy learning model that does not require actual training (Larose 2004). Its time complexity is directly proportional to the number of test questions, and the k-NN algorithm is among the simplest of all machine learning algorithms, which is in accordance with the needs of this paper. The data trained or classified by the k-NN model are the text features \({\text{W}}\) obtained from the previous step. Before the classification, the test questions text features \({\text{W}}\) are divided into training samples \({{\text{W}}_{train}}\) and samples to be classified \({{\text{W}}_{test}}\). The samples to be classified \({{\text{W}}_{test}}\) are test questions that need to be manually classified into concepts by experts in traditional algorithms. Each question including \({{\text{W}}_{train}}\) and \({{\text{W}}_{test}}\) has a class label and the class label represents a concept. For convenience, the concepts (that is, class labels) are represented as \({\text{C}}=({C_1},{C_2}, \ldots {C_x}, \ldots ,{C_k})\), where k is the number of concepts and \({C_x}\) is the \(x\)-th concept.

3.1.4 Result evaluation and classification result output

The training step of the k-NN model includes storing the text features \({{\text{W}}_{train}}\) of the training samples and corresponding class labels. The k-NN model after training can be used to classify \({{\text{W}}_{test}}\). The main evaluation index of the classification model is accuracy. The higher the accuracy of the automatic classification by the k-NN model, the closer the results of the automatic classification by the k-NN model to the results classified by the experts manually.

For the convenience of the next phase of calculation, the results classified by the k-NN classification model are converted into a questions-concepts matrix \({\text{QC}}\), which is expressed as follows:

$${\text{QC}}=\left[ {\begin{array}{*{20}{c}} {q{c_{11}}}&{q{c_{12}}}& \cdots &{q{c_{1k}}} \\ {q{c_{21}}}&{q{c_{22}}}& \cdots &{q{c_{2k}}} \\ \vdots & \vdots & \ddots & \vdots \\ {q{c_{m1}}}&{q{c_{m2}}}& \cdots &{q{c_{mk}}} \end{array}} \right],$$

where \(q{c_{jx}}\) indicates whether the current test question \({Q_j}\) belongs to the concept \({C_x}\), \(q{c_{jx}} \in \{ 0,{\text{~}}1\}\), and \(m\) is the total number of students. When \(q{c_{jx}}=1\) indicates that the test question \({Q_j}\) belongs to the concept \({C_x}\), and \(q{c_{jx}}=0\) indicates that the test question \({Q_j}\) does not belongs to the concept \({C_x}\). The matrix \({\text{QC}}\) obtained by the k-NN model replaces the process of manually classifying test questions into concepts by experts and will be combined with association rules mining method in the next phase to automatically generate the concept map.

3.2 Association rules mining between concepts

The association rules were first proposed by Agrawal et al. (1993) in 1993 to discover meaningful associations hidden in the data. The associations discovered by the association rules mining method can be in the form of association rules or frequent item sets, expressed as \({\text{A}} \to {\text{B}}\). \({\text{A}}\) and \({\text{B}}\) are disjoint item sets, where \({\text{A}} \cap {\text{B}}=\emptyset\), \({\text{A}}\) is called the antecedent of the rule, and \({\text{B}}\) is called the consequent of the rule. Apriori (Toivonen 2011) is one of the best algorithms for learning association rules. This paper cites the improved Apriori association rules mining algorithm proposed by Chen and Sue (2013). The algorithm proposed by Chen et al. relies on the experience of experts and combines association rule mining methods to generate concept maps, and they confirm the correctness of the generated concept map. We used their algorithms in the second phase (Association rules mining between concepts) of TA-ARM and compared the concept map generated using only their algorithms to verify the feasibility of the TA-ARM algorithm. As shown in Fig. 3, using student’s answer records data to discover the association rules between test questions, and combine the questions-concepts matrix \({\text{QC}}\) generated by the text analysis phase to map the associations between test questions to the associations between concepts, and automatically generate the concept map ultimately.

Fig. 3
figure 3

The process of association rules mining between concepts

Before mining association rules, the student’s answer records need to be digitized into a grade matrix \({\text{G}}\), expresses as follows:

$${\text{G}}=\left[ {\begin{array}{*{20}{c}} {{g_{11}}}&{{g_{12}}}& \cdots &{{g_{1n}}} \\ {{g_{21}}}&{{g_{22}}}& \cdots &{{g_{21}}} \\ \vdots & \vdots & \ddots & \vdots \\ {{g_{m1}}}&{{g_{m1}}}& \cdots &{{g_{mn}}} \end{array}} \right],$$

where \({g_{jy}} \in \{ 0,1\}\), \({g_{jy}}=1\) means that student \({S_y}\) correctly answered the question \({{\text{Q}}_j}\), and \({g_{jy}}=0\) means that student \({S_y}\) erroneously answered the question \({{\text{Q}}_j}\), and \(n\) is the total number of students. In the next steps, we will perform association rule mining on performance matrix and problem matrix.

3.2.1 Answer records consistency analysis

In order to remove unnecessary associations between test questions, we introduce in answer records consistency. Answer records consistency is the XNOR value between every two rows of grades in the grade matrix \({\text{G}}\), which means the number of students who all answered correctly or all answered incorrectly in every two test questions. When the XNOR value is 0, it means that the student has answered one of the two test questions correctly. Otherwise, it indicates that the student has all answered correctly or all answered incorrectly in the two test questions. So we have

$${\text{Count}}\;({Q_a},{Q_b})=\mathop \sum \limits_{{l=1}}^{n} ({g_{al}} \odot {g_{bl}}).$$
(2)

Using \({\text{~Count}}\;({Q_a},{Q_b})\) to indicates the answer records consistency between test question \({Q_a}\) and test question \({Q_b}\), and define the threshold as \(Mi{n_{count}}=n \times 40\%\) for the minimum answer records consistency, where \(n\) is the number of students. The threshold \(Mi{n_{count}}\) is consistent with the threshold value set by the referenced algorithm. When \({\text{Count}}\left( {{Q_a},{Q_b}} \right)<Mi{n_{count}}\), it indicates that the number of students who all answered correctly or all answered wrongly between the two questions is relatively rare. The association between the two questions is weak and will not be considered in the subsequent calculation.

Through the filtering of the answer records consistency with the threshold \(Mi{n_{count}}\), the number of test questions to be calculated can be reduced, and the efficiency of generating concept maps can be improved.

3.2.2 Test questions association rules mining

Based on the grade matrix \({\text{G}}\) and the Apriori (Toivonen 2011) algorithm, mining the association rules between test questions. Use the following formula to calculate the association rules between test questions:

$${\text{Conf}}\;({Q_a} \to {Q_b})=\frac{{{\text{Sup}}\;({Q_a},{Q_b})}}{{{\text{Sup}}\;({Q_a})}}$$
(3)

The \({\text{~Conf}}\;({Q_a} \to {Q_b})\) in (3) denotes the confidence of the association rule \({Q_a} \to {Q_b}\), \({\text{Sup}}\;({Q_a},{Q_b})\) indicates the support of test questions \({Q_a}\) and \({Q_b}\), and \({\text{Sup}}\;({Q_a})\) denotes the support of test questions \({Q_a}\), where \({\text{Conf}}\;({Q_a} \to {Q_b}) \in [0,1]\).

In this step, there are four kinds of association rules between questions to be considered: The student correctly answers test question \({Q_a}\) and then also correctly answers test question \({Q_b}\). The student correctly answers test question \({Q_b}\) and then also correctly answers test question \({Q_a}\). The student erroneously answers test question \({Q_a}\) and then also erroneously answers test question \({Q_b}\). The student erroneously answers test question \({Q_b}\) and then also erroneously answers test question \({Q_a}\). That is correct to correct and erroneous to erroneous. Then, summarize these four kinds of association rules into two types: correctly answered to correctly answered, and erroneously answered to erroneously answered. Calculate the confidence of two types of association rules respectively.

In order to remove unnecessary associations between test questions, the \(Mi{n_{Conf}}\) is set as the threshold of the confidence between test questions. The greater the confidence between two test questions, the closer the association between the two test questions and the greater the possibility of an association. Conversely, the weaker the confidence between the two test questions, the less likely they are associated.

3.2.3 New questions-concepts matrix construction

Construct a new questions-concepts matrix \({\text{QC}^{\prime}}\) from the questions-concepts matrix \({\text{QC}}\) obtained from the test questions text analysis phase using the following formula:

$$q{c^{\prime}_{jx}}=\frac{{q{c_{jx}}}}{{\mathop \sum \nolimits_{{h=1}}^{m} q{c_{hx}}}},$$
(4)

where \(q{c^{\prime}_{jx}}\) is the value in the new questions-concepts matrix \({\text{QC}^{\prime}}\) whose position corresponds to \(q{c_{jx}}\) in the questions-concepts matrix \({\text{QC}}\) obtained from the test questions text analysis phase, and \(\sum\nolimits_{{h=1}}^{m} {q{c_{hx}}}\) is the total degree of relevance of concept \({C_x}\) in all test questions.

In the next step, the new questions-concepts matrix \({\text{QC}^{\prime}}\) will be used instead of the original questions-concepts matrix \({\text{QC}}\) to participate in the calculation.

3.2.4 Concepts association rules mining

The relevant degree of concepts expresses the association strength between two concepts and can be embodied as the corresponding association rule between two concepts. Combining the new question-concept matrix \({\text{QC}^{\prime}}\), use the following formula to map the relevant degree of test questions to the relevant degree of concepts:

$${\text{Rev}}\;{({C_u},{C_v})_{{Q_a} \to {Q_b}}}=q{c_{au}} \times q{c_{bv}} \times {\text{Conf}}\;({Q_a} \to {Q_b}).$$
(5)

The \({\text{Rev}}\;{({C_u},{C_v})_{{Q_a} \to {Q_b}}}\) denotes the relevant degree of test questions \({C_u}\) and \({C_v}\) derived from the association rule \({Q_a} \to {Q_b}\), where \({\text{Rev}}\;{({C_u},{C_v})_{{Q_a} \to {Q_b}}} \in [0,1]\).

Define the threshold of relevant degree \(\mu =\mathop {{\text{Min}}}\limits_{{1 \leq j \leq m,1 \leq x \leq k{\text{~}}\;{\text{and}}\;~q{c_{jx}}>0}} q{c_{jx}}\), where \(\mu \in [0,1]\) and express the total number of test questions included in concept \({C_u}\) and concept \({C_v}\) as \({\varepsilon _{uv}}\). If \({\varepsilon _{uv}}<~m \times 50\%\), then preserve \({\text{Rev}}\;{({C_u},{C_v})_{{Q_a} \to {Q_b}}}\), even though it is smaller than \(\mu\). The \({\varepsilon _{uv}}\) is consistent with the value set by the referenced algorithm.

3.2.5 Concept map generation

Two types of association rules were considered in the previous steps: correctly answered to correctly answered, and erroneously answered to erroneously answered. Assume that the relevant degree of the concepts association which preserved is \({\text{Rev}}\;{({C_u},{C_v})_{{Q_a} \to {Q_b}}}\), and the association rule \({Q_a} \to {Q_b}\) is the erroneously answered to erroneously answered, then set a directed arrow from \({C_u}\) to \({C_v}\) and the weight of the edge is the relevant degree between \({C_u}\) and \({C_v}\). Similarly, if the relevant degree of the concepts association which preserved is \({\text{Rev}}\;{({C_u},{C_v})_{{Q_a} \to {Q_b}}}\), and the association rule \({Q_a} \to {Q_b}\) is the correctly answered to correctly answered, according to the logical equality formula \({Q_a} \to {Q_b}=\sim {Q_b} \to \sim {Q_a}\), then set a directed arrow from \({C_v}\) to \({C_u}\) and the weight of the edge is the relevant degree between \({C_v}\) and \({C_u}\). The association rules between all the concepts are standardized as a concept that should be mastered prior another concept, and the strength of the association is the degree of relevance. In order to prevent the associations between concepts from being too complex or too weak, delete associations with the relevant degree less than 0.1.

According to the association rules of the concepts obtained from the above steps, generate the associations between concepts. If there are more than one relationship between two concepts, the association with the greatest relevant degree is retained. Visualize the association rules between concepts using automatic drawing tools and generate the concept map ultimately.

The pseudo-code of the TA-ARM algorithm is as follows:

figure a

3.3 Algorithm complexity analysis

The TA-ARM concept map automatic generation algorithm includes two phases. The k-NN algorithm is mainly used in the test questions text analysis phase. The time complexity of the k-NN algorithm is \({\text{O}}(m)\), where m is the number of test questions. The Apriori algorithm is used in the phase of association rules mining between concepts. In this paper, only the frequent 2-itemsets in the test questions and concepts are considered, so the time complexity is not more than \({\text{O}}({n^2})\).

4 Experiment and result analysis

4.1 Data sources and experimental environment

In order to verify the feasibility and effectiveness of the TA-ARM algorithm, this paper selects 617,940 authentic answer records from 6866 students in a large-scale examination of Computer Culture Foundation as the experimental dataset, including 90 test questions involving 9 concepts. The dataset was collected from the specialized subject undergraduate entrance simulation exam of a province in China in December 2017. We also collected 3001 test questions with conceptual labels from many other college examinations as training samples. The distribution of the test questions in the training samples is shown in Fig. 4.

Fig. 4
figure 4

The distribution of the test questions in the training samples

The experimental operating environment is the Windows 10 operating system. The programming language is Python 3.6, and the software development environment is PyCharm Community Edition 2018 and SQL Server 2008.

4.2 Experiment of test questions text analysis

There are three question types include test questions: multi-choice questions, true or false questions and cloze questions. Before the test questions text analysis phase, combine the options for multi-choice questions with the stem as an entire test question text, the stem of the judgment question is used as the test question text, and the correct answer of the cloze question is incorporated into the stem as the text of the entire test question. Each question has a concept label. The actual contents of concepts corresponding to concept labels are shown in Table 2. In the next steps, the questions we are referring to are all processed according to the types of questions, and all concepts are represented using concept labels.

Table 2 The actual contents of concepts corresponding to concept labels

There are no obvious separators in Chinese test questions, it is necessary to segment the test questions and filter the stop words. In the word segmentation step, select and use an open source tool Jieba and in the stop words filtering step, select and use the mainstream Chinese stop words list Harbin Institute of Technology stop words list.

Extracting text features \({\text{W}}\) using formula (1), and each text feature \({W_j}\) is a 4246-dimensional vector. Before the text classification, the test questions text features \({\text{W}}\) are divided into training samples \({{\text{W}}_{train}}\) and samples to be classified \({{\text{W}}_{test}}\). The number of training samples \({{\text{W}}_{train}}\) is 3001, and the number of samples to be classified \({{\text{W}}_{test}}\) is 90. Next, we will use the training samples \({{\text{W}}_{train}}\) to train the k-NN model. That is, storing the feature vectors and concept labels of the training samples \({{\text{W}}_{train}}\).

After the k-NN classification model stores the \({{\text{W}}_{train}}\), then let k-NN model classifies the samples to be classified \({{\text{W}}_{test}}\). The model selects the 5 nearest neighbors when classifying, and this is also the default value set by the toolkit we are using. The results classified by the k-NN classification model are converted into a questions-concepts matrix \({\text{QC}}\). The abscissa represents the test questions and the ordinate represents the concepts, shown as follows:

\({\text{QC}}=\left[ {\begin{array}{*{20}{c}} 1&0&0&0&0&0&0&0&0 \\ 1&0&0&0&0&0&0&0&0 \\ 1&0&0&0&0&0&0&0&0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 0&0&0&0&0&0&1&0&0 \\ 0&1&0&0&0&0&0&0&0 \\ 0&0&0&0&1&0&0&0&0 \\ 1&0&0&0&0&0&0&0&0 \\ 0&0&0&0&0&0&1&0&0 \end{array}} \right].\)

The classification report is shown in Table 3.

Table 3 The classification report of the k-NN model

4.3 Association rules mining between concepts

Before mining association rules, the student’s answer records are preprocessed as the grade matrix G, the abscissa indicates the test questions, and the ordinate indicates the students, shown as follows:

\(G=\left[ {\begin{array}{*{20}{c}} 1&0&0&0&0&0&1&1&0&1&1&0& \cdots &1 \\ 0&1&1&1&1&1&0&1&1&1&1&0& \cdots &0 \\ 0&1&0&1&0&0&0&0&1&1&1&1& \cdots &1 \\ 1&1&1&0&0&1&0&1&1&1&1&0& \cdots &1 \\ 0&1&0&1&1&1&1&1&1&1&1&0& \cdots &1 \\ 1&1&0&0&1&1&1&1&1&1&1&1& \cdots &0 \\ 0&0&0&1&0&1&1&0&1&1&0&0& \cdots &1 \\ 0&1&1&1&1&0&1&1&1&1&1&1& \cdots &1 \\ 1&1&1&1&1&1&1&1&1&1&0&0& \cdots &1 \\ 1&1&1&1&1&1&1&1&1&1&1&1& \cdots &1 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 1&1&1&1&1&1&1&1&1&1&0&1& \cdots &1 \end{array}} \right]\)

Based on the formula (2) and the grade matrix \({\text{G}}\), we can calculate the answer records consistency value \({\text{Count}}\;({Q_a},{Q_b})\) between every two questions \({Q_a}\) and \({Q_b}\). Because the number of students is 6,866, we can calculate the threshold \(Mi{n_{count}}=n \times 40\% =2746.4\) and only consider association rules between test questions which meet \({\text{Count}}\;({Q_a},{Q_b}) \geq 2746.4\). After this step, we have retained 3660 associations between test questions.

Calculate the confidence of two types of association rules between test questions respectively and each association considers four kinds of association rules between every two questions after filtering by the threshold of answer records consistency. In order to facilitate comparison with the concept map generated by the expert manual assistance, we set the threshold of the confidence between test questions as \(Mi{n_{conf}}=0.75\), which consistent with the parameters of the algorithm proposed by Chen et al. Delete the association rules between test questions which meet \({\text{Conf}}\;({Q_a} \to {Q_b}){\text{~}}<0.75\).After this step, we have retained 3758 association rules between test questions from correctly answered to correctly answered and 212 association rules between test questions from erroneously answered to erroneously answered.

Construct a new questions-concepts matrix \({\text{QC'}}\) from the questions-concepts matrix \({\text{QC}}\) obtained from the test questions text analysis phase using the following formula (4). In the next step, the new questions-concepts matrix \({\text{QC}^{\prime}}\) will be used instead of the original questions-concepts matrix \({\text{QC}}\) to participate in the calculation, shown as follows:

\({\text{QC}}=\left[ {\begin{array}{*{20}{c}} {0.077}&0&0&0&0&0&0&0&0 \\ {0.077}&0&0&0&0&0&0&0&0 \\ {0.077}&0&0&0&0&0&0&0&0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 0&0&0&0&0&0&{0.083}&0&0 \\ 0&{0.1}&0&0&0&0&0&0&0 \\ 0&0&0&0&{0.091}&0&0&0&0 \\ {0.077}&0&0&0&0&0&0&0&0 \\ 0&0&0&0&0&0&{0.083}&0&0 \end{array}} \right]\)

Combining the new question-concept matrix \({\text{QC}^{\prime}}\), use the formula (5) to map the relevant degree of test questions to the relevant degree of concepts. Calculate the threshold of relevant degree \({{\varvec{\upmu}}}=1\) and calculate the total number of test questions included in concept \({C_u}\) and concept \({C_v}\) as \({\varepsilon _{uv}}\). The number of test questions is 90, delete the associations between concepts which meet \({\varepsilon _{uv}} \geq ~90 \times 50\%\) and \({\text{Rev}}{({C_u},{C_v})_{{Q_a} \to {Q_b}}}<1\). After this step, there are 3448 associations between concepts are retained.

If the associations between concepts are correctly answered to correctly answered, then convert the associations to erroneously answered to erroneously answered using the logical equality formula. And only keep associations between concepts with the highest relevant degree and greater than 0.1. There are 19 concepts of association rules that are retained after this step.

Visualize the association rules between concepts using automatic drawing tools and generate the concept map ultimately. Association rules between concepts obtained by the TA-ARM algorithm are shown in Table 4 and the concept map is shown in Fig. 5a.

Table 4 Association rules between concepts obtained by the TA-ARM algorithm
Fig. 5
figure 5

Concept maps generated by the TA-ARM algorithm (a) and the algorithm proposed by Chen et al. (b)

In order to verify the feasibility and effectiveness of the TA-ARM algorithm, we use the same datasets, parameters and thresholds mentioned in the above experiment and compare the results obtained by the algorithm proposed by Chen and Sue (2013). In the algorithm proposed by Chen et al., the experts manually classify the questions into concepts firstly, and then combine association rule mining method to generate the concept map ultimately. The results obtained by the algorithm proposed by Chen et al. are shown in Table 5 and Fig. 5b.

Table 5 Association rules between concepts obtained by the algorithm proposed by Chen and Sue (2013)

After comparison, it can be found that the concept map generated by the TA-ARM algorithm has more than four associations compared to the concept map generated by the algorithm proposed by Chen et al. However, the relevant degree of these four associations is all less than 0.11 and relatively weak. And the other association directions and relevant degrees of the two concept maps are all the same. In this experiment, the TA-ARM algorithm required no more than 10 s to generate a concept map, which is far less than the time when the experts participate in the classification and then generate the concept map. The feasibility and effectiveness of the TA-ARM algorithm are verified.

5 Conclusions

Aiming at the limitations of the high reliance on experts and time-consuming in the current concept map generation algorithms, this paper proposes a new concept map automatic generation algorithm TA-ARM based on text analysis and association rules mining. The TA-ARM firstly uses the text classification method in text analysis technology to classify the test questions into concepts, which replaces the process of expert manual classification, and then combines the association rules mining method in current concept map generation algorithms to realize the automatic generation of concept maps.

The experiment shows that the TA-ARM algorithm has the following characteristics: (1) Low reliance on expert experience; (2) High quality concept maps and low time consuming; (3) The concept map can be dynamically adjusts the concept map based on the parameters such as the threshold of confidence between test questions. The concept map generated by TA-ARM algorithm shows directions and relevant degrees of associations between concepts. It shows the structure between concepts, and provides optimal guidance for teaching as a knowledge visualization tool.

Although the algorithm performs well in the automatic generation of concept maps, it also has some limitations: (1) Only consider the case where one test question belongs to only one concept, and did not consider the case where one test question belongs to multiple concepts; (2) When the number of classifications, that is, the number of concepts increases, the concept map generated by the TA-ARM algorithm may be significantly different from the concept map classified by the experts and then generated. In short, the TA-ARM algorithm is not suitable for the case of multiple classifications and multiple labels. In the future, we will conduct an in-depth research on these aspects.