Introduction

Mathematics and science have long been perceived to be fundamental and essential subjects in the K-12 curriculum (McConney & Perry, 2010). Recently, since students’ competencies in STEM (i.e. science, technology, engineering, and mathematics) have received considerable attention in many countries (English & King, 2019), mathematics and science are becoming much more important. Previous research, however, has shown that many students still have difficulties in learning mathematics and science and are likely to stop learning mathematics early in their schooling (e.g. Hodgen, Küchemann, Brown, & Coe, 2009; Scarpello, 2007). In addition, although researchers largely agree that teachers should take into consideration students’ in-the-moment thinking for effective and meaningful instruction (National Council of Teachers of Mathematics [NCTM], 2014; National Research Council [NRC], 2012), teachers have trouble building their instruction upon students’ ideas and understanding (Jacobs, Lamb, & Philipp, 2010; Sánchez-Matamoros, Fernández, & Llinares, 2015). Therefore, there is a need for helping students and teachers deal with these issues in their learning and teaching.

Over the last few decades, the advancement of technological tools in education has contributed to the fields of mathematics and science education by addressing educational issues and assisting teachers and students in promoting their teaching and learning in effective and efficient ways. Educational data mining (EDM) can be considered an innovative tool or technique to provide new insights into teaching and learning. EDM is generally concerned with “developing, researching, and applying computerized methods to detect patterns in large collections of educational data that would otherwise be hard or impossible to analyze due to the enormous volume of data within which they exist” (Romero & Ventura, 2013, p. 12). Martin et al. (2015) argued that EDM techniques shed light on research in science education that has traditionally been difficult to perform (e.g. creating real-time assessment of students’ science learning and uncovering the nature of student engagement and its long-term impact on students’ choice of college major) as traditional data analysis methods were not equipped to cope with large amounts of unstructured data. Researchers have also used EDM approaches to develop an intelligent tutoring system that may support students’ personalized learning of mathematics (Cai et al., 2019) and science and mathematics teachers’ management of multiple groups (Schwarz et al., 2018; Tissenbaum & Slotta, 2019) by analyzing real-time data. In addition, a great volume of data collected by people (e.g. responses to the Third International Mathematics and Science Study [TIMSS] and the Program for International Student Assessment [PISA] questionnaires) and generated from online teaching and learning (e.g. massive open online courses [MOOCs]) have been analyzed to detect hidden but significant phenomena and information in mathematics and science education (e.g. Depren, 2018; Lee, 2019; She, Lin, & Huang, 2019).

With the recognition of the importance of EDM in the fields of education, researchers have conducted comprehensive review studies in various educational fields (e.g. Papamitsiou & Economides, 2014; Rodrigues, Isotani, & Zárate, 2018; Romero & Ventura, 2010; Saa, Al-Emran, & Shaalan, 2019; Shahiri, Husain, & Rashid, 2015). These studies, however, do not take into account the association between research topics and the EDM techniques to explore the topics. Rather, the previous studies tended to categorize each article reviewed according to research topics and EDM techniques independently. In a review of EDM in the context of higher education, Aldowah, Al-Samarraie, and Fauzy (2019) considered the potential relationship between different educational problems and EDM techniques used to address the problems. However, Aldowah et al. did not include all journals on higher education to identify articles relevant to their review even though they used such different publication sources as Scopus, Web of Science, and Science Direct. Furthermore, none of the studies has investigated how EDM has been used to support teaching and learning of mathematics and science. Given the complexities of teaching and learning of science/mathematics and their corresponding educational issues, EDM techniques could supply meaningful support for teachers and students. Therefore, a systematic review of the EDM studies used in science and mathematics education might provide important insights into improving the quality of science/mathematics education.

The Significance of the Study

Although science and mathematics are regarded as crucial subjects in school curricula around the world, not only do students struggle to learn fundamental concepts in mathematics and science, but teachers also have trouble teaching such concepts meaningfully. In recent years, a growing body of research has focused on EDM techniques to promote teaching and learning of science and mathematics (e.g. Cai et al., 2019; Martin et al., 2015). However, few literature reviews on this field have been systematically conducted to understand how such innovative techniques as EDM could be used to support science and mathematics education. To help researchers and teacher educators in science and mathematics education follow the trends in the EDM studies, integrate EDM techniques into their research, and design effective teaching and learning environments, it is important to review prior research studies on EDM comprehensively in the contexts of science and mathematics education.

Previous Reviews on Educational Data Mining

Researchers have conducted comprehensive and systematic review studies on EDM from different perspectives. Cristobal Romero and Sebastian Ventura were two popular scholars who conducted comprehensive reviews on EDM techniques and their applications to educational systems. Reviewing EDM studies from 1995 to 2005, Romero and Ventura (2007) demonstrated that DM can be used in different educational systems (e.g. traditional classrooms, web-based courses, learning content management systems, and adaptive and intelligent web-based learning systems) and that DM techniques such as clustering, classification, outlier detection, association rule mining, sequential pattern mining, and text mining were usually used in educational research. In 2010, they extended their review by analyzing 300 EDM works up to the year 2009 (Romero & Ventura, 2010). They found that researchers frequently used EDM to provide feedback for supporting instructors, offer recommendations for students, predict student’s performance, and analyze and visualize data. Developing a model of students’ skills and knowledge, detecting undesirable student behaviors, and grouping students according to their personal characteristics and features were other areas of interest that attracted many researchers’ attention (Romero & Ventura, 2010). A review of empirical studies on EDM from 2008 to 2013 revealed that prior studies on EDM aimed to identify students’ learning behaviors, predict students’ performance, increase teachers’ awareness of student learning, predict students’ dropout and retention, improve feedback and assessment systems, and recommend educational resources from data pools (Papamitsiou & Economides, 2014).

Early reviews of EDM attempted to conduct a literature review covering EDM in general. More recent review studies on EDM have tended to focus on a particular educational context or a DM technique. Shahiri et al. (2015) and Saa et al. (2019) conducted systematic reviews on predicting students’ academic performance and on identifying important factors (or attributes) affecting students’ performance using EDM. Shahiri et al. found, perhaps not surprisingly, that students’ cumulative grade point average is the most significant attribute to determine whether a student can complete a course. Internal assessments such as students’ assignment grades, quiz and test scores, lab work, and attendance were also important attributes in predicting student performance. Two main EDM approaches to predict students’ performance were classification and clustering (Saa et al., 2019).

The literature review of EDM by Rodrigues et al. (2018) focused on the process of evaluation in the context of e-learning. Rodrigues et al. identified main perspectives and trends in EDM: evaluation in traditional classrooms, evaluation of instructional actions of teachers, evaluation of administrative management, and evaluation of multimedia resources. In a comprehensive and systematic review of EDM in the context of students’ learning in higher education, Aldowah et al. (2019) found that most of the EDM papers focused on computer-supported predictive analytics (e.g. predicting students’ performance and drop/retention in a course) followed by computer-supported learning analytics (e.g. deriving information based on students’ interactions). In terms of EDM techniques, classification and clustering were the two most commonly used approaches, which is in line with findings by Saa et al. (2019).

As observed, the previous literature reviews on EDM have provided insight into DM techniques/tools and their application to educational systems in general. Although some of them sought to highlight the application of EDM to specific educational contexts (e.g. predicting students’ achievement and evaluating students’ learning in e-learning), they did not consider how EDM has been applied to teaching and learning in such particular disciplines as mathematics and science. In addition, although the prior studies reviewed main topics of EDM and DM techniques systematically, they were unlikely to consider the association between the research topics and DM techniques. In other words, there seems to be a lack of evidence showing how EDM can shed light on solving educational problems in these disciplines. The aim of this study is to provide a review of the use of EDM in the context of mathematics and science education with the objective of helping researchers and educators address their educational issues using EDM.

Methodology

The current study is a descriptive content analysis aimed at identifying and describing general issues and research trends in the integration of EDM into mathematics and science education (Çalık & Sözbilir, 2014). We used Kitchenham and Charters’ (2007) systematic literature review as a methodology to “summari[z]e all existing information about some phenomenon in a thorough and unbiased manner” (p. 7). Following their guidelines, our review consisted of three phases: a planning phase (specification of research questions and development of a review protocol), a conducting phase (research identification, study selection, study quality assessment, and data extraction), and a reporting phase.

Research Questions

The research questions that guided this study were the following: (a) What are the topics of research studies that used EDM in mathematics and science education? (b) What types of DM techniques have been used to achieve the objectives in mathematics and science education?

Identification of Research

The second step in this systematic literature review was to collect relevant research studies with regard to the research questions. With this objective, we determined the article pool and the key search strings. We used SCImago Journal Rank (SJR) as a starting point to identify appropriate and quality educational journals in mathematics and science. We selected “education” under the subject category, while other options were maintained as default (i.e. all subject areas, all regions/countries, and all types). This search resulted in 1222 journals. We then selected journals related to mathematics and science education. Journals related to computer education (e.g. Computers & Education) were also selected to identify research papers that used DM techniques in the teaching and learning of mathematics/science. Fifty-eight educational journals were chosen in total (17 mathematics education journals, 24 science education journals, 7 mathematics and science education journals, 10 computer education journals). A list of the journals considered is provided in the Supplementary Material. Furthermore, we used Web of Science, a website which provides access to multiple databases in many different disciplines, to identify EDM articles related to mathematics and science education.

The search string we used in mathematics and/or science education journals was “data mining” OR “machine learning” OR “learning analytics,” while we used a more specific search string in computer education journals, [“data mining” OR “machine learning” OR “learning analytics”] AND [“mathematics” OR “science”], to find exclusively EDM papers related to mathematics and science. In the Web of Science, we used the search string [“data mining” OR “machine learning” OR “learning analytics”] AND [“mathematics” OR “science”] AND “education.” The period of the search was set from 2010 to 2019 (June). Any paper published after July 2019 was not included in this review. The initial stage of the search yielded 847 papers (630 from the SJR and 217 from the Web of Science).

Study Selection

To select studies relevant to our research questions, we used the following inclusion and exclusion criteria, adapted from prior studies on EDM (e.g. Aldowah et al., 2019; Papamitsiou & Economides, 2014). We did not restrict our study selection to journals published in a specific country (see Table 1).

Table 1 Inclusion and exclusion criteria

Based on these criteria, the authors determined independently whether or not a paper would be included in the review. The inter-rater reliability between the authors was good (Cohen’s kappa statistic 0.76). We discussed any discrepancies until consensus on paper selection was reached. Günel, Polat, and Kurt (2016), for example, used different DM techniques to extract learning concepts from calculus, abstract algebra, and computer science textbooks. Although Günel et al. used DM techniques to analyze mathematics textbooks at the college level, we decided to exclude this article from our analysis because it focused more on comparing different DM algorithms and their accuracy in detecting learning concepts than on providing implications for teaching and learning of mathematics. Filiz and Oz (2019) also compared different DM algorithms and their accuracy in predicting students’ science performance, but we decided to include their article in our analysis because Filiz and Oz also used DM techniques to identify important factors influencing students’ science performance. Finally, 64 articles were included in this review. Figure 1 shows the entire process of selecting the articles for this review. In the process of selecting studies for review, many articles were excluded. We randomly selected several articles that we excluded in the process of abstract review and found that these articles tended to include the search string (e.g. “data mining” and “education”) somewhere in their text and references at least once but had nothing to do with our research questions.

Fig. 1
figure 1

Paper selection processes

Data Extraction and Analysis

A data extraction form was designed to accurately record the information from the 64 articles we reviewed. We used an Excel spreadsheet to record the following information: authors, year, title, subject, data source, research objective, DM technique, results, and implications.

We began the analysis by categorizing each article according to research topic and DM technique using a coding scheme developed from Papamitsiou and Economides (2014). While coding the research topics and DM techniques, we noticed that some of the articles reviewed in the current study did not fit the coding scheme. For example, one of the codes used in Papamitsiou and Economides’ review was prediction of dropout and retention. The current review, however, was not able to identify articles that used EDM to predict students’ dropout and retention. Rather, many articles in mathematics and science education tended to use EDM to examine informative factors in predicting students’ achievements in mathematics and science. Masci, Johnes, and Agasisti (2018), for instance, investigated student- and school-level factors affecting students’ PISA test scores in 9 different countries. Because we were unable to categorize Masci et al.’s article based on Papamitsiou and Economides’ coding scheme, we created a new code, identification of factors affecting performance, under the category of research topic.

Finally, we categorized the research topics for all articles into 7 major dimensions: student modeling, automated assessment, identification of factors affecting performance, performance prediction, teacher support, student support, and document analysis. We also classified the articles according to the DM techniques that the articles used. In this process, 5 different categories were found: text mining, clustering, classification, social network analysis, and relationship mining. We note that some articles were classified into more than one category due to some similarities between categories, especially when the articles addressed more than two research questions. As illustrated above, Filiz and Oz (2019) used DM methods not only to predict students’ science performance but also to identify factors affecting their performance; therefore, we assigned their article two codes: performance prediction and identification of factors affecting performance. Aiken, Henderson, and Caballero (2019) examined students’ course-taking patterns (coded as student modeling) as well as students’ academic and personal features affecting their academic success (coded as identification of factors affecting performance). In these cases, we were unable to use a single code to categorize the articles.

Results

This section summarizes and synthesizes literature on EDM conducted in the context of mathematics and science education. In general, previous studies were carried out much more frequently in the context of science (41 of 64 articles) than in that of mathematics (17 of 64 articles). Six studies were conducted in both mathematics and science (e.g. STEM). While 43 articles used DM algorithms to analyze educational data, 21 articles used DM techniques to develop intelligent tutoring systems.

Research Topics

Table 2 shows that most of the reviewed papers focused on student modeling, followed by identification of factors affecting student performance and by automated assessment. A relatively small number of papers aimed to predict students’ academic performance, support teachers’ instruction, support students’ learning, and analyze educational documents.

Table 2 Frequencies of articles reviewed according to research topic

Student Modeling

Using DM for modeling student behavior and thinking has drawn attention from many researchers. In particular, this topic was more likely to be conducted in science education than in mathematics education. The student modeling category was further divided into two sub-categories: modeling students’ behavior and learning patterns and modeling students’ thinking.

Modeling Students’ Behavior and Learning Patterns.

Researchers have sought to identify students’ behaviors within online learning environments and to identify learning patterns, such as carelessness in scientific inquiry (Hershkovitz, de Baker, Gobert, Wixon, & Pedro, 2013), self-regulated learning (Kim, Yoon, Jo, & Branch, 2018), interaction among students in a MOOC (Tawfik et al., 2017), assignment submission (Lee, 2019), and interaction in online learning platforms (Hossain et al., 2018; Kinnebrew et al., 2016; Malmberg, Järvenoja, & Järvelä, 2013). Martin et al. (2015) engaged third- and fifth-grade students in an online fraction game in which students used repeated equipartitioning to create a particular combination of fractions (e.g. to create both 1/6 and 1/9, a student may split the whole in thirds and then split one of the thirds into halves [i.e. 1/6] and another one of the thirds into thirds [i.e. 1/9]). Using a clustering algorithm, Martin et al. classified the students’ splitting behaviors into several categories and examined how students in each category learned fractions over time. Other researchers have sought to identify undergraduate students’ course-taking patterns (Aiken et al., 2019; Howard, Meehan, & Parnell, 2018; Wang, 2016).

Modeling Students’ Thinking.

DM has been used to investigate students’ cognitive structures. Lamb and his colleagues used artificial neural networks to explore students’ cognition rather than their content knowledge (Lamb, Annetta, Vallett, & Sadler, 2014; Lamb, Cavagnetto, & Akmal, 2016). Lamb et al. (2014) showed that students’ cognitive processing of science-based tasks was non-linear and dynamic. The artificial neural networks were also useful to model the relationships between the cognitive attributes associated with game-based science tasks and develop a cognitive-attribute hierarchy (Lamb et al., 2016). Analyzing students’ written responses, researchers also sought to model students’ thinking about a specific science concept, such as the effects of a stop codon on DNA replication, transcription, and translation (Prevost, Smith, & Knight, 2016), global lunar patterns (Cheon, Lee, Smith, Song, & Kim, 2013), and the impact of a mutation in noncoding regions of a gene (Sieke, McIntosh, Steele, & Knight, 2019). Two studies focused on how students think about their teachers’ teaching practices (Figueiredo, Esteves, Neves, & Vicente, 2016; Zhang, Qin, Jin, Deng, & Wu, 2017).

Identification of Factors Affecting Performance

Seventeen articles explored the main factors (or attributes) influencing students’ performance with the objective of providing high-quality education. This literature included similar numbers of mathematics and science education articles. Of the 17 articles, nine studies have used large volumes of data derived from the PISA or the TIMSS to identify important variables affecting student performance in mathematics and science (Chen, Zhang, Wei, & Hu, 2019; Depren, 2018; Depren, Aşkın, & Öz, 2017; Filiz & Oz, 2019; Gabriel, Signolet, & Westwell, 2018; Gorostiaga & Rojo-Álvarez, 2016; Liu & Whitford, 2011; Masci et al., 2018; She et al., 2019). There was only one study that examined informative factors in predicting prospective teachers’ achievements. Using artificial neural networks, Akgün and Demir (2018) revealed that university placement scores were the most informative variable in predicting prospective elementary teachers’ achievements in science and technology education courses. Other researchers sought to identify the factors that predicted students’ attitudes towards mathematics (Aksoy, Narli, & Idil, 2016), self-regulated learning (Kim et al., 2018), and their choice of undergraduate courses (Aiken et al., 2019; Suh, Upadhyaya, & Nadig, 2019).

Automated Assessment

Although using open-ended questions may better reveal students’ formal and informal ideas simultaneously than using multiple choice questions, resource and time limitations exist in assessing students’ responses to open-ended questions (Sieke et al., 2019), especially when a teacher deals with a large number of students’ written responses. To address these issues, researchers, particularly in science education, have used DM techniques to evaluate students’ written explanations efficiently and accurately.

According to our review, Ha and Nehm were two scholars who contributed greatly to this field. Ha, Nehm, and their colleagues used computer-assisted scoring models such as SPSS Text Analysis and Summarization Integrated Development Environment (SIDE) to automatically evaluate students’ written explanations of particular biology concepts (Beggrow, Ha, Nehm, Pearl, & Boone, 2014; Ha, Nehm, Urban-Lurain, & Merrill, 2011; Nehm, Ha, & Mayfield, 2012; Nehm & Haertig, 2012). These techniques require researchers (humans) to score a large number of student written answers first in order to train a machine (a computer). Ha et al. (2011) used the SIDE technique to develop an automated scoring model to evaluate students’ written answers to questions about evolutionary change. Based on identification of key concepts, this model achieved satisfactory inter-rater agreement with human evaluators (kappa statistic larger than 0.8). In a similar study, Nehm et al. (2012) examined the impact of students’ response length on assessment accuracy and noted that varying response lengths did not significantly influence scoring performance. DM was also used to detect students’ cheating strategies in an e-learning environment (Northcutt, Ho, & Chuang, 2016). Although automated assessment is one of the interesting research topics in science education, it has not gotten much attention in mathematics education research.

Performance Prediction

Academic performance in mathematics and science had a great influence on students’ choice of major and career as well as educational policies and directions around the world. Some studies in this category focused on the accuracy of DM models to predict student performance (Abidi, Hussain, Xu, & Zhang, 2019; Depren et al., 2017; Filiz & Oz, 2019). Filiz and Oz (2019) demonstrated that students’ science scores can be predicted with reasonable accuracy (more than 70%) based solely on students’ responses to the TIMSS questionnaire which included questions about students’ confidence in science and home educational resources. Other researchers focused on whether or not a DM technique can be used to make an early prediction of a student’s failure in a course or graduation (Cooper & Pearson, 2012; Ismail & Abdulla, 2015).

Teacher Support

Of the 7 research topics, teacher support was the only topic that received more attention in mathematics education. To help teachers develop their ability to respond to student thinking, Bywater, Chiu, Hong, and Sankaranarayanan (2019) developed an AI-based tool, called the Teacher Responding Tool, to provide automatic recommendations for teachers to respond to students’ written explanations. DM techniques also supported teachers in analyzing and reflecting on their own teaching practices (Duzhin & Gustafsson, 2018; Owens et al., 2017; Sergis et al., 2019). Two surveyed studies investigated how machine learning–based systems can help teachers orchestrate multiple groups in a collaborative setting (Schwarz et al., 2018; Tissenbaum & Slotta, 2019). Schwarz et al. (2018) used a system (System for Advancing Group Learning in Educational Technologies [SAGLET]) in which alerts are automatically sent to teachers as the system becomes aware of such critical moments as “idleness, off-topic talk, technical problems, explanation or challenge, confusion, correct solutions, and incorrect solutions” (p. 193) in online group learning. In this system, teachers observed multiple groups’ learning and used the alerts about critical moments to guide the groups.

Student Support

A small number of articles focused on the topic of student support, and the studies were likely to be conducted in the field of science education. Specifically, these studies involved the development of intelligent tutoring systems based on DM techniques to promote students’ scientific argumentation (Huang et al., 2011; Lee et al., 2019). Lee et al. (2019) introduced a formative feedback system, HASbot, which uses the natural language processing (NLP) algorithm to assess students’ written scientific arguments and provide immediate adaptive feedback. HASbot provides two types of feedback: (a) diagnostic (e.g. “You are scientific explanation level 1 because you did not use scientific evidence”) and (b) suggestive (e.g. “Your argument will be stronger if you evaluate the strengths and weaknesses of the evidence from the model. What are you certain about from the groundwater model?”) (Lee et al., 2019, p. 606). Another intelligent tutoring system was proposed by Rao and Saha (2019) to help students overcome learning difficulties in biology and facilitate students’ independent learning. The system provided learning content in biology and assisted students by automatically identifying important biological concepts and questionable sentences, retrieving relevant images from websites for better understanding, conducting mock tests before actual examinations, and evaluating students’ performance.

Document Analysis

Two articles in our review used DM methods to analyze existing learning documents. Like the topic of student support, research on document analysis was limited to the area of science education. Wahlberg and Gericke (2018) conducted textbook analysis using DM techniques. Since protein synthesis tends to be included in both biology and chemistry upper secondary curricula, Wahlberg and Gericke used text summarization and text mining techniques in order to examine how the same concept is described in the two different curricula. Reitsma, Marshall, and Chart (2012) used intermediary-based standard cross-walking and the NLP technique to automatically align a substantial body of learning resources with varying educational standards in the USA.

Data Mining Techniques

We classified the reviewed articles into 5 DM techniques (see Table 3). Classification was the most frequently used DM method, followed by text mining and by clustering. Whereas similar numbers of articles used classification algorithms to explore topics related to mathematics and science education, there was a difference in the number of articles that used text mining in each of the two disciplines. Furthermore, only a few articles used such DM methods as social network analysis and relationship mining to address educational issues in mathematics education.

Table 3 Frequencies of articles reviewed according to data mining technique

We also found that different DM techniques were used to examine various research topics in mathematics and science education. In Fig. 2, the thickness of each link illustrates the strength of the relationship between the research topic and the DM technique used. Classification techniques were mostly used to identify factors affecting students’ performance and to predict students’ performance. Text mining was used to examine various research topics, especially focusing on automated assessment. Researchers tended to use clustering, social network analysis, and relationship mining methods to model students’ behavior and learning patterns. In the subsequent sections, we briefly introduce the 5 DM techniques and describe how each technique was used to explore different research topics.

Fig. 2
figure 2

Relationships between research topics and data mining techniques

Classification.

Classification algorithms aim to assign items in a data set to predefined classes. These algorithms can be used for “predicting student performance, achievement, knowledge, predicting/preventing student dropout, [and] detecting problematic student’s behavior in online courses/e-learning” (Aldowah et al., 2019, p. 23). In our review, some researchers have compared the classification accuracies of different algorithms to demonstrate the effectiveness of the algorithms (Abidi et al., 2019; Depren, 2018; Depren et al., 2017; Filiz & Oz, 2019; Gorostiaga & Rojo-Álvarez, 2016). Others have used a specific classification algorithm to identify important factors and predict student performance. White-box DM models such as tree-based algorithms (Aksoy et al., 2016; Figueiredo et al., 2016; Gabriel et al., 2018; Gobert, Kim, Sao Pedro, Kennedy, & Betts, 2015; Goggins, Xing, Chen, Chen, & Wadholm, 2015; Hershkovitz et al., 2013; Kirby & Dempster, 2015; Lavie Alon & Tal, 2015; Liu & Whitford, 2011; Masci et al., 2018; Owens et al., 2017; She et al., 2019; Wang, 2016) were preferable to such black-box models as random forest (e.g. Aiken et al., 2019; Kim et al., 2018) and artificial neural networks (e.g. Akgün & Demir, 2018; Cooper & Pearson, 2012; Roberts, Chung, & Parks, 2016), which were less comprehensible (Romero & Ventura, 2013).

Text Mining.

Text mining is a method of identifying and extracting important information from a text (Aldowah et al., 2019; Romero & Ventura, 2013). In the current review, researchers have used text mining techniques to develop automated assessment systems (Beggrow et al., 2014; Ha et al., 2011; Ha & Nehm, 2016; Liu, Rios, Heilman, Gerard, & Linn, 2016; Nehm & Haertig, 2012; Nehm et al., 2012; Northcutt et al., 2016; Prevost et al., 2016; Sieke et al., 2019; Wiley et al., 2017), assist teachers in noticing students’ mathematical thinking (Bywater et al., 2019) and in managing group learning (Schwarz et al., 2018; Tissenbaum & Slotta, 2019), help students improve their scientific arguments (Huang et al., 2011; Lee et al., 2019), and facilitate students’ self-regulated learning (Rao & Saha, 2019). Based on these observations, we noted that researchers in science education were much more likely to use text mining techniques in order to develop an intelligent system and evaluate its effectiveness than to use them to analyze large volumes of text data. There were few studies that used text mining to analyze text data and identify useful information. Wahlberg and Gericke (2018), for example, extracted keywords from science textbooks and compared them to examine how a science concept was described in the different textbooks.

Clustering.

Clustering algorithms aim at “collecting similar objects together to form a group or cluster. Each cluster contains objects that are similar to each other but dissimilar to the objects of other groups” (Dutt, Ismail, & Herawan, 2017, p. 15991). In our review, K-means clustering (Figueiredo et al., 2016; Malmberg et al., 2013; Sergis et al., 2019; Zhang et al., 2017) and hierarchical clustering (Hossain et al., 2018; Lee, 2019; Liu & Lee, 2013; Martin et al., 2015) were the two most commonly used clustering techniques. In the context of mathematics and science education, clustering was used to identify patterns of undergraduate students’ resource choices (e.g. online videos and/or live lectures; Howard et al., 2018), identify the ways students solved homework and quiz problems in a physics MOOC (Lee, 2019), classify students’ responses to pretest questions before an instructional intervention (Magana et al., 2019), identify high- and low-achieving students’ learning strategies in favorable or challenging learning situations (Malmberg et al., 2013), and detect students’ behavior patterns in online learning environments (Araya et al., 2014; Kim et al., 2018; Martin et al., 2015). In addition, clustering was used to predict whether an undergraduate student was likely to graduate high school with a high or low cumulative GPA (Ismail & Abdulla, 2015).

Social Network Analysis.

Social network analysis aims to “understand and measure the relationships between entities in networked information” (Romero & Ventura, 2013, p. 21). Based on network theory and graph theory, it characterizes the relationships in terms of nodes (e.g. concepts, keywords, individual actors) and links or edges (i.e. relationships between nodes). Visualization of networked structures helps researchers and instructors understand or discover hidden patterns or relationships in the context of teaching and learning (Aldowah et al., 2019). In the reviewed articles, researchers used social network analysis to model student thinking (Choi, Lim, & Son, 2017; Prevost et al., 2016; Sieke et al., 2019), identify important factors influencing students’ attitudes towards mathematics (Aksoy et al., 2016), and analyze textbooks (Wahlberg & Gericke, 2018).

To examine students’ recognition of mathematics and science, Choi et al. (2017) asked Korean middle graders to write down three keywords when they thought of mathematics and science. Results indicated that Korean middle graders associated science mainly with “experiment,” “chemistry,” “physics,” “scientist,” and “biology,” while mathematics was associated with “calculation,” “function,” “numbers,” “difficulty,” “equation,” “formula,” etc. Sieke et al. (2019) also used social network analysis to characterize students’ thinking about the impact of mutation in a noncoding DNA region. Sieke et al. classified students’ responses to question regarding noncoding mutation into 8 different categories and examined how the categories were related in a single answer.

Relationship Mining.

Relationship mining is used to identify relationships between variables and discover rules to explain the relationships. Association rule mining is a commonly used method. Relationship mining also seeks to explore temporal associations, linear correlations, and causal relationships between variables (Romero & Ventura, 2013). Three of five articles classified as relationship mining in our review used association rule mining methods to model students’ behavior and ideas. Chen and Chang (2017) used a fuzzy association rule mining technique to identify significant rules for explaining the relationships between grades in particular undergraduate courses and accumulated GPA. In a study of students’ construction of concept maps in a biology context, Liu and Lee (2013) used an association rule mining algorithm, a priori, and identified 8 significant association rules that demonstrated the relationships between students’ knowledge structures in biology.

Conclusions and Discussion

The current review demonstrated that modeling students’ behavior and thinking was the most preferred application of EDM in mathematics and science education, which is in line with the findings of Papamitsiou and Economides (2014) and Peña-Ayala (2014). The higher frequency of articles dealing with the topic of student modeling may stem from the advancement of technology that captures a large amount of educational data in online learning environments (Lee, 2019; Levy & Wilensky, 2011). For example, log file data collected in online platforms (e.g. MOOCs and learning management systems) provide researchers with detailed information about what students have done and what they are doing (Lee, 2019). And, DM techniques have the potential to identify students’ learning patterns based on the log file data. In the current review, we found that clustering was the most frequently used DM technique to model students’ behavior and thinking, perhaps because of its effectiveness in identifying hidden patterns based on similarities and differences in the values of data.

We also found that relatively few studies in mathematics and science education focused on using DM techniques to predict student performance. This result may be surprising because previous reviews of EDM have consistently provided evidence that researchers have largely used DM methods to analyze students’ performance and retention (e.g. Aldowah et al., 2019; Papamitsiou & Economides, 2014; Peña-Ayala, 2014; Romero & Ventura, 2010). The current review, however, revealed that more attention was paid to identifying important factors influencing students’ mathematics or science performance than predicting their performance, perhaps because researchers in these areas focus more on the development of student performance and effective teaching and learning based on the important factors identified (Chen & Chang, 2017; Masci et al., 2018). Nonetheless, since mathematics and science have hierarchical structures, it is also important for educators to predict in advance whether a student will fail a mathematics or science class and provide an appropriate treatment as early as possible (Cooper & Pearson, 2012). DM techniques (e.g. classification algorithms) can be used to predict students’ success, failure, or overall academic achievement in mathematics and science. Data from TIMSS and PISA can be particularly useful in that these programs have been collecting huge volumes of data on mathematics and science from many countries (e.g. Depren, 2018; Filiz & Oz, 2019). Students’ log data in online learning systems can also be analyzed to predict students’ performance and course completion as early in the course as possible (e.g. Gobert et al., 2015).

As seen in our review and the previous reviews of EDM, most of the studies were much more likely to concentrate on students’ cognitive performance than on their affective features. Although Romero and Ventura (2013) argued that detecting and tracking students’ emotions, motivations, and interests is a topic of interest to the EDM research community, we found that very little research in science and mathematics education has been conducted on how EDM can be used to analyze students’ affective factors (e.g. Aksoy et al., 2016). Since affective domains significantly affect students’ behavior and performance (Pantziara & Philippou, 2015), future research in science and mathematics education might examine how EDM can be used to detect useful information on students’ affective factors, explore the relationships between affective constructs and cognitive domains, and assist students in developing a positive attitude (e.g. Aksoy et al., 2016).

Our review showed that applying DM techniques to automated assessment was the second most frequently addressed research topic, but only one article (Goggins et al., 2015) in mathematics education focused on automated assessment. A low number of studies on automated assessment in mathematics education can be attributed to the complexity of developing such systems in mathematics. In our review, we noticed that automated assessment was more likely to involve text mining than other DM techniques (see Fig. 2). However, mining mathematical texts may not be straightforward because mathematics, in its nature, consists of many symbols and graphs. Automated assessment has many advantages in teaching and learning, such as rapid assessment of hundreds of students’ written responses, immediate feedback and diagnosis of students’ learning deficiencies, and consistent and unbiased assessment (Nehm & Haertig, 2012; Wiley et al., 2017). Future research in mathematics education might need to examine how students’ written responses including mathematical symbols can be collected for analysis using DM and how automated assessment can be integrated into mathematics education to support teaching and learning of mathematics.

The low frequency of EDM studies under the categories of teacher support and student support may result from the fact that researchers paid more attention to analyzing enormous volumes of educational data using DM methods than developing DM-based systems to support teachers and students. In fact, there were 43 papers that used DM to analyze educational data and 21 papers that used DM to develop DM-based systems. Although a small number of studies focused on such topics as teacher support and student support, there was a difference in these topics between science and mathematics education. Whereas the topic of teacher support was more likely to be studied in the context of mathematics education, EDM research on student support was only performed in the area of science education. Based on the findings, two potential research directions in mathematics and science education can be suggested.

First, over the last decade, a growing body of research in mathematics education has investigated teachers’ ability to notice and respond to in-the-moment student thinking (Jacobs et al., 2010; Kilic, 2018; Sánchez-Matamoros et al., 2015). Recent reforms in science education also support a vision of teaching and learning in which teachers pay attention to students’ scientific ideas and build their teaching practices upon student thinking (NGSS Lead States, 2013; NRC, 2012). Barnhart and van Es (2015) found that preservice science teachers had difficulties in noticing and responding to students’ thinking without appropriate support from teacher educators. Bywater et al. (2019) demonstrated that a DM-based tool assisted mathematics teachers in developing their expertise in noticing and responding to nuances in students’ ideas and written explanations. Using the NLP technique, the tool provides teachers with three automatic recommendations, and the teachers can choose, edit, and/or combine any of the recommendations so that they can effectively respond to student ideas. Empirical results showed that the tool assisted teachers in noticing, interpreting, and responding to students’ mathematical ideas by helping them notice nuances in students’ explanations and make instructional decisions based on students’ current understanding rather than evaluate the correctness of student solutions. Although Bywater et al. used DM techniques to examine mathematics teachers’ noticing, future researchers may apply EDM in the context of science education to help novice science teachers develop their ability to notice student thinking as evidence of a successful classroom endorsed by the science education reforms.

Second, reviewing argumentation studies in science education thematically, Bağ and Çalık (2017) found that research on development of scientific argumentation skills was one of the topics that attracted many researchers’ attention. In addition, the current review showed that researchers in science education sought to develop DM-based tutoring systems and revealed that these systems helped students improve the quality of their scientific argumentation (Huang et al., 2011; Lee et al., 2019). NCTM (2014) stated that developing students’ and novice teachers’ mathematical arguments was one of the important practices in mathematics education. Although we did not find any articles that used DM techniques to develop students’ mathematical argumentation skills, future researchers may integrate DM techniques into mathematics education to investigate students’ and teachers’ mathematical argumentation and to improve their levels of argumentation.

The findings in the current study have important implications for researchers and teacher educators. First, as discussed above, DM techniques can be used to analyze large volumes of educational data quantitatively. Although researchers have used statistical methods to analyze quantitative data, DM techniques (e.g. text mining) have the potential to analyze unstructured data, such as students’ and teachers’ discussion, written explanations, and online log data files, which are hard to analyze with traditional statistical methods. Therefore, researchers might use EDM to discover hidden phenomena and information in mathematics and science education. Second, although relatively small numbers of EDM papers in mathematics and science education focused on developing DM-based systems to support teachers and students, our review showed that DM techniques have the potential to deal with important issues in mathematics and science education (e.g. teachers’ difficulties in noticing student thinking, managing different groups of students, and assessing many students’ learning processes as well as personalizing students’ learning and development of argumentation skills). Based on this review, we hope that EDM can be further used for improving the quality of mathematics and science education and addressing educational issues related to teaching and learning in mathematics and science education.