1 Introduction

Data mining has shown its success in e-commerce and business development, and now its usages in education sector are growing. Data mining detects patterns in data and be able to mine hidden information in the datasets, which leads to more effective decision making. Data mining is used to withdraw useful information from large datasets. Different approaches are used for data analysis, prediction, and classification to find hidden patterns in a huge meaningless datasets (Adekitan & Salau, 2019). This useful information is mined by the state-of-the-art data mining algorithms. A data mining algorithm is a set of steps that are followed to mine useful information and to build classification and prediction models by finding patterns in the datasets. These algorithms are used in the form of techniques or prediction models. A number of data mining techniques, processes or models to mine data, are being used e.g., Decision Tree, K-Nearest Neighbors, Random Forest, Support Vector Machine, Artificial Neural Network, and Naïve Bayes. Data mining techniques are becoming beneficial in educational areas as well. Data mining can be used by policy makers to identify essential factors in improving of the education quality. It can also help institutions in analyzing students' achievements, requirements, issues, and learning habits (Adekitan & Noma-Osaghae, 2019).

Higher education is considered as the basis for advancement of a society. We indicate that many students’ dropout and withdraw their education or retake admission in the same courses every year. If a massive number of students leave their education because of failure, not only students will suffer themselves, but it will also affect educational systems in a negative way. Therefore, it is necessary to have a system that can detect students' who are going to dropout in their final examinations to minimize failure rates (Chui et al., 2020). No education system is successful if it is not evaluated continuously. In order to improve institutional results and to ensure that all students graduate on time, it is necessary to find out obstacles in the path of students' success. It is very difficult for teachers dealing with many students, to mine their data and detect students' weak areas, but data mining makes it very easy and interesting task without teachers' direct involvement. Data mining techniques help in mining large amount of data in educational sectors to improve teaching and learning processes, called educational data mining. Educational Data Mining (EDM) techniques refer to the methods or algorithms used for mining educational datasets. In EDM, it is essential to extract the required information from huge educational datasets. This information can be used by higher educational authorities to improve policies, by institutions to check and balance teachers’/students’ issues and by students to improve their results (Burgos et al., 2018). EDM can be defined as using data mining techniques to immense educational datasets for solving different educational issues. EDM processes involve gathering data, applying models on that data to describe patterns or to mine useful information concerning educational institutes or students. EDM can be used to understand students' learning behaviors and interests to better design teaching strategies that will improve their performance and minimize dropout rates. Educational institutions are storing a huge amount of data every year in their databases. This massive data can be transformed into useful information to help different stakeholders in decision making processes (Kabakchieva, 2013). Since this information, students can detect their weak areas in different courses, teachers can improve their teaching strategies, and administrations can better manage different resources effectively for the benefit of their institution. Data mining makes all these tasks very easy on the behalf of previous experiences and patterns found in the data (Fernandes et al., 2019).

One of the critical tasks in EDM is students' future exam performance prediction. There are several studies published up to the date that used data mining techniques for predicting the students' exam performance. The main goal of these studies is to classify the entire students into two classes, i.e., "pass" or "fail". Students' performance predictions can be conducted by using supervised data mining techniques. In supervised data mining, a mathematical model is built from dataset that describes inputs as well as the desired outputs. It is significant to predict results before many students are dropped out from a specific course. Predicting students' performance is necessary for the institutes to find out weak corners of different courses. This prediction is useful to take an early action by improving learning processes of those students who have high risk of failure in the course.

Some survey studies are published till date that explore research works performed in educational data mining. A survey paper attempts to explore the determinants of students’ dropout, in order to benefit future research by highlighting most significant socio-economic features. The study concludes that a mix of individual, economic and educational features affects students’ academic outcomes (Aina et al., 2021). Another study focused on supervised data mining algorithms widely used for students’ performance prediction (Sen et al., 2020). Two survey studies explored research work to analyze prediction models and students’ factors that influence prediction results (Batool et al., 2021; Khan & Ghosh, 2021; Namoun & Alshanqiti, 2021; Qian et al., 2022; Upadhyay et al., 2021). In previous survey studies, we found that, to the best of our knowledge, no survey study tried to explore all factors that may influence students’ performance prediction results. The main strength of our survey study is that it summarizes the research work of last two decades with a total of 269 studies and tries to cover all factors that may improve students’ exam performance prediction results, called students’ attributes. By exploring latest survey papers, another prominent research gap is identified i.e., to the best of our knowledge, no survey paper explored feature selection.

This survey paper analyzed the research work performed in the last two decades by comparing state-of-the-art data mining techniques, data mining tools, and input attributes used for results prediction. The first objective of this paper is to seek for the best prediction techniques. Different prediction techniques are compared in terms of prediction accuracy to find out highly accurate prediction method. The second objective is to identify students' attributes that lead to most accurate prediction results as compared to others. As the third objective, this paper compares and identifies the mostly used data mining tool for prediction process. The presented survey paper synthesizes the machine learning models and tools applied in education to predict student performance. The presented study may help educational institutions to design and deploy a prediction model in their academic sections using available tools and students’ attributes. This study may enhance learning management systems (LMS) in virtual learning institutes to predict and prevent students’ dropout in online courses. The presented survey enhances the results of previous review studies in order to cover all factors essential in future research work.

We conducted a systematic literature review to answer the research questions:

  1. 1)

    Which data mining algorithm is mostly used in last two decades?

  2. 2)

    Which students’ attributes are highly correlated to their exam performance?

  3. 3)

    Which data mining tool is mostly used and why?

  4. 4)

    What is the role of feature selection in students’ performance prediction?

This paper answers these questions by comprehensively exploring the latest work and trends in educational data mining. It also focuses on the main aims of all research papers, i.e., to predict students’ exam scores or to classify students into pass/fail categories. This study aims to explore the right time for predicting final exam results, too. This study also presents the role of feature selection in predicting students’ results. These all factors will direct future research to achieve highly accurate prediction results in different scenarios and using different available inputs or tools.

The remainder of the paper is organized as follows. In Section II presents an overview of survey paper and its contributions. Section III presents a summary of data mining techniques, section IV describes students’ attributes and their impact on academic performance, section V gave a comparison of data mining tools used in students’ performance prediction. While section VI gives results of presented survey paper and the whole study is concluded in section VII. At last, section VIII and IX presents’ limitations of the study and future research work respectively.

2 Method

This section presents the proposed research methodology adopted to conduct the survey. We explore research papers published in the last two decades to answer the research questions mentioned above. We used Google scholar, IEEE Explore, Web of Science, Elsevier and DBLP to find research articles of well-known and impact factor journals, conferences, and thesis published till 2021. This survey paper focused on traditional classroom learning as well as e-learning platforms. The phrases used for searching research articles include “students’ performance prediction”, “exam score prediction”, “educational data mining”, “students’ academic performance”, “students’ final exam prediction”, “CGPA prediction”, “machine learning in predicting students’ grades”. Using these phrases, a total of 312 research papers were identified, and we stored them in a database. After reading full articles, only 269 research articles were included in this survey because they were focused on the supervised learning techniques. That is, the remaining 43 papers dealt with unsupervised learning techniques. This study focused on classification techniques only, while regression, clustering, association rules and feature optimization methods are used in few studies to improve the classification results. After literature selection, we presented and summarized the findings of selected articles by comparing and calculating the results. RapidMiner is used to provide a comprehensive analysis of research and the final outcomes are presented in graphs.

Figure 1 presents the directions of this research study. The figure shows that there are mainly two evaluation aims of students’ performance prediction i.e., classification and regression. It is presented that research studies evaluate students’ performance mainly at the time of admission, mid of academic session and right before final examinations. Data mining algorithms used in studies are presented in the figure, named as association rules, classification, regression, clustering and feature optimization. The four mostly used tools i.e., WEKA, RapidMiner, Python and MATLAB are mentioned in the figure. At last, input features are categorized. The figure presents that main input features are learning resources, academic performance, demographics, psychological factors, attendance, admission scores and internet usage.

Fig. 1
figure 1

Survey paper taxonomy

3 Data Mining Techniques

There are a number of techniques used for data mining, classification and prediction of the final outcomes. In data mining, classification methods are used for prediction where a classification model distributes a dataset into several classes. Classification process can be divided into two steps. First, based on training data a classifier is generated. Second, this classifier is used to label new data items with unknown classes (Asif et al., 2017). The aim of building a classifier is to make predictions about future data with relevant characteristics by distributing data into predefined classes. In prediction process, different data mining techniques can be adapted to classify students in multiple classes based on their performance, e.g., “pass” or “fail”. This section gives a brief overview of different classification techniques used in previous research papers for students' performance prediction.

3.1 A. Decision Tree

One of the mostly used data mining technique for EDM is Decision Tree (DT). Decision Tree is a tree-like graph based on a set of conditions. A set of features are used as input and class labels are the output of Decision Tree. A root node is placed on the top which generates a set of different branches. Each branch describes a condition which is further connected with the next node.

Decision Tree continues this process till it reaches the leaf node. These leaf nodes are labeled as classes or decisions (Tomasevic et al., 2020). Decision Tree follows an IF–THEN algorithm. Decision Tree model is simplest technique and thus it is very easy to understand it's working. Figure 2 describes a simple Decision Tree model which predicts students’ results on the basis of some conditions. A study analyzes students’ factors before admission and during current semester to predict their semester examination results. Decision Tree is used to build prediction model and study shows 87.14% accurate results (Yathongchai, 2003). A research study applied Genetic algorithm to fine-tune students’ score prediction tree (Kalles & Pierrakeas, 2006). Another study (Hsu et al., 2003) used Apriori algorithm to obtain significant factors in predicting students’ performance and then applied genetic algorithm for calculating fitness function of variables. A study analyzes students’ factors before admission and during current semester to predict their semester examination results. Decision Tree is used to build prediction model and study shows 87.14% accurate results (Yathongchai, 2003). A research study applied Genetic algorithm to fine-tune students’ score prediction tree (Kalles & Pierrakeas, 2006). Another study (Hsu et al., 2003) used Apriori algorithm to obtain significant factors in predicting students’ performance and then applied genetic algorithm for calculating fitness function of variables.

Fig. 2
figure 2

Decision Tree (DT)

The study shows that factor analysis positively impacts classification tree results. Generalized sequential pattern mining is used (Patil & Mane, 2014) to draw patterns for predicting students’ academic performance. It is proved that using significant features generates more accurate prediction results. Another study proved that Fuzzy genetic algorithm improves the prediction results (Hamsa et al., 2016). Another study used association rule mining to draw a significant correlation between students’ admission data and their academic performance (Rojanavasu, 2019). Decision Tree is used to generate prediction rules (Al-Radaideh et al., 2006; Ogor, 2007) and found that students’ gender, education funding and CGPA in previous semesters highly influence their final grades. A Decision Tree based early warning system is developed to predict students that are more prone to dropout. Students and teachers are then informed by email and asked to pay more attention in order to improve students’ results (Hu et al., 2014). Decision Tree outperformed Neural Network (Herzog, 2006), Naïve Bayes (Nghe et al., 2007) in estimating students’ degree completion time and their grades in final examination.

Information Gain and Gain Ratio are used explore correlation between students’ factors and their academic performance. It is found that students’ study time, age and parents’ education highly influence students’ results (Osmanbegović et al., 2014). Other studies show that students attendance (Upadhyay & Gautam, 2016) and courses (Altujjar et al., 2016) in current semester are most significant prediction features. An improved Decision Tree is proposed using Information Gain and Entropy. The partition and nodes of Decision Tree are selected with attributes having higher Information Gain. The proposed method is repeated until the best results are obtained (i.e., accuracy = 97.50%) (Sivakumar et al., 2016; Sivakumar & Selvaraj, 2018). Research studies show that ensemble model gave significant improvements in the prediction accuracy (Jha et al., 2019; Livieris et al., 2018; Pandey & Taruna, 2016).

Students’ behavior features and social activities have significant impact of their academic performance, and the research studies recommends using students’ cognitive as well as personal, economic and social attributes to predict their exam performance (Aman et al., 2019; Amrieh et al., 2016; Kiu, 2018; Zhao et al., 2020a). Therefore, random features may reduce the prediction accuracy. Dataset preprocessing and significant feature selection enhance the prediction results (Al-Obeidat et al., 2018; Oyefolahan et al., 2018; Wong & Senthil, 2018). An improved ID3 algorithm is proposed comprised of two steps i.e., entropy-based feature selection and prediction model construction. The proposed model proves that eliminating random features enhance model prediction results (Patil et al., 2018, Santoso, 2020).

Several research studies applied Decision Tree for students’ performance prediction (Adebayo & Chaubey, 2019; Adhatrao et al., 2013; Akinrotimi et al., 2018; Asif et al., 2017; Banu & Manjupargavi, 2021; Baradwaj & Pal, 2012; Bresfelean, 2007; Buenaño-Fernández et al., 2019; Dey 2020; Figueroa-Cañas & Sancho-Vinuesa, 2020; Hasan, 2019; Hasan et al., 2019; Hew et al., 2020; Kabra & Bichkar, 2011; Kabakchieva, 2013; Kovacic, 2010; Liang et al., 2016; Mikroskil, 2019; Moseley & Mead, 2008; Nandeshwar & Chaudhari, 2009; Patacsil, 2020; Puarungroj et al., 2018; Ramaswami & Bhaskaran, 2010; Sawant et al., 2019; Vivek Raj & Manivannan, 2020; Yadav & Pal, 2012; Zhang & Wu, 2019b). A number of research studies performed a comparison of Decision Tree, Neural Network, Naïve Bayes (Evwiekpaefe et al., 2014), Random Forest (Mishra & Kumawat, 2018; Salal et al., 2019), Lazy Learner (IBK) (Ilic et al., 2016; Meghji et al., 2019; Pandey & Taruna, 2014), KNN (Anuradha & Velmurugan, 2015; Poudyal et al., 2020), Gradient Boosting (Howard et al., 2018), SVM (Anoopkumar & Rahman, 2018; Francis & Babu, 2019), Logistic Regression (Perez et al., 2018; Salal et al., 2019; Kemper et al., 2020), Multilayer Perceptron (MLP) (Freitas et al., 2020) and Sequential Minimal Optimization (Acharya & Sinha, 2014) and proved that Decision Tree is most effective prediction model. In last decades, several Decision Tree algorithms have been used as classification and prediction models. In (Hamoud, 2016; Hamoud et al., 2018), J48, Random Tree, Hoeffding and Rep tree are found as best Decision Trees for students’ academic performance prediction. Different Decision Tree models are compared namely CART, CHAID, C4.5 and ID3 and proved that JRip (Walia et al., 2020), CART (Saa, 2016; Wong & Yip, 2020) and C4.5 (Saheed et al., 2018) gave the best prediction results. Another study shows that JRip prediction model outperformed as compared to other Decision Tree classifiers (Pattanaphanchai et al., 2019). Table 1 presents a summary of research studies implementing Decision Tree for students’ performance prediction.

Table 1 Decision tree for students’ performance prediction

3.2 B. Naïve Bayes

Naïve Bayes (NB) is another classification algorithm based on the Bayesian theorem. It is called Naive as this technique assumes that there is no hidden relationship between data attributes that can affect prediction results. It calculates the probability of belonging to a specific class. The class which obtains highest probability is considered as the class of that data (Tomasevic et al., 2020). Below equation shows the Bayesian formula which calculates probability of class A with the association of class B.

$${\text{P}}\left({\text{A}}|{\text{B}}\right)= \text{ } \frac{{\text{P}}\left({\text{B}}|{\text{A}}\right)\text{*P(A)}}{\text{P(B)}}$$
(1)

Naïve Bayes algorithm has the potential to predict students’ academic performance (Bekele & McPherson, 2011; Bekele & Menzel, 2005). A Bayesian model is proposed to classify students into different classes based on their academic performance (Ramaswami & Rathinasabapathy, 2012). Naïve Bayes is also compared to KNN for the selection of most accurate prediction model. The study shows that Naïve Bayes achieved more accuracy i.e. 93.6% when both models were constructed using demographic features only (Amra & Maghari, 2017). Another study used Naïve Bayes algorithm to forecast students’ grades in their final exam. Different semester activities e.g. students’ assignments, previous grades, attendance and lab assessments are proved to be very useful features for prediction of final exam grades (Shaziya et al., 2015). Naïve Bayes and SVM prediction model are built and compared, where Naïve Bayes gave better results i.e. 92% (Tripathi et al., 2019) and 63.5% (Kaur & Bathla, 2018). Distance learning makes it more challenging for tutors to interact with individual students, identify their weak areas and to predict students’ academic performance. A study implemented Naïve Bayes algorithm to provide a supporting tool for teachers which predict students’ final exam performance in distance learning environment (Kotsiantis et al., 2002). Students’ demographic variables including parents’ qualification, jobs, living status, income, students’ eating habits are used to develop Naïve Bayes prediction model. The prediction model found that students’ scores in secondary school, living status and medium of teaching are highly correlated to their grades in college examinations (Bhardwaj & Pal, 2012).

A comparison of different classification algorithms namely Decision Tree, Random Forest, Naïve Bayes, MLP, KNN and Logistic Regression is performed, and results show that Naïve Bayes gave best prediction results as compared to other techniques (Koutina & Kermanidis, 2011; Romero et al., 2013; Barbosa Manhães et al., 2015; Marbouti et al., 2016; Yaacob et al., 2019; Ahmed et al., 2020a). Naïve Bayes, MLP and J48 algorithms are used for students’ exam performance prediction based on their previous academic performance. The study shows that Naïve Bayes gave best results i.e. accuracy = 76.65% (Osmanbegovic & Suljic, 2012). Naïve Bayes and Decision Tree classification algorithms are compared, and it is found that Naïve Bayes outperform in predicting students’ final semester marks. Students’ demographic and academic attributes are preprocessed to improve classifier’s accuracy (Kaur & Singh, 2016; Khasanah, 2017; Mueen, 2016; Wati et al., 2017). Similarly, students’ admission test scores (Harvey & Kumar, 2019) and final exam results (Kumar et al., 2019) are predicted using Naïve Bayes and Decision Tree prediction models. The study shows that Naïve Bayes gave higher accuracy of 71% and 85% respectively. A web based Naïve Bayes classifier is developed to store students’ data, retrieve useful information and to predict their final exam success rate. Such a prediction model is found very useful for institutions to maintain their success graph and to provide relevant assistance to students and teachers (Devasia et al., 2016).

Most of the research studies focus on students’ family background and previous academic performance to predict their future exam scores. However, students’ personality is also a contributing factor which highly affects their educational interests. This study focused on time management, leadership, self-reflection, social support, study preference and future to predict how they are going to perform in their future exams. It is found that non-cognitive features support cognitive features to increase accuracy of Bayesian prediction model (Sultana et al., 2017). Similarly, another study focused on neglected features namely family expenditures, income, and family assets to explore their impact on students’ academic performance. Using SVM and Naïve Bayes prediction models, the study found that students’ performance is highly correlated to their family utility bills and expenses on education. A decrease in other expenditures may increase the opportunities to complete their higher education (Daud et al., 2017). A common objective of almost all mentioned studies is to build an early prediction model so that students can be prevented from dropout. A weekly approach is used to predict students’ final exam scores after each week of their admission before the final exams. The results show that adding more events to the dataset may increase prediction accuracy i.e. 73.5% after week 1 and 77.7% after week 16 (Akçapınar et al., 2019). Feature optimization is used to remove irrelevant features from the dataset. Forward Selection (Saifudin & Desyani, 2020), PCA (Borges et al., 2018) and Wrapper (Usman et al., 2020) feature selection techniques used with Naïve Bayes model enhanced students’ performance prediction results and also reduced the time required for model construction. Table 2 presents a summary of research studies implementing Naïve Bayes for students’ performance prediction.

Table 2 Naive Bayes for students’ performance prediction

3.3 C. Artificial Neural Network

Artificial Neural Network (ANN) is a well-known classification technique used to solve data mining problems. The concept of ANN is based on the biological neural network. ANN model is divided into three layers, input layer is used to take input data, hidden layer consists of a set of neurons that process data and output layer gives final classes of the data (Amazona & Hernandez, 2019). Input neurons are connected to the next neurons in hidden layer, in order to transmit a signal for processing. These hidden neurons process signal and forward it to the next connected neurons, until the signal reaches to output layer. Branches are used to connect neurons with each other. These branches are assigned with some weights to set the strength of the signal (Tomasevic et al., 2020). Figure 3 describes the working of a basic ANN model which predicted binary classes, i.e., pass or fail.

Fig. 3
figure 3

Artificial Neural Network (ANN)

Several studies present the utilization of ANN for students’ academic performance prediction (Calvo-Flores et al., 2006; Karamouzis & Vrettos, 2008; Lykourentzou et al., 2009; Paliwal & Kumar, 2009; Wook et al., 2009; Arsad & Buniyamin, 2013; Agrawal & Mavani, 2015; Whitehill et al., 2017; Altaf et al., 2019; Liu, 2019; Mi, 2019; Raga & Raga, 2019; Sukhbaatar et al., 2019; Umar; 2019; Khazaaleh, 2020; Sood & Saini, 2020).

A comparative analysis of different classification algorithms shows that ANN and Random Forest gave better prediction with more accurate results (Alloghani et al., 2018).

Ensemble methods are used to strengthen the prediction results. ANN model predicts results more accurately when supported by ensemble filtering method (Rahman & Islam, 2017). Another study hybrid wrapper feature selection with four prediction algorithms i.e., Decision Tree, Naïve Bayes, KNN and CNN to enhance the accuracy of individual model. The study shows that CNN model outperformed i.e. accuracy = 95% (Turabieh, 2019). Multi-layer Perceptron (MLP) is also used to predict students’ performance (Ahmad & Shahzadi, 2018; Ruby & David, 2015). MLP classifier consists of multiple layers, where each layer has a different function to perform. A predictive model (Olalekan et al., 2020) consisting of two layers: ANN and Naïve Bayes is proposed and found that MLP gave better prediction results as compared to single models. Multi-layer ANN model (Yağci & Çevik, 2019) predicted successful students in different subjects with an accuracy up to 99%. In several studies (Ramesh, 2013; Kaur et al., 2015, Yahaya et al., 2020) MLP gave better results (i.e., accuracy = 72.38% and 75% respectively) as compared to other classification models. A research study collected students’ attributes from school’s database. These attributes are pre-processed, and sparse auto encoder algorithm is applied to forecast influential factors from random features. Then, the MLP model is trained using influential factors only which gave better prediction results as compared to prediction with all random variables (Guo et al., 2015).

ANN prediction results depend on the number and type of input data on which model is trained. The study shows that students’ cognitive and non-cognitive variables have significant importance in their final exam results and its prediction (Lin, 2009). Data pre-processing is a significant step which helps to enhance machine learning algorithms’ results. In a study, Synthetic Minority Over-Sampling (SMOTE) is applied on students’ dataset and it proves that pre-processing shows a significant increase in prediction accuracy i.e. up to 7% (Jishan et al., 2015). E-learning made it easier for teachers and institutions to record students’ interactions, clicks, study-time, durations, assignment submission, learning habits etc. On the other hand, traditional classrooms have limited information related to students cognitive and non-cognitive attributes. A study proposed ANN based prediction model with limited number of students’ attributes and achieved 62.5% accuracy (Chanlekha & Niramitranon, 2018). Feature selection is used to find correlation between students’ academic features and results. A comparison of ANN prediction model with highly-correlated features and with random features is performed (Hamoud & Humadi, 2019). The study proves that several students’ features do not participate or take less part in predicting students’ results. Another study examined the contribution of input features to the prediction of output classes. It shows that students’ attendance and study duration are the best input variables for ANN based students’ results prediction model (Aydoğdu, 2020). PSO is applied before providing input values to the ANN back-propagation model, which increased the prediction accuracy and decreased the number of iterations (Sari & Sunyoto, 2019). A comparative analysis of different supervised learning algorithms and different students’ attributes is performed (Tomasevic et al., 2020). The study shows that ANN based on students’ assessments’ marks and interaction with learning material is the best prediction model. Different data mining algorithms are compared i.e., Decision Tree, Random Forest, Naïve Bayes, KNN, SVM, Logistic Regression and Neural Network and it is found that Neural Network outperformed other algorithms with highest accuracy (Cavazos & Garza, 2017; Vijayalakshmi & Venkatachalapathy, 2019; Bravo et al., 2020; Makombe & Lall, 2020; Mengash, 2020; Waheed et al., 2020). Decision Tree, ANN and Regression algorithms are compared and found that ANN gave best results (Mutanu & Machoka, 2019). In computer programming courses, students’ assignments’ completion is found to be a significant factor that highly influence their semester results (Qu et al., 2019). Deep Learning model gave an accuracy of 82.5% in predicting students’ success in programming courses (Pereira et al., 2020). Neural network is implemented to explore association between students’ internet usage and their academic results. The study found that high achievers spent more time on internet, however they have low download and upload volume. It is concluded that students’ spent a lot of time using internet but their usage behavior is quite different and can be used to predict their success in exams (Xu et al., 2019). Different research studies have used different statistical software to perform predictive analysis. A study is proposed to build ANN prediction model using two different platforms i.e., SPSS and MATLAB. The study shows that ANN model prediction results are higher than results in SPSS (Çevik & Tabaru-Örnek, 2020). Studies also show that CNN (Karimi et al., 2020; Zong et al., 2020), Deep Belief Network (Sokkhey & Okazaki, 2020a, 2020b, 2020c), Deep Learning (Amazona & Hernandez, 2019; Hussain et al., 2019), LSTM (Su et al., 2018, Li, 2020; Liu et al., 2020) can gave acceptable prediction results. In a study (Karlık & Karlık, 2020), different neural networks namely ANN, MLP, DNN and CNN are used to build students’ prediction models and the comparison shows that Fuzzy CNN model gave an accuracy of 92%. Another study proved that Regression Neural Network gave better results as compared to MLP (Iyanda, 2018). A hybrid algorithm based on RNN, gated recurrent unit (GRU) and LSTM is proposed. The experimental results show that model prediction results depend on input parameters, however, the proposed model achieved better accuracy (i.e. up to 80%) as compared individual models (He et al., 2020). Table 3 presents a summary of research studies implementing ANN for students’ performance prediction.

Table 3 Ann for students’ performance prediction

3.4 D. Support Vector Machine

Support Vector Machine (SVM) divides the dataset belonging to different classes by using model approach. It plots data items on a 2 or 3-dimensional space and draw a hyper plane between two different classes. The items that fall on one side of hyper plane are considered as belonging to one class. The data item nearest to hyper plane is called vector. A wider hyper plane represents a better separation of data as it clearly presents two different classes. SVM is specifically proposed for binary classification but several algorithms are used under SVM to solve multi class problems (Sen et al., 2020). Being a weight-based method, SVM is used for classification as well as feature extraction. Figure 4 presents the binary classification model using SVM.

Fig. 4
figure 4

Support Vector Machine (SVM)

Support Vector Machine classification model is used to predict engineering students’ final exam score based on their previous exam performance. SVM model is compared with linear regression and multilayer perception to find best prediction model among these classifiers. The results show that SVM gave best results i.e. accuracy = 90.1% (Huang & Fang, 2013) and 50% (Mativo & Huang, 2014). A high number of students drops out in programming courses, which highly affects their final GPA. Information Gain (IG) is used to explore different students’ variables and to assign weights to highly correlated features. The highly effective attributes are then used to develop SVM prediction model. The study found that students’ mid-term exams are best predictors for their final exam scores (Costa et al., 2017). Multi-level SVM based prediction model is developed which classifies students into five levels, on the basis of their GPA (Asogbon et al., 2016). Similarly, a three-level SVM classifier predicts 90% accurate results. The proposed models may help institutes to place students into different sections and to provide them the required attention (Burman & Som, 2019). A three-step prediction model namely students’ entrance in college, after first semester and after second semester is developed. At the first step, students’ attributes available at the time of admission are used for prediction, however, students’ academic results are added afterward. The proposed study is significant to predict students’ performance as early as possible (Gil et al., 2020). An ensemble model combines the results of different data mining techniques to make prediction more accurate. Such a hybrid approach is used (Kamal & Ahuja, 2019) by combining the prediction results of Decision Tree, Naïve Bayes and SVM. The study shows that ensemble model achieved an accuracy of 98.5%. A study (Wu et al., 2019b) proposed deep Neural Network prediction model based on CNN, Long Short-Term Memory (LSTM) and SVM models, and proved that hybrid model predicts more accurately (i.e. F-measure = 95.03%) as compared to linear SVM (i.e. F-measure = 92.48%). The prediction results of Deep Belief Network and SVM models are hybridized to decrease variance, and to enhance prediction results (Vora & Rajamani, 2019).

A comparative study is proposed to predict computer science graduation students’ Grade Point Average (GPA), based on ANN, SVM and extreme learning machines as prediction models. The study concluded that students’ GPA in previous semesters is the best indicator of their success or failure in final year exam, however, SVM model achieve highest prediction results (accuracy = 97.98%), followed by extreme learning machines (accuracy = 94.92%) (Tekin, 2014). Different studies have been proposed which compared SVM with other prediction models. Decision Tree (Naicker et al., 2020), Random Forest (Lottering et al., 2020), Logistic Regression (Aluko et al., 2016; Bhutto et al., 2020; Heuer & Breiter, 2018), Naïve Bayes (Soni et al., 2018; Fachrie, 2019), Random Forest, Neural Network (Solís et al., 2018; Ahmed et al., 2020b), KNN (Wiyono et al., 2020) and MLP (Zohair, 2019) are compared with SVM prediction model. In all the mentioned studies SVM achieved better accuracy as compared to other prediction models, applied on different students’ attributes at different education levels.

Educational datasets consist of large databases with number of students’ attributes and details. Not all the attributes influence their exam performance, therefore, all students’ attributes cannot be used in prediction model. To select most influencing features, ensemble feature selection technique has been used. The study shows that SVM model with selected feature gave better accuracy as compared to prediction with random features (Lu & Yuan, 2018). In (Zaffar et al., 2020) correlation-based filtering is used to select most significant features for prediction process. Features based SVM model achieved a F-measure of 90%. Principal Component Analysis (PCA) is used to explore correlation between students’ social activities and their scores in English. The prediction model (i.e. SVM) shows that finding correlation between students’ attributes increases prediction results (Zhao et al., 2020b). Open University Learning Analytics (OULA) dataset is one of the mostly used datasets in educational research. Datasets consist of students’ demographic data, number of clicks, and assessments marks. This dataset is used to build SVM prediction model which forecast 93.5% accurate results (Chui et al., 2020). A prediction model may achieve different results when operated on different input features. Therefore, students’ factors that influence their academic performance play a major role in prediction. A study examined students’ MOOC dataset and found that students’ performance in semester exercises is the best predictor followed by their clicks and interaction with learning material (Moreno-Marcos et al., 2020). Another study proved that using all students’ data sources e.g., survey data, academics, interaction with learning resources doesn’t provide most accurate results. It is suggested to combine only significant features for students’ academic results prediction. The above three mentioned studies also proved that SVM is the prediction model for students’ academic results (Yu et al., 2020). Table 4 presents a summary of research studies implementing SVM for students’ performance prediction.

Table 4 SVM for students’ performance prediction

3.5 E. K-Nearest Neighbors

K-Nearest Neighbor (KNN) is a similarity approach. It stores data based on their similar attributes. This technique assumes that data items with similar attributes are most probably placed in the same class, where "K" is number of nearest neighbors that are selected to predict the class of an unknown object. When a new unknown data item is to be placed in a class, k nearest neighbors is selected based on shortest distance between new item and its neighbors. The new item is given the label of class which has majority of the nearest neighbors (Sen et al., 2020). Figure 5 presents a KNN prediction model with 3 nearest neighbors.

Fig. 5
figure 5

K-Nearest Neighbors (KNN)

Five DM algorithms are compared namely Naïve Bayes, Decision Tree, KNN, C4.5 and SVM to generate best prediction model for students’ exam performance prediction and found that KNN outperform other classification model with a highest accuracy of 100% (Vital et al., 2021). A study is conducted to compare four different data mining algorithms for students’ academic performance and found that KNN gave best results as compared to other prediction models (Kulkarni & Ade, 2014). Students’ learning behaviors are used in KNN based prediction model. It is found that students’ clickstream data is very useful to predict their results (Brinton & Chiang, 2015). KNN algorithm with fixed and random number of ‘k’ is applied with ensemble clustering techniques. Students’ demographics, enrollment and performance records are used to prediction of final exam outcome (Iam-On & Boongoen, 2017). A fast KNN algorithm is proposed to decrease model’s processing time without compromising prediction accuracy. The proposed model gave better accuracy i.e., 96.6% as compared to traditional KNN model. The proposed model also decreases processing time up to 90% (Ahmed et al., 2020c). KNN is used to predict different courses’ results and gained 73.33% accurate results (Aluko et al., 2016). Another study proves that feature selection methods improve KNN prediction results (Ahmed et al., 2020a). Table 5 presents a summary of research studies implementing KNN for students’ performance prediction.

Table 5 KNN For Students’ Performance Prediction

3.6 F. Random Forest

Random Forest (RF) is an ensemble machine learning algorithm, comprised of multiple Decision Trees. Each Decision Tree is built using different set of training features, and at last, the prediction results of all Decision Trees are merged to achieve a more accurate prediction result. The final class with majority votes is selected as the predicted class (Bruce, 2019). Figure 6 describes the working of Random Forest classification model.

Fig. 6
figure 6

Random Forest (RF)

A research study focuses on the development of a Random Forest based prediction model for students’ learning outcomes. Students’ interactions with e-learning management system e.g. students’ visit, resource views, assignments submission and scores are used to identify key attributes with an aim to achieve highest prediction accuracy. Random Forest gave 76.9% accurate results (Abubakar & Ahmad, 2017). A two-step prediction process is performed, at first step classification algorithms are compared to predict students’ dropout and secondly students’ grades are predicted using regression analysis. For classification, Random Forest gave highest accuracy i.e. 82% (Rovira et al., 2017). Another two-step model is proposed for students’ prediction. At first, Random Forest is used to assign weights to each attribute based on their contribution to final grades prediction. Most correlated attributes are then used in prediction model and 96.1% is achieved (Miguéis et al., 2018). Fourteen data mining algorithms are compared to find best prediction model based on students’ demographic and academic attributes. The study found that Random Forest gave highest accuracy i.e. 93% (Senthil & Lin, 2017). An automatic prediction model was built using clustering and classification models for predicting students’ promotion in the next class. Dataset consisting of 151 attributes was collected from different universities and colleges. Most relevant attributes were selected using k-means clustering and an ensemble voting based techniques was used to predict students’ outcome in final examination. Results shows 87.5% accuracy when classify the raw data, however, 96.78% was achieved after most relevant attributes selection (Thakar & Mehta, 2017). Random Forest is used to find most influencing factors for students’ exam performance prediction. The proposed study found that students’ previous semesters’ CGPA and interaction with learning resources are best predictors of students’ final results (Sandoval et al., 2018). Another research study combined Relief-F and Random Forest models for selecting most significant attributes for students’ final exam scores prediction. The proposed model gave 97.88% prediction accuracy and shows that students’ attendance, extra-curricular activities, previous grades and parents’ education are highly influencing students’ exam results, therefore can be used for exam performance prediction (Deepika & Sathyanarayana, 2019). A research study performed a comparison of four data mining algorithms namely Decision Tree, PART, Random Forest & Bayesian Network. Gain Ration, Information Gain & Relief-F are used to select most significant features. Random Forest with twelve students’ attributes gave best results i.e. 99% accurate results (Hussain et al., 2018). Another study used Random Forest to classify students into dropout and non-dropout students and found an F1-score of 60% (Polyzou & Karypis, 2018).

Above studies proposed prediction model based on number of attributes, however some of the e-learning institutes have very limited attributes that can be used for prediction. Random Forest is used for prediction performed using lectures views, resources’ access and assessments’ scores. The proposed model gave 84% accuracy and proved that students’ LMS interactions and grades can be used for performance prediction (Wakelam et al., 2020). Three data mining algorithms are compared for modeling students into pass/fail segments. Random Forest outperform with 95.45% as compared to KNN and Naïve Bayes (Lenin & Chandrasekaran, 2019). Similarly, another study compared six data mining techniques and found that Random Forest gave highest results i.e. 94.1% (Rifat et al., 2019). To early predict students’ results, features that are highly correlated with results are extracted and results are predicted using Random Forest. The studies show that not all attributes contribute in prediction process, however, using irrelevant attribute may negatively affect prediction results (Masood et al., 2019; Nuankaew & Thongkam, 2020). Different machine learning algorithms are used for students’ final grades prediction. The study found that Random Forest gave best results as compared to other algorithms, when applied with students’ CGPA, attendance and extra-curricular activities (Singhani et al., 2019). As early students’ results are predicted, the more chances students will have to prevent their dropout. A Random Forest based phased prediction model is developed to predict students’ dropout at different stages of their semester. The proposed study shows that students’ results can be predicted using demographic data at the start of semester, while in the mid of semester predicted results will become more accurate by adding students’ learning behaviors (Chen et al., 2020).

To check the influence of different features on students’ dropout prediction, Random Forest model is used to check correlation of all features with results. The proposed study shows that students’ gender, age and region are negatively correlated with prediction model and their removal increased the accuracy of the model (Bruce, 2019). Four data mining prediction techniques namely Naïve Bayes, Logistic Regression, KNN and Random Forest are compared, and the study found that Random Forest gave best results. In the proposed study, dataset was divided into 10 weeks in a way that attributes are added to the dataset according to their availability. A comparative study is conducted to find best prediction model among Statistical techniques, Deep Belief Learning and Machine Learning algorithms. The study found that overall Machine Learning techniques gave best results and among ML techniques, Random Forest outperform with an accuracy of 93.5% (Sokkhey & Okazaki, 2019). A research study is conducted to predict students’ results without knowing their previous grades. The proposed study shows that students’ grades have great influence on results’ prediction, however their demographic attributes can give enough accuracy for preventing students’ dropout (Rajak et al., 2020). Ensemble based techniques are used for prediction of at-risk students, and found that Random Forest outperform other techniques (Kaviyarasi & Balasubramanian, 2020). Random Forest was applied on a dataset gathered from Open Universities of China including students’ demographics, behavioral and academic performance data was used. Before prediction, regression analysis was performed to find correlation between students’ attributes and final exam grades. Results shows that students’ views, clicks and learning activity duration are best predictor for their future grades (Narayanasamy & Elçi, 2020). Different data mining algorithms Naïve Bayes, ANN (Adekitan & Salau, 2020), Logistic Regression (Alhassan, 2020), Decision Tree (Farissi & Dahlan, 2020), SVM (Sokkhey & Okazaki, 2020a, 2020b, 2020c), KNN (Rincón-Flores et al., 2020; Sokkhey & Okazaki, 2020a, 2020b, 2020c) are compared with Random Forest, and it is found that Random Forest gave best prediction results. Table 6 presents a summary of research studies implementing Random Forest for students’ performance prediction.

Table 6 Random forest for students’ performance prediction

4 Students’ attributes used for academic performance prediction

Same data mining techniques gave different prediction results when used in different research papers as shown in the above tables. This difference in results is observed because each author used a different set of students' attributes as input. The major challenge in developing a prediction classifier is the input data type. Some students' attributes may have more impact on prediction results as compared to the other attributes. Some datasets used in literature studies are based on distance learning while the others are traditional classroom datasets. The use of different students' attributes makes same algorithm gave a different result in the terms of accuracy. Some of the commonly used attributes are students' grades achieved in a previous exam, attendance of the same course, gender, age, place of residence, family, social activities, learning behaviors, and interaction with learning resources etc. An analysis of research studies shows that students marks are the most used as it shows academic potential of a student. A study on students' progress pattern shows that students who outperform in a midterm exam are most likely to show good results in their final exam too. Similarly, students who gain less marks in the start of a degree, do not show any progress in their results till the end of degree program. Most of the students tend to remain in the same category (Asif et al., 2017). Students' attendance is second mostly used attribute which gave students' performance prediction. Students who attend more lectures have more chances to pass their examination (Hughes & Dobbins, 2015). Students’ demographic attributes highly affects their academic performance.

In a research study, state-of-the-art regression algorithms are applied to predict students’ exam performance. A total of 354 graduate and post graduate students’ records are collected from Hellenic Open University database. Along with students’ previous academic performance, demographic records including age, gender, marital status, jobs and number of children are considered for prediction process. Statistical measure i.e., Relief F is used to find most influencing features in dataset. After feature ranking, M5rules gave minimum error rate (Kotsiantis & Pintelas, 2005). A students’ performance model is proposed with tenfold cross validation. Four Bayesian algorithms namely Naïve Bayes, AODEsr, WAODE and HNB are compared. The study concluded that AODEsr outperformed i.e. 64.6% accurate results when applied with students’ academic performance and co-curricular activities (Sundar, 2013). Matrix Factorization algorithm applied in (Sweeney et al., 2015), gave best results for students’ next term grades prediction i.e. RMS = 0.775. Three classification algorithms are compared to find best prediction model of final exam grades. The results proved that Rule based model is the best prediction model and students’ demographics and learning behaviors can be used to generate prediction results (Ahmad et al., 2015). Another research study applied deep neural network to predict students’ final exam grades. MOOC dataset is used for building prediction model, consisting of students’ interactions with learning materials and activity logs. Deep learning shows best results when compared to baseline algorithms (Wang et al., 2017). Five ML algorithms namely generalized linear model, MLP, Random Forest, gradient boosting tree and ANN are compared to find best prediction model for students’ exam scores. Dataset consisting of students’ assessments scores was collected from DIT University, Dehradun. A highest accuracy of 98.26% is achieved by gradient boosting model (Kumar & Garg, 2019). A hybrid of seven classification algorithms namely SVM, KNN, Decision Tree, AdaBoost, MLP, Extra Tree and Logistic Regression is used to predict students’ scores using their institutional dataset attributes. The proposed weighted voting approach shows better results i.e. 81.37%, as compared to individual algorithms (Zulfiker et al., 2020).

An ensemble algorithm is proposed comprised of WINNOW, 1NN and Naïve Bayes. The proposed ensemble model receives input features and predict outcome based on a majority vote. Hellenic Open University dataset consisted of students’ assignments scores are used to classify students into pass and fail. Proposed ensemble model gave best accuracy i.e. 78.95% as compared to individual supervised learning models (Kotsiantis et al., 2010). Another ensemble algorithm is proposed based on three state-of-the-art classifiers namely J48, IBK and AODE. A majority vote is received by implementing three algorithms in a single model and it is found that ensemble approach gave 85% accurate results. The study used a combined dataset of academic and demographic attributes (Pandey & Taruna, 2018). Ensemble approach based on Stacked generalization is used to predict students’ performance based on demographics, psychological, personality and institutional attributes (Adejo & Connolly, 2018). The proposed study found that a hybrid of Decision Tree, ANN and SVM gave better accuracy as compared to individual results. Features optimization using genetic algorithm is applied with supervised learning algorithms gave 75.55% accuracy (Pereira et al., 2019). In (Mi et al., 2018) Genetic Algorithm is used to develop an early warning system for students. The main significance of proposed model is that it gives a clear description of prediction process by using if–then rules. Similarly, another research study used genetic programming for students’ success prediction in online courses. Pearson’s correlation is used to find attributes that highly contribute to final grade prediction. It is found that students’ scores are most significant features with a correlation coefficient r = 0.78 (Ulloa-Cazarez et al., 2018). These studies prove that feature selection algorithms enhance prediction accuracy and reduce model computational time.

In (Abu Tair & El-Halees, 2012), association rules, classification and clustering algorithms are applied on students’ demographic and academic datasets, to predict students’ dropout in college degree. An accuracy of 78.95% is gained by proposed association and classification rules. The proposed model presents that students’ gender, specialty and scores in secondary school are highly correlated with their semester results. Another study proposed a Neuro Fuzzy based classification model. The proposed model with threefold cross validation gained best results i.e., RMSE = 0.256. The study shows that students intelligence, motivation and interests in studies can be used to predict their final exam performance (Hidayah et al., 2013). In (Márquez-Vera et al., 2016), Classification Rule Mining algorithm is proposed to predict students’ who are more prone to dropout. If–then rules give a detailed vision of attributes that leads to final prediction. The proposed model shows that students final grades can be predicted in first 6 weeks of registration. If–then rules are generated for early prediction of students’ results. To enhance prediction accuracy, rough set theory is used for data dimensionality reduction which shows 79.23% accuracy (Sudha & Kumaravel, 2017). In (Czibula et al., 2019), Relational Association Rules is used for predicting students’ grades in final semester. Data was collected from Babes-Bolyai University consisting of students first three semester GPA. The proposed model gave best results i.e., F-measure = 0.84. Apriori algorithm is used (Anwar & Rani, 2020) to classify students into dropout and no dropout classes for their future results. Students’ previous exam scores in Mathematics are used as predictors of future Mathematics scores. Findings revealed that students with higher scores in prerequisite classes are more likely to have better performance in next classes. M5 Rules Algorithm is implemented (Chand et al., 2020) for grades prediction. Using scores of different subjects, M5rules gained highest accuracy i.e., 89.2% as compared to Random Forest and Linear Regression.

Different students’ attributes are available at different phases. Two datasets are used consisted of different attributes, first with demographics and previous class performance factors that are available at the beginning of session, while second dataset includes demographics as well as students’ assessments scores, attendance and subjects. Research concluded that students’ neighborhood, age, assessments scores and attendance are highly correlated with final exam grades (Fernandes et al., 2019). Students’ behavioral attributes e.g. orderliness inside the institute is found as most significant attribute to predict their academic performance (Cao et al., 2018). In another study, students’ social media activities are examined to find impact of students’ academic and non-academic social media activities on their final exam scores. Findings revealed that students’ social media activities can be used for predicting their final exam performance (Chang et al., 2019). Learning strategies and motivation are found as most significant attributes for students’ CGPA prediction, with a correlation of 0.243 and 0.193 respectively (Nabizadeh et al., 2019). Students’ response time is found as a good predictor of students’ scores, as minimum response time shows students’ knowledge and attention towards the lecture. Additive Factors Analysis approach predict students’ results with 87.8% accuracy (Chounta & Carvalho, 2019). In online courses, students’ interactions with learning resources are found as a significant variable for predicting their academic performance. More than 2000 websites frequently visited by students, are considered for research. Research results shows that websites containing videos, games and music are negatively correlated, however, visiting learning based websites are positively correlated with academic performance (Wu et al., 2019a). Similarly, in another study, students’ interactions are considered to predict their final exam scores. OULAD dataset provides details of learning resources and sum of clicks performed by students during their course. Long short-term memory algorithm shows 59% precision in the 1st week while 93% precision was achieved in the last week of course (Aljohani et al., 2019). Canonical Correlation Analysis is used to explore relation between different learning resources. The proposed study found that students’ performance in one learning resource can be used to predict their performance in other type of learning resources (Sahebi & Brusilovsky, 2018). Another research study aimed to explore potential attributes for students’ performance prediction. Out of 45 students’ attributes, research study found that students’ previous grades, attention in class, study-room and extra-curricular activities have positive while access to mobile phone, alcohol consumption and more travel time to school have negative impacts on students’ academic performance (Kaviyarasi & Balasubramanian, 2018). Logit leaf model is implemented for students’ performance prediction in online courses. Over 10,554 students’ records were used comprised of their learning patterns and activities. The study revealed that students’ academic engagement is the best predictor of students’ academic performance (Coussement et al., 2020). Students’ final grades are predicted using 2500 students’ data registered in different courses. Rule Induction classifier gave 96.25% accuracy (Majeed & Junejo, 2016). Input–Output Hidden Markov Model is proposed to predict students’ performance using students’ weekly activities in online learning environment. The proposed model gave 82% accuracy in the second week, while 84% accuracy is gained in the last week of course (Mubarak et al., 2020). Above studies shows that students’ academic records, demographics and learning behavior are the best predictor. Several studies proved that using data preprocessing and feature selection techniques enhances the prediction results.

Different datasets are used in research studies, most of researchers gather data from schools, colleges and universities’ databases, LMS systems or conducted surveys to collect students’ responses. Several studies used online available datasets. Four publicly available datasets are mostly found in the research papers for students’ performance prediction named as: OULAD, MOOCs, Moodle and UCI Repository dataset. Open University Learning Analytics DatasetFootnote 1 (OULAD) contains a data of 22 courses and 32,593 students. Students’ demographic attributes, sum of clicks and assessments’ results are available with their final exam result. Massive Open Online CoursesFootnote 2 (MOOCs) offer opportunities for distance learning. Modular Object-Oriented Dynamic Learning EnvironmentFootnote 3 (Moodle) is another learning management system used for online learning courses. Students’ LMS data is available and used in lot of studies to predict students’ performance and their learning behaviors. UCI Machine Learning Repository offers a datasetFootnote 4 for students’ performance prediction. The dataset contains 23 attributes including demographic, social and academic records. A dataset of 649 students from two secondary schools is available and used in different studies for final exam grade prediction. Table 7 presents a summary of different students’ attributes for their exam performance prediction.

Table 7 Students’ attributes used for exam performance prediction

5 Tools used for data mining

This section presents a comparison of data mining tools used for students’ academic performance prediction. There are several studies presented above, these research studies used different data mining tools for the prediction process. A wide number of tools are available to build prediction models using machine learning. These tools make it very easy to perform prediction tasks, data analysis, feature selections, data cleansing and building classification and regression models etc. This section presents data mining tools that are used in literature for the prediction of students' academic achievement i.e., RapidMiner, WEKA, MATLAB, and Python.

5.1 A. RapidMiner

RapidMiner provides a user-friendly interface to build prediction models. RapidMiner is very easy to use as it provides a graphical, code-free environment. Prediction models are built by drag and drop operations. All classification and regression models are available. To build a model, import data into RapidMiner Studio, set parameters and drag & drop required model into design screen. Resultant model will appear in results section. It also supports statistical analysis of results to evaluate the accuracy of model, visualization is also available to provide graphical representation of results. RapidMiner also provide step by step tutorials that are helpful for beginners (Osmanbegovic & Suljic, 2012).

5.2 B. WEKA

WEKA stands for Waikato Environment for Knowledge Analysis. WEKA is another data mining platform for building prediction models. WEKA provides graphical user interface as well as command line interface to implement data mining algorithms. WEKA allows users to use its provided operators or to implement their own java codes. It is used to solve all classification, clustering, feature selection, data processing and regression problems. It is an open source and freely available software which increases its number of users (Shahiri & Husain, 2015).

5.3 C. MATLAB

MATLAB stands for "MATrix LABoratory", developed and sold by Mathworks, Inc. MATLAB is also used for data science problem solving. It allows implementation of data mining algorithms for classification and prediction problems. It reduces data preprocessing time, filters noisy data, plot data into graphs to allow users visualize data patterns and build data mining models. It also provides analysis features to evaluate model results (Tomasevic et al., 2020).

5.4 D. Python

Python is a programming language used for implementing machine learning algorithms. It is an open-source program, freely available for commercial uses. Different libraries are available in python to implement codes, e.g., Pandas for data preparation, Scikit-learn for machine learning, Plotly for data visualization, and Theano for mathematical expressions (Stančin & Jović, 2019).

Some research studies used other data mining tools, e.g., SPSS (Moseley & Mead, 2008), KNIME (Adebayo & Chaubey, 2019; Rifat et al., 2019), R Studio (Kumar et al., 2019; Sukhbaatar et al., 2019; Lottering et al., 2020; Olalekan et al., 2020) and R Programming (Akçapınar et al., 2019; Figueroa-Cañas & Sancho-Vinuesa, 2020; Lenin & Chandrasekaran, 2019; Vijayalakshmi & Venkatachalapathy, 2019). Table 8 presents a summary of four frequent data mining techniques used for students’ performance prediction.

Table 8 Data mining tools used for students’ performance prediction

6 Results and discussion

This section presents an overview of the research findings. Figure 7 clearly describes that students’ performance prediction is of high interest in the present decade. Educational data mining is a new research domain but is rapidly growing because of its impacts and benefits gained by institutions. Figure 7 shows that Decision Tree is mostly used since last ten years but ANN, SVM and Random Forest are trending algorithms in the past three years. Below figures clearly present that the work on students' exam performance prediction is growing rapidly year by year. However, different studies used different techniques to improve the prediction results. Figure 8 gives an overview of frequently used data mining techniques for the prediction of students' final exam performance in the last years. The mostly used techniques are Decision Tree and ANN. While least used technique is KNN. Decision Tree is very simple to use because of its simple hierarchical flow. Therefore, it is mostly used for students’ classification as compared to other data mining techniques. In Fig. 9, four data mining tools are reviewed that are used for students’ exam performance prediction. New tools are rapidly emerging, however, mostly used tools are MATLAB, WEKA, RapidMiner and Python. WEKA is found as most frequently used tool in the present decade, followed by Python. WEKA is freely available software under a public license, but RapidMiner and MATLAB required to purchase a license. WEKA is easy to use software as it allows java code implementation as well as graphical user interface.

Fig. 7
figure 7

Trending data mining algorithms used for students’ performance prediction

Fig. 8
figure 8

Frequently used Data Mining Algorithms in last decade

Fig. 9
figure 9

Data mining tools used for students’ performance prediction

Above figures show that different algorithms can be used to predict students’ results. All of the studies used different student’ attributes as an input to their proposed prediction models. Mostly used attributes are demographics, attendance, academic results and students’ clicks/views, students’ personality, psychological factors and social behavior or activities. Figure 10 shows that students’ academic records and demographic factors are proved as the best attributes in previous research studies. This survey paper also presents different feature selection techniques used to select most influencing features. Figure 11 represents that more than a half of studies used feature selection methods before building prediction models. Having irrelevant features in the dataset may reduce the prediction results and increase model processing time. Figure 12 presents that feature selection methods are highly trending in past three years. Several feature selection methods are used in previous research paper; however, two techniques are widely used i.e., Information Gain and Gain Ratio.

Fig. 10
figure 10

Mostly used students’ attributes

Fig. 11
figure 11

Feature selection techniques

Fig. 12
figure 12

Yearly trending feature selection techniques

7 Conclusion

Educational data mining gained a rapid growth as it helps institutions as well as policy makers in decision making. One of the most important research areas of educational data mining, is predicting students' future results based on their previous performance and demographics. Predicting future exam results before final examination can help teachers to find students who are at risk of failure, so they can be provided with extra assistance and time. Action plans can be implemented to prevent or reduce dropouts. This paper presents a summary of research studies conducted to predict students' performance using different data mining techniques. This study investigated recent twenty-years’ work of researchers in order to compare different data mining techniques used for predictions and to evaluate students’ attributes. Mostly used technique is Decision Tree, however, all data mining techniques gave different results because output of prediction models depends on the input data given to the model i.e., students' attributes. Mainly five types of students' attributes are used in the literature i.e., students' marks, attendance, learning behaviors, social activities and demographic data. It is found that students' marks or GPA is mostly used input type which gave best prediction results. The study also focused on data mining tools used for implementing data mining algorithms. Several data mining tools are available while four data mining tools are frequently used i.e., WEKA, MATLAB, Python, and RapidMiner. Several datasets are used in these research studies. Four mostly used datasets are OULAD, MOOCs, Moodle and UCI Repository Dataset.

The study shows that different evaluation methods including correlation, accuracy, f-measure, precision and recall are used. It is proved that all studies aim to classify students into binary i.e., pass/fail or multi classes i.e., grades. A few studies predict students’ final marks or CGPA using regression techniques. The presented study also concluded that students’ performance can be predicted at different stages e.g., at the time of admission, at the start of semester, and before final examinations. However, it is proved that prediction in the last two weeks of semester can be more accurate as more academic features are available at this phase. Feature selection methods are trending in the past three years. Several studies proved that using only relevant features increases the prediction accuracy. This review will be beneficial for future research in predicting students' results and for institutions to pick the best classifier based on their students' data. This study will help academic policy makers and administrations to use their students’ data in improving institutions’ results, in available students’ attributes and tools. The findings of the study will be helpful for future research studies to focus on highly influencing attributes only.

8 Limitations and future work

This study tried to provide a systematic review of research conducted to predict students’ academic performance prediction. The number of research studies and algorithms explored are limited as each method cannot be mentioned in a single study. However, the survey provides a clear insight to effective and mostly used data mining algorithms, tools and students’ attributes.

For future work, it is recommended to universities and online educational institutes using data mining for students' performance prediction and designing action plans to prevent students' dropout and increase courses' completion rates. Exploring students’ psychological factors, teaching & learning methods, institutes' physical facilities and their impact on students’ academic results is an open research area in EDM.                                       Acknowledgement: This research was partly supported by the Technology Development Program of MSS [No. S3033853] and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2021R1A4A1031509).