Keywords

1 Introduction

The use of statistics, learning algorithms, and data mining methodologies is the primary emphasis of data mining research in the field of EDM. The importance of data mining technology in the educational setting has grown over the last few decades. It has soared to great prominence in recent years as a result of the accessibility of open datasets and learning algorithms [1]. EDM entails the creation and implementation of data mining techniques that interpret the substantial amounts of data from various educational levels. Anticipating the learning process and evaluating student success are important objectives in the study of EDM [2]. It is a field which discovers underlying relationships and discovers trends in educational data. Heterogeneous data is contributing in the big data paradigm in the sector of education. In order to adaptively extract relevant information from educational datasets, specialized data mining techniques are required [3]. Many educational domains, including learning outcomes, dropout prediction, educational analysis of data, and academic and behavioral analysis, have used data mining methods [4]. EDM has always placed a premium on assessing and forecasting students’ academic success. Higher education institutions must examine students based not only on their test results, but they should also consider how they learn, make projections about how they will perform academically in the future, and issue timely academic warnings. This work will assist students in raising their performance, which will enhance the management of educational resources while also assisting higher education in raising the quality of instruction [5]. The challenge of interpreting and making judgments from the enormous amount of information is growing progressively more onerous. The dimensionality is one of the primary challenges, although it can be solved by employing dimensionality reduction techniques. Dimensionality reduction refers the method of converting high-dimensional data into a meaningful less dimensionality. PCA [6] and LDA [7] are two well-liked techniques that have been extensively employed in various classification applications among the various dimensionality reduction approaches that have been developed. LDA employs label information; it can produce better classification results than PCA, which is unsupervised. This study applied the PCA and LDA algorithms for dimensionality reduction. The efficiency and effectiveness of PCA and LDA dimensionality reduction approaches are systematically evaluated in this work [8]. This work focused on evaluating students’ academic achievement and to predict future success based on current performance. In order to reduce the dataset's dimensionality, this study suggests PCA and LDA and logistic regression as the dataset's classifier. Section 2 offers an analysis of previous works created by other researchers in the field of academic projection. Section 3 discusses aspects of the experimental methods. The experimental results are described and discussed in Sects. 4 and 5. The conclusion and prospective future approaches are identified in Sect. 6.

2 Related Study

Academic performance prediction has been one of the key goals of academic practitioners. Collaboration research has shown that effective procedures can be created for academic prediction using computational methods (such as data mining). For academic prediction, numerous researchers have created a variety of prediction models incorporating data mining.

Karthikeyan et al. [9] developed a novel method known as a hybrid educational data mining framework to evaluate academic achievement and effectively enhance the educational experience. Crivei et al. [10] examined the applicability of unsupervised machine learning methods, particularly PCA and association rule mining, to assess student academic performance. EDM incorporates data mining techniques with educational data, according to Javier et al. [11]. In this, the well-known data mining methods are listed, including correlation mining factor analysis, and regression. Zuva et al. [12] provided a model which compares four classifiers in order to identify the best method for forecasting a learner's performance.

A key objective of the research will be to improve the current prediction algorithm in light of the requirement for an efficient prediction method. As a result, a model must be put out to improve the classification process.

3 Methodology

In this research work, the methodology was implemented by integrating the benefits of dimensionality reduction and classification. PCA and LDA are utilized in this work to lower the dimension, and also they are compared. PCA helps to eliminate features that are not essential to the model's goals, which reduces training time and expense and improves model performance [13]. LDA transforms a high-dimensional data into a low-dimensional by increasing between-class scatter and decreasing the within-class scatter. Logistic regression is employed in order to create our supervised classification for the dataset after doing dimensionality reduction. Figure 1 depicts the implemented methodology.

Fig. 1
A flow chart. The steps are data acquisition from the dataset, data normalization by pre-processing and standardization, data dimension reduction by P C A and L D A, classification and prediction by logistic regression model, and model validation.

Model implementation

3.1 Dataset Description

The UCI machine learning repository's student dataset is used for this work. The dataset has 400 instances. The dataset consists of one target class and a total of 30 attributes. The dataset contains a total of 266 positive and 130 negative instances. The dataset's attributes are outlined below.

  • Mother's Education

  • Father's Education

  • Home to School Travel Time

  • Weekly Study Time

  • Number of Past Class Failures

  • Free Time After School

  • Current Health Status

  • Number of School Absences

  • First Period Grade

  • Second Period Grade

  • Final Grade.

3.2 Data Preprocessing

Due to enormous volumes and likely origin from diverse sources, real-world databases of today are especially prone to noisy, missing, and inconsistent data [14]. In the data mining process, data quality is crucial since poor data might produce predictions that are erroneous [15]. Data preprocessing overarching goal is to eliminate undesirable variability or impacts for effective modeling [16]. By doing normalization on the dataset, the existing data elements are scaled as part of data preprocessing so that they fall inside a narrow predetermined range of [0, 1] values. Speed will increase, and complexity will go down. Dataset V is normalized using the Z-score method to create a normalized value V′ using the following equation:

$$V^{\prime} = \frac{V - Y}{Z}$$
(1)
V′:

Normalized value,

V:

Value,

Y:

Mean,

Z:

SD.

3.3 Implemented Model

The research work consists of two phases. For the processed dataset, dimensionality reduction was done in the first stage. Supervised classification was employed in the second stage. The well-known dimensionality reduction methods PCA and LDA are investigated in this work. High-dimensional datasets are used for performance analysis. Logistic regression was used to classify data in order to compare how well the dimensionality reduction method is performed. These data were used to infer the differences between the supervised and unsupervised dimensionality reduction methods.

3.4 Principal Component Analysis

Data analysis and machine learning frequently employ the dimensionality reduction method known as PCA. Its primary function is to maintain the majority of the original data while downscaling a high-dimensional dataset into a lower dimensional space. This is accomplished by locating the principal components, which are linear combinations of the original characteristics that encompass the broadest range of data variance.

PCA discovers a significant subset of the estimated parameters with the maximum variance, known as the principle components PCs, that is, how PCA attempts to lower the dimension of the data. The initial PCs were accounted for the majority of the variance, making it possible to ignore with less information loss [17]. PCA is used to keep as much of the given dataset's information as feasible while also reducing the dimensionality of the enormous data [18]. The goal is to convert the dataset X, which has p dimensions, and Y, which has L (L < p) dimensions. Y is the PC of X, i.e.,

$$Y = PC\left( X \right)$$
(2)
  1. (1)

    Configure Dataset

In X, there are n vectors (x1, x2, …, xn), which contain dataset instance.

  1. (2)

    Determine Mean

    $$\overline{x} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \frac{{x_{i} }}{N}$$
    (3 )
  1. (3)

    Determine the Covariance

    $$C = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} \left( {x_{i} - \overline{x}} \right)\left( {x_{i} - \overline{x}} \right)^{T}$$
    (4)
  1. (4)

    Find Eigenvalues and Eigenvectors

The directions and magnitude of the new feature space will be determined by the eigenvectors and eigenvalues, respectively.

$$C = \lambda 1 > \lambda 2 > \cdots > \lambda N\quad \left( {{\text{Eigenvalues}}} \right)$$
(5)
$$C = u1,u2 \ldots uN\quad \left( {{\text{Eigenvectors}}} \right)$$
(6)

Creating a feature vector: According to eigenvalue, eigenvectors are ranked from the highest to lowest. This lists the elements in ascending order of importance. The primary element of the data collection is the eigenvector with the highest eigenvalue. The greatest eigenvalue is employed to create the feature vector [19,20,21]. Creating a new dataset involves selecting the principal components to keep in the data, creating a feature vector, and multiplying the vector by its transposition [19, 22,23,24,25].

3.5 Linear Discriminant Analysis

By maximizing between-class scatter and decreasing the within-class scatter, the LDA method reduces the dimensions. It allows dimensionality reduction without information loss and is mostly used prior to classification [18].

  1. (1)

    Within-class scatter matrix

    $$s_{w} = \mathop \sum \limits_{j = 1}^{c} \mathop \sum \limits_{i = 1}^{{N_{j} }} \left( {x_{i}^{j} - \mu_{j} } \right)\left( {x_{i}^{j} - \mu_{j} } \right)^{T}$$
    (7)
c:

Number of classes

\(x_{i}^{j}\):

ith sample of class j,

μj:

Mean of class j,

Nj:

Number of samples in class j.

  1. (2)

    Between-class scatter matrix

    $$s_{b} = \mathop \sum \limits_{j = 1}^{c} \left( {\mu_{j} - \mu } \right)\left( {\mu_{j} - \mu } \right)^{T}$$
    (8)
µ:

Mean of all classes.

The between-class scatter determinant and within-class scatter determinants of the projected samples are optimized by LDA approaches [18] (Table 1).

Table 1 Comparison of accuracy with other studies

3.6 Logistic Regression

Logistic regression is used when classifying data components. In logistic regression, the target variable is binary, which means that it only contains data that can be classified into two distinct groups: 1 or 0, which corresponds to a student who will be passed or failed in the academies. The aim of the logistic regression technique is to find the diagnostically reasonable model that best describes the relationship between the target variable and the predictor variable [15]. The Sigmoid equation below serves as the foundation for the logistic regression model [15]. Figure 2 depicts the Sigmoid function graph.

$${h}_{\theta }\left(x\right)=\frac{1}{1+{e}^{-z}} , z={\beta }_{0}+{\beta }_{1}X$$
(9)
Fig. 2
A graph of the Sigmoid function plots Y versus X. Predicted Y lies between 0 and 1 on the Y-axis. It has an S-shaped curve with arrowheads on both ends.

Sigmoid function graph

The probability-based outcome or classes provided by the logistic regression classifier had probability score between 0 and 1.

$${\text{cost}}\left( {h_{\theta } \left( x \right),y} \right) = \left\{ {\begin{array}{*{20}l} { - {\text{log}}\left( {h_{\theta } \left( x \right)} \right)} \hfill & {{\text{if}}\,y = 1} \hfill \\ { - {\text{log}}\left( {1 - \left( {h_{\theta } \left( x \right)} \right)} \right)} \hfill & {{\text{if}}\,y = 0} \hfill \\ \end{array} } \right.$$
(10)

The cost method serves as the goal of optimization. Optimizing the cost function in logistic regression to develop a precise model with minimal inaccuracies. The possibility of an event in the future is predicted using this model. The primary principle of logistic regression is to use a model based on the likelihood that an outcome will occur. Pseudocode 1 provides a description of the logistics regression model, which is used to train and test the data instance.

Pseudocode 1: Logistic Regression

  1. 1.

    Input: Featured Data

  2. 2.

    Output: Classified Data

  3. 3.

    For i = 1 to K

  4. 4.

    For Each data instance di

  5. 5.

    Set the Target Regression Value

    $$Z=\frac{{y}_{i}-P(1-{d}_{j})}{[p-(1-{d}_{j}).(1-p(1-{d}_{j}))]}$$
  6. 6.

    Initialize the weight of instance dj to P(1|dj). (1 − P). (1|dj)

  7. 7.

    Finalize a f(j) to the data with class value (zj) and weights (wj)

  8. 8.

    Assign (class label:1) if P (1|dj) > 0.5, otherwise (class label:2).

4 Experimental Result

The student dataset, which has 400 instances and 30 attributes, is used in this work. The dataset statistics and description are given in Tables 2 and 3, respectively. The student dataset is used as the basis for performance analysis using the two different dimensionality reduction techniques, PCA and LDA, as well as logistic regression classifier. Dimensionality reduction during preprocessing was accomplished using the LDA and PCA methods. Then logistic regression is used to properly classify samples into defined groups. Prior to deploying a predictive model for implementation, it is crucial to ensure its effectiveness and accuracy. The results of the analysis and evaluation involve assessing various criteria, including Precision, Recall, and Accuracy. Table 5 illustrates the implemented model's performance metrics.

Table2 Dataset statistics
Table 3 Dataset description
Table 4 Comparison of models using various methods
Table 5 Performance metrics

4.1 Employing Different Algorithms for Comparison

The student dataset is modeled with three distinct algorithms using the original dataset, PCA processed data, and LDA processed data in order to further assess how the model works. The outcome is shown in Table 4. LDA enhanced the performance accuracy of the other algorithms, but when Naive Bayes is employed an exception performance is found. As a result of PCA processing, the result in Table 4 shows decrease in Naive Bayes accuracy from 89 to 87%. Also, it was shown that LDA improved the algorithms’ precision.

5 Discussion

The experimental findings shown that LDA improves classification accuracy than PCA. Jawad et al. [26] and Musso et al. [24] produced the similar finding, with a precision of 96% (Table 1). According to experimental findings, the proposed LDA approach increased logistic regression's classification accuracy for the student dataset. The accuracy of such model is determined by comparing it to the classification results published by other researcher’s algorithms for academic prediction.

6 Conclusion and Future Work

The research work implemented an effective framework for predicting academic success. After carefully examining prior published works, this model combines the use of logistic regression for classification with LDA for dimensionality reduction. First, the LDA approach is used to our dataset with the goal of increasing classification accuracy. Although being a widely used approach, PCA's effectiveness in logistic regression has not garnered enough emphasis. In this research work, the integration of LDA and logistic regression can result better for predicting academic prediction. Also, the logistic regression model outperformed other algorithms employed in the work and findings from other studies in terms of prediction performance.