Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

According to Wook (2009): [1] “data mining is the process of automatically extracting useful information and relationships from immense quantities of data”. In this sense data mining is not about acquiring specific information; rather, it is about answering a question or approving theory. Another definition states: “Data mining algorithms can predict future trends and behaviours and enable businesses to make proactive, knowledge-driven decisions” [2]. These definitions argue that data mining could be very beneficial in every aspect of our lives.

One of the most famous applications of data mining is its use to support and improve higher education. In this context, data mining techniques can have a direct impact on the workforce being provided to industry and, as a result, can directly affect the economy. However, academic failure among university students is one of the biggest problems affecting higher education. One of the solutions involves predicting a student’s performance in order to help the student to choose the right course and increase their performance. Using data mining we can study and analyse information from previous students (which is the main subject of this paper). The various data mining techniques, which include classification, clustering and relationship mining, can be applied in educational settings to predict the performance of a student.

In this paper we present a data mining technique that analyses previous students’ data in order to predict future students’ performance on specific courses. To achieve this and support the development of higher education in our country, we use ID3 and J48 decision tree classification algorithms. The data set used in this paper is student data from the Computer Science Department, Computer and Information Sciences College, King Saud University of Saudi Arabia (KSU).

The rest of this paper is organised as follows. In Sect. 2, we present a brief background about the paper. Research related to our objectives is reviewed in Sect. 3. In Sect. 4, the methodology and experimental results are discussed. Conclusions are drawn in Sect. 5.

2 Decision Tree Classification Algorithms

Classification generally refers to the mapping of data items into predefined groups and classes [4]. It is also referred to as supervised learning and involves learning and classification. In the learning phase, training data are analysed by classification algorithm; during the classification phase, test data are used to estimate the accuracy of the classification rules [5]. Data mining involves various classification techniques such as decision tree algorithms, Bayesian Classification, and classification by Back Propagation, Support Vector Machine and K Nearest Neighbour.

Decision tree algorithms are used for gaining information for the purpose of decision-making and are one of the most widely used algorithms in data mining. The decision tree starts with a root node from which users take actions. From this first node, users split further nodes recursively according to the decision tree learning algorithm [6]. The final result is a decision tree in which each branch represents a possible scenario of decision and its outcome. The two widely used decision tree learning algorithms discussed in this paper are: ID3 and J48 (which is an implementation of C4.5).

2.1 ID3 (Iterative Dichotomiser 3) [6]

Quinlan Ross introduced ID3 in 1986. It is based on Hunt’s algorithm. The tree is constructed in two phases: tree building and pruning. ID3 uses information gain measure to choose the splitting attribute. It only accepts categorical attributes in building a tree model.

To build the decision tree, information gain is calculated for each and every attribute, with the attribute containing the highest information gain designated as a root node. The attribute is labelled as a root node and the possible values of the attribute are represented as arcs. Then all possible outcome instances are tested to check whether they fall under the same class or not. If all the instances fall under the same class, the node is represented with a single class name; if not, the splitting attribute is chosen to classify the instances.

2.2 J48 (Implementation of C4.5)

This algorithm is a successor to ID3 developed by Quinlan Ross. Unlike ID3 it handles both categorical and continuous attributes. It splits the attribute values into two partitions based on the selected threshold so that all the values above the threshold sit as one child and the remaining sit as another child. It uses Gain Ratio as an attribute selection measure to build a decision tree.

3 Data Mining in Higher Education

Universities process a vast amount of data each year, including information on the enrolment of students and academic performance. Therefore, these institutions require a suitable data mining tool to process past data and come up with solutions to resolve current situations. When data mining was introduced, the application of data mining techniques boosted many industries such as business, telecommunications, banking and education. In the education sector, data mining was defined as “the process of converting raw data from educational systems to useful information that can be used to inform design decisions and answer research questions” [3]. The prediction and prevention of academic failure among students has long been debated within each higher education institution. It has become one of the most important reasons for using data mining in higher education.

Performance prediction is a prominent field of research in educational data mining. There are many researchers who have addressed this problem, drawing similar conclusions to ours but using different techniques and tools. Vialardi et al. [7] developed a recommender system to support new students to choose better academic itineraries based on the acquisition of knowledge from past students’ academic performance. Their system was based on using the C4.5 algorithms to provide the rules used in the recommender system to decide if the enrolment of the student on a specific course has a good probability of success or not. Experiments on the performance of the system give a success rate for 80 % of cases in terms of predicting the results for each student.

Another piece of research [8] developed a model to predict students’ academic performance by taking into account socio-demographic and academic variables. The model, which is based on using the Naïve Bayes classifier and the Rapid miner software, reached accuracy of 60 % correct classification. Bharadwaj and Pal [9] conducted a study on student performance by selecting 300 students from five different colleges. By means of Bayesian classification method, which examines 17 attributes, it was found that factors including students’ living location and medium of teaching were highly interlinked with student academic performance.

From the range of data mining techniques that exist, classification, and in particular decision tree algorithms, are the most effective modelling functions since they can be used to find the relationship between a specific variable, target variable and other variables. Al-Radaideh et al. [10] proposed a prediction approach that used data mining classification techniques to evaluate data that may affect student performance. They used three different classification methods: ID3, C4.5 and the Naïve Bayes. The results indicated that the decision tree model had better prediction accuracy than the other two models.

Another research study used decision tree algorithms to predict the results of students in their final semester based on the marks they obtained in previous semesters. They used two algorithms, J48 (implementation of C4.5) and Random Tree, to predict the result of the fifth semester based on the marks obtained by the students in the previous four semesters. They applied the two algorithms to the students’ records and found that Random Tree is more accurate in predicting performance than the J48 algorithm.

4 Proposed Workflow

Data mining educational information is a vibrant and important field of research. Generally, the majority of research for the prediction of student academic performance shares the same methodology: data collection and preprocessing followed by the implementation of the classification algorithm and the analysis of results.

4.1 Data Collection and Preprocessing

Data preparation is the most complex step in the data mining process as it involves the collection of data and preprocessing. The collection of data includes the identification of data resources and the gathering of relevant data from these resources. Following this initial step, the collected data should be preprocessed to make it suitable for data mining algorithm implementation, which will also provide a greater understanding of the data collected. Preprocessing includes four important stages: data cleaning, which includes studying the data to remove noise (which are mostly the errors and biased values that can affect the accuracy of results) and dealing with unknown or missing data values; removal of duplicate values; smoothing noisy data and resolving inconsistencies.

Preprocessing also includes the integration of data from multiple resources, data values transformation from one domain to another and normalisation. Another preprocessing task is data reduction, which might be in the form of a feature selection to reduce the data dimensionality (which eliminates features that are not much use in model construction and produce better performance by reducing the number of used objects). Preprocessing also involves data discretisation, which reduces the number of values for a given continuous attribute by dividing its range into intervals. This helps to replace the numerous values of a continuous attribute by a small number of interval labels and supports concept hierarchy generation, which is used for discretisation. Finally, the resulted data should be transferred into a suitable file format for the desired application, for example the WEKA file format (.arff).

4.2 Algorithm Implementation

After preparing the data, the dataset is divided into training and test datasets. The training dataset is used to construct the classification model and the test dataset is used either to test the performance of the constructed classification model or to compare predictions to the known target values [12]. Different algorithms are used for the classification and prediction of student performance; however, the decision tree classification is the most frequently used algorithm for this type of application.

4.3 Analysis of Results

After applying the algorithm, the resultant data patterns should undergo further processing. The post-processing stage includes pattern evaluation to measure different parameters such as accuracy. Post-processing also includes pattern selection because not all of the resultant patterns are of interest so a further selection may be beneficial. Finally, it is important that the result is interpreted correctly so that it is effectively viewed and understood.

5 Case Study

Our proposed system is modelled on predicting students’ grades using former students’ grades as a basis. Our methodology consists of four steps:

  1. a.

    Collecting and preparing educational data.

  2. b.

    Building the models of ID3 and J48 classification.

  3. c.

    Evaluating the models using one of the evaluation methods.

  4. d.

    The best model will be chosen as the model for online prediction.

We used NetBeans IDE as the programming environment and JAVA programming language in implementation. WEKA classes were integrated as a jar library.

5.1 Data Preparation

The dataset used in this work was obtained from King Saud University of Saudi Arabia, Computer Science Department, Computer and Information Sciences College. It was a sample of CS Master student records from session 2003 to 2011. Microsoft Excel was used to save the collected data for preprocessing; it was then stored as an .arff file for data mining tasks.

Preprocessing the data was a time consuming process; the original data were stored in an un-normalised database table with a row each for student ID, term, course and grade for each course. As part of the preprocessing stage we:

  1. 1.

    Filtered all the unneeded columns (like registration term and registration place), as those have no influence on the required results.

  2. 2.

    Removed the old courses that are no longer available in the department course plan.

  3. 3.

    Filled the empty cells where values were missing with the average of the grades taken in the same course by other students.

  4. 4.

    Transformed data to a transactional format where each row was indexed by student ID and each column indexed by course. At the intersection of each row and column, a value of the grade taken for each course is recorded.

  5. 5.

    Finally, our data form included 112 records and 37 attributes (one student ID, four core courses and 32 elective courses). We split our data into 82 records for training data and 30 records for test data. These training data are used to construct the model while the test data is used in evaluating. Table 1 shows an example of the normalised data.

    Table 1. Example of the used dataset

5.2 Offline Classification

We selected two classification models: ID3 and J48. The performance of the two models was compared to decide on the best approach for online prediction. Each model was built for each elective course as a class attribute based on the grades in the core courses. The average accuracy in prediction grades for all elective courses was computed to decide the best model.

Table 2 shows the analysis of the classification algorithms and the accuracy percentages for each elective course.

Table 2. Comparison of classification algorithms

Table 2 shows that the performance of both algorithms is satisfactory; however, higher overall accuracy (83.75 %) was attained through J48 implementation compared to ID3, which had 69.27 % accuracy.

5.3 Online Classification

Based on the results shown in Table 2, the J48 algorithm was used for online prediction of students’ grades. Our small system allows students to ask for predicted grades for one or more courses. The classification models are built using training data and specific attributes, using students’ records to get a prediction. Attributes used to construct the model include the elective courses passed by a student together with core courses.

The system also recommends the best courses for students based on two criteria: maximum accuracy value and maximum grade value for all predicted courses. The goal is to help the students to choose a suitable course that enhances their academic attainment level.

The remainder of this section presents some examples of how the system interacts with students, how the results are displayed and an explanation of how the system works.

In Fig. 1, the student selected CS595 and CS597 as pass courses and CS530, CS528 and CS590 as predicted courses. Here, three classification models were built by using the training student’s dataset after he or she pressed the predict button.

Fig. 1.
figure 1

Example 1

After building the models, the system predicted the student’s grade in CS530, CS528 and CS590. In addition, it gave an accuracy percentage for this prediction by comparing predicted grades with actual grades:

The system also recommended the best courses based on maximum accuracy value and maximum grade value:

Here, the system has 100 % accuracy in predicting course CS530, so it is recommended for the student. Moreover, all three courses gave an A+ prediction, which means that all three courses are recommended.

In Fig. 2, a separate student selected some pass courses and obtained a prediction for the rest of the courses. The attributes used to construct these models were the pass courses together with four core courses and each of the rest of the courses, which were used as class attributes for each model. The system gave a predicted grade for each course along with an accuracy percentage:

Fig. 2.
figure 2

Example 2

The system also recommended the best courses for students based on maximum accuracy value and maximum grade value:

6 Conclusion

To support the aims of higher education institutions, we proposed a model that would predict student performance in future courses based on the grade of previous students in core and elective courses. The workflow for our system was presented. A case study was also presented, using student records obtained from the department of Computer Science in KSU, where programmes consist of core and elective courses. The prediction system is based on the decision tree classification methods, specifically ID3 and J48. First, a preprocessing step was applied to the data and then the model was built using training data for each elective course as a class attribute based on the grades in the core courses. By calculating the average accuracy for each algorithm we found that J48 achieved a better accuracy performance of 83.75 % when compared to an accuracy of 69.27 % for the ID3 algorithm. On that basis we used the J48 algorithm for online prediction to enable the students to get a prediction for one or more courses.