Keywords

1 Introduction

It is often said that we are drowning in data but starving for knowledge [1]. Extraction of information from data facilitates knowledge building. Information which can be termed as a subset of data stimulates action in an entity, whereas knowledge defines the action of an entity in a particular setting [2]. A number of researchers have classified knowledge on different basis, sometimes defining the manner of codification and occurrence [3], or on the basis of know-what, know-how, know-why and know-when aspect of knowledge [4]. Some have even mapped knowledge in diverse domains [5].

There are varieties of ways for representing knowledge [6]. Using production rules written in form of IF-THEN rules is one of the most popular approach used for knowledge representation [7]. The IF-THEN rules adopt a modular approach, each defining principally independent and a relatively minor piece of knowledge. A rule-based system will include universal rules and actualities about the knowledge domain covered.

Knowledge building in education domain can be achieved by adopting procedures that optimize their functioning.

The research work is undertaken with an objective of deducing relevance index for inferred knowledge. The case of education sector is taken in particular while inferring knowledge related to student’s academic performance in a technical subject at level of higher education. It is important to deduce relevance index to inferred knowledge as it is a clear depiction of the existing system and further helps in decision making.

The paper is organized as follows. Section 2 elaborates the methodology adopted for rule induction and further rule evaluation in the higher education set-up. Finally, the conclusions are drawn and presented in Sect. 3.

2 Adopted Methodology in Higher Education Scenario

Educational organizations strive to achieve the higher academic output for the students. Many researchers have strived hard to predict the factors affecting the academic results of students [8,9,10,11,12]. Identification of such critical parameters, which could improve the academic attainment of students, supports an effective academic planning.

In case the individuals of a population can be separated into different classes, generation of a classification rule is a system in which the individuals of the population are each allocated to one or the other class.

In the study, knowledge is represented through classification rules [13], which exist in the form of IF-THEN rules. The work starts by identifying the variables and collecting the data in the context of these variables. The values of the attributes are then encoded on an 8-level scale. Rule induction is initiated through JRip, which implements a propositional rule learner, repeated incremental pruning to produce error reduction (RIPPER). The rules are then evaluated on the basis of the metrics Net Benefit which takes into account both classification and misclassification witnessed by the knowledge rule.

2.1 Variable Identification and Data Collection

This dataset has 5000 records and five independent attributes, all of which are categorical. The independent attribute names in the dataset are as follows: ContinuousEvaluationMarks, SGPA_II, Practical_orient, Attendance, Base_Sub_Marks.

The independent attributes affect the dependent attribute of End_Term_Marks and are reflected in Table 1.

Table 1 Attributes of the study

The attributes were encoded on the 8-level scale, depicted in Table 2.

Table 2 Encoding of the attributes

2.2 Rule Induction

In the year 1995, Cohen proposed JRip which implemented a propositional rule learner, repeated incremental pruning to produce error reduction (RIPPER) [14].

Error reduction can be witnessed in JRip since the process of incremental pruning examination of the classes is done in the increasing order of their size. The initial ruleset is generated on the basis of incremental reduced error. Initially, JRip (RIPPER) treats all the instances from the training dataset related to a particular judgment as a class and deduces a ruleset that covers all the members of that class. The procedure is repeated for all the classes.

Initialization

Initialize RS = {}, and from each class from the less frequent one to the most frequent one.

Repeat

{

  1. 1.

    Building phase: Repeat the phases given below, grow phase and prune phase until there are no positive instances or error rate increases more than 50%.

    1. 1.1

      Grow phase: Follow the greedy approach of adding conditions to the rule until the accuracy of the rule reaches 100%.

    2. 1.2

      Prune phase: Incremental pruning approach should be followed for each rule. The pruning metrics can be measured in terms of 2p/(p + n)  1 , where p number of positive instances covered in the ruleset and n number of negative instances covered in the ruleset.

  1. 2

    Optimization Phase: On generation of the initial ruleset {R i }, two variants of each rule are to be generated and pruned from randomized data using procedures Grow and Prune. The generation of the first variant is done from an empty rule, and the next variant is created by adopting a greedy approach of adding conditions to the original rule. The metrics of Description Length (DL) are computed for each variant. The final representation of the ruleset is done by the rule having the minimal DL. After the examination of all the rules in R i , Building phase is again used for generating more rules if there are still residual positives.

  1. 3

    Those rules that increase the DL of the complete ruleset are then deleted from the ruleset, and the final ruleset is added to RS.

}

In the study, JRip was implemented using Weka 3.8.0 and the following ruleset of 87 rules was generated. A snapshot of the rules and the output achieved is shown in Fig. 1.

Fig. 1
figure 1

Weka implementation

2.3 Rule Analysis and Interpretation

For each of the 87 rules acquired by implementing JRip on the dataset, the value of classification (true positive, TP) and misclassification (false positive, FP) was recorded [15].

  • True positive (TP)the number of examples satisfying A and C

  • False positive (FP)the number of examples satisfying A, but not C

where Aantecedent of the rule, Cconsequent of the rule

The rules were further evaluated on the basis of Net Benefit [16] considering a range of thresholds and calculating the NB across these thresholds. The result was then plotted against Rule Number and Net Benefit. For each threshold P t , the Net Benefit was calculated as per Eq. 1:

$$ {\text{Net Benefit}}\;\left( {\text{NB}} \right) = \frac{\text{TP}}{\text{N}} - \frac{\text{FP}}{\text{N}} \left( {\frac{{P_{t} }}{{1 - P_{t} }}} \right) $$
(1)

On evaluating the rules for Net Benefit for different values of P t , the following observations were met and are depicted through Fig. 2.

Fig. 2
figure 2

Consolidated plot across P t values (0.1, 0.5, 0.6)

On cross-tabulating the rule count for P t  = 0.1–0.6, the NB values for all the 87 rules can be witnessed in Table 3.

Table 3 Analysing NB for the rules

The consolidated plot depicting the NB values for all 87 rules across thresholds (P t  = 0.1, 0.5, 0.6), shown in Fig. 2, depict that the Net Benefit of the rule having maximum Net Benefit across all the threshold values of P t (P t  = 0.1–0.6) decreases as we increase the threshold value (P t ) from 0.1 to 0.6. In fact at P t  = 0.6, some of the rules exhibit the negative NB.

P t  = 0.5 signifies that FP and TP are weighted equally. Hence, maintaining a P t  = 0.1 signifies assigning more weightage to the classification, i.e. true positive (TP), rather than to misclassification, i.e. false positive (FP).

The study selects P t  = 0.1. Maximum NB and distinct peaks are achieved on selecting a P t  = 0.1. It is also observed that NB value decreases as we move from P t  = 0.1 to P t  = 0.6. Moreover, the NB value also shows a negative growth in case of P t  = 0.6. P t  = 0.6 signifies the assignment of more weightage to misclassification rather than to classification.

However, for P t  = 0.1, the rule that acquires the highest benefit is:

Base_Sub_Marks = 010 and Attendance = 001 and ContinuousEvaluationMarks = 101 => End_Term_Marks = 011

On decoding the rule, it can be stated as:

Base_Sub_Marks is between 41 and 50 and Attendance between 75.1 and 77% and ContinuousEvaluationMarks between 20 and 23 => End_Term_Marks between 51 and 60.

The relevance index assigned to the knowledge rule is on the basis of its Net Benefit (NB), keeping into account the classification and misclassification done by the rule. The Net Benefit (NB) for the above said rule at a threshold value P t of 0.1 is 0.002689.

The reason for using Net Benefit (NB) to assign relevance index to inferred knowledge is:

  1. 1.

    The prediction model incorporates consequences and hence can be used to infer a decision on the usage of the given model.

  2. 2.

    It can be directly applied to the validation set and does not need any additional information.

  3. 3.

    Even if the model outcome is in binary or continuous form, the method for evaluation is applicable.

3 Conclusion

Rule induction can deduce the relationship existing between the various attributes. The influence of the independent variables on the dependent variable can be observed. Rules with a higher relevance index are much more apt to the system and can be used for appropriate syllabus planning, designing structured lesson plans, structuring criteria for the evaluation of the student’s performance and adoption of suitable teaching pedagogy for the improvement in the overall academic performance of the students. The knowledge derived in the form of rules bears relevance in the context of the domain and hence can be added to the knowledge set that can supplement the process of decision making in a knowledge base environment.