Keywords

1 Introduction

The supreme difficult task in EDM process is the ability to choose the correct technique. The decision requires capability on the technical opinion because there are numerous techniques for a technologist wishing to ascertain a model from the data. This miscellany will cause a serious challenge to a non-expert users who have no clear understanding of techniques available to solve existing domain problem [1, 2]. The classification of EDM techniques will simplify the understanding of the existing techniques. In addition, a number of EDM systems do not have intelligent assistance for addressing EDM process, instead it provides conceptual map. A brainy EDM system based on Case-Based Reasoning Recommendation System (CBR-RS) offers a flexible solution as a framework implicitly to the non-expert user by means of previous experiences. Case-Based Reasoning (CBR) is a collaborative method of knowledge-based RS [2, 3].

Educational data mining (EDM) process lays emphasis on analysing educational data to improve models for learning experiences and refining institutional efficiency [2]. EDM applies data mining methods such as classification, clustering, and association analysis. It also applies techniques from statistics, machine learning, text mining, web log analysis, etc. [2, 3]. However, the challenge is that there is no unified approach among EDM researchers. Non-expert users find it difficult to gain richer insights into available datasets and find it complex to apply EDM techniques for advanced data analysis to attain useful results because it is an inherently complex process [2, 4].

Therefore, non-expert users need clear information about EDM techniques and the parameters that are suitable to be applied according to the data quality. To address this issue, researchers propose construction of a recommender system using case-based reasoning (CBR), which contains information about previously solved EDM tasks [2, 5].

2 Recommendation System (RS)

Recommender System (RS) is a software tool or method that gives recommendation to help in identifying a set of elements that will be of curiosity and significant objects to users [6]. It is also demarcated as a system, which selects items appropriate for a precise user [3, 7]. The central theme of RS is grounded upon similarity measures of the data mining (DM) techniques [8]. RS is mostly applied in e-commerce, knowledge management systems and recently in institutions of higher learning to solve problems in admission process, prediction of student performance, recommendation of academic resources, student retention, and timely graduation [7]. However, there is still more work to be done, especially from the data mining point of view, where knowledge can be extracted and used to make academic recommendations [3, 9].

Previous work in educational data mining (EDM) has explored techniques for prediction, even though without taking contextual information into account [2, 3]. EDM is not a solitary answer for learning educational data [1]. The miscellany and intricacy of different EDM methods posture enormous trial to educational management decision-makers. EDM is an application of data mining techniques to a certain set of data from educational institutions (EIs) to address educational problems [2]. The velocity and volume of EDM techniques coupled with a lack of common terminologies make the selection of EDM techniques and appropriate variables more intractable. EDM is significant in mining student performance and registration data. Other critical uses include recommending enhancements to current learning practices and alarmed with emerging methods for discovering the unique types of data from academic teaching and learning environments [2, 3]. EDM is an art used to ascertain knowledge and methods to discover data assembled from educational surroundings.

EDM can forecast student performance precisely for categorizing strong and weak students [3]. A model to forecast students’ recital and make detailed recommendations that can impact the final examination success rate using decision tree classifier was developed from the data assimilated from Moodle e-learning system, transformed and discretization methods were applied to the data [2, 3]. The result of previous research in EDM encourages us to create a recommender system for educational data mining techniques offering institutions of higher learning an expert system for automated decision-making for solving problems of diversity and complexity of different EDM techniques [3].

3 Collaborative Filtering RS

Collaborative Filtering (CF) is an algorithm that centres its forecasts and endorsements on the rankings or behaviour of other users in the application. It simply aims to yield user precise recommendations based on other user’s complete rankings of items [4]. The Collaborative Filtering Recommendation System (CFRS) approach is extensively used in e-commerce to diminish information overload and recommends items to a specific user based on similar ratings on items from other users. It diagnosed similarities between users based on their ratings and produce recommendations [2, 10]. The studies further revealed that web mining has been a popular technique in CFRS to solve predictions problems about performance, registrations, completions of subjects and semesters in e-learning system [11].

Matrix factorization technique has been used for rating prediction and has also outperformed other methods known to be state of the art [11]. Additionally, matrix factorization approximates a matrix X by the product of two smaller matrices W and H, where \(X\approx W{H}^{T}\) where the rating is predicted by the expression:

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{r}_{ui} = \mathop \sum \limits_{k = 1}^{K} w_{uk} h_{ik} = \left( {WH^{T} } \right)_{u, i} $$
(1)

where \({w}_{uk, }{h}_{ik}\) are elements of W and H, which are the model constraints, u, is a vector containing the K latent features. Matrix factorization is the furthermost effective insights into the latent factor model, which is able to depict students’ performance and courses [3, 12]. Matrix factorization is reported to be effective and efficient technique for collaborative prediction for student performance in many recommender system algorithms [11].

Cosine similarity measure is one approach preferred in collaborative filtering recommender systems. This approach contemplates items as vectors of an n-dimensional space and compute resemblance as the cosine of the angle that they form expressed as [7, 13, 14]:

$$\mathrm{sim }\left(\mathrm{x},\mathrm{y}\right)=\mathrm{cos}\left(x,y\right)=\frac{\left(x\cdot y\right)}{\left|\left|x\right|\right|\left|\left|y\right|\right|}=\frac{\sum_{c=1}^{n}{R}_{xc}{R}_{yc}}{\sqrt{\sum_{c=1}^{n}{({R}_{xc})}^{2}}\sqrt{\sum_{c=1}^{n}{({R}_{yc})}^{2}}}$$
(2)

where this · indicates vector dot product and ||x|| is the norm of vector x. Cosine similarity is vector based on linear algebra rather than a statistical approach [14]. Moreover, Pearson correlation similarity differs with cosine similarity in that there is vector normalization that occurs in their equation. Pearson correlation similarity method quantifies the degree to which two variables linearly narrate to each other and is expressed as [15, 16]:

$$\mathrm{sim}\left(\mathrm{x},\mathrm{y}\right)=\frac{\sum \left(x,y\right)}{{\sigma }_{x}*{\sigma }_{y}}=\frac{\sum {s\in s}_{xy}({r}_{x,s}-\overline{{r}_{x}})({r}_{y,s}-\overline{{r}_{y}})}{\sqrt{{s\in s}_{xy}{({r}_{x,s}-\overline{{r}_{x}})}^{2}}\sqrt{{s\in s}_{xy}{({r}_{y,s}-\overline{{r}_{y}})}^{2}}}$$
(3)

where σ is the standard deviation, \(\overline{{r}_{x}}\) and \(\overline{{r}_{xy}}\) are the mean ranking given by the user x and y, whereas, rx,s and ry,s are also rating given by the user x and y.

4 Content-Based RS

Content-Based RS (CBRS) uses item information and user preferences in order to predict subjects and items that can be exciting for new users [7]. CBRS recommend items related to the ones earlier chosen by the users [7]. This is used to recommend the user items related to the ones the user has favoured historically, guided by motivating objects in a space of conceivable options [3]. Bayesian network model has been used for CBRS for modelling user preferences, grouping items into different groups and creating a ranking list for each group.

CBRS does not depend on other user’s data to avoid the issue of cold start and sparsity, the problem relies on the requirements of recommended items structure. It struggles in finding user’s new items of interest and challenged by complex attributes. CBRS is suffering from over-specialization, limiting users to discover new and different recommendations [7, 17]. It directs users to recommendations that are already known to them. Therefore, due to massive data rising in the educational domain, with different and complex attributes contained herein, the CBRS gets disqualified in assisting this study to bring the solution for better selection of the educational data mining techniques to a non-expert user [7].

5 Knowledge-Based RS

The Knowledge Based Recommendation System (KBRS) method applies knowledge about the users and items to generate a recommendation according to the user’s requirements [7]. The research study considers knowledge-based RS since they are notable case-based and the approach overcomes the cold-start problem by getting the system user likings unequivocally and recommending products grounded on system’s built in acquaintance [18]. The KBRS has no problem with ramp-up and grey sheep. Additionally, the quality of recommendations does not hinge on the amount of historical data but is based on the knowledge built, however its weakness is based on the need for knowledge design [16].

CBRS suffers from weaknesses of content limitation and overspecialization while CFRS suffers from cold start, scalability and sparsity problems. Hence KB-RS overcomes the above shortcomings by knowledge reasons [16]. The research further reveals that the most frequently used KB-RS methods are rule-based reasoning (RBR) and case-based reasoning (CBR). Hence the study adopted CBR technique since its ability to study from its prior experience to solve problems and also takes after reasoning model of human beings, which enhances the accuracy of the recommender solutions [16].

6 Case-Based Reasoning RS

Case-Based Reasoning (CBR) is a well-known type of knowledge-based (KB) approach, which specializes in solving problems and influencing the decision-making in learning from the past. This tool is very effective and helps users in making better decisions timeously. The Cased-Based Reasoning Recommender System (CBR-RS) is one of the most progressive methodologies derived from a problem-solving process, where existing problems are stored and the solution applied to them [19, 20].

7 Research Methodology

The research study makes use of the following two research methodologies, systematic literature review and CBR-RS prototype development.

8 Systematic Literature Review

The research study embraces Systematic Review (SR) Methodological Analysis to critically examine aforementioned research and understandings as recommended [2, 3]. SR Methodological Analysis is chosen to produce suggestion from published papers and discover literature pertinent to RS in academic journals, books and conference proceedings [2, 3].

9 Prototype Development

The research study develops a CBR-RS prototype as an experiment with EDM technique recommendation using JCOLIBRI framework. The dataset contains 3500 instances from the first year to third year level. The development first looks at the integration of Eclipse IDE with Colibri studio to enable execution and manage Java source code. Second, looks into the Case Designer tool to define structure of the cases, where JCOLIBRI is used to define case description, solution, result and justification. Third, introduces similarity measure algorithm for similarity configuration of the CBR application. Finally, prototype the interface that allows a non-expert user to define a query based on the problem domain, define similarity configuration, retrieve from a developed case-based system, adapt the solution by adding or removing from components, revise and retain the solution for future use.

The prototype used a CBR cycle inference process that includes four main steps, which are [2, 3]:

  • Retrieve—looks at up-to-date comparable case from case library.

  • Reuse—retrieved case extracted to solve present similar new problem.

  • Review/Revise—retrieved case is evaluated and adapted when is necessary to provide solution to the new problem.

  • Retain—a new case is saved in the case base to preserve the newly adopted decision.

The expression for the case prototypically is demonstrated as [3]:

CASE = (Situation, Solution, Result), which comprises Situation describing a given case, Solution giving a diagnosis and recommendations to the user and Result is the outcome of applying the solution.

10 Similarity Measure

Numeral methods exist for mining cases and adaptation for application in the case library such as: k-nearest neighbours (k-NN), decision trees, artificial neural network, etc. The research implemented k-NN algorithm for mining cases from the case library due to its simple logic and the proximity function using Euclidean metrics to determine the closest case as expressed in the formula [21, 22]:

$$ d\left( {x, x^{^{\prime}} } \right) = \sqrt {\mathop \sum \limits_{i = 1}^{n} (x_{i} - x^{^{\prime}}_{i} } )^{2} $$
(4)

where d is a similarity metric, x is an identified problem and x′ is the case. KNN algorithm searches case-base observations for the K instances that most closely resemble the new problem and extract the cases for review and adaptation to solve the new case. The k-NN algorithm is implemented as:

  1. 1.

    Load training data (x) with y as a class label and test data (x′)

  2. 2.

    Specify parameter k

  3. 3.

    Calculate Euclidean distance as

    For i=1 to n do

    Compute distance \(d\left( {x, x^{^{\prime}} } \right)\); // where d denotes Euclidean distance between points

    End for loop

  4. 4.

    Compute set I containing indices for k smallest distance //get nearest neighbours

  5. 5.

    Return majority label for {y, iϵI) // make predictions.

The main task in this process is to define similarity measurement in case-retrieval phase. The Nearest Neighbour algorithm (K-NN) applied in this research to combine feature similarity and relevance. A score was computed for each case in the case base using this formula in (1). The case with the uppermost score is the nearest neighbour. Table 1 defines case structure for the commission of the practice of creating CBR-RS application. The case designer in COLIBRI studio is to specify the structure of the cases, which contains collections of numerous attributes with description.

Table 1 Case structure of CBR-recommender system

11 Systematic Review

The proposed framework in Fig. 1 of recommender systems illustrates the identified gap in the literature reflecting the challenges, limitations and proposed methods to overcome them. The researchers propose framework of RS in Fig. 1 through systematic review of the literature to summarize recommender systems used in solving the problems of admission process, prediction of student performance and student retention. The proposed framework outlines problems in each RS approach and techniques used to solve the problem. Systematic review in this section allows the researcher to form a strong argument that presents comprehensive and logical state for the framework [2]. Systematic review is a technique of recognizing, assessing and understanding all research articles significant to a research question or topic area of curiosity [2]. The systematic literature review in this study produces existing evidence of EDM techniques supported in Fig. 1.

Fig. 1
figure 1

Framework of recommender system

12 Proposed CBR-RS Architectural Framework

The proposed CBR-RS architecture framework in Fig. 2 demonstrates the flow and communication between the three layers, presentation, service, and data layer. The layered design permits scalability and ease of maintenance since responsibilities and accountabilities are distributed. Presentation layer provides a front-end interface allowing communication between end user (non-expert user) and the application. The presentation layer has been designed using swing java, java scripts, and J2EE to allow users to interact simply with the educational recommender systems. The presentation layer allows the non-expert user to identify problem and define query as input.

Fig. 2
figure 2

CBR-RS architectural framework

The service layer offers APIs for machine learning algorithm and JCOLIBRI framework for building Case-based reasoning (CBR) system. The data layer allows the service layer to have access to the case-based library while integrating the system with designed ontology from protégé. Data extractor is located in this layer and maintains execution task such as insert, update, delete and performs all enquiry operations in the educational recommender system.

Furthermore, to evaluate RS accuracy precisely researchers use information-retrieval classification metrics to measures the probability that correct or incorrect decision has been made, the metrics are known as precision and recall as given in Eqs. (5) and (6) and F1-Measure in Eq. (7). To evaluate the CBR-RS application, we used evaluation metrics to scrutinize the performance of the algorithm:

$$\mathrm{Precision} =\frac{TP}{TP+ FP}$$
(5)
$$\mathrm{Recall}=\frac{TP}{TP+ FN}$$
(6)
$$\mathrm{F}1-\mathrm{Measure}=2\frac{PR}{P+R}$$
(7)

The implementation of KNN algorithm basic structure classification in CBR-RS prototype is appropriate in finding similarity measures when k = 3 and 5 with accuracy score of 90% as depicted in Table 2. Precision was computed through Eq. (5) for chosen appropriate cases divided by the numeral cases while recall was computed using Eq. (6) as the chosen appropriate cases divided by the numeral cases that have higher similarity. Higher precision inferred that the probability of the retrieved cases is importantly high.

Table 2 Accuracy based on CBR-RS identified problems from Table 1

13 Conclusion

A CBR-RS was successfully prototyped, implemented, and the CBR-RS application was successfully assessed using F1-measure metrics that calculates harmonic means between precision and recall to test the performance of the algorithm implemented through JCOLIBRI. The inclusive results display good performance of the case-based reasoning recommender system using classification algorithm. However, it may be further tested on numerous diverse datasets. These architectures and models proposed served as a platform to develop and implement CBR-RS application using JCOLIBRI framework in order to permit non-expert user to define query, retrieve from a case-based library and adapt the solution by revising or retaining some components for future use.