Keywords

1 Introduction

An important stage in building knowledge-based systems (KBS) is the choice of knowledge representation model for creating knowledge base. The first KBS appeared in 1970-s were based on rules. When implementing rule-based KBS, developers faced many challenges. The major one is the problem of knowledge elicitation and formulating it in the form of a set of rules. Most often, the experts intuitively make decisions based on their vast experience, without hesitation, what rules they apply in this or that case. Partitioning a specific behavior of an expert into separate building blocks such as rules is a very complicated problem requiring high skilled specialists. Thus, acquisition of knowledge is the key problem of rule-based reasoning (RBR) systems.

However, since 1980s an alternative reasoning paradigm has increasingly attracted more and more attention. Case-based reasoning (CBR) solves new problems by adapting previously successful solutions to similar problems, just as a human does it. The paper [1] of Shank is widely cited to be the origins of CBR. In this work it has been proposed to generalize knowledge about previous situations and store them in the form of scripts that can be used to make conclusions under similar conditions. Later Shank continued to explore the role that dynamic memory about previous situations (cases), represented as a knowledge container, plays both in the process of decision making and in the process of learning [2]. The model of dynamic memory became a basis for the creation of a number of other systems: MEDIATOR [3], CHEF [4], PERSUADER [5], CASEY [6] and JULIA [7].

CBR allowed to overcome a number of restrictions inherent to the rule-based systems [8]. The acquisition of knowledge in CBR is reduced to identification of essential features and accumulation of decision making stories (cases) described by the features, that is much easier task than building an explicit knowledge model of the application domain. There are different ways of presenting and storing cases – from simple (linear) to complex hierarchical ones.

At the same time there are two essential shortcomings of traditional CBR. The first one is that description of the cases does not usually take into account the deeper knowledge of the application domain. The second one reveals itself when the number of cases accumulated in the knowledge base becomes great. The large case base results in reduced system performance. Very often the search dictionaries and algorithms for determining similarity are needed to debug manually. These shortcomings could neutralize the benefits of case-based approach.

In order to overcome these disadvantages CBR has been widely integrated with other methods in various application domains [9, 10]. Some systems (ADIOP, CADRE, CADSYN, CHARADE, COMPOSER, IDIOM, JULIA) integrated CBR with constraint satisfaction problem (CSP). Some systems (ANAPRON, AUGUSTE, CAMPER, CABARET, GREBE, GYMEL and SAXEX) combined CBR with RBR. It is worth to note that the first prototype of the system, integrating CBR with RBR was CABARET [11]. In [12] it is proposed possible connection of CBR with RBR and its application to the financial domain implemented in the MARS prototype system.

In present paper, an original approach is proposed in which it is possible the dynamic interaction of both CBR and RBR models of knowledge representation and its application to the task of cases classification. Section 2 provides a hybrid model of dynamic interaction of CBR and RBR. In Sect. 3 we give the method of transformation of a set of cases into the system of fuzzy classification rules and discuss its improvement. Section 4 contains some experimental results and their explanation.

2 Dynamic Integration of CBR and RBR

The interaction between the employees of the organization on solving the important tasks generates creation of new knowledge. In the process of knowledge transformation both its forms are used: non-formalized and formalized knowledge. While creating new knowledge the formalized (explicit) knowledge and non-formalized (tacit) knowledge interact in four ways: socialization, externalization, combination, internalization according to SECI model (Socialization, Externalization, Combination, Internalization [13]. Sequential alternation of four processes - socialization, externalization, combination, internalization - creates a spiral of knowledge. The process is developing by spiral consistently through these four stages.

In the previous section we described two basic approaches to knowledge representation - case-based and rule-based approaches. Both models have their advantages and disadvantages. Therefore, in practice it would be reasonable to join advantages of both approaches. The combination of these approaches could be represented as cyclic knowledge transformation from case-based form to the rule-based and vice versa, as shown in Fig. 1, by the analogy with Nonaka-and-Takeuchi spiral [13].

Fig. 1.
figure 1

Transformation of knowledge between case base and rule base

At the initial stages of the KBS life cycle, when there is no insight into the application domain, it is advisable to use a case-based model in which the knowledge is represented by relevant precedents (cases) of decision making (stage I). To create a precedent it is necessary to determine a set of features that uniquely determine the situation and the specific solution made in this situation. A simple linear form of the case can be presented as \( (n + 1) \) - dimensional tuple

$$ CASE = (x_{1} ,x_{2} ,x_{3} , \ldots ,x_{n} ,s), $$
(1)

where \( x_{1} ,x_{2} ,x_{3} , \ldots ,x_{n} \) are the features identifying the situation, s is a solution to the problem defined in the case. Subsequently, with the deepening into the problem domain, possible complication of the case structure is possible, through, for example, the introduction of hierarchy and other relationships between the features.

Already at the earliest stages of the KBS development it is possible to extract, adapt and apply cases in solving the current problems. After applying the case from the knowledge base to the current situation, a new precedent is recorded to the case base for future use. Note that in terms of Nonaka-and-Takeuchi cycle, we do not carry out the formalization of knowledge here. Moreover, we do not try to analyze why the solution is made (i.e. we do not try to transform tacit knowledge into the formal representation), but we simply fix the fact of tacit knowledge in a particular case.

As soon as we accumulate sufficient volume of decision-making cases and deepen our knowledge on the basis of analyzing the case base, it is possible to carry out the formalization of knowledge, contributing to the transformation of tacit knowledge into the explicit one (stage II). Machine learning offers a variety of approaches to extract knowledge from data (simple cases). In this article we consider the problem of obtaining a set of formal rules for classification of cases. Classification allows, on the one hand, to arrange a space of cases in order to improve the retrieving performance. On the other hand, classifying rules are formal knowledge of higher level than simple cases. They can be meaningfully analyzed by an expert who can get possibly new information for decision-making in the future.

If we have simple cases of the form (1), we could apply, for example, decision tree method [14] to this sample of cases, in which the intermediate nodes of the tree correspond to the features \( x_{1} ,x_{2} ,x_{3} , \ldots ,x_{n} \), and terminal nodes are the classes to which solutions belong. Another method of forming a set of fuzzy rules for cases classification is given in the next section and based on the original papers [15,16,17].

The transition to a rule-based model means that we obtain the explicit (formalized) form of knowledge, able to explain the cause-and-effect relationships in the application domain. Such reasoning could be presented to experts for analysis and interpretation (stage III of knowledge transformation cycle). The important point at this stage is the resolution of conflicts arising in a system of rules (conflicting rules lead to different conclusions for the same premises).

At the stage IV the accumulation of new cases takes place using the received version of a decision support system. New cases can be obtained from application of logical inference methods to the knowledge base consisting of rules and facts (cases), and also as a result of CBR-cycle, adapting old cases from the knowledge base to new problems. Getting new cases means the use of formal (explicit) knowledge and turning them into the implicit (tacit) form.

The considered cycle of knowledge transformation is repeated further on the next level. As soon as our knowledge about application domain deepen it is possible to make the structure of cases more complex by transition from the linear parametric form to the hierarchical or more complex logical form. At each stage of the proposed cycle, the conversion is made of the implicit knowledge into the explicit form, or vice versa, and each new turn of the spiral brings additional new aspects and dimensions to the knowledge of the application domain.

3 Method of Cases Classification Through Fuzzy Rules

Here we consider the method of obtaining fuzzy rules from cases in application to the problem of cases classification. The idea of the method is as follows. Consider arbitrary case of the form (1), where the features \( x_{1} \in X_{1} ,x_{2} \in X_{2} ,x_{3} \in X_{3} , \ldots ,x_{n} \in X_{n} \), and the solution variable s belongs to a certain set of classes \( D = \{ d_{1} ,.d_{2} , \ldots ,d_{m} \} \). Without loss of generality we assume that the features are numerical and sets \( X_{1} ,X_{2} ,X_{3} , \ldots ,X_{n} \) are continuous numerical intervals.

Define linguistic variables \( V_{i} \) corresponding to the features \( x_{i} \), \( i = \overline{1,n} \). Let each variable \( V_{i} \) has J term-values \( A_{i}^{(j)} \) that are fuzzy sets determined by membership functions \( \mu_{{A_{i}^{(j)} }} \), on the universal sets \( X_{i} \), \( j = \overline{1,J} \). Then the rule \( Rule \) corresponding to the case (1), is formulated as follows:

$$ Rule:\quad IF\;V_{1} = F(x_{1} )\& \ldots \& V_{n} = F(x_{n} )\;THEN\;R = d $$
(2)

where R is the classifying variable, \( d \in D \), \( F(x_{i} ) \) is a fuzzy term-value which the linguistic variable \( V_{i} \) accepts as a result of fuzzification of the feature \( x_{i} \):

$$ F(x_{i} ) = A_{i}^{{(j^{*} )}} ,{\text{where}}\,j^{*} = \arg \,\mathop {\hbox{max} }\limits_{{j = \overline{1,J} }} \mu_{{A_{i}^{(j)} }} (x_{i} ). $$
(3)

Then, knowledge base consisting of a set of rules from the initial case base is formed in accordance with the following algorithm.

An algorithm of obtaining fuzzy classification rules from case base

  1. Step 1.

    Define the number and the type of membership functions and universal sets \( X_{i} \) for each linguistic variable \( V_{i} \) corresponding to the feature \( x_{i} \).

  2. Step 2.

    Determine fuzzy rule \( Rule \), of the form (2) for each case of the form (1).

  3. Step 3.

    Assign truth degree \( TD(Rule) \) to each rule. The simplest way to do this is to compute minimum of all \( \mu_{{A_{i}^{{(j^{*} )}} }} (x_{i} ) \) for every \( x_{i} \) and fuzzy sets \( A_{i}^{{(j^{*} )}} \) defined by the formula (3) as a result of fuzzification for each rule \( Rule \) of the form (2). The result is obtained in the following way

    $$ TD(Rule) = \mathop {\hbox{min} }\limits_{{i = \overline{1,n} }} \;\mu_{{A_{i}^{{(j^{*} )}} }} (x_{i} ) = \mathop {\hbox{min} }\limits_{{i = \overline{1,n} }} \mathop {\hbox{max} }\limits_{{j = \overline{1,J} }} \mu_{{A_{i}^{(j)} }} (x_{i} ). $$
    (4)
  4. Step 4.

    Resolve conflicts between the rules. After the step 2 we obtain a set of rules uniquely corresponding to the set of cases. But this set can contain subsets of the rules with the same premises. These rules can have the same or different conclusions (solutions \( d \in D \)). If for all rules in such subset we have the same conclusion we simply remove all duplicate rules from the set except one. For the rules with conflicting conclusions, we propose two different strategies for classification problem. In each case we leave only one rule in the subset with conflicted conclusions in the rule base. Let we have a subset of rules \( Rule_{1} \),…,\( Rule_{m} \) with the same premise. Divide the set of indexes \( I = \left\{ {1,2, \ldots ,m} \right\} \) into K subsets \( I_{1} \),\( I_{2} \),…,\( I_{K} \) corresponding to the classes of solutions \( d_{1} \), \( d_{2} \),…, \( d_{K} \) in the conclusions of the rules (3), \( \dim I_{k} = m_{k} \), \( \sum\limits_{k = 1}^{K} {m_{k} } = m \).

The first strategy of resolving conflicts is to obtain the optimal rule

$$ l^{*} = \arg \mathop {\hbox{max} }\limits_{l \in I} TD(Rule_{l} ), $$
(5)

and then to choose a solution d corresponding to the conclusion of this optimal rule. The second strategy of resolving conflicts is a bit more complex. The optimal solution \( d_{{k^{*} }} \) that is assigned to the conflicting set of rules could be obtained as follows

$$ k^{*} = \arg \mathop {max}\limits_{k} (\sum\limits_{{i \in I_{k} }} {TD(Rule_{i} )} ). $$
(6)

4 Experimental Results

In this section we investigate the proposed approach with the data set Iris (available at http://archive.ics.uci.edu/ml/) with 150 cases, characterized by 4 features (\( n = 4 \)), which are classified into 3 classes (\( K = 3 \)) through classifying attribute.

In the first experimentation, we randomly selected training sample of given size (120 cases) from Iris set, while the rest of the data (30 cases) was used as a control sample. Then, for each feature, minimum and maximum values were determined from the training sample. The obtained intervals were used as the universal sets for corresponding linguistic variables. Then we divided each universal set \( X_{i} \) into equal intervals for three variants of choosing the number of term-values \( A_{i}^{(j)} \), \( j = \overline{1,J} \): \( J = \) 3,5 and 7 (Jdoes not depend on i) for membership functions (3MF, 5MF and 7MF). We also considered two classes of MF – linear (class T) and quadratic (class S).

Then a set of fuzzy classifying rules obtained as a result of machine learning algorithm was applied to control sample to compute an accuracy of classification as a ratio of correctly classified cases to the general volume of the control sample. In Table 1 we presented the results of classification accuracy for two strategies of control sample classification with linear membership functions (3MF, 5MF and 7MF).

Table 1. Classification accuracy for two strategies of control sample classification with linear membership functions (3MF, 5MF and 7MF)

From Table 1 we can see that the best results were obtained for 3MF. Reducing in the accuracy for 5MF and 7MF could be explained by the fact that the number of possible combinations of preconditions in the rules increases significantly, so classification on the control sample fails because of the lack of appropriate rule for the classified case. Therefore we compare two types of control cases classification. For the first strategy we apply each rule only to exactly corresponding control cases. If there are no appropriate rules, incorrect classification is registered. For the second strategy, if there is no appropriate rule to classify the control case, we exclude one feature and try to find classifying rule for reduced number of features (Here we assume that the excluded feature is not important for classification). The first appropriate rule is applied for classification of the corresponding case. The second strategy permits to improve the results of cases classification. The experiments were performed ten times, each time having arbitrary set of cases for training.

In Table 2 we give classification accuracy for linear and quadratic membership functions (class T and S for 3MF). Here we can see that for smoother membership functions (class S) the classification accuracy is better.

Table 2. Classification accuracy for two strategies of control sample classification with linear and quadratic membership functions (class T and S for 3MF)

Figure 2 gives visual illustration of comparison between the three classes of linear membership functions (3MF, 5MF and 7MF) for sequentially reducing volume of training sample. Here we can observe that the classification accuracy for sequentially reducing volume of training sample (from 90 cases to 30) remains quite high. The results could be explained by the fact that the number of rules obtained from the initial training sample of 150 cases was not more than 17 in all the experiments. If we randomly choose less amount of cases for training, the information content (or representativeness) of the sample remains still high.

Fig. 2.
figure 2

Classification accuracy for sequentially reducing volume of training sample for linear membership function (T) for 3MF, 5MF and 7 MF

Thus, we have proposed rather simple method of generating classification rules from case base that can be used in the cycle of knowledge transformation in the KBS. Despite the simplicity we have achieved quite high accuracy of classification even for sequentially reducing volume of training samples. We believe that accuracy of classification can be improved further through the extension of the rule structure (2) by introducing disjunctions into a conjunctive premises as well as considering the possibility of reducing the number of the rule premises by leaving the most informative ones, that is the subject of further research.