Keywords

1 Introduction

Case-based reasoning (CBR) [18] aims at solving a new problem—the target problem—thanks to a set of cases (the case base), where a case is a pair consisting of a problem and a solution of this problem. A source case is a case from the case base, consisting of a source problem and one of its solutions. The classical approach to CBR consists in (i) selecting source cases similar to the target problem and (ii) adapting them to solve it. In such a view, the target and the source are compared in terms of similarity, which is a (two-valued or gradual) binary relation, while information about the way they differ is not really considered in general.

In this paper, in addition to the binary relation of similarity (which is the basis of approximation), two other relations are considered: the betweenness and the analogical proportion, which in a way or another leave room to dissimilarity.

Betweenness is a ternary relation stating that an object is “between” two other objects. It is used as an interpolation principle: if the target problem is between two source problems then it is plausible that a solution of it is between the solutions of these source problems. In a graded setting, this suggests that if the target problem is closer to one of two source problems, its solution should be closer as well to the solution to the corresponding source problem. In the Boolean setting, betweenness view may lead to several potential solutions, except if the solutions of the two source problems coincide.

Analogical proportion is a quaternary relation: four objects a, b, c and d are in analogical proportion if “a is to b as c is to d”. A logical modeling of it [14, 16] has pointed out that it expresses that “a differs from b as c differs from d (and vice-versa)”, and that“what a and c have in common, b and d have it also”. Thus, analogical proportion is a matter of both dissimilarity and similarity. The fact that we are no longer considering similarity only, enables us to escape the strict vicinity of known cases, and to perform a form of adaptation for free. More precisely, in a CBR perspective, we use it as an extrapolation principle: if the target problem and three source problems are in analogical proportion, the solutions of these four problems are (likely to be) in analogical proportion as well. Such an analogical jump enables us to extrapolate the solution of the target problem from the solutions of three distinct source problems. An illustration of this is given in [2] where in three distinct situations the recommended actions are respectively to (i) serve tea without milk without sugar, (ii) serve tea with milk without sugar, (iii) serve tea without milk with sugar, while in a fourth situation that makes an analogical proportion with the three others, the action to do would be (iv) “serve tea with milk with sugar”.

The approach described in this paper combines the use of closeness, betweenness and analogical proportion for a knowledge-light approach to CBR. In fact, the only source of knowledge used in the inference lies in the case base: there is no domain knowledge nor adaptation knowledge and the similarity is based on some distance function.

Section 2 introduces the notions and notations used throughout the paper and the assumptions it is based on. Section 3 describes the approach for applying approximation, interpolation and extrapolation to CBR. Section 4 provides an evaluation in a Boolean setting. Section 5 presents a discussion and a comparison with related work, while Sect. 6 points out lines for future research.

2 Definitions, Notations and Assumptions

In this section, definitions are presented in a Boolean setting (objects are tuples of Boolean values).

2.1 Boolean Setting

Let \(\mathbb {B}=\{0, 1\}\) be the set of Boolean values. The Boolean operators are denoted by the connector symbols of propositional logic: for \(a, b\in \mathbb {B}\), \(\lnot {}a=1-a\), \(a\wedge {}b=\min (a, b)\), \(a\vee {}b=\max (a, b)\), \(a\mathrel {\oplus }{}b=|b-a|\) (\(\mathrel {\oplus }\) is the exclusive or) and \(a\mathrel {\equiv }{}b=\lnot (a\mathrel {\oplus }{}b)\).

Let p be a positive integer. In the examples, an element of \(\mathbb {B}^p\) is noted without parentheses and commas: (0, 1, 0, 0, 1) is simply noted by 01001. The Hamming distance \(H\) on \(\mathbb {B}^p\) is defined by \(H(a, b)=\sum _{i=1}^p|b_i-a_i|\). For example, with \(p=5\), \(H(01001, 11011)=2\).

2.2 CBR

CBR aims at solving the target problem with the help of the case base. It consists most of the time (1) in selecting relevant source cases (retrieval), (2) in reusing these source cases in order to solve the target problem (adaptation). The other classical steps of CBR are not considered in this paper: they concern the validation-repair of the newly formed case and its potential adding to the case base.

Let \({\mathcal {P}}\) and \({\mathcal {S}}\) be two sets called the universe of problems and the universe of solutions: a problem x (resp., a solution y) is by definition an element of \({\mathcal {P}}\) (resp., of \({\mathcal {S}}\)). Here, \({\mathcal {P}}=\mathbb {B}^m\), where \(m\ge 1\) is a constant. Similarly, \({\mathcal {S}}=\mathbb {B}^n\), \(n\ge 1\). A binary relation on \({\mathcal {P}}\times {\mathcal {S}}\) denoted by \(\mathrel {\leadsto }\) and read “has for solution” is assumed to exist. Thus, “\(\mathtt{{y}}\) solves \(\mathtt{{x}}\)” is denoted by \(\mathtt{{x}}\mathrel {\leadsto }\mathtt{{y}}\). A case is a pair \((\mathtt{{x}}, \mathtt{{y}})\in {\mathcal {P}}\times {\mathcal {S}}\) such that \(\mathtt{{x}}\mathrel {\leadsto }\mathtt{{y}}\). The case base \(\mathtt{{CB}}\) is a finite set of cases. A source case is an element of \(\mathtt{{CB}}\). The target problem is the problem to be solved. Note that \(\mathrel {\leadsto }\) is usually not completely known to the CBR system: such a system provides plausible solutions to problems, on the basis of what is known of \(\mathrel {\leadsto }\) from the case base.

In some situations, it is assumed that \(\mathrel {\leadsto }\) is functional. This assumption means that there exists a function \(\mathtt{{f}}: {\mathcal {P}}\mathrel {\rightarrow }{\mathcal {S}}\) such that \(\mathtt{{x}}\mathrel {\leadsto }\mathtt{{y}}\) iff \(\mathtt{{y}}=\mathtt{{f}}(\mathtt{{x}})\), for any \(\mathtt{{x}}\in {\mathcal {P}}\) and \(\mathtt{{y}}\in {\mathcal {S}}\).

2.3 Betweenness

Let \(\mathcal {U}\) be a set whose elements are represented by numerical features (including Boolean features). Let \(a, b, c\in \mathcal {U}\); a is between b and c, denoted by \(b{{\pmb {-}}}a{{\pmb {-}}}c\), if for every feature i, (\((b_i\le {}a_i\le {}c_i)\) or \((c_i\le {}a_i\le {}b_i)\)).Footnote 1 Let \(\mathtt{{Between}}(b, c)=\{a\in \mathcal {U}~|~b{{\pmb {-}}}a{{\pmb {-}}}c\}\), for \(b, c\in \mathcal {U}\). For example, in \(\mathcal {U}=\mathbb {B}^5\), \(\mathtt{{Between}}(01001, 11011) = \{01001, 11001, 01011, 11011\}\). For a logical view on betweenness, we refer to [22].

2.4 Analogical Proportion

Let \(\mathcal {U}\) be a set. An analogical proportion on \(\mathcal {U}\) is a quaternary relation between four elements a, b, c and d of \(\mathcal {U}\), read “a is to b as c is to d” and denoted by \(a\mathtt{{:}}b\mathtt{{{:}{:}}}c\mathtt{{:}}d\), having the following properties (see, e.g. [16]), for any \(a, b, c, d\in \mathcal {U}\):

 

reflexivity:

\(a\mathtt{{:}}b\mathtt{{{:}{:}}}a\mathtt{{:}}b\),

symmetry:

if \(a\mathtt{{:}}b\mathtt{{{:}{:}}}c\mathtt{{:}}d\) then \(c\mathtt{{:}}d\mathtt{{{:}{:}}}a\mathtt{{:}}b\),

central permutation:

if \(a\mathtt{{:}}b\mathtt{{{:}{:}}}c\mathtt{{:}}d\) then \(a\mathtt{{:}}c\mathtt{{{:}{:}}}b\mathtt{{:}}d\).

 

In the Boolean setting, the analogical proportion considered is defined on \(\mathbb {B}\) by

$$\begin{aligned} a\mathtt{{:}}b\mathtt{{{:}{:}}}c\mathtt{{:}}d \quad \text {if } (a\wedge \lnot {}b\mathrel {\equiv }{}c\wedge \lnot {}d) \wedge (\lnot {}a\wedge {}b\mathrel {\equiv }\lnot {c}\wedge {}d) =1 \end{aligned}$$

that can be read “a differs from b as c differs from d” and “b differs from a as d differs from c”. This can also be rewritten \(b-a=d-c\), but mind that these differences belong to \(\{{-}1, 0, 1\}\) (− is not an operation that is closed in \(\mathbb {B}\)). Thus, the patterns abcd such that \(a\mathtt{{:}}b\mathtt{{{:}{:}}}c\mathtt{{:}}d\) are 0000, 1111, 0011, 1100, 0101 and 1010.

This analogical proportion can be extended on \(\mathcal {U}=\mathbb {B}^p\) by

$$\begin{aligned} a\mathtt{{:}}b\mathtt{{{:}{:}}}c\mathtt{{:}}d \quad \text {if } a_i\mathtt{{:}}b_i\mathtt{{{:}{:}}}c_i\mathtt{{:}}d_i \text { for each } i\in [1, p] \end{aligned}$$

Given \(a, b, c\in \mathcal {U}\), solving the analogical equation \(a\mathtt{{:}}b\mathtt{{{:}{:}}}c\mathtt{{:}}y\) aims at finding the \(y\in \mathcal {U}\) satisfying this relation. It may have no solution, e.g., when \(a=0\), \(b=1\) and \(c=1\). The equation \(a\mathtt{{:}}b\mathtt{{{:}{:}}}c\mathtt{{:}}y\) in \(\mathbb {B}\) has a solution iff \((a\mathrel {\equiv }{}b)\vee (a\mathrel {\equiv }{}c)=1\) and, when this is the case, the solution is unique: \(y=c\mathrel {\equiv }(a\mathrel {\equiv }{}b)\).

3 Reusing Cases by Approximation, Interpolation and Extrapolation

This section describes the three mentioned approaches: approximation, interpolation and extrapolation. For an integer \(k\ge 1\), case retrieval can be done by considering ordered sets of k source cases. This principle is detailed in Sect. 3.2 and applied in Sects. 3.3, 3.4 and 3.5 respectively for \(k=1\), 2, and 3. The combination of these three approaches is discussed in Sect. 3.6. Let us start with an example.

3.1 A Basic Example

In order to support the intuition, we consider the following example where a suitable dish type (described via 3 two-valued attributes, i.e., \({\mathcal {S}}=\mathbb {B}^3\)) has to be suggested to an individual (described via 8 two-valued attributes, i.e., \({\mathcal {P}}=\mathbb {B}^8\)). The 8 attributes representing an individual \(\mathtt{{x}}\) have the following semantics: \(\mathtt{{x}}_1\): \(\mathtt{{x}}\) suffers from gout, \(\mathtt{{x}}_2\): \(\mathtt{{x}}\) has diabetes, \(\mathtt{{x}}_3\): \(\mathtt{{x}}\) is allergic to nuts, \(\mathtt{{x}}_4\): \(\mathtt{{x}}\) does not eat mammal meat (beef, pork, etc.), \(\mathtt{{x}}_5\): \(\mathtt{{x}}\) needs to have a regular calcium supplement, \(\mathtt{{x}}_6\): \(\mathtt{{x}}\) needs to have a regular iron supplement, \(\mathtt{{x}}_7\): \(\mathtt{{x}}\) likes vegetables, and \(\mathtt{{x}}_8\): \(\mathtt{{x}}\) does not like dairy products. A dish type \(\mathtt{{y}}\) is represented via 3 attributes: \(\mathtt{{y}}_1\): \(\mathtt{{y}}\) is a dish with sauce, \(\mathtt{{y}}_2\): \(\mathtt{{y}}\) is based on starchy food (e.g., a pasta dish), \(\mathtt{{y}}_3\): \(\mathtt{{y}}\) is a dish with fish. \(\mathtt{{x}}\mathrel {\leadsto }\mathtt{{y}}\) can be read as: \(\mathtt{{y}}\) is a suitable dish type for \(\mathtt{{x}}\). For a healthy individual 00010010 (with no specific requirement), all dishes are suitable. Therefore, \(\mathrel {\leadsto }\) is not functional in this application, as several types of dishes might be suitable for the same individual. The 3 approaches of case reuse developed in this paper can be applied to this application as follows (where \(\mathtt{{y}}^{j}\) is a suitable class of dish for \(\mathtt{{x}}^{j}\)):

  • Approximation: If an individual \(\mathtt{{x}}^{\mathtt{{tgt}}}\) is not far from \(\mathtt{{x}}^{1}\), it is plausible that a suitable dish for \(\mathtt{{x}}^{\mathtt{{tgt}}}\) will not be far from a suitable dish \(\mathtt{{y}}^{1}\) for \(\mathtt{{x}}^{1}\).

  • Interpolation: If an individual \(\mathtt{{x}}^{\mathtt{{tgt}}}\) is between \(\mathtt{{x}}^{1}\) and \(\mathtt{{x}}^{2}\); it is plausible that a suitable dish for \(\mathtt{{x}}^{\mathtt{{tgt}}}\) will be between a suitable dish \(\mathtt{{y}}^{1}\)for \(\mathtt{{x}}^{1}\) and a suitable dish \(\mathtt{{y}}^{2}\) for \(\mathtt{{x}}^{2}\).

  • Extrapolation: If an individual \(\mathtt{{x}}^{\mathtt{{tgt}}}\) is as similar to \(\mathtt{{x}}^{3}\) as \(\mathtt{{x}}^{2}\) is similar to \(\mathtt{{x}}^{1}\), it is plausible that a suitable dish for \(\mathtt{{x}}^{\mathtt{{tgt}}}\) will be as similar to \(\mathtt{{y}}^{3}\) as \(\mathtt{{y}}^{2}\) is similar to \(\mathtt{{y}}^{1}\).

3.2 General Principle

Let \(k\in \{1, 2, 3\}\): the approach presented here covers approximation (\(k=1\)), interpolation (\(k=2\)) and extrapolation (\(k=3\)). Two \((k+1)\)-ary relations are considered: \(\mathtt{{Rel}}_{\mathcal {P}}\) on \({\mathcal {P}}\) and \(\mathtt{{Rel}}_{\mathcal {S}}\) on \({\mathcal {S}}\). Ideally, it would be assumed that these relations have the following properties, for \((\mathtt{{x}}^{1}, \ldots , \mathtt{{x}}^{k}, \mathtt{{x}}^{k+1})\in {\mathcal {P}}^{k+1}\) and \((\mathtt{{y}}^{1}, \ldots , \mathtt{{y}}^{k}, \mathtt{{y}}^{k+1})\in {\mathcal {S}}^{k+1}\), with the hypothesis that \(\forall j\in [1 ; k+1], \mathtt{{x}}^{j}\mathrel {\leadsto }\mathtt{{y}}^{j}\):

$$\begin{aligned}&\text {if } \mathtt{{Rel}}_{\mathcal {P}}\left( \mathtt{{x}}^{1}, \ldots , \mathtt{{x}}^{k}, \mathtt{{x}}^{k+1}\right) \text { then } \mathtt{{Rel}}_{\mathcal {S}}\left( \mathtt{{y}}^{1}, \ldots , \mathtt{{y}}^{k}, \mathtt{{y}}^{k+1}\right) \end{aligned}$$

However, this assumption is usually too strong: since the relation \(\mathrel {\leadsto }\) is only partially known to the CBR system, it seems odd to have such a certain relationship about \(\mathrel {\leadsto }\) given by the pair \((\mathtt{{Rel}}_{\mathcal {P}}, \mathtt{{Rel}}_{\mathcal {S}})\). So, only the following relaxed form of this property is assumed:

$$\begin{aligned}&\text {if } \mathtt{{Rel}}_{\mathcal {P}}\left( \mathtt{{x}}^{1}, \ldots , \mathtt{{x}}^{k}, \mathtt{{x}}^{k+1}\right) \text {and } \mathtt{{x}}^{j}\mathrel {\leadsto }\mathtt{{y}}^{j} \text { for } j\in [1 ; k+1] \nonumber \\&\text {then } \text {it is plausible that } \mathtt{{Rel}}_{\mathcal {S}}\left( \mathtt{{y}}^{1}, \ldots , \mathtt{{y}}^{k}, \mathtt{{y}}^{k+1}\right) \end{aligned}$$
(1)

This property (1) can be used for CBR. Let \(\mathtt{{x}}^{\mathtt{{tgt}}}\) be the target problem. A candidate is an ordered set of k cases \(((\mathtt{{x}}^{1}, \mathtt{{y}}^{1}), \ldots , (\mathtt{{x}}^{k}, \mathtt{{y}}^{k}))\in \mathtt{{CB}}^k\) such that \(\mathtt{{Rel}}_{\mathcal {P}}(\mathtt{{x}}^{1}, \ldots , \mathtt{{x}}^{k}, \mathtt{{x}}^{\mathtt{{tgt}}})\). Based on this notion, the following CBR steps can be specified:

 

retrieval::

The set of candidates is computed.

adaptation::

For a candidate \(((\mathtt{{x}}^{1}, \mathtt{{y}}^{1}), \ldots , (\mathtt{{x}}^{k}, \mathtt{{y}}^{k}))\), it is plausible that the solution \(\mathtt{{y}}^{\mathtt{{tgt}}}\) of \(\mathtt{{x}}^{\mathtt{{tgt}}}\) satisfies \(\mathtt{{Rel}}_{\mathcal {S}}(\mathtt{{y}}^{1}, \ldots , \mathtt{{y}}^{k}, \mathtt{{y}}^{\mathtt{{tgt}}})\).

 

Let \(\mathtt{{Candidates}}\) be the set of candidates. When \(\mathtt{{Candidates}}=\emptyset \), the approach fails. Let \(\mathtt{{potentialSols}}\) be the multiset of \(\mathtt{{y}}\in {\mathcal {S}}\) such that \((\mathtt{{x}}, \mathtt{{y}})\in \mathtt{{Candidates}}\). When there are several distinct solutions in this multiset, there is a need for a function to integrate these solutions into one. Let \(\mathtt{{integrate}}: \mathtt{{potentialSols}}\mapsto \mathtt{{y}}\in {\mathcal {S}}\) be such a function. The nature of this function can be different, depending on the solution space. When \({\mathcal {S}}=\mathbb {B}\) (or, more generally, a set of low cardinality) then \(\mathtt{{integrate}}\) can be a simple vote. When \({\mathcal {S}}=\mathbb {B}^n\), the integration could be a component by component vote, for example:

$$\begin{aligned} \mathtt{{integrate}}(\{\{001, 001, 010, 111, 111\}\}) = 011 \end{aligned}$$

(with an arbitrary choice in case of ties).

3.3 Reusing Cases by Singletons: Approximation

When \(k=1\), candidates are singletons and source cases are considered individually in relation to the target problem. Usually \(\mathtt{{Rel}}_{\mathcal {P}}\) and \(\mathtt{{Rel}}_{\mathcal {S}}\) binary relations are related to the notion of similarity and denoted \(\mathtt{{x}}^{1}\simeq \mathtt{{x}}^{2}\) and \(\mathtt{{y}}^{1}\simeq \mathtt{{y}}^{2}\). When a distance \(\mathtt{{dist}}\) is available on the universe, \(\simeq \) is defined as \(a\simeq {}b\) iff \(\mathtt{{dist}}(a, b)\le \tau _{\mathtt{{dist}}}\) where \(\tau _{\mathtt{{dist}}}>0\) is a fixed threshold. Applying the general principle (1) leads to:

$$\begin{aligned} \begin{aligned}&\text {if } \mathtt{{x}}^{1}\simeq \mathtt{{x}}^{2} \text { and } \mathtt{{x}}^{j}\mathrel {\leadsto }\mathtt{{y}}^{j} \text { for } j\in \{1, 2\} \\&\text {then } \text {it is plausible that } \mathtt{{y}}^{1}\simeq \mathtt{{y}}^{2} \end{aligned} \end{aligned}$$

i.e., similar problems have similar solutions, a principle often emphasized in CBR (see e.g., [9]).

Fig. 1.
figure 1

The approximation method.

Back to our initial example, let us consider 2 individuals with close profiles (in terms of Hamming distance), e.g., \(\mathtt{{x}}^{1}= 00010010\) and \(\mathtt{{x}}^{2}= 00010011\). In that example, \(H(\mathtt{{x}}^{1}, \mathtt{{x}}^{2})=1\) (\(\mathtt{{x}}^{2}\) does not like dairy product), and a dish type \(\mathtt{{y}}^{2}=011\) without sauce, at distance 1 of \(\mathtt{{y}}^{1}=111\) (a dish type suitable for \(\mathtt{{x}}^{1}\)) will be suitable for \(\mathtt{{x}}^{2}\).

Figure 1 summarizes the approximation method with an algorithm.

3.4 Reusing Cases by Pairs: Interpolation

When \(k=2\), candidates are pairs, relations \(\mathtt{{Rel}}_{\mathcal {P}}\) and \(\mathtt{{Rel}}_{\mathcal {S}}\) are betweenness on \({\mathcal {P}}\) and on \({\mathcal {S}}\). The general principle (1) applied here leads to:

$$\begin{aligned} \begin{aligned}&\text {if } \mathtt{{x}}^{1}{{\pmb {-}}}\mathtt{{x}}^{3}{{\pmb {-}}}\mathtt{{x}}^{2} \text { and } \mathtt{{x}}^{j}\mathrel {\leadsto }\mathtt{{y}}^{j} \text { for } j\in \{1, 2, 3\} \\&\text {then } \text {it is plausible that } \mathtt{{y}}^{1}{{\pmb {-}}}\mathtt{{y}}^{3}{{\pmb {-}}}\mathtt{{y}}^{2} \end{aligned} \end{aligned}$$

Applied to CBR, a retrieved pair \(\{(\mathtt{{x}}^{1}, \mathtt{{y}}^{1}), (\mathtt{{x}}^{2}, \mathtt{{y}}^{2})\}\in \mathtt{{Candidates}}\) is such that \(\mathtt{{x}}^{1}{{\pmb {-}}}\mathtt{{x}}^{\mathtt{{tgt}}}{{\pmb {-}}}\mathtt{{x}}^{2}\) and then adaptation takes advantage of the inferred information \(\mathtt{{y}}^{1}{{\pmb {-}}}\mathtt{{y}}^{\mathtt{{tgt}}}{{\pmb {-}}}\mathtt{{y}}^{2}\). A way to get a unique solution is to have \(\mathtt{{y}}^{1}=\mathtt{{y}}^{2}\), which entails \(\mathtt{{y}}^{\mathtt{{tgt}}}=\mathtt{{y}}^{1}\) as a candidate solution for \(\mathtt{{x}}^{\mathtt{{tgt}}}\). If \(\mathtt{{Candidates}}=\emptyset \), the equality \(\mathtt{{y}}^{1}=\mathtt{{y}}^{2}\) can be relaxed in \(\mathtt{{dist}}(\mathtt{{y}}^{1}, \mathtt{{y}}^{2})\le \tau _{\mathtt{{dist}}}\) (where \(\mathtt{{dist}}\) is here a distance function on \({\mathcal {S}}\)). In that case, uniqueness of \(\mathtt{{y}}^{\mathtt{{tgt}}}\) is not guaranteed anymore. By contrast, if \(\mathtt{{Candidates}}\) is considered to be too large, it can be restricted by allowing only the pairs \(\{(\mathtt{{x}}^{1}, \mathtt{{y}}^{1}), (\mathtt{{x}}^{2}, \mathtt{{y}}^{2})\}\) such that \(\mathtt{{dist}}(\mathtt{{x}}^{1}, \mathtt{{x}}^{2})\le \tau _{\text {between}}\), where \(\tau _{\text {between}}>0\) is a given threshold.

Fig. 2.
figure 2

The interpolation method (for the situation in which the solutions of the retrieved cases are equal).

Back to our initial example, let \(\mathtt{{x}}^{\mathtt{{tgt}}}=11010010\), \((\mathtt{{x}}^{1}, \mathtt{{y}}^{1})=(01110010, 001)\), and \((\mathtt{{x}}^{2}, \mathtt{{y}}^{2})=(10010010, 001)\) (i.e. \(H(\mathtt{{x}}^{1},\mathtt{{x}}^{2})=3\)): \(\mathtt{{x}}^{\mathtt{{tgt}}}\), \(\mathtt{{x}}^{1}\) and \(\mathtt{{x}}^{2}\) differ only in the fact that the chosen individuals have/do not have gout/diabetes/allergy to nuts. \(\mathtt{{y}}^{1}=\mathtt{{y}}^{2}=001\) is the solution “dish without sauce, not based on starchy food, with fish”. Since \(\mathtt{{x}}^{1}{{\pmb {-}}}\mathtt{{x}}^{\mathtt{{tgt}}}{{\pmb {-}}}\mathtt{{x}}^{2}\), a solution is \(\mathtt{{y}}^{\mathtt{{tgt}}}\in \mathtt{{Between}}(\mathtt{{y}}^{1}, \mathtt{{y}}^{2})=\{\mathtt{{y}}^{1}\}\), i.e., \(\mathtt{{y}}^{\mathtt{{tgt}}}=\mathtt{{y}}^{1}=\mathtt{{y}}^{2}\).

Fig. 3.
figure 3

The extrapolation method (in 2 versions: a simple one (a) and a more efficient one (b)).

Figure 2 summarizes the interpolation method in an algorithm.

3.5 Reusing Cases by Triples: Extrapolation

When \(k=3\), candidates are triples. \(\mathtt{{Rel}}_{\mathcal {P}}\) and \(\mathtt{{Rel}}_{\mathcal {S}}\) are analogical proportions on \({\mathcal {P}}\) and on \({\mathcal {S}}\). The property (1) applied here leads to:

$$\begin{aligned} \begin{aligned}&\text {if } \mathtt{{x}}^{1}\mathtt{{:}}\mathtt{{x}}^{2}\mathtt{{{:}{:}}}\mathtt{{x}}^{3}\mathtt{{:}}\mathtt{{x}}^{4} \text { and } \mathtt{{x}}^{j}\mathrel {\leadsto }\mathtt{{y}}^{j} \text { for } j\in \{1, 2, 3, 4\} \\&\text {then } \text {it is plausible that } \mathtt{{y}}^{1}\mathtt{{:}}\mathtt{{y}}^{2}\mathtt{{{:}{:}}}\mathtt{{y}}^{3}\mathtt{{:}}\mathtt{{y}}^{4} \end{aligned} \end{aligned}$$

Applied to CBR, an element of \(\mathtt{{Candidates}}\) is a triple \(((\mathtt{{x}}^{1}, \mathtt{{y}}^{1}), (\mathtt{{x}}^{2}, \mathtt{{y}}^{2}), (\mathtt{{x}}^{3}, \mathtt{{y}}^{3}))\) of source cases such that (i) \(\mathtt{{x}}^{1}\mathtt{{:}}\mathtt{{x}}^{2}\mathtt{{{:}{:}}}\mathtt{{x}}^{3}\mathtt{{:}}\mathtt{{x}}^{\mathtt{{tgt}}}\) and (ii) the equation \(\mathtt{{y}}^{1}\mathtt{{:}}\mathtt{{y}}^{2}\mathtt{{{:}{:}}}\mathtt{{y}}^{3}\mathtt{{:}}\mathtt{{y}}\) has a solution. Such a solution, unique in the Boolean setting, is considered as a plausible solution of \(\mathtt{{x}}^{\mathtt{{tgt}}}\).

Back to our example where we are searching for a suitable dish type \(\mathtt{{y}}^{\mathtt{{tgt}}}\) for an individual \(\mathtt{{x}}^{\mathtt{{tgt}}}\):

figure a

The relation \(\mathtt{{x}}^{1}\mathtt{{:}}\mathtt{{x}}^{2}\mathtt{{{:}{:}}}\mathtt{{x}}^{3}\mathtt{{:}}\mathtt{{x}}^{\mathtt{{tgt}}}\) holds, and the equation \(\mathtt{{y}}^{1}\mathtt{{:}}\mathtt{{y}}^{2}\mathtt{{{:}{:}}}\mathtt{{y}}^{3}\mathtt{{:}}\mathtt{{y}}\) has a unique solution \(\mathtt{{y}}^{\mathtt{{tgt}}}=101\). Our principle tells us that \(\mathtt{{y}}^{\mathtt{{tgt}}}=101\) is a suitable option for \(\mathtt{{x}}^{\mathtt{{tgt}}}\). If we consider the intended meaning of the parameters, \(\mathtt{{x}}^{\mathtt{{tgt}}}\) has gout, no diabetes, is allergic to nut, is not refractory to meat, has no need for calcium nor iron supplement, likes vegs and dairy. The dish type \(\mathtt{{y}}^{\mathtt{{tgt}}}=101\) describing fish with sauce and no starchy food is suitable for this type of individual.

Figure 3(a) summarizes the extrapolation method in an algorithm and Fig. 3(b) presents a more efficient method described further.

3.6 Combining These Approaches

Now, we have three methods, approximation, interpolation and extrapolation, to solve a problem. These methods are plausible and incomplete: for a target problem \(\mathtt{{x}}^{\mathtt{{tgt}}}\), each of them may fail either by providing an incorrect solution or by not providing any solution at all. A complete discussion of the options for combining these methods is out of the scope of this paper and constitutes a future work. However, this section discusses some ideas about the design of a good combination method.

Fig. 4.
figure 4

The combination method based on a preference relation on the set of methods \(\{\mathtt{{approximation}}, \mathtt{{interpolation}}, \mathtt{{extrapolation}}\}\).

A simple way to combine these methods is to use a preference relation between them: if the preferred method provides a solution, this is the solution returned, else the second preferred method is tried, and so on (this simple combination method is summarized in Fig. 4). This makes sense since a method may provide results unfrequently but with high plausibility and should be tried before a method providing frequent results with lower plausibility.

Now, given the three methods, how can such a preference relation be chosen? For this purpose, an analysis of the case base may help. In particular, for the extrapolation method, it has been shown that the functions \(\mathtt{{f}}\) such that \(\mathtt{{x}}^{1}\mathtt{{:}}\mathtt{{x}}^{2}\mathtt{{{:}{:}}}\mathtt{{x}}^{3}\mathtt{{:}}\mathtt{{x}}^{4}\) entails \(\mathtt{{f}}(\mathtt{{x}}^{1})\mathtt{{:}}\mathtt{{f}}(\mathtt{{x}}^{2})\mathtt{{{:}{:}}}\mathtt{{f}}(\mathtt{{x}}^{3})\mathtt{{:}}\mathtt{{f}}(\mathtt{{x}}^{4})\) for each \((\mathtt{{x}}^{1}, \mathtt{{x}}^{2}, \mathtt{{x}}^{3}, \mathtt{{x}}^{4})\in {\mathcal {P}}^4\) are the affine functions [6].Footnote 2 Thus, if \(\mathrel {\leadsto }\) is functional (\({\mathrel {\leadsto }}=\mathtt{{f}}\)) and \(\mathtt{{f}}\) is affine, then the extrapolation method never gives an incorrect solution. If \(\mathtt{{f}}\) is closed to an affine function, then this method gives good results (and should be highly ranked in the preference relation). A measure of closeness to affinity is the BLR test [3] that can be run on the case base.

Another approach to method combination consists in having a preference between methods depending on the target problem. For this purpose, an idea is to associate to the output of each method a score that is relative to the confidence of the method wrt its output: the preferred method is the one with the higher score. For example, the approximation method would have a high confidence if many source cases support the returned solution and few ones go against it. A similar principle can be applied for interpolation and extrapolation.

4 Evaluation

The objective of the evaluation is to study the behaviour of the three approaches, on various types of Boolean functions, in order to determinate which approach is the best and in which circumstancies. First experimental results are presented and a combination method based on the preference relation is proposed according to these results.

4.1 Experiment Setting

In the experiment, \({\mathcal {P}}=\mathbb {B}^8\) and \({\mathcal {S}}=\mathbb {B}^3\), as in the running example. \(\mathrel {\leadsto }\) is assumed to be functional: \({\mathrel {\leadsto }}=\mathtt{{f}}\).

The function \(\mathtt{{f}}\) is randomly generated using the following generators that are based on the three main normal forms, with the purpose of having various types of functions:

  • \(\mathtt{{CNF}}\) \(\mathtt{{f}}\) is generated in a conjunctive normal form, i.e., \(\mathtt{{f}}(\mathtt{{x}})\) is a conjunction of \(n_{\text {conj}}\) disjunctions of literals, for example \(\mathtt{{f}}(\mathtt{{x}})=(\mathtt{{x}}_1\vee \lnot \mathtt{{x}}_7)\wedge (\lnot \mathtt{{x}}_3\vee \mathtt{{x}}_7\vee \mathtt{{x}}_8)\wedge \mathtt{{x}}_4\). The value of \(n_{\text {conj}}\) is randomly chosen uniformly in \(\{3, 4, 5\}\). Each disjunction is generated on the basis of two parameters, \(p^{+}>0\) and \(p^{-}>0\), with \(p^{+}+p^{-}<1\): each variable \(\mathtt{{x}}_i\) occurs in the disjunct in a positive (resp. negative) literal with a probability \(p^{+}\) (resp., \(p^{-}\)). In the experiment, the values \(p^{+}=p^{-}=0.1\) were chosen.

  • \(\mathtt{{DNF}}\) \(\mathtt{{f}}\) is generated in a disjunctive normal form, i.e., it has the same form as for \(\mathtt{{CNF}}\) except that the connectors \(\wedge \) and \(\vee \) are exchanged. The parameters \(n_{\text {disj}}\), \(p^{+}\) and \(p^{-}\) are set in the same way.

  • \(\mathtt{{Pol}}\) is the same as \(\mathtt{{DNF}}\), except that the disjunctions (\(\vee \)) are replaced with exclusive or’s (\(\mathrel {\oplus }\)), thus giving a polynomial normal form. The only different parameter is \(p^{-}=0\) (only positive literals occur in the polynomial normal form).

The case base \(\mathtt{{CB}}\) is generated randomly, with the values for its size: \(|\mathtt{{CB}}|\in \{16, 32, 64, 96, 128\}\), i.e. \(|\mathtt{{CB}}|\) is between \(\frac{1}{16}\) and \(\frac{1}{2}\) of \(|{\mathcal {P}}|=2^8=256\). Each source case \((\mathtt{{x}}, \mathtt{{y}})\) is generated as follows: \(\mathtt{{x}}\) is randomly chosen in \({\mathcal {P}}\) with a uniform distribution and \(\mathtt{{y}}=\mathtt{{f}}(\mathtt{{x}})\).

Each method may lead to several solutions. Let \(\mathtt{{Y}}\) be the multiset of these solutions. The function \(\mathtt{{integrate}}\) introduced at the end of Sect. 3.2 aims at associating to \(\mathtt{{Y}}\) a unique element \(\mathtt{{y}}\in {\mathcal {S}}\). Here, \(\mathtt{{integrate}}\) consists in making a vote on each component: \(\mathtt{{y}}_i=\mathop {\text {argmax}}_{\mathtt{{y}}_i\in \mathtt{{Y}}_i}\mathtt{{multiplicity}}(\mathtt{{y}}_i, \mathtt{{Y}}_i)\), where \(\mathtt{{Y}}_i\) is the multiset of the \(\mathtt{{y}}_i\) for \(\mathtt{{y}}\in \mathtt{{Y}}\).

Let \(\mathtt{{ntp}}\) be the number of target problems posed to the system, \(\mathtt{{na}}\) be the number of (correct or incorrect) answers (\(\mathtt{{ntp}}-\mathtt{{na}}\) is the number of target problems for which the system fails to propose a solution), and \(\mathtt{{sscapa}}\) be the sum of the similarities between the correct answer (according to the generated function \(\mathtt{{f}}\)) and the predicted answer, where the similarity between two solutions \(\mathtt{{y}}^{1}\) and \(\mathtt{{y}}^{2}\) is computed by \(1-H(\mathtt{{y}}^{1}, \mathtt{{y}}^{2})/n\) (with \(n=3\) since \({\mathcal {S}}=\mathbb {B}^n=\mathbb {B}^3\)). For each method, the following scores are computed:

 

The precision:

\(\mathtt{{prec}}\) is the average of the ratios \(\displaystyle \frac{\mathtt{{sscapa}}}{\mathtt{{na}}}\).

The correct answer rate:

\(\mathtt{{car}}\) is the average of the ratios \(\displaystyle \frac{\mathtt{{sscapa}}}{\mathtt{{ntp}}}\).

 

The average is computed on 1 million problem solving for each function generator, requiring the generation of 1060 \(\mathtt{{f}}\) for each of them. The average computing time of a CBR session (retrieval and adaptation for solving one problem) is about \(0.8\,\text {ms}\) (for \(k=1\)), \(19\,\text {ms}\) (for \(k=2\)) and \(2\,\text {ms}\) (for \(k=3\)) on an current standard laptob.

The parameters for each method has been chosen as follows, after preliminary tests based on precision:  

approximation::

\(\tau _{\mathtt{{dist}}}=1\) on \({\mathcal {P}}\);

interpolation::

\(\tau _{\mathtt{{dist}}}=0\) on \({\mathcal {S}}\) (i.e., \(\mathtt{{y}}^{1}=\mathtt{{y}}^{2}\)), \(\tau _{\text {between}}=2\);

extrapolation::

All the triples in analogy with \(\mathtt{{x}}^{\mathtt{{tgt}}}\) are considered.

combination::

The chosen preference method is: interpolation preferred to approximation preferred to extrapolation for \(\mathtt{{CNF}}\) and \(\mathtt{{DNF}}\), and extrapolation preferred to interpolation preferred to approximation for \(\mathtt{{Pol}}\).

 

For the sake of reproducibility, the code for this experiment is available at https://tinyurl.com/CBRTests, with the detailed results (generated functions and details of the evaluation).

A Note About Implementation. A naive algorithm for implementing each \(k\in \{1,2,3\}\) method is in \(O(|\mathtt{{CB}}|^k)\) (cf. Figs. 1, 2 and 3(a)). However, this complexity can be reduced for interpolation to search \(\mathtt{{Candidates}}\) by iterating only on pairs of cases \(((\mathtt{{x}}^{1}, \mathtt{{y}}^{1}), (\mathtt{{x}}^{2}, \mathtt{{y}}^{2}))\in \mathtt{{CB}}^2\) such that \(H(\mathtt{{x}}^{1},\mathtt{{x}}^{2})\le \tau _{\mathtt{{dist}}}\). For extrapolation, the complexity can be separated in an offline and on online part (cf. Fig. 3(b)). The offline part is in \(O(|\mathtt{{CB}}|^2)\) and generates a hashtable structure over keys representing the differences between \(\mathtt{{x}}^{1}\) and \(\mathtt{{x}}^{2}\) (considered as vectors in \(\mathbb {R}^m\)). For example, \(\mathtt{{key}}(01001, 11010)=({-1}, 0, 0, {-1}, 0)\). The online part is in \(O(|\mathtt{{CB}}|^2)\) in the worst case (and frequently closer to \(O(|\mathtt{{CB}}|)\))Footnote 3 and searches all the \((\mathtt{{x}}, \mathtt{{y}})\in \mathtt{{CB}}\) such that \(\mathtt{{key}}(\mathtt{{x}}, \mathtt{{x}}^{\mathtt{{tgt}}})=\mathtt{{key}}(\mathtt{{x}}^{1}, \mathtt{{x}}^{2})\), which is equivalent to \(\mathtt{{x}}^{1}\mathtt{{:}}\mathtt{{x}}^{2}\mathtt{{{:}{:}}}\mathtt{{x}}\mathtt{{:}}\mathtt{{x}}^{\mathtt{{tgt}}}\).

4.2 Results

Figure 5 pictures the detailed results given in Table 1.

Fig. 5.
figure 5

Precision and correct answer rate function of the case base size for the three generators, for each method (\(k=1\), \(k=2\), \(k=3\) and “combine”, i.e. combination of the three other methods).

Table 1. \(\mathtt{{prec}}\) and \(\mathtt{{car}}\) for \(k\in \{1, 2, 3\}\) and combine (c.) for the different generators.

The result precisions shows that, for \(\mathtt{{CNF}}\) and \(\mathtt{{DNF}}\), interpolation gives better results than approximation and extrapolation, and also that approximation gives better results than extrapolation. However, when examining the results wrt the correct answer rate, interpolation has a low performance, especially when the case base is small (\(|\mathtt{{CB}}|\in \{16, 32\}\)), due to the difficulty to find two candidates for the betweenness relations.

The results are different for \(\mathtt{{Pol}}\), for which, extrapolation provides better results. This result can be explained by the functions that have been generated which are close to affine functions, affine functions for which extrapolation always returns a correct answer [6]. For \(\mathtt{{DNF}}\), the second best approach wrt precision is approximation. One more time, the weakness of the two best methods (extrapolation and interpolation) is their low correct answer rates, and it must be noted that, in all situations, approximation provides a better \(\mathtt{{car}}\), especially when the case base is small.

Finally, combining the three approaches by preference ordering (interpolation, then approximation and then extrapolation for \(\mathtt{{DNF}}\) and \(\mathtt{{CNF}}\), and extrapolation, then approximation and then interpolation for \(\mathtt{{Pol}}\)) improves the results provided using only approximation, as expected.

The elaboration of a better combination method constitutes a future work. However, some elements relative to this issue are discussed here. A simple yet promising approach would be to estimate the average precision—and hence, the preference relation—on the basis of the case base, using a leave-one out approach: for each case \((\mathtt{{x}}, \mathtt{{y}})\in \mathtt{{CB}}\), the three methods are run with the target problem \(\mathtt{{x}}^{\mathtt{{tgt}}}=\mathtt{{x}}\) and the case base \(\mathtt{{CB}}\setminus \{(\mathtt{{x}}, \mathtt{{y}})\}\), and the results given by the three methods are compared with the known solution \(\mathtt{{y}}^{\mathtt{{tgt}}}=\mathtt{{y}}\), which enables us to compute the average precision for each method.

Another issue to be studied carefully is whether it is worth searching for a good combination method. Indeed, if all the methods are strongly correlated, their combination would not give much improvement. For this purpose, a preliminary test has been carried out that computes the covariance of the events “incorrect answer for method k” and “incorrect answer for method \(\ell \)”, with \(k, \ell \in \{1, 2, 3\}\) and \(k\ne \ell \): \(\mathtt{{cov}}_{k\ell }=P_{k\ell }-P_kP_{\ell }\) where \(P_{k\ell }\) estimates the probability that the methods k and \(\ell \) provide an incorrect answer, given that they both provide an answer, \(P_k\) estimates the probability that the method k provides incorrect answers, given that it provides an answer, and \(P_{\ell }\) is defined similarly. The results on the generated data are: \(\mathtt{{cov}}_{12}=\mathtt{{cov}}_{21}\simeq 0.90\), \(\mathtt{{cov}}_{23}=\mathtt{{cov}}_{32}\simeq 0.41\), and \(\mathtt{{cov}}_{13}=\mathtt{{cov}}_{31}\simeq 0.27\). This can be interpreted as follows. The high correlation between approximation and interpolation is mainly due to the tight bounds used in the experiments for interpolation. By contrast, extrapolation is much less correlated with the other methods. This suggests the complementarity of extrapolation, hence the worthiness of studying with more depth the combination issue.

5 Discussion and Related Work

An important issue raised in this paper is that the reuse of multiple cases (\(k\ge 2\)) does not necessarily rely on similarity (or not only): in the general situation, it relies on two \((k+1)\)-ary relations on \({\mathcal {P}}\) and \({\mathcal {S}}\), that may be non reducible to binary similarity relations. These relations structure the problem and solution spaces. Similar issues are addressed in other works of the CBR literature. For example, (simple or multiple) reuse may have profit of adaptation hierarchies on \({\mathcal {P}}\) and \({\mathcal {S}}\) [1, 5, 23]. Another example of relation used for multiple case retrieval and adaptation is the use of diversity: the retrieved cases should be diverse in order to better contribute to the solving of the target problem [13]. The main originality of this work is the use for CBR of betweenness and analogical proportion, two domain-independent relations that can be implemented in many formalisms (though it has been considered only in the Boolean setting here), e.g. on strings [12], and that enables to apply to CBR the principles of interpolation and extrapolation, in addition to the already frequently used principle of approximation.

The extrapolation approach based on analogical reasoning can be connected to the work of adaptation knowledge learning (AKL) in CBR, as explained hereafter. Most approaches to AKL consist in learning adaptation rules using as training set a set of source case pairs \(((\mathtt{{x}}^{i}, \mathtt{{y}}^{i}), (\mathtt{{x}}^{j}, \mathtt{{y}}^{j}))\) [7, 8, 10]. A similar idea consists in considering one of such pairs as an adaptation case [11, 15]. With \((\mathtt{{x}}^{1}, \mathtt{{y}}^{1})\) a retrieved case, it can be adapted to solve \(\mathtt{{x}}^{\mathtt{{tgt}}}\) if the difference from \(\mathtt{{x}}^{1}\) to \(\mathtt{{x}}^{\mathtt{{tgt}}}\) equals the difference from \(\mathtt{{x}}^{i}\) to \(\mathtt{{x}}^{j}\), which can be formalized using analogical proportion by \(\mathtt{{x}}^{i}\mathtt{{:}}\mathtt{{x}}^{j}\mathtt{{{:}{:}}}\mathtt{{x}}^{1}\mathtt{{:}}\mathtt{{x}}^{\mathtt{{tgt}}}\). Then, the adaptation consists in applying on \(\mathtt{{y}}^{1}\) the difference from \(\mathtt{{y}}^{i}\) to \(\mathtt{{y}}^{j}\), which amounts to solving the equation \(\mathtt{{y}}^{i}\mathtt{{:}}\mathtt{{y}}^{j}\mathtt{{{:}{:}}}\mathtt{{y}}^{1}\mathtt{{:}}\mathtt{{y}}\). Therefore, one contribution of this work is the formalization of case-based adaptation in terms of analogical proportions on the problem and solution spaces.

The idea of applying analogical inference based on analogical proportions in CBR has been first advocated and outlined recently [17], based on the fact that such an approach has given good quality results in classification [4] in machine learning on real datasets. Interestingly enough, analogical proportions can be always found in such datasets, and the results obtained can be favorably compared to those yielded by nearest neighbor methods. The ideas of interpolative and extrapolative reasoning can be found in the setting of conceptual spaces furnished with qualitative knowledge [20] with an illustration in [19]. Similar ideas have been applied to the completion of bases made of if-then rules in relation with the idea of analogical proportion [22] and to the interpolation between default rules [21]. However, none of these papers mentions a CBR perspective.

6 Conclusion

Classical CBR exploits each known case individually, on the basis of its similarity with the target problem. In this paper, we have proposed to extend this paradigm by taking advantage of betweenness and analogical proportion relations for linking the current situation to pairs and triples of known cases. By doing that, we are no longer just proposing the known solution of a case problem as an approximate solution to the current problem, but we are also taking advantage of interpolation and extrapolation ideas that are respectively embedded in betweenness and analogical proportion relations.

We have also provided a first experimental study that shows the precision and the correct answer rate of the approximation, interpolation, and extrapolation approaches on various types of functions. Each approach has its own merits. Approximation is simple and is often very efficient. Interpolation provides the most precise results, but may fail to give an answer. Extrapolation is superior to the two other methods when the underlying function is affine or close to be, i.e., exhibits a form of simplicity. Moreover, the results of extrapolation are not much correlated with the results of the two other methods. Experiments also show that combining the approaches may be beneficial. Clearly, one should investigate less straightforward ways for combining the approaches. More experiments would be also of interest for varying more the parameters and for dealing with non functional dependencies between problems and solutions. Experiments have been made on a variety of artificial datasets, showing that the present implementation of extrapolation is especially of interest for datasets close to obey to an affine Boolean function, studying if another form of extrapolation would do better, or if such datasets are often encountered in practice, is a matter of further work.

There are still two other perspectives worth of consideration. First, it would be of interest to see how incorporating domain/retrieval/adaptation knowledge in the process. Lastly, relations \(\mathtt{{Rel}}_{\mathcal {P}}\) and \(\mathtt{{Rel}}_{\mathcal {S}}\) have been defined here on \(\mathbb {B}^p\), but allowing for gradual relations and for nominal and numerical features in the description of problems and solutions would be an important improvement. Such a latter development should be feasible since extensions of analogical proportions have been satisfactorily experienced in classification for handling nominal and numerical features [4], and the idea of betweenness seems as well to be susceptible of natural extensions to these situations.