1 Introduction

Analogical proportions are statements of the form “a is to b as c is to d”. In the Nicomachean Ethics, Aristotle makes an explicit parallel between such statements and geometric proportions of the form “\(\frac{a}{b}= \frac{c}{d}\)”, where abcd are numbers. It also parallels arithmetic proportions, or difference proportions, which are of the form “\(a - b = c - d\)”. The logical modeling of an analogical proportion as a quaternary connective between four Boolean items appears to be a logical counterpart of such numerical proportions [15]. It has been extended to items described by vectors of Boolean, nominal or numerical values [2].

A particular case of such statements, named continuous analogical proportions, is obtained when the two central components are equal, namely they are statements of the form “a is to b as b is to c”. In case of numerical proportions, if we assume that b is unknown, it can be expressed in terms of a and c as \(b= \sqrt{a\cdot c}\) in the geometric case, and as \(\frac{a + c}{2}\) in the arithmetic case. Note that similar inequalities hold in both cases: \(\min (a, c)\le \sqrt{a\cdot c} \le \max (a, c)\) and \(\min (a, c)\le \frac{a + c}{2} \le \max (a, c)\). This means that the continuous analogical proportion induces a kind of interpolation between a and c in the numerical case by involving an intermediary value that can be obtained from a and c.

General analogical proportions when d is unknown provides an extrapolation mechanism, which with numbers yields \(d = \frac{b\cdot c}{a}\) and \(d = b + c - a\) in the geometric and arithmetic cases respectively. We recognize the expression of the well-known Rule of Three in the first expression. Analogical proportions-based inference [2] offers a similar extrapolation device relying on the parallel between (ab) and (cd) stated by “a is to b as c is to d”.

The analogical proportions-based extrapolation has been successfully applied to classification problems. It may be used either directly as a new classification paradigm [2, 12], or as a way of completing a training set on which classical classification methods are applied once this set has been completed [1, 4]. This paper investigates the effectiveness of the simpler option of using only continuous analogical proportions that involve pairs instead of triples of items, in order to enlarge a training set.

The paper is organized as follows. Section 2 provides a short background on analogical proportions and more particularly on continuous ones. Then Sect. 3 surveys related work on analogical interpolation or extrapolation. Section 4 presents different variants of algorithms for completing a training set based on the idea of continuous analogical proportions. Section 5 reports the results of the use of different classical classification techniques on the corresponding enlarged training sets for various benchmarks.

2 Background: Continuous Analogical Proportion

The statement “a is to b as c is to d”, here denoted \(a : b\,\,{:}{:}\,\,c : d\), expresses that “a differs from b as c differs from d, and b differs from a as d differs from c”. The logical counterpart of the latter statement, where a, b, c, d are Boolean variables, is given by:

$$\begin{aligned}a : b\,\,{:}{:}\,\,c : d =(\lnot a \wedge b \equiv \lnot c \wedge d) \wedge (\lnot b \wedge a \equiv \lnot d \wedge c)\end{aligned}$$

See [13, 16] for justifications. This expression is true for only 6 patterns of values for abcd, namely \(\{0000, 0011, 0101, 1111, 1100, 1010\}\). This extends to nominal values where \(a : b\,\,{:}{:}\,\,c : d\) holds true if and only if abcd is one of the following patterns ssss, stst, or sstt, where s and t are two possible distinct values of items a, b, c and d.

Regarding continuous analogical proportions, it can be easily checked that the unique solutions of equations \(1:x\,\,{:}{:}\,\,x:1\) and \(0:x\,\,{:}{:}\,\,x:0\) are respectively \(x=1\) and \(x=0\), while \(1:x\,\,{:}{:}\,\,x:0\) or \(0:x\,\,{:}{:}\,\,x:1\) have no solution in the Boolean case. This somewhat trivializes continuous analogical proportions in the Boolean case. The situation for nominal values is the same.

The case of numerical values is richer. a, b, c, d are now supposed to be normalized values in the real interval [0, 1]. The reader is referred to [6] for a general discussion of multiple-valued logic extensions of analogical proportions. They can be associated with the following expression:

$$\begin{aligned} a : b\,\,{:}{:}\,\,c : d = {\left\{ \begin{array}{ll} 1- \mid (a - b) - (c-d)\mid , \\ \quad \text{ if } a \ge b \text{ and } c \ge d, \text{ or } a \le b \text{ and } c \le d\\ 1- \max (\!\mid \!a - b\!\mid , \!\mid \!c - d\!\mid ), \\ \quad \text{ if } a \le b \text{ and } c \!\ge d, \text{ or } a \ge b \text{ and } c \le d \end{array}\right. } \end{aligned}$$
(1)

It coincides with \(a:b\,\,{:}{:}\,\,c:d\) on \(\{0, 1\}\). As can be seen, \(a : b\,\,{:}{:}\,\,c : d\) is equal to 1 if and only if \((a - b) = (c-d)\). For instance, \(0.2 : 0.5\,\,{:}{:}\,\,0.6 : 0.9\), or \(0.2 : 0.5 \,\,{:}{:}\,\,0.2 : 0.5\) holds true. Because \(|a-b|= |(1-a) - (1-b)|\), it is easy to check that the code independence property: \(a : b\,\,{:}{:}\,\,c : d = (1-a) : (1-b)\,\,{:}{:}\,\,(1-c) : (1-d)\) holds (0 and 1 play symmetric roles, and it is the same to encode an attribute positively or negatively).

Then the corresponding expression for continuous analogical proportions is [16]:

$$\begin{aligned} a : b\,\,{:}{:}\,\,b : c = {\left\{ \begin{array}{ll} 1- \mid a + c- 2b\mid , \\ \quad \text{ if } a \ge b \text{ and } b \ge c, \text{ or } a \le b \text{ and } b \le c\\ 1- \max (\!\mid \!a - b\!\mid , \!\mid \!b - c\!\mid ), \\ \quad \text{ if } a \le b \text{ and } b \!\ge c, \text{ or } a \ge b \text{ and } b \le c \end{array}\right. } \end{aligned}$$
(2)

As can be seen \(a : b\,\,{:}{:}\,\,b : c =1\) if and only if \(b=(a + c)/2\) (which includes the case \(a=b=c\)). The proportions \(0:\frac{1}{2}\,\,{:}{:}\,\,\frac{1}{2}:1\) or \(0.3 : 0.6\,\,{:}{:}\,\,0.6 : 0.9\) are examples of continuous analogical proportions. Moreover, \(1:3\,\,{:}{:}\,\,3:5\) is an example of continuous analogical proportion between nominal ordered grades. Thus this extension captures the idea of betweenness implicit in statements of the form “a is to b as b is to c”. Note that we have \(0 : 1\,\,{:}{:}\,\,1 : 0 = 0\) and \(1 : 0\,\,{:}{:}\,\,0 : 1 = 0\), as expected.

Analogical proportions extend to vectors in a component-wise manner. Let \(\varvec{a} = (a_1, \ldots , a_m)\), where each \(a_i\) belongs to \(\{0, 1\}\) (Boolean case), or to a finite set with more than 2 elements (nominal case), or to [0, 1] (numerical case). \(\varvec{b}, \varvec{c}, \varvec{d}\) are defined similarly. Then \(\varvec{a}:\varvec{b}\,\,{:}{:}\,\,\varvec{c}:\varvec{d}\) has a truth value which is just \(\min _{i=1}^m a_i:b_i\,\,{:}{:}\,\,c_i:d_i\).

In this paper, we deal with classification. So each vector \(\varvec{a}\) in a training set is associated with its class \(cl(\varvec{a})\). Thus saying that the continuous analogical proportion \(\varvec{a}:\varvec{x}\,\,{:}{:}\,\,\varvec{x}: \varvec{c}\) holds true amounts to say:

$$\begin{aligned} \begin{array}{l} \varvec{a}:\varvec{x}\,\,{:}{:}\,\,\varvec{x}: \varvec{c}=1 \text{ iff } \\ a_j:x_j \,\,{:}{:}\,\,x_j:c_j=1 \text{ for } \text{ each } \text{ attribute } j \text{ and } cl(\varvec{a}):cl(\varvec{x})\,\,{:}{:}\,\,cl(\varvec{x}): cl(\varvec{c}) = 1 \end{array} \end{aligned}$$
(3)

Moreover, since continuous analogical proportions are trivial for a Boolean or a nominal variable, we shall also use a more liberal extension of betweenness for the vectorial case [10] in this paper. Namely, we shall say \(\varvec{x}\) is between \(\varvec{a}\) and \(\varvec{c}\) defined as:

$$\begin{aligned} { between( \varvec{a},\varvec{x},\varvec{c})=1} \text{ iff } a_j\le {}x_j\le {}c_j \text{ or } c_j\le {}x_j\le {}a_j \text{ for } \text{ each } \text{ attribute } j. \end{aligned}$$
(4)

Then we can define the set \(\mathrm {Between}(\varvec{a}, \varvec{c})\) of vectors between two vectors \(\varvec{a}\) and \(\varvec{c}\). For instance, we have \(\mathrm {Between}(01000, 11010) = \{01000, 11000, 01010, 11010\}\). Note that in case of Boolean values, the betweenness condition can also be written as \(\forall i=1,\cdots , m,\) \( (a_i\wedge c_i\rightarrow x_i)\wedge (x_i\rightarrow a_i\vee c_i)=1\).

3 Related Work

The idea of generating, or completing, a third example from two examples can be encountered in different settings. An option, quite different from interpolation, is the “feature knock out” method [23], where a third example is built by modifying a randomly chosen feature of the first example with that of the second one. A somewhat related idea can be found in a recent proposal [3] which introduces a measure of oddness with respect to a class that is computed on the basis of pairs made of two nearest neighbors in the same class; this amounts to replace the two neighbors by a fictitious representative of the class.

Reasoning with a system of fuzzy if-then rules provides an interpolation mechanism [14], which, from these rules and an input “in-between” their condition parts, yields a new conclusion “in-between” their conclusion parts, by taking advantage of membership functions that can be seen as defining fuzzy “neighborhoods”.

Moreover, several approaches based on the use of interpolation and analogical proportions have been developed in the past decade. In [17], the problem considered is to complete a set of parallel if-then rules, represented by a set of condition variables associated to a conclusion variable. The values of the variables are assumed to belong to finite sets of ordered labels. The basic idea is to apply analogical proportion inference in order to induce missing rules from an initial set of rules, when an analogical proportions hold between the variable labels of several parallel rules. Although this approach may seem close to the analogical interpolation-based approach proposed in this paper, our goal is not to predict just the conclusion part of an incomplete rule, but rather a whole example including its attribute-based description and its class. Moreover, we restrict our study to the use of pairs of examples for this prediction, while in [17] the authors use both pairs or triples of rules for completing rules. An extended version of the above-mentioned work has been presented in [22] where the authors also propose a more cautious method that makes explicit the basic assumptions under which rule conclusions are produced from analogical proportions. Along the same line, see also [21] on interpolation between default rules.

Let us also mention the general approach proposed by Schockaert and Prade [20] to interpolative and extrapolative reasoning from incomplete generic knowledge represented by sets of symbolic rules, handled in a purely qualitative manner, where labels are represented in conceptual spaces. This work is an extended version of [19] in which only interpolative inference is considered. The same authors present an illustrative case study in [18] in the music domain. In the context of natural language modeling, Derrac and Schockaert [5] have proposed a data-driven approach that exploits betweenness and a fortiori inference to derive semantic relations within conceptual spaces.

Besides, some previous works have considered, discussed and experimented the idea of an analogical proportion-based enlargement of a training set, based on triples of examples. In [1], the authors proposed an approach to generate synthetic data to tune a handwritten character classifier. Couceiro et al. [4] presented a way to extend a Boolean sample set for classification using the notion of “analogy preserving” functions that generate examples on the basis of triples of examples in the training set. The authors only tested their approach on Boolean data.

In a more recent work, Lieber et al. [10] have extended the paradigm of classical Case-Based Reasoning to link the current case to either pairs of known cases by performing a restricted form of interpolation, or to triples of known cases by exploiting extrapolation, taking advantage of betweenness and analogical proportion relations.

Lastly, in the context of deep learning, Goodfellow et al. [7] invented the idea of a generative adversarial network (GAN) as a class of machine learning systems. Given a training set, two neural networks, contesting with each other in a game, are learnt in order to generate new data with the same statistics as the training set. More recently, Inoue [9] presented a data augmentation technique for image classification that mix two randomly picked images to train a classifier.

4 Analogical Interpolation-Based Predictor (AIP)

Analogical proportions have been recently applied to classification problems and have shown their efficiency for classifying a variety of datasets [2]. In this paper, we aim to investigate if continuous analogical proportions could be useful for a prediction purpose, namely enlarging a training set with made examples, and if standard classification methods applied to this enlarged set can compete with the direct application of analogical proportions-based inference for classification. As said before, the basic idea of the paper is to apply an interpolation method for predicting new examples not present in the original data set which is just enlarged.

In the following, we describe the basic principle of our predicting approach.

4.1 Basic Procedure

Consider a set E of n classified examples i.e., \(E=\big \{(\varvec{x^{1}},y^{1}), ...,(\varvec{x^{i}},y^{i}),...,\)\((\varvec{x^{n}},y^{n})\big \}\) such that the class label \(y^{i}=cl(\varvec{x^{i}})\) is known for each \(i \in 1,...,n\). The goal is to predict a new set of examples \(S=\{(\varvec{x^{k}},y^{k})\notin E\)} by interpolating examples from the set E. The new set S will serve for enlarging E.

The basic idea is to find pairs of examples \((\varvec{a},\varvec{c}) \in E^{2}\) with known labels such that the analogical proportion (3) is solvable attribute by attribute i.e., there exists \(\varvec{x}\) such that \(a_j:x_j\,\,{:}{:}\,\,x_j:c_j=1\) for each attribute \(j=1,...,m\), and the class equation has \(cl(\varvec{x})\) as a solution, i.e., \(cl(\varvec{a}):cl(\varvec{x})\,\,{:}{:}\,\,cl(\varvec{x}):cl(\varvec{c})=1\).

As mentioned before in Sect. 2, the solution for the previous equation \(a_j:x_j\,\,{:}{:}\,\,x_j:c_j=1\) in the numerical case is just the midpoint \(x_j=(a_j+c_j)/2\) for each attribute \(j=1,...,m\). We are interested in the case of ordered nominal values in this paper. Moreover, we assume that the distances between any two successive values in such an ordered set of values are the same. Let \(V=\{v_1, \cdots , v_k\}\) be an ordered set of nominal values, then, \(v_i\) will be regarded as the midpoint of \(v_{i-j}\) and \(v_{i+j}\) with \(j \ge 1\), provided that both \(v_{i-j}\) and \(v_{i+j}\) exist. For instance, if \(V=\{1, \cdots , 5\}\), the analogical proportions \(1 : 3\,\,{:}{:}\,\,3 : 5\) or \(2 : 3\,\,{:}{:}\,\,3 : 4\) hold, while \(2 : x\,\,{:}{:}\,\,x : 5 = 1\) has no solution. So it is clear that some pairs \((\varvec{a},\varvec{c})\) will not lead to any solution since we restrict the search space to the pairs for which the midpoint (attribute by attribute) exists.

This condition may be too restrictive especially for datasets with high number of attributes which may reduce the set of predicted examples. In case of success, the predicted example \(\varvec{x}=\{x_1,...,x_j,...x_m\}\) will be assigned to the predicted class label \(cl(\varvec{x})\) and saved in a candidate set.

Since different voting pairs may predict the same example \(\varvec{x}\) more than once (\(\varvec{x}\) may be the midpoint of more than one pair \((\varvec{a},\varvec{c})\)), a candidate example may have different class labels. Then has to perform a vote on class labels for each candidate example classified differently in the candidate set. This leads to the final predicted set of examples where each example is classified uniquely.

This process can be described by the following procedure:

  1. 1.

    Find solvable pairs \((\varvec{a},\varvec{c})\) such that Eq. 3 has a unique non null solution \(\varvec{x}\).

  2. 2.

    In case of ties (an example \(\varvec{x}\) is predicted with different class labels), apply voting on all its predicted class labels and assign to \(\varvec{x}\) the success label.

  3. 3.

    Add \(\varvec{x}\) to the set of predicted examples (together with \(cl(\varvec{x})\)).

In the next section, we first present a basic algorithm applying the process described above, then we propose two options that may help to improve the search space for the voting pairs.

4.2 Algorithms

The simplest way is to systematically consider all pairs (\(\varvec{a},\varvec{c})\in E^{2}\), for which Eq. 3 is solvable, as candidate pairs for prediction. Algorithm 1 implements a basic Analogical Interpolation-based Predictor, denoted \(AIP_{std}\), without applying any filter on the voting pairs.

figure a

Considering all pairs \((\varvec{a},\varvec{c})\) for prediction may seem unreasonable especially when the domain of attribute values is large since this may blur prediction results. A first improvement of Algorithm 1 is to restrict the search for pairs to those that are among the nearest neighbors (in terms of Hamming distance) to the example to be predicted.

Let us consider two different pairs \((\varvec{a},\varvec{c})\) and \((\varvec{d},\varvec{e})\in E^{2}\). We assume that \(\varvec{a}:\varvec{x}\,\,{:}{:}\,\,\varvec{x}: \varvec{c}=1\) produces as solution an example \(\varvec{b}\) and \(\varvec{d}:\varvec{x}\,\,{:}{:}\,\,\varvec{x}: \varvec{e}=1\) produces an other example \(\varvec{b'} \ne \varvec{b}\). If \(\varvec{b'}\) is closest to \((\varvec{d},\varvec{e})\) than \(\varvec{b}\) is to \((\varvec{a},\varvec{c})\) in terms of Hamming distance, it is more reasonable to consider only the pair \((\varvec{d},\varvec{e})\) for prediction. This means that example \(\varvec{b'}\) will be predicted while \(\varvec{b}\) will be rejected. We denote \(AIP_{NN}\) this second improved Algorithm 2 in the following.

Algorithm 3 (that we denote \(AIP_{NN,SC}\)) is exactly the same as Algorithm 2 in all respects, except that we look for only pairs \((\varvec{a},\varvec{c})\) belonging to the same class in this case. Note that the two algorithms only differ for non binary classification problems, since \(s : x\,\,{:}{:}\,\,x : t = 1\) has no solution in \(\{0, 1\}\) for \(s \ne t\).

figure b
figure c

4.3 Another Option

As can be seen in the next section, searching for the best pairs (described in Algorithms 2 and 3) limits the number of accepted voting pairs. Moreover, there is a second constraint to be satisfied, that is limiting the solutions of Eq. 3 to the values of \(\varvec{x}\) that are the midpoint of \(\varvec{a}\) and \(\varvec{c}\) which is hard to be satisfied in the ordered nominal setting. To relax this last constraint, we may think to use the “betweenness” definition given in Eq. 4. In this definition, the equation \(between(\varvec{a},\varvec{x},\varvec{c})=1\) has, as a solution, any \(\varvec{x}\) such that \(\varvec{x}\) is between \(\varvec{a}\) and \(\varvec{c}\) for each attribute \(j \in {1,...,m}\). This last option is implemented by the algorithm denoted \(AIP_{Btw}\) which is exactly the same as Algorithm 3 except that we use the definition (4) to solve the analogical interpolation.

5 Experimentations and Discussion

In this section, we aim to evaluate the efficiency of the proposed algorithms for predicting new examples. For this purpose, we first run different standard ML classifiers on the original dataset, then we apply each AI-Predictor to generate a new set of predicted examples that is used to enlarge the original data set. This leads us to four different enlarged datasets, one for each proposed algorithm. Finally, we re-evaluate again ML classifiers on each of these completed datasets. For both original and enlarged datasets, we apply the testing protocol presented in the next sub-section.

In this experimentation, we tested with the following standard ML classifiers:

  • IBk: a k-NN classifier, we use the Manhattan distance and we tune the classifier on different values of the parameter \(k=1,2,...,11\).

  • C4.5: generating a pruned or unpruned C4.5 decision tree. We tune the classifier with different confidence factors used for pruning \(C=0.1,0.2,...,0.5\).

  • JRip: propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER). We tune the classifier for different values of optimization runs \(O=2,4,...10\) and we apply pruning.

5.1 Datasets for Experiments

The experimental study is based on several datasets taken from the U.C.I. machine learning repository [11]. A brief description of these data sets is given in Table 1.

To apply the analogical interpolation, we have chosen to deal only with ordered nominal datasets in this study (the extension to the numerical case is the topic of a future work). Table 1 includes 10 datasets with ordered nominal or Boolean attribute values. In terms of classes, we deal with a maximum number of 5 classes.

  • , , and are multiple classes datasets.

  • , , , , and datasets are binary class problems. has noise added (in the sample set only). data set contains only binary attributes and has missing attribute values. As a missing value, in this dataset, simply means that this value is not “yes” nor “no”, we replace each missing value by a third value other than 0 and 1. These data sets are described in Table 1.

5.2 Testing Protocol

To test ML classifiers, we apply a standard 10 fold cross-validation technique. As usual, the final accuracy is obtained by averaging the 10 different accuracies (computed as the ratio of the number of correct predictions to the total number of test examples) for each fold. However, each ML classifier requires a parameter to be tuned before performing this cross-validation.

Table 1. Description of datasets

In order to do that, we randomly choose a fold (as recommended by [8]), we keep only the corresponding training set (i.e. which represents 90% of the full dataset). On this training set, we again perform a 10-fold cross-validation with diverse values of the parameters. We then select the parameter values providing the best accuracy. These tuned parameters are then used to perform the initial cross-validation. As expected, these tuned parameters change with the target dataset. To be sure that our results are stable enough, we run each algorithm (with the previous procedure) 10 times so we have 10 different parameter optimizations. The displayed parameter p is the average value over the 10 different values (one for each run). The results shown in Table 2 are the average values obtained from 10 rounds of this complete process.

5.3 Experimental Results

In the following, we first provide a comparative study of the overall accuracies for ML classifiers obtained with original and enlarged datasets. This study aims to check if examples predicted by the AIP are of good quality (namely labeled with the suitable class). In such case, the efficiency of ML classifiers should be improved when applied to enlarged datasets. Then we also report the main characteristics of these predicted datasets. Finally, we compare ML classification results with enlarged datasets to the ones obtained by directly applying Analogy-based Classification [2] to the original datasets. In this last study, we wonder if using ML classifiers with enlarged datasets may perform similarly as Analogy-based Classification [2] to the original datasets while maintaining a reduced complexity.

Results of ML-Classifiers. Accuracy results for IBk, C4.5 and JRIP are obtained by using the free implementation of Weka software to the enlarged datasets obtained from AI-Predictors. To run IBk, C4.5 and JRIP, we first optimize the corresponding parameter for each classifier, using the meta CVParameterSelection class provided by Weka using a cross-validation applied to the training set only. This enables us to select the best value of the parameter for each dataset, then we train and test the classifier using this selected value of this parameter.

Table 2 provides classification results of ML classifiers obtained with a 10-fold cross validation and for the best/optimized value of the tuned parameter (denoted p in this table).

Results in the previous table show that:

Table 2. Results for ML classifiers obtained with the enlarged datasets
  • The accuracy results have been improved when applying ML classifiers on the new predicted data instead of the original data. This is noticed for all datasets except for Monk1 and Monk3 datasets. The highest improvement percentage was noticed with the IBk classifier for the dataset Monk2 (17%), Hayes-Roth (13%) and Breast Cancer (11%).

  • Regarding the two artificial datasets Monk1 and Monk3, it is known in the original dataset, that only two attributes among 6 are involved to define the class label for each example. We may think that using the midpoint value for each attribute as well as the class label, applied in the proposed analogical interpolation which treat equally all attributes, is not compatible with this kind of classification.

  • The good improvement observed for Monk2 dataset confirms our previous intuition since, contrary to Monk1 and Mon3, in Monk2 all attributes are involved in defining the class label in this dataset.

  • The standard Algorithm 1 outperforms other algorithms in case of Cancer and Breast Cancer datasets. It is important to note that only these two datasets include attributes with large range of values (with maximum of 10 different values for Cancer and 13 different values for Breast Cancer). Moreover the number of attributes is also high if compared to other datasets. We expect that, in case ordered nominal data is represented by a large scale, using only nearest neighbor pairs for prediction seems too restrictive and leads to a local search for new examples.

  • There is no particular algorithm that provides the best results for all datasets.

  • We computed the average accuracy for each proposed algorithm and for each ML classifier over all datasets. Results are given at the end of Table 2. We can note that IBk classifier performs the best accuracy when using the enlarged data built from the \(AIP_{NN,SC}\) Algorithm. While C4.5 and JRIP perform better when applied to the dataset built from \(AIP_{Btw}\) Algorithm.

  • Overall, the IBK classifier shows the highest classification accuracy over all datasets.

In this first study, the improved results of ML classifiers when applied to enlarged datasets show the ability of the proposed algorithms (especially, \(AIP_{NN,SC}\) and \(AIP_{Btw}\)) to predict examples that are labeled with the suitable class.

Characteristics of the Predicted Datasets. To have a better understanding of the previous shown results, in this subsection we aim to investigate more the new predicted datasets. For this end, we compute the number of predicted examples for each dataset and the proportion of these examples that are assigned to the correct/suitable class label. This proportion is computed on the basis of the predicted examples that are compatible with the original set. For this new experimentation, we only consider examples predicted by Algorithm \(AIP_{NN,SC}\) (and \(AIP_{Std}\) for some datasets). We save these additional results in Table 3. From these results, we can see that:

  • In seven among ten datasets, the proportion of predicted examples that are successfully classified is \(100\%\). This means that all predicted examples that match the original set are assigned to the correct class label and thus are fully compatible with the original set (see for example Monk2, Breast Cancer, Hayes Roth and Nursery).

  • Predicting accurate examples in these datasets may explain why ML classifiers show high classification improvement when applied to the new enlarged dataset.

  • Although \(AIP_{NN,SC}\) Algorithm succeeds to predict accurate examples, the number of predicted examples is very reduced for some datasets such as for Breast Cancer, Voting and Cancer. This due to the fact that we restrict the search for only nearest neighbors pairs belonging to the same class in this Algorithm. It is important to note that these datasets contains large number of attributes which make the process of pairs filter more constraining.

  • As can be seen in Table 3, the size of the predicted sets is considerably increased, for these three datasets, when applying \(AIP_{Std}\) Algorithm which is less constraining than \(AIP_{NN,SC}\) (520 examples instead of 46 are predicted for Cancer dataset). In Table 2, we also noticed that, only for these three cited datasets, IBK performs considerably better when applied to the datasets built from the standard algorithm \(AIP_{Std}\) (producing larger sets). It is clear that in case the predicted set is very reduced, the enlarged dataset remains similar to the original set that’s why the improvement percentage of ML classifiers cannot be clearly noticed in the case of datasets predicted from \(AIP_{NN,SC}\) Algorithm.

  • Lastly for some datasets such as Monk1 and Monk3, the proportion of predicted examples that are compatible with the original set is low if compared to other datasets. As explained before, in the original sets, the classification function involves only 2 among 6 attributes which seems incompatible with continuous analogical interpolation assuming that all attributes as well as class label are the midpoint of the attributes and the class label of the pair used for prediction.

Table 3. Nbr. of predicted examples, proportion of predicted examples that are compatible with the original set

Comparison with AP-Classifier[2]. Finally, we provide a comparative study of ML classifiers results, reported in Sect. 5.3, to the results obtained with a direct application of analogical proportions for a classification purpose [2]. Note that in [2], analogical proportions-based extrapolation has been directly applied to define a new classification paradigm while in this paper we exploit analogical proportions-based interpolation to enlarge datasets on which classical ML classifiers are applied. Classification accuracies of analogical proportions-based classifiers [2] are given in Table 4 and compared to the best result of each ML classifier applied to the enlarged datasets. Results in Table 4 shows that AP-Classifier outperforms classic ML classifiers on five datasets especially on the three Monks datasets. However enlarged datasets, using analogical interpolation, helped to reduce the gap between AP-Classifier and other ML classifiers once they were applied to these enlarged data. On the other side, ML classifiers provides better accuracies on four other datasets (see for example the Breast cancer (resp. Hayes-Roth) dataset for which the IBK (resp. JRIP) is largely better than AP-Classifier).

Table 4. Results for ML classifiers obtained with the enlarged datasets and comparison with AP-Classifier [2]

This comparison firstly shows the interest of analogical proportions as a classification tool for some datasets and secondly as way for enlarging datasets for other cases. Identifying on which dataset each of these methods may be better applied should be deeply investigated in future.

In terms of complexity, the proposed Analogical Interpolation approaches (which are quadratic due to the use of pairs of examples) if combined with the IBK classifier for example (which is linear), leads to a improved classifier. This latter shows better classification accuracy and enjoining reduced complexity if compared to the AP-classifier having cubic complexity (that may be computationally costly for large datasets [2]).

6 Conclusion

This paper has studied the idea of enlarging a training set using analogical proportions as in [4], with two main differences: we only consider pairs of examples by using continuous analogical proportions which contribute to reduce the complexity to be quadratic instead of cubic, and we test with ordered nominal datasets instead of Boolean one.

On the one hand the results obtained by classical machine learning methods on the enlarged training set generally improve those obtained by applying these methods to the original training sets. On the other hand, these results, obtained with a smaller level of complexity, are often not so far from those obtained by directly applying the analogical proportion-based classification method on the original training set [2].