1 Introduction

Despite the fact that conclusions obtained by analogical reasoning do no guarantee to be valid from a classical logic viewpoint, this kind of reasoning is considered as a valuable and often creative way to solve real life problems. Analogical proportions, i.e., relations between four items of the form a is to b as c is to d, constitute a key notion formalizing analogical inference and relying on the following principle: if such proportions hold on a noticeable subset of known features used for describing the four items, the proportion may still hold on other features as well, which may help guessing the unknown values of d on these other features from their values on a, b, and c [11, 19]. It is only quite recently that a logical modeling of these proportions has been proposed [12, 13], following several attempts at formalizing them in other settings [7, 10]. This logical modeling makes clear that the analogical proportion holds if and only if a differs from b as c differs from to d and vice-versa.

The paper investigates two new justifications of the Boolean expression of an analogical proportion. First, starting from the core axioms supposed to be satisfied by an analogical proportion, and agreed by everybody for a long time, this paper exhibits the Boolean models compatible with these axioms. There are several ones, but the smallest model is the standard Boolean expression of an analogical proportion previously proposed. This smallest model is characterized by six possible Boolean patterns (among sixteen candidates). In the second part of the paper, we try to evaluate their cognitive significance in terms of algorithmic complexity (i.e. Kolmogorov complexity) and show that they are also minimal among all Boolean patterns with respect to an algorithmic complexity-based definition of analogy. Indeed algorithmic complexity measures a kind of universal information content of a Boolean string. Despite its inherent uncomputability, there exist powerful tool for computing good approximations. Kolmogorov complexity has been proved to be of great value in diverse applications: for example, in distance measures [1] and classification methods, plagiarism detection, network intrusion detection [5], and in numerous other applications [9].

The paper is organized as follows. In Sect. 2, we provide a background on the definition of an analogical proportion and its basic properties in a Boolean setting. In Sect. 3, we start from the characteristic axioms of analogical proportions, and we investigate the different compatible Boolean models. In Sect. 4, we briefly review the main definition and theorems of Kolmogorov complexity. As we have the main tools, we are in a position to give a Kolmogorov complexity-based definition of analogy in Sect. 5. Section 6 is devoted to a set of experiments that empirically validate our definition. Finally, we conclude in Sect. 7.

2 Background on Boolean Analogical Proportion

At the time of Aristotle, the idea of analogical proportion originated from the notion of numerical proportion. In that respect the arithmetic proportion between 4 integers abcd, which holds if \( a-b=c-d\), is a good prototype of the idea of analogical proportion, since we can read it as “a differs from b as c differs from d”, which perfectly fits with “a is to b as c is to d”, denoted by \(a:b\,{::}\,c:d\). When considering Boolean interpretation where \(a, b, c, d \in \{0,1\}\), it is tempting to carry on with the same definition as \(\{0, 1\} \subset \mathbb {R}\), with the inevitable drawback that difference is not an internal operator in \(\mathbb {B}=\{0,1\}\). Nevertheless, if we draw the truth table (16 lines) corresponding to this definition, we get Table 1 highlighting that only 6 among 16 lines are valid proportions.

Table 1. Boolean valuations for \(a:b\,{::}\,c:d\)

Boolean Definition. Looking for a purely logical definition of \(a:b\,{::}\,c:d\), we need to make use of the comparative indicators [14, 15] that are naturally associated with a pair of variable (ab):

  • \(a \wedge b\) and \(\lnot a \wedge \lnot b\): they are positive similarity and negative similarity indicators respectively; \(a \wedge b\) (resp. \(\lnot a \wedge \lnot b\)) is true iff only both a and b are true (resp. false);

  • \(a \wedge \lnot b\) and \(\lnot a \wedge b\): they are dissimilarity indicators ; \(a \wedge \lnot b\) (resp. \(\lnot a \wedge b\)) is true iff only a (resp. b) is true and b (resp. a) is false.

Then analogical proportion is defined by the two logically equivalent expressions [13]:

$$\begin{aligned}&a:b\,{::}\,c:d = (a \wedge \lnot b \equiv c \wedge \lnot d) \wedge (\lnot a \wedge b \equiv \lnot c \wedge d) \end{aligned}$$
(1)
$$\begin{aligned}&a:b\,{::}\,c:d =((a \wedge d) \equiv (b \wedge c)) \wedge ((\lnot a \wedge \lnot d) \equiv (\lnot b \wedge \lnot c)) \end{aligned}$$
(2)

Expression (1) reads “a differs from b as c differs from d and b differs from a as d differs from c”. This definition is equivalent to the previous one (it yields Table 1) with the advantage of being an internal definition inside \(\mathbb {B}\). Expression (2) may be viewed as the logical counterpart of the well-known property of arithmetical proportions \( a-b=c-d\Leftrightarrow a +d = b + c\). “a is to b as c is to d” can now be read “what a and d have in common, b and c have it also (both positively and negatively)”, which, however, is a less straightforward reading of the idea of analogy than the one associated with (1). As can be checked on Table 1, analogical proportions are independent with respect to the positive or negative encoding of properties: \((a:b\,{::}\,c:d) \equiv (\lnot a: \lnot b \,{::}\, \lnot c: \lnot d)\).

For representing objects one generally needs vectors of Boolean values, rather than single Boolean values, each component being the value of a binary attribute. The previous definition directly extends to Boolean vectors in \(\mathbb {B}^n\) of the form \(\overrightarrow{a}=(a_1, \cdots , a_n)\) as follows: \(\overrightarrow{a}:\overrightarrow{b}\,{::}\,\overrightarrow{c}:\overrightarrow{d} \text{ iff } \forall i \in [1,n], ~ a_i:b_i\,{::}\,c_i:d_i\).

Equation and Creativity. The equation \(a:b\,{::}\,c:x\) has a unique solution \(x=c \equiv (a \equiv b)\) provided that \((a \equiv b) \vee (a \equiv c)\) holds. Indeed neither \(0:1\,{::}\,1:0\) nor \(1:0\,{::}\,0:1\) holds true. This process can be extended componenwise to vectors. In that case, for instance, the following equation 010 : 100 : 011 : x has for unique solution the vector (1, 0, 1) which is not among the 3 previous vectors (0, 1, 0), (1, 0, 0), (0, 1, 1). Then analogical proportions for vectors are creative (an informal quality usually associated with the idea of analogy) as they may involve 4 distinct vectors.

A Previous View of Analogical Proportion. In [6], S. Klein suggests that an analogical proportion would hold as soon as abc are completed by d taken as \(d=c \equiv (a \equiv b)\). This amounts to define it as \(A_K(a,b,c,d) \triangleq (a \equiv b) \equiv (c \equiv d)\). Then \(0:1\,{::}\,1:0\) and \(1:0\,{::}\,0:1\) become valid analogical proportions and leads to the model denoted Kl in the following section. The validity of such patterns may be advocated on the basis of a functional view of analogy where \(a:f(a)\,{::}\,b:f(b)\) sounds indeed valid, taking the negation in \(\mathbb {B}\) for f. But, this is debatable since \(A_K(a,b,c,d)\Leftrightarrow A_K(b,a,c,d)\) (which does not fit with intuition). It turns out that \(a:b\,{::}\,c:d \Rightarrow A_K(a,b,c,d)\).

Lower Approximations of Analogical Proportion. While \(A_K(a,b,c,d)\) is an upper approximation of \(a:b\,{::}\,c:d\) true for 8 patterns, one may look for lower approximations that are true for 4 patterns only (taking into account code independency). There are 3 such approximations, given below, followed by the patterns they validateFootnote 1:

\( (a \equiv b) \wedge (c \equiv d) \ \begin{array}{|cccc|} \hline 1 &{} 1 &{} 1&{} 1 \\ \hline 0 &{} 0 &{} 0 &{} 0 \\ \hline 1 &{} 1 &{} 0 &{} 0 \\ \hline 0 &{} 0 &{} 1 &{} 1 \\ \hline \end{array} \); \( (a \equiv c) \wedge (b \equiv d) \ \begin{array}{|cccc|} \hline 1 &{} 1 &{} 1&{} 1 \\ \hline 0 &{} 0 &{} 0 &{} 0 \\ \hline 1 &{} 0 &{} 1 &{} 0 \\ \hline 0 &{} 1 &{} 0 &{} 1 \\ \hline \end{array} \); \( (a \not \equiv d) \wedge (b \not \equiv c) \ \begin{array}{|cccc|} \hline 1 &{} 1 &{} 0 &{} 0 \\ \hline 0 &{} 0 &{} 1 &{} 1 \\ \hline 1 &{} 0 &{} 1 &{} 0 \\ \hline 0 &{} 1 &{} 0 &{} 1 \\ \hline \end{array} \). Note that only the last one remains creative.

The question addressed now is “Could an axiomatic view of analogical proportions offer a kind of intrinsic justification that only the 6 patterns obeying (1)–(2) are acceptable?”

3 Analogy and Its Lattice of Boolean Models

Analogy, viewed as a quaternary relation R, is supposed to obey 3 axioms (e.g., [7, 10]):

  1. 1.

    \(\forall a, b, R(a,b,a,b)\) (reflexivity);

  2. 2.

    \(\forall a, b, c,d, R(a,b,c,d) \rightarrow R(c,d,a,b)\) (symmetry);

  3. 3.

    \(\forall a, b, c,d, R(a,b,c,d) \rightarrow R(a,c,b,d)\) (central permutation).

These axioms are clearly inspired by numerical proportions. From them, some basic properties can be deduced by proper applications of symmetry and central permutation:

  • \(\forall a, b, R(a,a,b,b)\) (identity);

  • \(\forall a, b, c,d, R(a,b,c,d) \rightarrow R(b,a,d,c)\) (inside pair reversing);

  • \(\forall a, b, c,d, R(a,b,c,d) \rightarrow R(d,b,c,a) { (\textit{extreme permutation})}\).

In fact, another (less standard) axiom expected from a natural analogy is:

$$\forall a, b, R(a,a,b,x) \implies x=b ~~ (unicity) $$

All these properties fit with our intuition of what may be an analogical proportion. In this paper, we focus on \(\mathbb {B}=\{0,1\}\) as interpretation domain. In that case, R should be interpreted as a subset of \(\mathbb {B}^4\): removing the emptyset leaves \(2^{16}-1\) candidate models. It is straightforward to get a basic model. By applying reflexivity, we see that 0101, 1010 should belong to the relation and 0000, 1111 as well since we may have \(a=b\), and central permutation then leads to add 0011 and 1100. Thus, we get the model \(\varOmega _0=\{0000, 1111, 0101,1010, 0011, 1100\}\), which is stable under symmetry. \(\varOmega _0\) is the smallest model for analogical proportion over \(\mathbb {B}\) Footnote 2. However, one may ask about other models, and we can show the following:

Property 1

There are exactly 8 models of analogy (satisfying the 3 first axioms) over \(\mathbb {B}\). There are exactly 2 models of analogy (satisfying the 3 first axioms plus unicity).

Proof

Any model should include \(\varOmega _0\). Let us note that a bigger model should necessarily have an even cardinality due to the following facts:

  • To be bigger than \(\varOmega _0\), it should contain a string s containing both 0 and 1.

  • Thanks to symmetry or central permutation axioms, it should contain the symmetric cdab of \(s=abcd\) and the central permutation acbd of s: necessarily, one of these 2 strings is different from s (otherwise, we get \(a=b=c=d\)).

So we have to look for models of cardinality 8, 10, 12, 14 and 16. Obviously \(\mathbb {B}^4\) of cardinality 16 is a model, the biggest one. Due to the axioms, we have to add to \(\varOmega _0\) subsets of \(\mathbb {B}^4\) that are stable w.r.t. symmetry and central permutation. We have exactly:

  • one subset with 2 elements: \(S_2=\{1001,0110\}\)

  • two subsets with 4 elements: (i) \(S_3=\{1110,1101,1011,0111\}\); (ii) \(S_4=\{0001,0010,0100,1000\}\).

Since every model has to be built by adding to \(\varOmega _0\) one of the previous subsets, we get the following models for analogy in \(\mathbb {B}\):

  1. (1)

    1 model with 6 elements: \(\varOmega _0\) (the smallest one)

  2. (2)

    1 model with 8 elements: \(Kl=\varOmega _0 \cup S_2\)={0000, 1111, 0101, 1010, 0011, 1100, 0110, 1001}

    As previously explained, this model is due to Klein [6].

  3. (3)

    2 model with 10 elements:

    • \(M_3=\varOmega _0 \cup S_3\)={0000, 1111, 0101, 1010, 0011, 1100, 1110, 1101, 1011, 0111},

    • \(M_4=\varOmega _0 \cup S_4\)={0000, 1111, 0101, 1010, 0011, 1100, 0001, 0010, 0100, 1000}

  4. (4)

    2 models with 12 elements:

    • \(M_5=M_3 \cup S_2\)={0000, 1111, 0101, 1010, 0011, 1100, 1110, 1101, 1011, 0111, 0110, 1001},

    • \(M_6=M_4 \cup S_2\)={0000, 1111, 0101, 1010, 0011, 1100, 0001, 0010, 0100, 1000, 0110, 1001},

  5. (5)

    1 model with 14 elements:

    • \(M_7 = M_3 \cup S_4=M_4 \cup S_3= \varOmega _0 \cup S_3 \cup S_4=\) {0000, 1111, 0101, 1010, 0011, 1100, 1110, 1101, 1011, 0111, 0100, 1000, 0110, 1001},

  6. (6)

    1 model with exactly 16 elements: \(\varOmega =\varOmega _0 \cup S_2 \cup S_3 \cup S_4=\mathbb {B}\).

Finally \(\varOmega _0\) and Kl satisfy unicity but M3 (containing 1100 and 1101) and M4 (containing 0000 and 0001) do not satisfy. This achieves the proof. \(\Box \)

The set of models is a lattice with bottom element \(\varOmega _0\) and top element \(\mathbb {B}\), see Fig. 1. As can be seen, 8 models fit with the axioms in the Boolean case, including the 6-patterns model \(\varOmega _0\) and the 8-patterns model Kl due to Klein. However, it is natural to privilege the smallest model, the minimal one that just accounts for the axioms and nothing more.

Fig. 1.
figure 1

The lattice of Boolean models of analogy

We now investigate if another justification in favor of the minimal model \(\varOmega _0\) can be obtained by minimizing an expression reflecting the information content of an analogical proportion in terms of Kolmogorov complexity. We now review the fundamentals of Kolmogorov complexity theory, also known as Algorithmic Complexity Theory.

4 Kolmogorov Complexity: A Brief Review

Kolmogorov complexity is not a new concept and the theory has been designed many years ago: see for instance [9] for an in depth study. This theory has not to be confused with Shannon information theory [16] despite the fact that they share some links.

The Starting Point. We need the help of a universal Turing machine denoted U. Then p denotes a program running on U. Two situations can happen: (i) either p does not stop for the input x, or (ii) p stops for the input x and outputs a finite string y. In that case, we write \(U(p,x)=y\). The Kolmogorov complexity [9] of y w.r.t. x is then defined as:

$$\begin{aligned} K_U(y/x) = min \{ |p| , U(p,x)=y\}. \end{aligned}$$

\(K_U(y/x)\) is the size of the shortest program able to reconstruct y with the help of x. The Kolmogorov complexity [9] of y is just obtained with the empty string \(\epsilon \):

$$\begin{aligned} K_U(y) = min \{ |p| , U(p,\epsilon )=y\}. \end{aligned}$$

Given a string s, \(K_U(s)\) is an integer which, in some sense, is a measure of the information content of s: instead of sending s to somebody, we can send p from which s can be recovered as soon as this somebody has the machine U. \(K_U\) enjoys a lot of properties among which a kind of universality: this complexity is independent of the underlying Turing machine as we have the invariance theorem [9]:

Theorem 1

If U1 and U2 are two universal Turing machines, there exists a constant \(c_{U1U2}\) such that for all string s: \(|K_{U1}(s) - K_{U2}(s)| < c_{U1U2}\), where \(K_{U1}(s)\) and \(K_{U2}(s)\) denote the algorithmic complexity of s w.r.t. U1 and U2 respectively.

This theorem guarantees that complexity values may only diverge by a constant c (e.g. the length of a compiler or a translation program) and for huge complexity strings, we can denote K without specifying the Turing machine U. It can also be shown that [9]:

Theorem 2

\(\forall x, y, K(xy) = K(x) + K(y/x) + \mathcal {O}(1)\).

Unfortunately K has been proved as a non-computable function [9]. But in fact, K or an upper bound of K can be estimated in diverse ways that we investigate now.

Complexity Estimation. The first well known option available to estimate K is via lossless compression algorithm. For instance bzip approximates better than gzip, and the PAQ family is still better than bzip2. Due to the invariance theorem, when the size of s is huge, using compression will provide a relatively stable approximation as the constant c in the theorem can be considered as negligible. It is obviously not the case when the size of s becomes small. When s is short, compression is not a valid option. On another side, the constant c can prevent for providing stable approximations of K(s). Luckily, the works of [3, 4, 17] give means of providing sensible values for the complexity of short strings (i.e. less than 10bits). This job has been done by the Algorithmic Nature Group (https://algorithmicnature.org/). They have developed a tool OACC (http://www.complexitycalculator.com/) allowing to estimate the complexity of short strings. The authors derived their approach from a theorem from Levin [4, 8] establishing the exact connection between m(s) and K(s), where m(s) is a semi-measure known as the Universal Distribution defined as follows [18]: \(m(s) = \varSigma _{p:U(p, \epsilon )=s} 2^{-|p|}\).

Theorem 3

There exists a constant c depending only of the underlying Turing machine such that: \(\forall s, | - log_2(m(s)) - K(s) | < c \).

Rewriting the formula as \(K(s) = -log_2(m(s)) + \mathcal {O}(1)\), shows that estimating K could also be done via estimating m(s). Estimating m(s) becomes realistic when s is short as we have to estimate the probability for s to be the output of a short program. Considering simple Turing machines as described in [17], over a Boolean alphabet \(\{0,1\}\) and a finite number n of states {1,...,n} plus a special Halt state denoted 0, there are exactly \((4n+2)^{2n}\) such Turing machines. Using clever optimizations [17], running these machines for \(n=4\) and \(n=5\) becomes realistic and provides an estimation of m(s) and ultimately of K(s). In the following, we denote \(K'(s)\) this OACC estimation of K(s).

Short Chains Complexity Estimation. Some properties are expected from a complexity calculator machinery to be in accordance with a cognitive process:

  1. 1.

    There is no way to distinguish strings of length 1 and it is absolutely clear that \(K(0) = K(1)\) should hold whatever the considered universal Turing machine.

  2. 2.

    An important point is to be able to distinguish the 4 strings of length 2: 00, 11, 10, 01 and we expect the following properties: \(K(00)=K(11) < K(01) = K(10)\);

  3. 3.

    In terms of n bits strings, we expect \(0 \ldots 0\) and \(1 \ldots 1\) to be the simplest ones and to have the same complexity.

Observing the tables in [4], it appears that the properties above are satisfied, namely:

  • Whatever the number of states of a 2-symbols Turing machine, \(K'(0)=K'(1)\).

  • Whatever the number of states of a 2-symbols Turing machine, \(K'(00)=K'(11)=a\), \(K'(01) = K'(10)=b\) and \(a < b\).

  • Whatever the number of states of a 2-symbols Turing machine, and for strings of length less than or equal to 10 (short strings) \(K'(0 \ldots 0)=K'(1 \ldots 1)=a\) and a is the minimum value among the set of values.

Then the estimation of K via \(K'\) coming from the OACC estimator is a suitable candidate for our purpose. But before going further, we have first to check that OACC validate the above conditions. As we can check by examining Table 2 and column 4 of the final table in Sect. 6, these basic cognitive evidences are confirmed with the OACC tool. So we can start from OACC to check the properties required to validate the analogical hypothesis that we propose in the next section.

Table 2. Complexity of 1 bit and 2 bits chains with OACC

5 An Algorithmic Complexity View of Analogical Proportions

As described in our introduction, several attempts have been done to formalize analogy or analogical reasoning with mitigate success. In this paper, as it has been the case in the works of [2], we adopt a machine learning viewpoint. Our aim is to integrate analogical reasoning in the global landscape of predicting values from observable examples.

When stated in a machine learning perspective, the problem of analogical inference is as follows: for a given \(x_3\), predict \(x_4\) such that the target pair \((x_3, x_4)\) is in the same relation that another given source pair \((x_1, x_2)\) considered as an example. The pair \((x_3,x_4)\) is the target pair which is partially known. In the case of classification where the 2nd element in a pair is the label, it amounts to predict the label of \(x_3\) having only one classified example \((x_1,x_2)\) at hand.

A functional view amounts to considering a hidden function f such that \(x_2=f(x_1)\) and we have to guess \(x_4=f(x_3)\). This functional view is the one developed in [2]: the problem of analogical inference strictly fits with a regression problem but with only one example. Ruling out any statistical models, this approach needs a brand new formalization that the authors extract from algorithmic complexity theory. Instead of trying to find regularities among a large set of observations (statistical approach), they consider the very meaning of each of the 3 observables \(x_1, x_2=f(x_1)\) and \(x_3\). We start from this philosophy, but we depart from it as below:

  • We focus on the Boolean case where the 3 objects under consideration are Boolean vectors. So we do not have to care about the change between the source domain representation and the target domain representation: these 2 domains are identical. The cost of this representation change is null in terms of algorithmic complexity.

  • To be in line with the machine learning minimal assumption that there exists some unknown probability distribution P from which the data are drawn, we do not consider that \(x_2\) is a (hidden) function of \(x_1\). We just have a probability of observing \(x_2\) having already observed \(x_1\) which is more general than associating a fixed \(x_2\) with every given \(x_1\). It could be the case that for another \(x'_2\) we still have \(x_1:x'_2\,{::}\,x_3:x_4\).

As a consequence, we start from the following intuitions:

  1. 1.

    For \(x_1:x_2\,{::}\,x_3:x_4\) to be accepted as a valid analogy, it is clear that the way we go from \(x_1\) to \(x_2\) should not be very different from the way we go from \(x_3\) to \(x_4\) (but it has not to be a functional link). We suggest to measure this expected proximity with the difference \(|K(x_2/x_1) - K(x_4/x_3)|\). Considering \(K(x_2/x_1)\) as the difficulty to build \(x_2\) from \(x_1\), the previous expression \(|K(x_2/x_1) - K(x_4/x_3)|\), when small, tells us that it is not more difficult to build \(x_4\) from \(x_3\) than to build \(x_2\) from \(x_1\), and vice versa. This is what we call the atomic view of analogy. But this is obviously not enough.

  2. 2.

    In fact, the previous formula does not tell anything about the link between the pair \((x_1,x_2)\) and the pair \((x_3,x_4)\). For \(x_1:x_2\,{::}\,x_3:x_4\) to be accepted as a valid analogy, the difficulty to apprehend the string \(x_1x_2\) from the string \(x_3x_4\) should be close to the difficulty to apprehend \(x_3x_4\) from \(x_1x_2\). We suggest to measure this expected proximity with the difference \(|K(x_1x_2/x_3x_4) - K(x_3x_4/x_1x_2)|\). This difference is obviously symmetric and is linked to the symmetry of an analogy.

  3. 3.

    Above all, the global picture has to be “simple” i.e. telling that \(x_1:x_2\,{::}\,x_3:x_4\) is a valid analogy should not be too disturbing, at least from a cognitive viewpoint. This means that the occurrence of the string \(x_1x_2x_3x_4\) in this order should be highly plausible. We suggest to measure this plausibility with \(K(x_1x_2x_3x_4)\) which is the size of the shortest program producing the binary string \(x_1x_2x_3x_4\) from a universal Turing machine.

Following the ideas of [2], we use the sum as aggregator operator and denote \(k(x_1x_2x_3x_4)\) the following formula measuring, in some sense, the quality of an analogy:

$$|K(x_2/x_1)-K(x_4/x_3)| + |K(x_1x_2/x_3x_4) - K(x_3x/x_1x_2)| + K(x_1x_2x_3x_4)$$

This leads us to postulate that the “best” \(x_4\) we are looking for to build a valid analogy \(x_1:x_2\,{::}\,x_3:x_4\) is the one minimizing this expression. So, we have: \(x_4 = argmin_u k(x_1x_2x_3u)\). Let us see if we can, at least from an empirical viewpoint, validate this model.

6 Validation in the Boolean Setting

As we are not in a position to prove something at this stage, let us just investigate now the empirical evidence for our formula. One point to start with is to check if this formula holds in the very basic Boolean case. Considering \(x_1, x_2, x_3, x_4\) as Boolean values, we have to check how the 6 cases of valid analogical proportions actually behave w.r.t. the formula \(k(x_1x_2x_3x_4)\). Thus, we have to estimate formula \(k(x_1x_2x_3x_4)\) for every \(x_1x_2x_3x_4 \in \mathbb {B}^4\). The point is that our strings are very short: only 4 bits. So, as explained in Sect. 4, we have to rely on OACC instead of a compression estimation.

The Less Complex Analogical Chains. On top of that, we have to consider, not only pure Kolmogorov complexity K but also complexity w.r.t. a given string as in \(K(x_3x_4/x_1x_2)\). Generally, it is quite clear that \(K(xy) \le K(x) + K(y/x)\): roughly speaking, we can build a program whose output is xy by concatenating a program whose output is x to a program taking x as input and providing y as output. It is more difficult to get a more precise bound. Thanks to Theorem 2: \(K(xy) = K(x) + K(y/x) + \mathcal {O}(1)\), which shows that we can approximate K(y/x) with \(K(xy) - K(x)\). As we now have all the tools needed to approximate formula k, it remains to use OACC to compute the estimation. The following table reports the results of this computation:

figure a

As can be seen for the 6 patterns of the model \(\varOmega _0\) of analogical proportion, the unique solution of equation \(a\,:\,b\,{::}\,\,c\,:x\) always corresponds to a string abcx that minimizes expression k wrt the other option \(abc\overline{x}\) (where \(\overline{x}= \lnot x\)), e.g. \(k(1111) < k(1110)\). Besides 0101 is simpler than 0110 despite the fact that in the second case there is also an underlying function such that \(x_2= f(x_1)\) and \(x_4=f(x_3)\): the negation. Note that 0110 and 1001 exhibit the highest complexity as estimated by OACC. It eliminates Kl. As there is no known convergence result regarding K and that we cannot estimate the constant in the formula \(K(s) = -log_2(m(s)) + \mathcal {O}(1)\), these experiences should only be considered as adding a bit of credibility to the smallest model.

7 Conclusion

We have given a complete description of the Boolean models of analogy. To choose the most relevant one among the possible 8 models beyond the minimality argument, we have proposed a complexity-based definition for Boolean analogical proportion. Using a set of calculations with OACC, the tool developed by the Algorithmic Nature Group (https://algorithmicnature.org/), we have checked that the truth table of the Boolean analogy fits with the fact that the corresponding combinations minimize the given complexity formula. It remains to consider the formula in a more general setting than the Boolean one. This would in particular allow to establish a link between transfer learning and Kolmogorov complexity. Another point would be of interest: to be able to solve the minimization problem associated to the formula. Doing so would be to solve the analogical equation \(a:b\,{::}\,c:x\). This might be the basis of a constructive process.