1 Concept-recognitions for Human and for Artificial Intelligences

How are abstract concepts formed and recognized on the basis of some previous experience? This difficult question has been investigated with different methods by psychologists, neuroscientists, artificial intelligence researchers, logicians, philosophers. Interestingly enoug, neuroscientific researches have recently interacted with an important approach to psychology: the Gestalt theory, that had been proposed by Wertheimer, Kofka and Köhler in the early 20th century.Footnote 1 The basic idea of this theory can be sketched as follows: human perception and knowledge of objects is essentially based on our capacity of realizing a Gestalt (a form) of the objects in question: a holistic image that cannot be identified with the set of its component elements. A human mind that abstracts the concept table from a given set of concrete examples, generally creates a table-Gestalt, a kind of vague and out of focus image that does not fully correspond to a particular table with well determined features. When we recognize as a table a new object we have met in our environment, we generally make a comparison between

  • the main features of the new object;

  • the table-Gestalt that we had constructed in our mind.

Can such recognition-processes that are so natural for human minds be “taught” to an intelligent machine? Is it possible to simulate the intuitive notion of Gestalt by some adequate mathematical concepts? This question can be successfully investigated in the framework of a quantum-inspired approach to pattern recognition and to machine learning. Unlike some standard quantum approaches whose aim is designing quantum circuits to implement machine-learning processes by means of quantum computers, quantum-inspired approches to pattern recognition and to machine learning are theoretic studies that apply quantum-information concepts in order to investigate recognition and classification-questions arising in different fields of knowledge.Footnote 2

Consider an agent (let us call her Alice) who is interested in a given concept \(\mathcal C\) that may refer either to concrete or to abstract objects (say table, triangle, beautiful). The name Alice may denote either a human or an artificial intelligence. We will use \(Alice_H\) for a Human Mind and \(Alice_M\) for an Intelligent Machine. Alice will then correspond either to \(Alice_H\) or to \(Alice_M\). We suppose that Alice (on the basis of her previous experience) has already recognized and classified a given set of objects for which the question “does the object under consideration verify the concept \(\mathcal C\)?” can be reasonably asked. We assume that the possible answers to this question are:

  • YES!

  • NO!

  • PERHAPS!

As an example, Alice might be a child who has already recognized (in the environment where she is living) the objects that are tables and the objects that are not tables. At the same time, she might have been doubtful about the right classification of some particular objects. For instance, she might have answered “PERHAPS!” to the question “is this food trolley a table?”.

While \(Alice_H\) may have seen the objects under consideration, seeing is of course more problematic for \(Alice_M\). Thus, generally, one shall make recourse to some theoretic representations that faithfully describe the objects in question. As happens in physics, such theoretic representations can be identified with convenient mathematical objects that represent object-states.

In the classical approaches to pattern recognition and machine learning an object-state is usually represented as a vector

$$\begin{aligned} \overrightarrow{\textbf{x}} = (x_1, \ldots , x_d), \end{aligned}$$

that belongs to the real space \(\mathbb R^d\) (where \(d \ge 1\)). Every component \(x_i\) of the vector \( \overrightarrow{\textbf{x}}\) is supposed to correspond to a possible value of an observable that is considered relevant for recognizing the concept \(\mathcal C\). The number \(x_i\) is called a feature of the object represented by the vector \( \overrightarrow{\textbf{x}}\).

We will first discuss the problem “how is a concept \(\mathcal C\) recognized on the basis of a previous experience?” in a classical framework. Suppose that (at a given time \(t_0\)) Alice is interested in the concept \(\mathcal C\). Her previous experience concerning \(\mathcal C\) can be described by the formal notion of classical three-valued \(\mathcal C\)-dataset.Footnote 3

Definition 1.1

Classical three-valued \(\mathcal C\)-dataset

A classical three-valued \(\mathcal C\)-dataset (briefly, classical \(3\mathcal C\)-dataset) is a sequence

$$^\mathcal CCDS = \,\, (\mathbb R^d,\,\,^\mathcal C St, \,\, ^\mathcal C St^+, \,\, ^\mathcal C St^-, \,\, ^\mathcal C St^? \,\, ), $$

where:

  1. 1.

    \(^\mathcal C St\) is a finite set of object-states \( \overrightarrow{\textbf{x}}\) in the space \(\mathbb R^d\), for which the question “does the object described by \( \overrightarrow{\textbf{x}}\) verify the concept \(\mathcal C\)?” can be reasonably asked.

  2. 2.

    \(^\mathcal CSt^+\) is a subset of \(^\mathcal C St\), consisting of all states that have been positively classified with respect to the concept \(\mathcal C\). The elements of this set are called the positive instances of the concept \(\mathcal C\).

  3. 3.

    \(^\mathcal CSt^-\) is a subset of \(^\mathcal C St\), consisting of all states that have been negatively classified with respect to the concept \(\mathcal C\). The elements of this set are called the negative instances of the concept \(\mathcal C\).

  4. 4.

    \(^\mathcal CSt^?\) is a (possibly empty) subset of \(^\mathcal C St\), consisting of all states that have been considered problematic with respect to \(\mathcal C\). The elements of this set are called the indeterminate instances of the concept \(\mathcal C\).

  5. 5.

    The three sets \(^\mathcal CSt^+\), \(^\mathcal CSt^-\), \(^\mathcal CSt^?\) are pairwise disjoint. Furthermore, \(^\mathcal CSt^+ \, \cup \, ^\mathcal CSt^-\, \cup \, ^\mathcal CSt^? \, = \,\, ^\mathcal CSt.\)

We indicate by \(n,\, n^+, \, n^-, \, n^?\) the cardinal numbers of the sets \(^\mathcal CSt\), \(^\mathcal CSt^+\), \(^\mathcal CSt^-\), \(^\mathcal CSt^?\), respectively.

Particular examples of classical datasets are the binary classical datasets, where the set \(^\mathcal CSt^?\) of the indeterminate instances is empty. The elements of the set \(^\mathcal CSt^+ \cup \,\, ^\mathcal CSt^-\) will be called the determinate instances of the dataset \(^\mathcal CCDS\).

Suppose that at a later time (\(t_1\)) Alice “meets” a new object described by the object-state \( \overrightarrow{\textbf{y}}\). She shall find a rule that allows her to answer the question “does \( \overrightarrow{\textbf{y}}\) verify the concept \(\mathcal C\)?”. And this answer shall refer to her previous knowledge that is represented by the classical \(3\mathcal C\)-dataset

$$^\mathcal CCDS = \,\, (\mathbb R^d,\,\,^\mathcal C St, \,\, ^\mathcal C St^+, \,\, ^\mathcal C St^-, \,\, ^\mathcal C St^? \,\, ). $$

A winning strategy is based on the use of two special concepts: the (classical) positive centroid and the (classical) negative centroid of a given classical \(3\mathcal C\)-dataset.

Definition 1.2

Classical centroids

Consider a classical \(3\mathcal C\)-dataset

$$^\mathcal CCDS = \,\, (\mathbb R^d,\,\, ^\mathcal C St, \,\, ^\mathcal C St^+, \,\, ^\mathcal C St^-, \,\, ^\mathcal C St^? \,\, ). $$
  1. (1)

    The positive centroid of \(^\mathcal CCDS\) is the following vector of the space \(\mathbb R^d\):

    $$\begin{aligned} \overrightarrow{\textbf{x}} ^+ = \sum _i\left\{ \frac{1}{n^+} \overrightarrow{\textbf{x}_i}: \overrightarrow{\textbf{x}_i} \in \,\,^\mathcal C St^+\right\} . \end{aligned}$$
  2. (2)

    The negative centroid of \(^\mathcal CCDS\) is the following vector of the space \(\mathbb R^d\):

    $$\begin{aligned} \overrightarrow{\textbf{x}} ^-= \,\, \sum _i\left\{ \frac{1}{n^-} \overrightarrow{\textbf{x}_i}: \, \overrightarrow{\textbf{x}_i} \in \,\,^\mathcal C St^-\right\} . \end{aligned}$$

From an intuitive point of view the positive (negative) centroid of \(^\mathcal CCDS\) can be regarded as the description of an imaginary object, whose state is determined by calculating the average-value of each feature for all positive (negative) instances of \(^\mathcal CCDS\).

In order to face the classification-problem we will now introduce a special class of similarity-relations that allow us to compare any object-state living in the space \(\mathbb R^d\) (of a given dataset \(^\mathcal CCDS\)) with the positive and with the negative centroid of \(^\mathcal CCDS\). These particular similarity-relations can be defined in terms of a function that will be called classical fidelity.

Let us first recall the definition of Euclidean distance.

Definition 1.3

Euclidean distance

The Euclidean distance on a space \(\mathbb R^d\) is the binary function that associates to any pair of vectors \( \overrightarrow{\textbf{x}}\) and \( \overrightarrow{\textbf{y}}\) of the space the following real number:

$$ d(\overrightarrow{\textbf{x}}, \overrightarrow{\textbf{y}}) = \Vert \overrightarrow{\textbf{x}} - \overrightarrow{\textbf{y}}\Vert $$

(where \( \Vert \overrightarrow{\textbf{x}} - \overrightarrow{\textbf{y}}\Vert \) is the length of \( \overrightarrow{\textbf{x}} - \overrightarrow{\textbf{y}}\)).

The Euclidean distance can be transformed, in a canonical way, into a new binary function, whose values are real numbers in the interval [0, 1]. We will call this function the classical fidelity.

Definition 1.4

Classical fidelity

The classical fidelity on a space \(\mathbb R^d\) is the binary function that associates to any pair of vectors \( \overrightarrow{\textbf{x}}\) and \( \overrightarrow{\textbf{y}}\) of the space the following real number:

$$ CF(\overrightarrow{\textbf{x}}, \overrightarrow{\textbf{y}}) = \dfrac{1}{1+d(\overrightarrow{\textbf{x}}, \overrightarrow{\textbf{y}})}. $$

From an intuitive point if view, the number \( CF(\overrightarrow{\textbf{x}}, \overrightarrow{\textbf{y}})\) can be interpreted as a measure of the degree of closeness between the vectors \(\overrightarrow{\textbf{x}}\) and \(\overrightarrow{\textbf{y}}\).

Apparently, the classical fidelity-values decrease with the increasing of the Euclidean distance. Furthermore, for any vectors \( \overrightarrow{\textbf{x}}\) and \(\overrightarrow{\textbf{y}}\) we have:

  • \( CF(\overrightarrow{\textbf{x}}, \overrightarrow{\textbf{y}}) = 1 \,\,\, \text {iff} \,\,\, \overrightarrow{\textbf{x}} = \overrightarrow{\textbf{y}}.\)

  • \( CF(\overrightarrow{\textbf{x}}, \overrightarrow{\textbf{y}}) \ne 0.\)

By using the concept of classical fidelity we can now define in any space \(\mathbb R^d\) the relation of r-similarity (where r is any real number in the interval [0, 1]).

Definition 1.5

r-similarity

Let \(\overrightarrow{\textbf{x}}\) and \(\overrightarrow{\textbf{y}}\) be object-states of a space \(\textbf{R}^d\) and let \(r \in [0,1]\).

The state \(\overrightarrow{\textbf{x}}\) is called r-similar to the state \(\overrightarrow{\textbf{y}}\) (briefly, \(\overrightarrow{\textbf{x}} \not \perp _r \overrightarrow{\textbf{y}}\) ) iff \(r \le CF(\overrightarrow{\textbf{x}}, \overrightarrow{\textbf{y}})\).

One can easily check that this relation is reflexive, symmetric and generally non-transitive. Since \(CF(\overrightarrow{\textbf{x}}, \overrightarrow{\textbf{y}}) \ne 0\), there are infinitely many r such that \(\overrightarrow{\textbf{x}} \not \perp _r \overrightarrow{\textbf{y}}\) . As we will see, this fact does not represent a shortcoming for our aims.

Given a dataset \(^\mathcal CCDS\), it is useful to refer to a threshold-value

$$r^* \in (\frac{1}{2}, 1]$$

that is considered relevant for the dataset in question.

When \(\overrightarrow{\textbf{x}} \not \perp _{r^*} \overrightarrow{\textbf{y}}\), we have:

$$ CF(\overrightarrow{\textbf{x}}, \overrightarrow{\textbf{y}}) \ge r^*.$$

Thus, the degree of closeness between \(\overrightarrow{\textbf{x}}\) and \(\overrightarrow{\textbf{y}}\) can be considered “sufficiently high”. In other words, \(\overrightarrow{\textbf{x}}\) and \(\overrightarrow{\textbf{y}}\) are “sufficiently similar”.

Now Alice has at her disposal the mathematical tools that allow her to face the classification problem. Suppose that Alice’s information about a concept \(\mathcal C\) is the classical \(3\mathcal C\)-dataset

$$^\mathcal CCDS = \,\, (\mathbb R^d,\,\, ^\mathcal C St, \,\, ^\mathcal C St^+, \,\, ^\mathcal C St^-, \,\, ^\mathcal C St^? \,\, ),$$

whose positive and negative centroids are the object-states \(\overrightarrow{\textbf{x}}^+\) and \(\overrightarrow{\textbf{x}}^-\), respectively. Let \(r^*\) be a threshold-value (in the interval \((\frac{1}{2}, 1]\)), which is considered relevant for \(^\mathcal CCDS\). The main goal is defining a classifier function, that assigns to every state \(\overrightarrow{\textbf{y}}\) (which describes an object that Alice may meet) either the value \(+\) (corresponding to the answer “YES!”) or the value − (corresponding to the answer “NO”!) or the value ? (corresponding to the answer “PERHAPS!”).

Definition 1.6

The three-valued classical fidelity-classifier

The three-valued classical fidelity-classifier (briefly \(3-CFC\)), determined by the classical dataset \(^\mathcal CCDS \) and by the threshold-value \(r^*\), is the function \( \mathcal{C}\mathcal{l}_{[^\mathcal CCDS, r^*] }\) that satisfies the following condition for any object-state \(\overrightarrow{\textbf{y}}\) of the space \(\mathbb R^d\):

$$\begin{aligned} \mathcal{C}\mathcal{l}_{[^\mathcal CCDS,r^* ]}(\overrightarrow{\textbf{y}}) = {\left\{ \begin{array}{ll} +, \,\, \text {if} \,\,\, \overrightarrow{\textbf{y}} \not \perp _{r^*} \overrightarrow{\textbf{x}}^+ \,\, \, \text {and not} \,\,\, \overrightarrow{\textbf{y}} \not \perp _{r^*} \overrightarrow{\textbf{x}}^-;\\ \overrightarrow{\textbf{y}} \not \perp _{r^*} \overrightarrow{\textbf{x}}^- \,\, \, \text {and not} \,\,\, \overrightarrow{\textbf{y}} \not \perp _{r^*} \overrightarrow{\textbf{y}}^+;\\ ?, \,\, \text {otherwise.} \end{array}\right. } \end{aligned}$$

In other words, an object-state \(\overrightarrow{\textbf{y}}\) is classified

  • as a positive instance, when it is “sufficiently similar” to the positive centroid and is not “sufficiently similar” to the negative centroid;

  • as a negative instance, when it is “sufficiently similar” to the negative centroid and is not “sufficiently similar” to the positive centroid;

  • as an indeterminate instance, otherwise.

Let us now turn to a quantum-inspired approach to pattern recognition and machine learning, which is based on the following idea: replacing classical object-states with pieces of quantum information (possible states of quantum systems that are storing the information in question). In this way, our mathematical description of objects acquires the peculiar uncertainty and ambiguity that characterize quantum states.Footnote 4

In some situations it may be natural to start with a classical information represented by an object-state \(\overrightarrow{\textbf{x}}.\) Then, the transition to a quantum pure state \(|\psi \rangle \) can be realized by adopting an encoding procedure, that transforms our classical object-state into a quantum pure state:

$$ \overrightarrow{\textbf{x}} \,\,\, \mapsto \,\,\, |\psi \rangle _{\overrightarrow{\textbf{x}} }. $$

Two important examples of a “natural” quantum encoding are the amplitude encoding and the stereographic encoding.

Definition 1.7

Amplitude encoding

Consider a classical object-state \( \overrightarrow{\textbf{x}} = \, (x_1, \ldots , x_d) \in \mathbb R^d. \) The quantum-amplitude encoding of \(\overrightarrow{\textbf{x}}\) is the following unit vector that lives in the space \(\mathbb R^{(d+1)}\):

$$AmpEnc(\overrightarrow{\textbf{x}}) \,= \, \frac{(x_1, \ldots , x_d,1)}{\Vert (x_1, \ldots , x_d,1)\Vert }. $$

Definition 1.8

Stereographic encoding

Consider a classical object-state \(\overrightarrow{\textbf{x}} = \, (x_1, \ldots , x_d) \in \mathbb R^d\). The quantum stereographic encoding of \(\overrightarrow{\textbf{x}}\) is the following unit vector that lives in the space \(\mathbb R^{(d+1)}\):

$$StEnc(\overrightarrow{\textbf{x}})\,= \,\, \frac{1}{\sum _{i=1}^d (x_i)^2 +1}\left( 2x_1,\ldots , 2x_d, \sum _{i=1}^d (x_i)^2 - 1\right) .$$

Notice that, in both cases, the quantum encoding of a classical object state \(\overrightarrow{\textbf{x}} \) is a quantum pure state that preserves all classical features described by \(\overrightarrow{\textbf{x}}\).

Of course, one could also directly “reason” in a quantum-theoretic framewok, avoiding any reference to a (previously known) classical object-state \(\overrightarrow{\textbf{x}}\). In such a case, one can assume, right from the outset, that an object-state is represented by a quantum pure state \(|\psi \rangle \) living in a given (finite-dimensional) Hilbert space \(\mathcal H\). Such a state can be regarded as a probabilistic answer to a sequence of quantum questions:

$$(Q_1, \ldots , Q_q), $$

mathematically represented as projection operators of the space \(\mathcal H\). The state \(|\psi \rangle \) will assign to each question \(Q_i\) a probability-value according to the Born-rule:

$$ Prob_{|\psi \rangle }(Q_i) = \text {tr}(P_{|\psi \rangle }Q_i) \in [0,1]$$

(where \(\text {tr}\) is the trace functional and \(P_{|\psi \rangle }\) is the projection operator corresponding to the unit vector \(|\psi \rangle \)).

Now, by recalling the concept of classical three-valued \(\mathcal C\)-dataset, the concept of quantum three-valued \(\mathcal C\)-dataset can be defined in a natural way.

Definition 1.9

Quantum three-valued \(\mathcal C\)-dataset

A quantum three-valued \(\mathcal C\)-dataset (briefly, quantum \(3\mathcal C\)-dataset) is a sequence

$$^\mathcal CQDS = \,\, (^\mathcal C\mathcal H,\,\, ^\mathcal C St_q, \,\, ^\mathcal C St_ q^+, \,\, ^\mathcal C St_q^-, \,\, ^\mathcal C St_q^? \,\, ), $$

where:

  1. 1.

    \(^\mathcal C\mathcal H\) is a finite-dimensional Hilbert space associated to \(\mathcal C\).

  2. 2.

    \(^\mathcal C St_q\) is a finite set of pure states \(|\psi \rangle \) of \(^\mathcal C\mathcal H\) for which the question “does the object described by \(|\psi \rangle \) verify the concept \(\mathcal C\)?” can be reasonably asked.

  3. 3.

    \(^\mathcal CSt_q^+\) is a subset of \(^\mathcal C St\), consisting of all states that have been positively classified with respect to the concept \(\mathcal C\). The elements of this set are called the positive instances of the concept \(\mathcal C\).

  4. 4.

    \(^\mathcal CSt_q^-\) is a subset of \(^\mathcal C St\), consisting of the negative instances of the concept \(\mathcal C\).

  5. 5.

    \(^\mathcal CSt_q^?\) is a (possibly empty) subset of \(^\mathcal C St\), consisting of the indeterminate instances of the concept \(\mathcal C\).

  6. 6.

    The three sets \(^\mathcal CSt_q^+\), \(^\mathcal CSt_q^-\), \(^\mathcal CSt_q^?\) are pairwise disjoint. Furthermore, \(^\mathcal CSt_q^+ \, \cup \, ^\mathcal CSt_q^-\, \cup \, ^\mathcal CSt_q^? \, = \,\, ^\mathcal CSt_q.\)

As we have done in the classical case, we indicate by \(n,\, n^+, \, n^-, \, n^?\) the cardinal numbers of the sets \(^\mathcal CSt_q\), \(^\mathcal CSt_q^+\), \(^\mathcal CSt_q^-\), \(^\mathcal CSt_q^?\), respectively.

The concepts of quantum positive and quantum negative centroid (of a given quantum dataset) can be now defined mutatis mutandis with respect to the classical case.

Definition 1.10

Quantum centroids

Consider a quantum \(\mathcal C\)-dataset

$$^\mathcal CQDS = \,\, (^\mathcal C\mathcal H,\,\, ^\mathcal C St_q, \,\, ^\mathcal C St_q^+, \,\, ^\mathcal C St_q^-, \,\, ^\mathcal C St_q^? \,\, ). $$
  1. (1)

    The quantum positive centroid of \(^\mathcal CQ DS\) is the following density operator of the space \(^\mathcal C\mathcal H\):

    $$\begin{aligned}\rho ^+ = \,\, \sum _i\left\{ \frac{1}{n^+}P_{|\psi _i\rangle }: \, |\psi _i\rangle \in \,\,^\mathcal C St^+\right\} .\end{aligned}$$
  2. (2)

    The quantum negative centroid of \(^\mathcal CQDS\) is the following density operator of the space \(^\mathcal C\mathcal H\):

    $$\begin{aligned} \rho ^- = \,\, \sum _i\left\{ \frac{1}{n^-}P_{|\psi _i\rangle }: \, |\psi _i\rangle \in \,\,^\mathcal C St^-\right\} .\end{aligned}$$

Notice that quantum centroids are density operators that do not generally correspond to pure states. Furthermore, a quantum centroid cannot be generally represented as the quantum encoding of a corresponding classical centroid.

The concept of quantum positive centroid seems to represent a “good” mathematical simulation for the intuitive idea of Gestalt. In fact, both the quantum positive centroid and the intuitive idea of Gestalt describe an imaginary object, representing a vague, ambiguous idea that Alice has obtained as an abstraction from the “real” examples she has met in her previous experience. As happens in the case of the intuitive idea of Gestalt, a quantum positive centroid, represented by the density operator \(\rho ^+ = \,\, \sum _i\left\{ \frac{1}{n^+}P_{|\psi _i\rangle }: \, |\psi _i\rangle \in \,\,^\mathcal C St^+\right\} \) ambiguously alludes to the concrete positive instances that Alice had previously met.

We know that human recognitions and classifications are usually performed by means of a quick and mostly unconscious comparison between the main features of some new objects we have met and a gestaltic pattern that we had previously constructed in our mind. And any comparison generally involves the use of some similarity-relations that are mostly grasped in a vague and intuitive way by human intelligences. This typical human procedure can be formally represented in the framework of our quantum-inspired approach to pattern recognition.

As we have done in the classical case, we will first introduce a class of similarity-relations, that can be defined in terms of a quantum concept of fidelity.

Definition 1.11

The (quantum) fidelity for pure states.

The (quantum) fidelity on a Hilbert space \(\mathcal H\) is defined as the function F that assigns to any pair \(|\psi \rangle \) and \(|\varphi \rangle \) of pure states of \(\mathcal H\) the real number

$$F(|\psi \rangle ,|\varphi \rangle ) = \arrowvert \langle \psi \mid \varphi \rangle \arrowvert ^2 $$

(where \(\langle \psi \mid \varphi \rangle \) is the inner product of \(|\psi \rangle \) and \(|\varphi \rangle \)).

The definition of fidelity can be generalized to the case of density operators, which may represent either pure or mixed states.

Definition 1.12

Fidelity for density operators

Consider a Hilbert space \(\mathcal H\). The fidelity on \(\mathcal H\) is the function F that assigns to any pair \(\rho \) and \(\sigma \) of density operators of \(\mathcal H\) the real number

$$F(\rho , \sigma ) := \text {tr}\left( \sqrt{\sqrt{\rho } \, \sigma \sqrt{\rho }}\right) ^2. $$

This definition represents a “good” generalization of the concept of fidelity for pure states. For, one can show that:

$$F(P_{|\psi \rangle }, P_{|\varphi \rangle }) = \arrowvert \langle \psi \mid \varphi \rangle \arrowvert ^2. $$

It is interesting to recall the main properties of the fidelity-function, which play an important role in many applications:

  1. 1.

    \(F(\rho , \sigma ) \in [0,1].\)

  2. 2.

    \(F(\rho ,\sigma ) = F(\sigma ,\rho ) .\)

  3. 3.

    \(F(\rho ,\sigma ) = 0\) iff \(\rho \sigma \) is the null operator.

  4. 4.

    \(F(\rho ,\sigma ) = 1\) iff \(\rho = \sigma \).Footnote 5

From a physical point of view, the fidelity-function can be regarded as a form of symmetric conditional probability: \(F(\rho , \sigma ) \) represents the probability that a quantum system in state \(\rho \) can be transformed into a system in state \(\sigma \), and vice versa.

As happens in the classical case, the quantum concept of fidelity allows us to define in any Hilbert space \(\mathcal H\) a class of similarity-relations, called r-similarities, where r is any real number in the interval [0, 1].

Definition 1.13

Quantum r-similarity

Let \(\rho \) and \(\sigma \) be two density operators of a Hilbert space \(\mathcal H\) and let \(r \in [0,1]\).

The state \(\rho \) is called r-similar to the state \(\sigma \) (briefly, \(\rho \not \perp _r \sigma \)) iff \(r \le F(\rho , \sigma )\).

Now Alice has at her disposal the mathematical tools that allow her to face the classification problem in the quantum case. Suppose that Alice’s information about a concept \(\mathcal C\) is the quantum \(\mathcal C\)-dataset

$$^\mathcal CQDS = \,\, (^\mathcal C\mathcal H,\,\, ^\mathcal C St_q, \,\, ^\mathcal C St_q^+, \,\, ^\mathcal C St_q^-, \,\, ^\mathcal C St_q^? \,\, ),$$

whose positive and negative centroids are the states \(\rho ^+\) and \(\rho ^-\), respectively. Let \(r^*\) be a threshold value in the interval \((\frac{1}{2}, 1]\), that is considered relevant for \(^\mathcal CQDS\). Like in the classical case, the main goal is defining a classifier function, that assigns to every state \(\sigma \) (which describes an object that Alice may meet) either the value \(+\) (corresponding to the answer “YES!”) or the value − (corresponding to the answer “NO”!) or the value ? (corresponding to the answer “PERHAPS!”).

Definition 1.14

The three-valued quantum fidelity-classifier

The three-valued quantum fidelity-classifier (briefly, \(3-QFC\)), determined by the quantum dataset \(^\mathcal CQDS \) and by the threshold-value \(r^*\), is the function \( \mathcal{C}\mathcal{l}_{[^\mathcal CQDS, r^*] }\) that satisfies the following condition for any state \(\sigma \) of the space \(\, ^\mathcal C\mathcal H\):

$$\begin{aligned}\mathcal{C}\mathcal{l}_{[^\mathcal CQDS,r^* ]}(\sigma ) = {\left\{ \begin{array}{ll} +, \,\, \text {if} \,\,\, \sigma \not \perp _{r^*} \rho ^+ \,\, \, \text {and not} \,\,\, \sigma \not \perp _{r^*} \rho ^-;\\ -, \,\, \text {if} \,\,\, \sigma \not \perp _{r^*} \rho ^- \,\, \, \text {and not} \,\,\, \sigma \not \perp _{r^*} \rho ^+;\\ ?, \,\, \text {otherwise.} \end{array}\right. }\end{aligned}$$

In other words, an object-state \(\sigma \) is classified

  • as a positive instance, when it is “sufficiently similar” to the positive centroid and is not “sufficiently similar” to the negative centroid;

  • as a negative instance, when it is “sufficiently similar” to the negative centroid and is not “sufficiently similar” to the positive centroid;

  • as an indeterminate instance, otherwise.

Although recognition procedures are different for human and for artificial intelligences, there is a common method of “facing the problems” that seems to work in both cases. Using quantum-theoretic concepts represents a great advantage in order to investigate the relationships between the behaviors of human and of artificial intelligences. The intuitive concept of Gestalt can hardly been simulated in a classical framework; for, the characteristic ambiguity of a quantum positive centroid is not shared by the corresponding notion of classical positive centroid. As we have seen, in the classical case a positive centroid represents an exact object-state, that is obtained by calculating the average-values of the values that all positive instances assign to the observables under consideration. Thus, unlike the quantum case, a classical positive centroid describes an imaginary object that is characterized by precise features.

2 An Empirical Simulation

We will now illustrate a simple empirical simulation, based on the fidelity-classifier, both in the classical and in the quantum case. We suppose that Alice is interested in a concept \(\mathcal C\) that describes a kind of flower (say, the rose). At the initial time (\(t_0\)) she has a classical three-valued information concerning a given set of instances of different flowers. Every flower is supposed to be characterized by two features that concern the petal length and the petal width, respectively. Thus, the classical object-state that describes a particular flower in the set of instances under consideration will be a vector \( \textbf{x} = (x_i, x_j)\) of the real space \(\mathbb R^2.\)

We suppose that the numbers occurring in our empirical simulation are the following:

\(n^+\) (the number of the positive instances) = 352;

\(n^-\) (the number of the negative instances) = 359;

\(n^?\) (the number of the indeterminate instances) = 339.

Hence, \(n = n^+ + n^-+ n^? = 1050.\)

Accordingly, Alice’ s classical information at the initial time is illustrated by Fig. 1, where the red points correspond to the positive instances, while the blue points and the green points correspond to the negative instances and to the indeterminate instances, respectively.

Fig. 1
figure 1

Alice’s classical information at the initial time

This information can be represented as a particular three-valued classical dataset, having the following form:

$$^\mathcal CCDS = \,\, (\mathbb R^2,\,\,^\mathcal C St, \,\, ^\mathcal C St^+, \,\, ^\mathcal C St^-, \,\, ^\mathcal C St^? \,\, ). $$

An interesting parameter is represented by the indeterminacy rate of the dataset \(^\mathcal CCDS\), which is defined as follows:

$$IR(^\mathcal CCDS):=\,\,\frac{n^?}{n}.$$

From an intuitive point of view, the number \(IR(^\mathcal CCDS)\) measures the degree of uncertainty of Alice’s information. In the case of our example we have:

$$ IR(^\mathcal CCDS) = 0.323.$$

An important question arises: can Alice control the reliabilty of her fidelity-classifier \( \mathcal{C}\mathcal{l}_{[^\mathcal CCDS, r^*] }\), which is based on the dataset \(^\mathcal CCDS\) and on the choice of a threshold value \(r^*\)?

In order to answer this question Alice can apply the standard supervised procedure. First of all she randomly splits the set \(^\mathcal C St\) of all instances of her dataset \(^\mathcal CCDS\) into two proper subsets:

  • the training set \(^\mathcal C St_{Train}\);

  • the test set \(^\mathcal C St_{Test}\).

We assume that the training set \(^\mathcal C St_{Train}\) represents the \(80\%\) of the original set \(^\mathcal C St\), while the test set \(^\mathcal C St_{Test}\) corresponds to the \(20\%\) . This gives rise to two new (“smaller”) datasets, called the training dataset and the test dataset, that will be indicated as follows:

  • \(^\mathcal CCDS_{Train} = \,\, (\mathbb R^2,\,\, ^\mathcal C St_{Train}, \,\, ^\mathcal C St^+_{Train}, \,\, ^\mathcal C St^-_{Train}, \,\, ^\mathcal C St^?_{Train} \,\, ). \)

  • \(^\mathcal CCDS_{Test} = \,\, (\mathbb R^2,\,\, ^\mathcal C St_{Test}, \,\, ^\mathcal C St^+_{Test}, \,\, ^\mathcal C St^-_{Test}, \,\, ^\mathcal C St^?_{Test} \,\, ). \)

As expected, the training dataset \(^\mathcal CCS_{Train}\) will have its own positive and negative centroids, called the training positive centroid (\(\overrightarrow{\textbf{x}}^+_{Train}\)) and the training negative centroid (\(\overrightarrow{\textbf{x}}^-_{Train}\)), respectively. Of course, generally, the two training centroids will be different from the centroids of the original dataset \(^\mathcal CCDS\).

At a later time \(t_1\), Alice applies the fidelity-classifier \( \mathcal{C}\mathcal{l}_{[^\mathcal CCDS_{Train}, r^*] }\) (based on the training dataset and on the threshold value \(r^*\)) to all determinate elements of the test dataset (i.e. to all instances that are either positive or negative). As a result, every input \(\overrightarrow{\textbf{y}} \in \,\, ^\mathcal C St^+_{Test} \cup \,\, ^\mathcal C St^-_{Test}\) will be classified either as a positive or as a negative or as an indeterminate instance.

By referring to this classification (performed by \( \mathcal{C}\mathcal{l}_{[^\mathcal CCDS_{Train}, r^*] }\)), we introduce the following terminology. We say that that an instance \(\overrightarrow{\textbf{y}}\) of the test dataset \(^\mathcal CCDS_{Test}\) represents

  • a true positive instance iff \(\overrightarrow{\textbf{y}}\) is a positive instance in the test dataset and has been classified as a positive instance (by the classifier \( \mathcal{C}\mathcal{l}_{[^\mathcal CCDS_{Train}, r^*] }\));

  • a false positive instance iff \(\overrightarrow{\textbf{y}}\) is a negative instance in the test dataset and has been classified as a positive instance;

  • a true negative instance iff \(\overrightarrow{\textbf{y}}\) is a negative instance in the test dataset and has been classified as a negative instance;

  • a false negative instance iff \(\overrightarrow{\textbf{y}}\) is a positive instance in the test dataset and has been classified as a negative instance;

  • a false indeterminate instance iff \(\overrightarrow{\textbf{y}}\) is a determinate instance in the test dataset and has been classified as an indeterminate instance.

Consider now the following five numbers (which can be easily calculated in the case of our empirical simulation):

  1. (1)

    the number TP of the true positive instances;

  2. (2)

    the number TN of the true negative instances;

  3. (3)

    the number FP of the false positive instances;

  4. (4)

    the number FN of the false negative instances;

  5. (5)

    the number FI of the false indeterminate instances.

On this basis, the accuracy of Alice’s classifier \( \mathcal{C}\mathcal{l}_{[^\mathcal CCDS, r^*] }\) can be defined as a function of these numbers:

$$ Acc(\mathcal{C}\mathcal{l}_{[^\mathcal CCDS, r^*] }) := \,\, \frac{TP + TN}{TP+ TN + FP + FN+FI}. $$

In some situations it may be interesting to distinguish the accuracy of a given classifier from the balanced accuracy, that also depends on the cardinal number \(n^+\) of the set of the positive instances and on the cardinal number \(n^-\) of the set of the negative instances of the dataset under consideration. The balanced accuracy of the classifier \(\mathcal Cl_{[^\mathcal CCDS, r^*]}\) is defined as follows:

$$ BalAcc(\mathcal Cl_{[^\mathcal CCDS, r^*] }) :=\, \frac{1}{2}\left( \frac{TP}{n^+}\, + \, \frac{TN}{n^-}\right) .$$

Another significant parameter is the indeterminacy rate of the classifier \( \mathcal{C}\mathcal{l}_{[^\mathcal CCDS_{Train}, r^*] }\) that is defined in the following way:

$$ IR(\mathcal{C}\mathcal{l}_{[^\mathcal CCDS_{Train}, r^*] }) :=\frac{FI}{n_{Test}},$$

where \(n_{Test}\) is the cardinality of the set of all instances of the test-dataset. In the case of our example we obtain:

$$IR(\mathcal{C}\mathcal{l}_{[^\mathcal CCDS_{Train}, r^*] }) = 0.502. $$

Notice that \( IR(\mathcal{C}\mathcal{l}_{[^\mathcal CCDS_{Train}, r^*] })\) is a parameter that regards the classification-process, while \(IR(^\mathcal CCDS)\) represents an intrinsic property of the original dataset \(^\mathcal CCDS\).

By definition, both the accuracy and the balanced accuracy of the classifier \( \mathcal{C}\mathcal{l}_{[^\mathcal CCDS, r^*] }\) depend on the choice of the threshold-value \(r^*\). Figure 2 shows how the accuracy-values vary with the possible values of \(r^*\). The accuracy reaches its maximum value (0.537) when \( r^*= 0.501\). Also the maximum value of the balanced accuracy (0.554) is reached when \( r^*= 0.501.\)

Fig. 2
figure 2

How the accuracy-values vary with the threshold-values in the classical case

After having acted as a classical epistemic agent, at time \(t_2\), Alice decides to transform her initial classical information into a quantum \(\mathcal C\)-data set (via a stereographic encoding). In this way, every classical object-state

$$ \textbf{x} = (x_i, x_j) \in \mathbb R^2$$

is transformed into a pure state \(|\psi \rangle _{\textbf{x}}\) of the Hilbert space \(\mathbb R^3\). The result is a three-valued quantum \(\mathcal C\)-dataset

$$^\mathcal CQDS = \,\, (^\mathcal C\mathcal H,\,\, ^\mathcal C St_q, \,\, ^\mathcal C St_q^+, \,\, ^\mathcal C St_q^-, \,\, ^\mathcal C St_q^? \,\, ), $$

where:

  • \(^\mathcal C\mathcal H\) is the Hilbert space \(\mathbb R^3\);

  • \(n^+ = 352,\,\,\ n^- = 359, \,\,\, n^? = 339. \)

Alice’s quantum information (at time \(t_2\)) is illustrated by Fig. 3, where only the positive instances (represented by the red points) and the negative instances (represented by the blue points) have been considered.

Fig. 3
figure 3

Alice’s quantum information at time \(t_2\)

Our empirical simulation clearly shows why the concept of quantum positive centroid represents a “good” simulation of the intuitive concept of Gestalt. Both \(Alice_H\) and \(Alice_M\) have met \(n^+ (=352)\) particular flowers that have been recognized as roses. On this basis, \(Alice_H\) has constructed in her mind a rose-Gestalt: a kind of out of focus image that preserves an ambiguous memory of some general, vague features of the concrete flowers she had previously seen. This characteristic human procedure can be emulated by \(Alice_M\), by referring to the mathematical concept of quantum positive centroid (of the quantum \(\mathcal C\)-dataset \(^\mathcal CQDS\)), where the mixed state

$$\rho ^+ = \,\, \sum _i\left\{ \frac{1}{352}P_{|\psi _i\rangle }: \, |\psi _i\rangle \in \,\,^\mathcal C St^+\right\} $$

represents a vague information that ambiguously alludes to all instances that \(Alice_M\) had previously recognized as roses.

Alice can now proceed like in the classical case in order to control the reliabilty of her quantum fidelity-classifier \( \mathcal{C}\mathcal{l}_{[^\mathcal CQDS, r^*] }\), based on the quantum dataset \(^\mathcal CQDS\) and on the choice of a threshold value \(r^*\). The quantum concepts of accuracy and of balanced accuracy are defined in the expected way. Figure 4 shows how the accuracy-values vary with the possible values of \(r^*\) in the quantum case. Both accuracies reach their maximum value when \( r^*= 0.67\). The accuracy’s maximum value is 0.798, while the balanced accuracy’s maximum value is 0.803.

Fig. 4
figure 4

How the accuracy-values vary with the threshold-values in the quantum case

Fig. 5
figure 5

The supremacy of quantum classifiers

On this basis we can compare the accuracies of our quantum classifiers with the accuracies of the corresponding classical classifiers. Figure 5 clearly shows the supremacy of quantum classifiers.

As happens in the classical case, we can also define the indeterminacy rate of the quantum classifier \( \mathcal{C}\mathcal{l}_{[^\mathcal CQDS_{Train}, r^*] }\). One can show that:

$$ IR(\mathcal{C}\mathcal{l}_{[^\mathcal CQDS_{Train}, r^*] }) = 0.138.$$

Thus, in the quantum case the degree of uncertainty of Alice’s classification has decreased with respect to the classical case (where \(IR(\mathcal{C}\mathcal{l}_{[^\mathcal CCDS_{Train}, r^*] }) = 0.502 \)).

From a logical point of view, the main ideas of a quantum approach to pattern recognition can be naturally reconstructed in the framework of a special version of first-order quantum computational semantics.Footnote 6 But this is a longer story that will be told in another paper.