1 Introduction

It is well-known that quantum mechanics describes statistical properties of microscopic phenomena for which classical probability theory seems to be inapplicable. In this paper, we focus on violations of classical probability laws that can be found in quantum theory [4042]. Our assertion is that the mathematical formalism of quantum theory has some features of non-Kolmogorovian probability theory and it can be used for analysis of statistical properties of some phenomena outside of quantum physics. Actually, authors have pointed out that the violations of the total probability law can be found in experimental data in biology [10], cognitive science [4, 5, 7, 9, 1113, 1517, 20, 2731], and decision-makings in games [7, 8].

First of all, we briefly illustrate the violation of a classical probability law, namely, the total probability law, in the famous double-slit experiment. The probability that a photon is detected at the position x on a photo-sensitive plate is represented as

$$\begin{aligned} P(x) =&\biggl \vert \frac{1}{\sqrt{2}}\psi _{1}(x)+\frac{1}{\sqrt{2}} \psi _{2}(x)\biggr \vert ^{2} \\=&\frac{1}{2}\bigl \vert \psi _{1}(x)\bigr \vert ^{2}+\frac{1}{2}\bigl \vert \psi _{2}(x)\bigr \vert ^{2}+\bigl \vert \psi _{1}(x)\bigr \vert \bigl \vert \psi _{2}(x)\bigr \vert \cos \theta, \end{aligned}$$

where ψ 1 and ψ 2 are two wave functions, whose absolute values |ψ k (x)|2 give the distributions of photons which pass through the slits numerated as k=1,2. The term of |ψ 1(x)||ψ 2(x)|cosθ describes the interference effect due to superposition of two wave functions. Let us denote |ψ k (x)|2 by P(x|k). Then the above equation is rewritten as

$$ P(x)=P(x|1)P(1)+P(x|2)P(2)+2\sqrt{P(x|1)P(1)P(x|2)P(2)}\cos \theta $$
(1)

where P(1)=P(2)=1/2. However, the usual total probability law has the form:

$$ P(x)=P(x|1)P(1)+P(x|2)P(2) . $$
(2)

Thus it is violated. The interference term

$$2\sqrt{P(x|1)P(1)P(x|2)P(2)}\cos \theta $$

describes the degree of violation.

For this example, we can propose the following interpretation. Let us consider context “both slits are open”, and denote it by S 1∪2. This context is not a simple (Boolean) sum of two contexts S i , i=1,2, “only ith slit is open”:

$$S_{1 \cup 2}\neq S_1 \cup S_2. $$

In the approach of hidden variables theory, one would try to find a proper common probability space describing all contexts, S 1∪2 and S 1S 2. However, this is difficult.Footnote 1 In quantum mechanics, different contexts are distinguished as different states, like, e.g.,

(3)

The probabilities denoted by P(x) in Eqs. (1) and (2) are given by \(\langle x \vert \rho_{S_{1\cup 2}}\vert x \rangle \) and \(\langle x \vert \rho_{S_{1} \cup S_{2}}\vert x \rangle \), respectively. We denote these probabilities by \(P_{S_{1\cup 2}}(x)\) and \(P_{S_{1} \cup S_{2}}(x)\) and rewrite Eq. (1) as

$$P_{S_{1\cup 2}}(x)=P_{S_1 \cup S_2}(x)+2\sqrt{P(x|1)P(1)P(x|2)P(2)}\cos \theta. $$

The violation of the total probability law comes from the difference between the probabilistic structures of two contexts, S 1∪2 and S 1S 2, or more precisely, the two states \(\rho_{S_{1\cup 2}}\) and \(\rho_{S_{1} \cup S_{2}}\).

In this way we can discuss the violation of the total probability law for phenomena outside of quantum physics. Let us consider a simple and intuitive example. Consider the following experiments in the domain of cognitive psychology

We give chocolate to the subjects, and ask them whether it is sweet (C=1) or not (C=2). Then we obtain statistical data determining the probabilities P(C=1) and P(C=2). In another experiment we first give sugar to the (other) subjects before giving chocolate to them. Then we can obtain the probabilities P(S=1), P(S=2), P(C=1|S=1) and P(C=1|S=2). The probability that chocolate is sweet is estimated as

$$ P(C=1|S=1)P(S=1)+P(C=1|S=2)P(S=2) . $$
(4)

And a naive application of classical probability theory implies that it is equal to P(C=1). However, one can easily see that the value given by Eq. (4) is smaller than P(C=1);

$$ P(C=1)\neq P(C=1|S=1)P(S=1)+P(C=1|S=2)P(S=2). $$

The LHS probability P(C=1) is obtained in context “subjects did not taste sugar before tasting chocolate”. Contexts “a tongue tasted sugar” and “a tongue did not taste sugar” are different. We denote first context by S sug and the latter by S ¬sug . The probabilities of LHS and RHS in the above equation should be replaced by \(P_{S_{\neg sug}}(C=1)\) and \(P_{S_{sug}}(C=1)\). Intuitively, the result of \(P_{S_{\neg sug}}(C=1)\neq P_{S_{sug}}(C=1)\) seems to be now natural. However, to explain this result mathematically, we need a proper probability space which describes the both contexts S ¬sug and S sug . To find such a common probability space, enormous knowledge about the physical and chemical structure of a tongue will be needed, so that it is very difficult to find such space, as in the case of the hidden variables theory. In this paper, another approach is employed. It is based on the mathematical apparatus of quantum information and probability. The states \(\rho_{S_{\neg suger}}\) and \(\rho_{S_{suger}}\) describing two contexts S ¬sug and S sug are constructed. Then the concepts of channel and lifting [3] play an important role.

A channel is a map from states to states. This is a basic mathematical tool of quantum information theory. If a state describes context for a system, a channel operating on the states describes the change of context. The role of lifting is more general than one of channel, see Sect. 2. That is, a lifting sends a state in \(\mathcal{S}(\mathcal{H})\) to a compound state in an expanded space \(\mathcal{S}(\mathcal{H}\otimes \mathcal{K})\). In Sect. 2, we introduce several examples of lifting maps, and in Sect. 3, we point out that the lifting maps are useful to define joint probabilities in two (event) systems. The violation of the total probability law can be mathematically explained by the difference between two states which are provided by lifting maps. The main problem discussed in this paper is how to construct such a lifting map. To do this, we use the ideas of adaptive dynamics (AD) [37]. The basic concept of AD is explained briefly in Sect. 3. The AD-framework presented in this paper generalizes the standard open systems dynamics.Footnote 2

We illustrate usage of the lifting maps to describe mathematically the violation of the total probability law by three examples of cognitive phenomena. The first example is the problem of sweetness, see Sect 5. In Sect. 6, we discuss the system describing metabolism in E. coli. In biology, it is known that E. coli gives preference to metabolism of glucose over one of lactose. Our model evaluates this function of preference in the term of the violation of the total probability law. In Sect. 6, we focus on a problem which has been widely studied in psychology and cognitive science. People frequently make an inference to estimate the probability of certain event. According to experimental analysis of human behavior in cognitive science, there are cases such that human estimation of the probability does not match with classical probability theory. Such inference is called heuristic inference. We assume that a decision-maker using a heuristic inference holds some psychological factor biasing Bayesian inference. The latter is known as a “rational” inference and it is based on classical probability theory. In our approach, a lifting map is used for describing such a psychological effect.

We also remark that application of quantum information theory outside of quantum physics, e.g., for macroscopic biological systems, wakes up again the long debate on a possibility to combine the realistic and quantum descriptions, cf. [6, 22, 23]. At the moment we are not able to present a consistent interpretation for coming applications of quantum information theory outside of quantum physics; we can only keep close to the operational interpretation of quantum information theory, e.g., [18, 19]. In applications of quantum probability outside of physics, the Bayesian approach to quantum probability and information interpretation of the quantum state [14, 21] are the most natural.Footnote 3

2 Lifting Map

In this section we discuss the notion of lifting [3].

Let \(\mathcal{A}\) be C -algebra. The space of states on this algebra is denoted by the symbol \(\mathcal{S}(\mathcal{A})\).

Definition

Let \(\mathcal{A}\), \(\mathcal{B}\) be C -algebras and let \(\mathcal{A} \otimes \mathcal{B}\) be a fixed C -tensor product of \(\mathcal{A}\) and \(\mathcal{B}\). A lifting from \(\mathcal{A}\) to \(\mathcal{A}\otimes \mathcal{B}\) is a weak ∗-continuous map

$$ \mathcal{E}^{\ast}:\mathcal{S}(\mathcal{A})\rightarrow \mathcal{S}( \mathcal{A}\otimes \mathcal{B}) $$

If \(\mathcal{E}^{\ast }\) is affine and its dual is a completely positive map, we call it a linear lifting; if it maps pure states into pure states, we call it pure.

Remark

Let \(\mathcal{A}\), \(\mathcal{B}\) be the sets of all observables in Hilbert spaces \(\mathcal{H}\), \(\mathcal{K}\); \(\mathcal{A}=\mathcal{O}(\mathcal{H})\), \(\mathcal{B}=\mathcal{O}(\mathcal{K})\). Then, \(\mathcal{E}^{\ast }\) is a lifting from \(\mathcal{S} (\mathcal{H} ) \) to \(\mathcal{S} ( \mathcal{H}\otimes \mathcal{K} )\). The definition of lifting includes that of channel: (1) If \(\mathcal{K}\) is \(\mathbb{C}\), then lifting \(\mathcal{E}^{\ast }\) is nothing but a channel from \(\mathcal{S} ( \mathcal{H} ) \) to \(\mathcal{S} ( \mathcal{H} )\). (2) If \(\mathcal{H}\) is \(\mathbb{C}\), then lifting \(\mathcal{E}^{\ast }\) is a channel from \(\mathcal{S} ( \mathcal{H} ) \) to \(\mathcal{S} ( \mathcal{K} )\).

We present some important examples of liftings.

Example 1

Nondemolition lifting: Lifting from \(\mathcal{S} (\mathcal{H} ) \) to \(\mathcal{S} (\mathcal{H}\otimes \mathcal{K} )\) is called nondemolition for a state \(\rho\in \mathcal{S}(\mathcal{H})\) if ρ is invariant for \(\mathcal{E}^{\ast }\) i.e., if for all \(a \in \mathcal{O}(\mathcal{H})\)

$$ \operatorname{tr}\bigl(\mathcal{E}^{\ast }\rho (a\otimes 1)\bigr)=\operatorname{tr}(\rho a). $$

Example 2

Isometric lifting: A transition expectation from \(\mathcal{A}\otimes \mathcal{B}\) to \(\mathcal{A}\) is a completely positive linear map \(\mathcal{E}:\mathcal{A}\otimes \mathcal{B}\rightarrow \mathcal{A}\) satisfying

$$ \mathcal{E}(1_{\mathcal{A}}\otimes 1_{\mathcal{B}})=1_{\mathcal{A}}. $$

Let \(V:\mathcal{H}\rightarrow \mathcal{H}\otimes \mathcal{K}\) be an isometry

$$ V^{\ast }V=1_{\mathcal{H}}. $$

Then the map

$$ \mathcal{E}:x\in \mathbf{B}(\mathcal{H})\otimes \mathbf{B}(\mathcal{K}) \rightarrow V^{\ast }xV\in \mathbf{B}(\mathcal{H}) $$

is a transition expectation, and the associated lifting maps a density matrix ρ in \(\mathcal{H}\) into

$$ \mathcal{E}^{\ast }\rho=V\rho V^{\ast } $$

in \(\mathcal{H}\otimes \mathcal{K}\). Liftings of this type are called isometric. Every isometric lifting is a pure lifting.

Example 3

Compound lifting: Let \(\varLambda ^{\ast }:\mathcal{S}(\mathcal{A}_{1})\rightarrow \mathcal{S}(\mathcal{A}_{2})\) be a channel. For any \(\rho _{1}\in \mathcal{S}(\mathcal{A}_{1})\) in the closed convex hull of the extremal states, fix a decomposition of ρ 1 as a convex combination of extremal states in \(\mathcal{S}(\mathcal{A}_{1})\)

$$ \rho _{1}=\int_{\mathcal{S}(\mathcal{A}_{1})}\omega _{1}d\mu $$

where μ is a Borel measure on \(\mathcal{S}(\mathcal{A}_{1})\) with support in the extremal states, and define

$$ \mathcal{E}^{\ast }\rho _{1}\equiv \int_{\mathcal{S}(\mathcal{A}_{1})} \omega _{1}\otimes \varLambda ^{\ast }\omega _{1}d\mu $$

Then \(\mathcal{E}^{\ast }:\mathcal{S}(\mathcal{A}_{1})\rightarrow \mathcal{S} (\mathcal{A}_{1}\otimes \mathcal{A}_{2})\) is a lifting, nonlinear even if Λ is linear, and it is a nondemolition type.

The most general lifting, mapping \(\mathcal{S}(\mathcal{A}_{1})\) into the closed convex hull of the extremal product states on \(\mathcal{A}_{1}\otimes \mathcal{A}_{2}\) is essentially of this type. This nonlinear nondemolition lifting was first discussed by Ohya to define the compound state and the mutual entropy for quantum information communication [34, 35].

Now we omit the condition that μ is concentrated on the extremal states used in [34]. Therefore once a channel is given, then lifting of the convex product type can be constructed. For example, the von Neumann quantum measurement process is written, in the terminology of lifting, as follows. Having measured a compact observable A=∑ n a n P n (spectral decomposition with ∑ n P n =I) in a state ρ, the state after this measurement will be

$$ \varLambda ^{\ast }\rho =\sum_{n}P_{n} \rho P_{n} $$

and lifting \(\mathcal{E}^{\ast }\) of the convex product type associated to this channel Λ and to a fixed decomposition of ρ as ρ=∑ n μ n ρ n (\(\rho _{n}\in \mathcal{S}(\mathcal{A}_{1})\)) is given by

$$ \mathcal{E}^{\ast }\rho =\sum_{n}\mu _{n}\rho _{n}\otimes \varLambda ^{\ast }\rho _{n}. $$

3 Adaptive Dynamics and New View to Total Probability Law

The idea of the adaptive dynamics (AD) has implicitly appeared in series of papers [2, 3, 25, 26, 32, 34, 3639]. The name of the adaptive dynamics was deliberately used in [37]. The AD has two aspects, one of which is the “observable-adaptive” and another is the “state-adaptive”.

The idea of observable-adaptivity comes from studying chaos. Recognition (measurement) of chaos in a phenomenon depends on the choice of the method of structuring of this phenomenon; for example, which scales of time, distance or domain are used by observer. And even generally measurement depends on the choice of the method of structuring of a phenomenon. For example, consider time dependent dynamics; suppose that one studies its discretization based on a time interval τ and another takes ten times of τ, their results can be different. (See the paper [37] in more details.) Examples of the observable-adaptivity are used to understand chaos [32, 36] and to examine the violation of Bell’s inequality, namely the chameleon dynamics proposed by Accardi [1].

The idea of state-adaptivity is implicitly started in constructing a compound state for quantum communication [2, 3335]. Examples of the state-adaptivity are seen in an algorithm solving NP complete problem [3, 38, 39]. State-adaptivity means that dynamics depends on the state of a system. For instance, in [3], the interaction Hamiltonian used for the computation depends on the state at time t 0 and the state after t>t 0 is changed by this Hamiltonian.

The above concept of AD can be represented mathematically by lifting. Let us introduce lifting from \(\mathcal{S} ( \mathcal{H} ) \) to \(\mathcal{S} ( \mathcal{H}\otimes \mathcal{K} )\) (or lifting from \(\mathcal{S} ( \mathcal{K} ) \) to \(\mathcal{S} ( \mathcal{H}\otimes \mathcal{K} )\)), say \(\mathcal{E}^{\ast }_{\sigma Q}\). Here, σ and Q are a state and an observable belonging to \(\mathcal{S} ( \mathcal{H}\otimes \mathcal{K} )\) and \(\mathcal{B(H)}\otimes \mathcal{B(K)}\), respectively. Lifting \(\mathcal{E}^{\ast }_{\sigma Q}\) is constructed with the aid of σ and Q. We consider the following dynamics.

$$\rho \quad\Rightarrow\quad \mathcal{E}^{\ast }_{\sigma Q}(\rho) \quad\Rightarrow\quad \operatorname{tr}_{\mathcal{H}} \mathcal{E}^{\ast }_{\sigma Q}(\rho) \equiv \rho_{\sigma Q} \in \mathcal{S} (\mathcal{K}). $$

The initial state ρ is defined in \(\mathcal{S}(\mathcal{H})\) or \(\mathcal{S}(\mathcal{K})\). We call this state change the dynamics adaptive to the state σ and the observable Q or the dynamics adaptive to context S={σ,Q}.

The compound state \(\mathcal{E}_{\sigma Q}^{*}(\rho) =\mathcal{E}_{S}^{*}(\rho)\) describes correlation of an event system of the interest with another event system.

Now consider two “event systems” \(A= \{ a_{k}\in \mathbb{R},E_{k}\in \mathcal{O}(\mathcal{K}) \} \) and \(B= \{ b_{j}\in \mathbb{R}, F_{j}\in \mathcal{O}(\mathcal{H}) \}\), where the sets of {E k } and {F k } are positive operator valued measures (POVMs), i.e., satisfying ∑ k E k =I,∑ k F k =I and E k ,F k >0. We define the joint probability as

$$ P_S(a_{k},b_{j})=\operatorname{tr}E_{k} \otimes F_{j}\mathcal{E}_{S}^{\ast }(\rho). $$
(5)

Further, the probability P S (a k ) is defined as

$$\begin{aligned} \sum_j P_S(a_{k},b_{j}) =&\operatorname{tr}I \otimes E_{k} \mathcal{E}_{S}^{\ast }(\rho)\\=&\operatorname{tr}_{\mathcal{K}}E_{k} \rho_{S}\\\equiv& P_S(a_k). \end{aligned}$$

As was discussed in the introduction, the violation of the total probability law comes from a difference of two contexts, say S={σ,Q} and \(\tilde{S}=\{\tilde{\sigma}, \tilde{Q}\}\). It is represented as

$$P_S(a_k)=P_{\tilde{S}}(a_k)+\Delta=\sum _j P_{\tilde{S}}(a_{k},b_{j})+ \Delta. $$

Generally, if \(S\neq \tilde{S}\), then Δ≠0. In order to discuss the form of Δ mathematically we have to define corresponding liftings.

In the sequels, we shall find proper liftings describing the following three problems: (1) state change of a tongue as the reaction to sweetness; (2) lactose-glucose interference in E. coli growth; (3) Bayesian updating.

4 State Change as Reaction of Tongue to Sweetness

The first problem under investigation is not so sophisticated, but quite common. As was discussed in the introduction, we consider the following cognitive experiment. One takes sugar S or (and) chocolate C and he is asked whether it is sweet or not so. The answers “yes” and “no” are numerically encoded by 1 and 2. Then the basic classical probability law need not be satisfied, that is,

$$ P(C=1)\neq P(C=1|S=1)P(S=1)+P(C=1|S=2)P(S=2), $$

because the LHS P(C=1) will be very close to 1 but the RHS will be less than \(\frac{1}{2}\). Note that the LHS P(C=1) is obtained in context that subjects do not taste sugar; they start directly with chocolate. Contexts “a tongue tasted sugar”, say S sug , and “a tongue did not taste sugar”, say S ¬sug , are different. The probabilities in LHS and RHS in the above equation should be replaced by \(P_{S_{\neg sug}}(C=1)\) and \(P_{S_{sug}}(C=1)\). The problem to be discussed is how to obtain these probabilities mathematically.

Let |e 1〉 and |e 2〉 be the orthogonal vectors describing sweet and non-sweet states, respectively. The initial state of a tongue is neutral such as

$$ \rho \equiv \bigl \vert x_{0} \rangle \langle x_{0}\bigr \vert , $$

where \(x_{0}=\frac{1}{\sqrt{2}} ( \vert e_{1} \rangle +\vert e_{2} \rangle )\). Here we start with the neutral pure state ρ, because we consider experiments with two sweet things. This problem can be described by the Hilbert space \(\mathbb{C}^{2}\), so that |e 1〉 and |e 2〉 can be set as \(\binom{1}{0}\) and \(\binom{0}{1}\), respectively.

When one tastes sugar, the operator corresponding to tasting sugar is mathematically (and operationally) represented as

$$ S=\left ( \begin{array}{c@{\quad}c} \lambda _{1} & 0 \\0 & \lambda _{2}\end{array} \right ), $$

where |λ 1|2+ |λ 2|2=1. This operator can be regarded as the square root of the sugar state σ S :

$$ \sigma _{S}=\vert \lambda _{1}\vert ^{2}E_{1}+\vert \lambda _{2}\vert ^{2}E_{2},\quad E_{1}=\vert e_1 \rangle \langle e_1 \vert ,\ E_{2}=\vert e_2 \rangle \langle e_2\vert . $$

Taking sugar, he will taste that it is sweet with the probability |λ 1|2 and non-sweet with the probability |λ 2|2, so |λ 1|2 should be much higher than |λ 2|2 for usual sugar. This comes from the following change of the neutral initial (i.e., non-adaptive) state of a tongue:

$$ \rho \rightarrow \rho _{S}=\varLambda _{S}^{\ast }( \rho )\equiv \frac{S^{\ast }\rho S}{\operatorname{tr} \vert S\vert ^{2}\rho }. $$
(6)

This is the state of a tongue after tasting sugar.

This dynamics is similar to the usual expression of the state change in quantum dynamics. The subtle point of the present problem is that just after tasting sugar the state of a tongue is neither ρ S nor ρ. Note here that if we ignore subjectivity (personal features) of one’s tongue, then, instead of the state given by (6), the “just after tasting sugar” state will have the form:

$$ E_{1}\rho _{S}E_{1}+E_{2}\rho _{S}E_{2}. $$

This is the unread objective state as usual in quantum measurement. We can use the above two expressions, which give us the same result for computation of the probabilities for the S-variable.

However, for some time duration, the tongue becomes dull to sweetness (and this is the crucial point of our approach for this example), so the tongue state can be written by means of a certain “exchanging” operator such that

$$ \rho _{SX}=X\rho _{S}X. $$

Then similarly as for sugar, when one tastes chocolate, the state will be given by

$$ \rho _{SXC}=\varLambda _{C}^{\ast }(\rho _{SX}) \equiv \frac{C^{\ast }\rho _{SX}C}{\operatorname{tr} \vert C\vert ^{2}\rho _{SX}}, $$

where the operator C has the form:

$$ C=\left ( \begin{array}{c@{\quad}c} \mu _{1} & 0 \\0 & \mu _{2}\end{array} \right ) $$

with |μ 1|2+ |μ 2|2=1. Common experience tells us that |λ 1|2≥ |μ 1|2≥|μ 2|2≥|λ 2|2 and the first two quantities are much larger than the last two quantities.

As can be seen from the preceding consideration, in this example the “adaptive set” {σ,Q} is the set {S,X,C}. Now we introduce the following nonlinear demolition lifting:

$$ \mathcal{E}_{\sigma Q}^{\ast }(\rho )= \rho _{S}\otimes \rho _{SXC}=\varLambda _{S}^{\ast }(\rho )\otimes \varLambda _{C}^{\ast }\bigl(X\varLambda _{S}^{\ast }(\rho )X \bigr). $$

The corresponding joint probabilities are given by

$$ P(S=j,C=k)=\operatorname{tr}E_{j}\otimes E_{k}\mathcal{E}_{\sigma Q}^{\ast }( \rho ). $$

The probability that one tastes sweetness of the chocolate after tasting sugar is

$$ P(S=1,C=1)+P(S=2,C=1)=\frac{\vert \lambda _{2}\vert ^{2}\vert \mu _{1}\vert ^{2}}{\vert \lambda _{2}\vert ^{2}\vert \mu _{1}\vert ^{2}+\vert \lambda _{1}\vert ^{2}\vert \mu _{2}\vert ^{2}}, $$

which is \(P_{S_{sug}}(C=1)\). Note that this probability is much less than

$$ P_{\neg S_{sug}}(C=1)=\operatorname{tr}E_{1}\varLambda _{C}^{\ast }( \rho )=\vert \mu _{1}\vert ^{2}, $$

which is the probability of sweetness tasted by the neutral tongue ρ. This means that the usual total probability law should be replaced by the adaptive (context dependent) probability law.

5 Activity of Lactose Operon in E. Coli

The lactose operon is a group of genes in E. coli (Escherichia coli), and it is required for the metabolism of lactose. This operon produces β-galactosidase, which is an enzyme to digest lactose into glucose and galactose. There was an experiment measuring the activity of β-galactosidase which E. coli produces in the presence of (I) only 0.4 % lactose, (II) only 0.4 % glucose, or (III) mixture 0.4 % lactose +0.1 % glucose, see [24]. The activity is represented in Miller’s units (enzyme activity measurement condition), and it reaches to 3000 units by full induction. In the cases of (I) and (II), the data of 2920 units and 33 units were obtained. These results make one to expect that the activity in the case (III) will be large, because the number of molecules of lactose is larger than that of glucose. However, the obtained data were only 43 units. This result implies that E. coli metabolizes glucose in the preference to lactose. In biology, this functionality of E. coli have been discussed, and it was known that glucose has a property reducing lactose permease provided by the operon. Apart from such qualitative and biochemical explanation, it will be also necessary to discuss mathematical representation such that the biological activity in E. coli is evaluated quantitatively. In the paper [10], it is pointed out that the activity of E. coli violates the total probability law as shown below, which might come from the preference in E. coli’s metabolism. We will explain this contextual behavior by the formula (5).

We consider two events L and G; L: “E. coli detects a lactose molecule in cell’s environment—to use it for its metabolism” and G: “E. coli detects a glucose molecule”. In the case of (I) or (II), the probability P(L)=1 or P(G)=1 is given. In the case of (III), P(L) and P(G) are calculated as

$$ P(L)=\frac{0.4}{0.4+0.1}=0.8,\qquad P(G)=\frac{0.1}{0.4+0.1}=0.2. $$

The events L and G are mutually exclusive. So it can be assumed that P(L)+P(G)=1. Further, we consider the events {+,−} which means that E. coli activates its lactose operon or not. From the experimental data for the cases (I) and (II), the following conditional probabilities are obtained:

(7)

In the case (III), if the total probability law were satisfied, the probability P(+) would be computed as

$$\begin{aligned} P(+|L)P(L)+P(+|G)P(G) \end{aligned}$$

However, from the experimental data, we obtain

$$ P(+)=\frac{43}{3000}, $$

so that the total probability law is violated:

$$ P(+|L\cup G)\neq P(+|L)P(L)+P(+|G)P(G). $$

This violation is similar to that one in the double-slit experiment which was discussed in the introduction. Context of the case (III), say S LG , is different from a simple (Boolean) sum of two contexts, that is, S L S G . We replace the LHS probability of the above equation by \(P_{S_{L\cup G}}(+)\) and replace the RHS probabilities P(+|L) and P(+|G) by \(P_{S_{L}}(+)\) and \(P_{S_{G}}(+)\) respectively;

$$P_{S_{L\cup G}}(+)\neq P_{S_L}(+)P(L)+P_{S_G}(+)P(G)\equiv P_{S_L \cup S_G}(+). $$

We now use our mathematical model for computation of the above probabilities, by using the concept of lifting. First, we introduce the initial state ρ=|x 0〉〈x 0| in Hilbert space \(\mathcal{H}=\mathbb{C}^{2}\). The state vector x 0 is written as

$$ \vert x_{0} \rangle =\frac{1}{\sqrt{2}}\vert e_{1} \rangle +\frac{1}{\sqrt{2}}\vert e_{2} \rangle . $$

The basis {e 1,e 2} denote the detection of lactose or glucose by E. coli, i.e., the events, L and G. In the initial state ρ, the E. coli bacteria has not recognized the existence of lactose and glucose yet. When E. coli recognizes them, the following state change occurs:

$$ \rho \mapsto \rho _{D}=\varLambda _{D}^{\ast }(\rho ) \equiv \frac{D\rho D^{\ast }}{\operatorname{tr}(|D|^{2}\rho )}, $$

where

$$ D=\left ( \begin{array}{c@{\quad}c} \alpha & 0 \\0 & \beta \end{array} \right ) $$

with |α|2+|β|2=1. Note that |α|2 and |β|2 give the probabilities of the events L and G: P(L) and P(G). The state σ D DD encodes the probability distribution P(L),P(G):

$$\sigma_{D}=P(L)\vert e_1 \rangle \langle e_1 \vert +P(G)\vert e_2 \rangle \langle e_2 \vert . $$

In this sense, the state σ D represents the chemical solution of lactose and glucose. We call D the detection operator and call ρ D the detection state. The state determining the activation of the operon in E. coli depends on the detection state ρ D . In our operational model, this state is obtained as the result of the following transformation:

$$ \rho _{DQ}=\varLambda _{Q}^{\ast }(\rho _{D}) \equiv \frac{Q\rho _{D}Q^{\ast }}{\operatorname{tr}(Q\rho Q^{\ast })}, $$

where the operator Q is chosen as

$$ Q=\left ( \begin{array}{c@{\quad}c} a & b \\c & d\end{array} \right ) . $$

We call ρ DQ the activation state for the operon and we call Q the activation operator. (The components a,b,c and d will be discussed later.)

We introduce lifting

$$ \mathcal{E}_{DQ}^{\ast }(\rho )=\varLambda _{D}^{\ast }( \rho )\otimes \varLambda _{Q}^{\ast }\bigl(\varLambda _{D}^{\ast }(\rho )\bigr)\in \mathcal{H}\otimes \mathcal{K}= \mathbb{C}^{2}\otimes \mathbb{C}^{2}, $$

by which we can describe the correlation between the activity of lactose operon and the ratio of concentration of lactose and glucose. From the discussion in Sect. 3, the joint probabilities P DQ (L,+) and P DQ (G,+) are given by

(8)

The probability P DQ (±) is obtained as P DQ (L,±)+P DQ (G,±).

Let us consider context S L (the case (I)) such that the detection operator D satisfies the condition P(L)=|α|2=1. We denote such D by the symbol D L . The probabilities \(P_{D_{L} Q}(\pm)=P_{S_{L}}(\pm)\) are calculated as

$$ P_{S_L}(+)=\frac{|a|^{2}}{|a|^{2}+|c|^{2}},\qquad P_{S_L}(-)=\frac{|c|^{2}}{|a|^{2}+|c|^{2}}. $$

From the experimental results of Eq. (6), these values should be \(\frac{2920}{3000}\) and \(\frac{80}{3000}\). Therefore, we can give the following forms for the parameters a and c.

$$ a=\sqrt{\frac{2920}{3000}}\mathrm{e}^{\mathrm{i}\theta _{+L}}k_{L},\qquad c=\sqrt{\frac{80}{3000}}\mathrm{e}^{\mathrm{i}\theta _{-L}}k_{L} $$

Here, k L is a certain real number. In a similar way, we consider context S G (the case (II)) and obtain

$$ b=\sqrt{\frac{33}{3000}}\mathrm{e}^{\mathrm{i}\theta _{+G}}k_{G},\qquad d=\sqrt{\frac{2967}{3000}}\mathrm{e}^{\mathrm{i}\theta _{-G}}k_{G} $$

for the components b and d. To simplify the discussion, hereafter, we assume θ +L =θ L , θ +G =θ G and denote \(\mathrm{e}^{\mathrm{i}\theta _{L}}k_{L}\), \(\mathrm{e}^{\mathrm{i}\theta _{G}}k_{G}\) by \(\tilde{k}_{L}\), \(\tilde{k}_{G}\). Then, the operator Q is rewritten to

$$\begin{aligned} Q =&\frac{1}{\sqrt{3000}}\left ( \begin{array}{c@{\quad}c} \sqrt{2920} & \sqrt{33} \\\sqrt{80} & \sqrt{2967}\end{array} \right ) \left ( \begin{array}{c@{\quad}c} \tilde{k}_{L} & 0 \\0 & \tilde{k}_{G}\end{array} \right ) \\=&\left ( \begin{array}{c@{\quad}c} \sqrt{P_{S_L}(+)} & \sqrt{P_{S_G}(+)} \\\sqrt{P_{S_L}(-)} & \sqrt{P_{S_G}(-)}\end{array} \right ) \left ( \begin{array}{c@{\quad}c} \tilde{k}_{L} & 0 \\0 & \tilde{k}_{G}\end{array} \right ). \end{aligned}$$
(9)

By using this Q, we calculate the probability \(P_{S_{L\cup G}}(+)\) corresponding to the case (III):

In general, the value of this probability is different from that of \(P_{S_{L} \cup S_{G}}(+)=P_{S_{L}}(+)|\alpha|^{2}+P_{S_{G}}(+)|\beta|^{2}\). The rate \(|\tilde{k}_{L}|/|\tilde{k}_{G}|\) essentially determines the degree of the difference. Recall the experimental data in the case (III). In this case, P(L)=|α|2=0.8>P(G)=|β|2=0.2, but \(P_{S_{L\cup G}}(+)\) is very small. According to our interpretation, it implies that the rate \(|\tilde{k}_{L}|/|\tilde{k}_{G}|\) is very small. In this sense, the operator F = in Eq. (9) specifies the preference in E. coli’s metabolism. We call F the preference operator. Finally note that if α, β are real and \(\tilde{k}_{L}=\tilde{k}_{G}^{\ast }\), the usual total probability law is held.

6 Bayesian Updating Biased by Psychological Factor

The Bayesian updating is an important concept in Bayesian statics, and it is used to describe a process of inference, which is explained as follows. Consider two event systems denoted by S 1={A,B} and S 2={C,D}, where the events A and B are mutually exclusive, and the same holds for C and D. Firstly, a decision-making entity, say Alice, estimates the probabilities P(A) and P(B) for the events A and B, which are called the prior probabilities. The prior probability is sometimes called “subjective probability” or “personal probability”. Further, Alice knows the conditional probabilities P(C|A) and P(C|B) which are obtained from some statistical data. When Alice sees the occurrence of the event C or D in the system S 2, she can change her prior prediction P(A) and P(B) to the following conditional probabilities by Bayes’ rule: When Alice sees the occurrence of C in S 2, she can update her prediction for the event A from P(A) to

$$ P(A|C)=\frac{P(C|A)P(A)}{P(C|A)P(A)+P(C|B)P(B)}. $$

When Alice sees the occurrence of D in S 2, she can update her prediction for the event A from P(A) to

$$ P(A|D)=\frac{P(D|A)P(A)}{P(D|A)P(A)+P(D|B)P(B)}. $$

In the same way she updates her prediction for the event B. These conditional (updating) probabilities are called the posterior probabilities. The change of prediction is described as “updating” from the prior probabilities P(A),P(B) to the posterior probabilities, and it is called the Bayesian updating.

In the paper [9], we redescribed the process of Bayesian updating in the framework of “quantum-like representation”, where we introduced the following state vector belonging to Hilbert space \(\mathcal{H}= \mathcal{H}_{1}\otimes \mathcal{H}_{2}=\mathbb{C}^{2}\otimes \mathbb{C}^{2}\);

$$\begin{aligned} \vert \varPhi \rangle =&\sqrt{P\bigl(A^{\prime }\bigr)}\vert A^{\prime } \rangle \otimes \bigl(\sqrt{P\bigl(C^{\prime }|A^{\prime } \bigr)}\vert C^{\prime } \rangle +\sqrt{P\bigl(D^{\prime }|A^{\prime } \bigr)}\vert D^{\prime } \rangle \bigr) \\&{}+\sqrt{P\bigl(B^{\prime }\bigr)}\vert B^{\prime } \rangle \otimes \bigl(\sqrt{P\bigl(C^{\prime }|B^{\prime }\bigr)}\vert C^{\prime } \rangle +\sqrt{P\bigl(D^{\prime }|B^{\prime } \bigr)}\vert D^{\prime } \rangle \bigr). \end{aligned}$$
(10)

We call this vector the prediction state vector. The set of vectors {|A′〉,|B′〉} is an orthogonal basis on \(\mathcal{H}_{1}\), and {|C′〉,|D′〉} is another orthogonal basis on \(\mathcal{H}_{2}\). The A′, B′, C′ and D′ represent the events defined as

Event A′ (B′)::

Alice judgesthe event A(B) occurs in the system S 1.”

Event C′(D′)::

Alice judgesthe event C(D) occurs in the system S 2.”

These events are the subjective events (judgments) in Alice’s “mental space” and the vectors |A′〉, |B′〉, |C′〉 and |D′〉 give quantum-like representation of the above judgments. The vector |Φ〉 represents coexistence of these judgments in Alice’s brain. For example, Alice is conscious of |A′〉 with the weight \(\sqrt{P(A^{\prime })}\), and under the condition of the event A′, she sets the weights \(\sqrt{P(C^{\prime }|A^{\prime })}\) and \(\sqrt{P(D^{\prime }|A^{\prime })}\) for the minds |C′〉 and |D′〉. Such an assignment of weights implies that Alice feels causality between S 1 and S 2: The events in S 1 are causes and the events in S 2 are results. The square of \(\sqrt{P(A^{\prime })}\) corresponds to a prior probability P(A) in the Bayesian theory. If Alice knows the objective conditional probabilities P(C|A) and P(C|B), Alice can set the weights of \(\sqrt{P(C^{\prime }|A^{\prime })}\) and \(\sqrt{P(C^{\prime }|B^{\prime })}\) from P(C′|A′)=P(C|A) and P(C′|B′)=P(C|B). If Alice has the prediction state |Φ〉〈Φ|≡ρ and sees the occurrence of the event C in S 2, the event D′ is vanished instantaneously (in her mental representation). This vanishing is represented as the reduction by the projection operator M C=I⊗|C′〉〈C′|;

$$ \frac{M_{C^{\prime }}\rho M_{C^{\prime }}}{\operatorname{tr}(M_{C^{\prime }}\rho )}\equiv \rho _{C^{\prime }} $$

The posterior probability P(A|C) is calculated by

$$ \operatorname{tr}(M_{A^{\prime }}\rho _{C^{\prime }}), $$

where M A=|A′〉〈A′|⊗I.

The inference based on the Bayesian updating is rational—from the view point of classical probability theory, game theory and economics (the Savage sure thing principle). However, in cognitive psychology and economics one can find extended empirical data showing that sometimes human inference seems to be irrational, see [30] for the review. Typically this happens in contexts such that there are (often hidden) psychological factors disturbing the rational inferences. Our aim is to provide a mathematical description of such an irrational inference; the concept of lifting will be used. Let us introduce lifting from \(S(\mathcal{H})\) to \(S(\mathcal{H}\otimes \mathcal{K})\) by

$$ \mathcal{E}_{\sigma V}^{\ast }(\rho )=V\rho \otimes \sigma V^{\ast }. $$

Here \(\sigma \in S(\mathcal{K})\) represents the state of Alice’s psychological representation of context which is generated when Alice updates her inference. The operator V on \(\mathcal{H} \otimes \mathcal{K}\) is unitary and gives a correlation between the prediction state ρ and the psychological factor σ, in other words, it specifies a psychological affection to the rational inference. We call the state defined by

$$ \rho_{\sigma V}\equiv \operatorname{tr}_{\mathcal{K}}\bigl(\mathcal{E}_{\sigma V}^{\ast }( \rho )\bigr), $$

the prediction state biased from the rational prediction ρ. From this ρ σV , the joint probability is defined as

$$ \operatorname{tr} ( M_{X^{\prime }}M_{Y^{\prime }} \rho_{\sigma V}M_{Y^{\prime }} )\equiv P_{\sigma V}\bigl(X',Y'\bigr), $$

for the events X′=(A′ or B′) and Y=(C′ or D′), and the biased posterior probability is defined as

$$P_{\sigma V}\bigl(X'|Y'\bigr)=\frac{P_{\sigma V}(X',Y') }{\operatorname{tr}(M_{C^{\prime }}\rho_{\sigma V})} $$

In general, the value of P σV (X′|Y′) is different from the original P(X′|Y′) obtained with the aid in the rational (Bayesian) inference.