1 Formulation of the Problem

Somewhat surprisingly, quantum models describe several aspects of human behavior. The main issue with which we deal in this paper is that many aspects of human behavior seem to be well-described by formulas from quantum physics; see, e.g., [1, 3, 12].

This is understandable on the qualitative level. The success of quantum models in describing several aspects of human behavior is understandable on the qualitative level (see, e.g., [8]): similar to quantum physics, every time we gain new knowledge we inevitably change the system. For example, once we learn a new dependence between the economic variables, we can make better predictions of economic phenomena and thus, change the behavior of decision makers.

But how can we explain this success on the quantitative level? The above qualitative arguments do not explain why not only ideas but also formulas from quantum physics are helpful in describing human behavior (see, e.g., [1]). In other words, while the qualitative success of quantum models is reasonable, their quantitative success remains a mystery.

What we do in this paper. In this paper, we show that the analysis of different types of uncertainty leads to the desired quantitative explanation of the success of quantum models in describing human behavior. To be more precise, we show that this analysis leads to a general formula that includes non-quantum and quantum probabilities as particular cases. We hope that some intermediate cases of this general formula will be even more accurate in describing human behavior than the currently used formulas based on non-quantum and quantum probabilities.

The structure of the paper is as follows. In Sect. 2, we briefly recall the main ideas behind quantum physics, and recall how the corresponding models help in describing human behavior. In Sect. 3, we show how to explain the quantum formulas, and how to derive a general formula that contains non-quantum and quantum formulas as particular cases. In Sects. 4 and 5, we describe how this formula is related to entropy and fuzzy.

2 Quantum Models and How They Describe Human Behavior: A Brief Reminder

The main difference between non-quantum and quantum probabilities. According to [4], the main difference between non-quantum and quantum probabilities can be explained on the example of the following two-slot experiment.

This experiment is about particle propagation. We have a particle generator – e.g., a light source or a radio source that generates photons, or a radioactive element that generates electrons or alpha-particles. There is an array of sensors at some distance from this generator. By detecting the particles, these sensors helps us estimate the probability that the original particle goes to the location x of the sensor. To be more precise, what we estimate is the probability density \(\rho (x)\) corresponding to the sensor’s location x.

There is a barrier between the source of the signals and the sensors. In this barrier, there are two slots that can be open or close. If both slots are closed, no particles come through, so the sensors do not detect anything. If one or both slots are open, detectors detect the particles.

We assume that the particles do not interact with each other; this is a reasonable assumption for electromagnetic waves (photos), and even for electrons – as long as their density is not too high.

Let us assume that first we open the first slot and leave the second slot closed. Let \(\rho _1(x)\) denote the resulting probability density. Now, we can close the first slot, open the second slot, and measure the new probability density. We will denote this new probability density by \(\rho _2(x)\). What will happen if we open both slots? What will then be the resulting probability density \(\rho (x)\)?

In non-quantum physics, the answer is simple. Indeed, in the new experiment, a particle reaches the sensor if it either went through the first slot or it went through the second slot. We set up the experiment in such a way that the particle cannot go through both slots. Thus, the probability that a particle passes via one of the two slots is equal to the sum of the probability of passing through the first slot and the probability of passing through the second slot: \(\rho (x)=\rho _1(x)+\rho _2(x)\).

However, this is not what we observe in quantum physics. In quantum physics, to properly describe uncertainty, it is not sufficient to describe the corresponding probabilities, we also need to describe a complex-valued function \(\psi (x)\) – called wave function – for which the probability density is equal to \(\rho (x)=|\psi (x)|^2\).

In this case, if we know the wave function \(\psi _1(x)\) corresponding to the first-slot-open case and the wave function \(\psi _2(x)\) corresponding to the second-slot-open case, then the wave function \(\psi (x)\) corresponding to the case when both slots are open is equal to \(\psi (x)=\psi _1(x)+\psi _2(x)\). In this case, in general,

$$\rho (x)=|\psi (x)|^2=|\psi _1(x)+\psi _2(x)|^2\ne |\psi _1(x)|^2+|\psi _2(x)|^2=\rho _1(x)+\rho _2(x).$$

For example, in situations in which the values of all wave functions are positive real numbers, we get \(\psi _1(x)=\sqrt{\rho _1(x)}\), \(\psi _2(x)=\sqrt{\rho _2(x)}\), and thus, \(\rho (x)=\psi ^2(x)\) has the form

$$\rho (x)=(\sqrt{\rho _1(x)}+\sqrt{\rho _2(x)})^2=\rho _1(x)+\rho _2(x)+ 2\sqrt{\rho _1(x)\cdot \rho _2(x)}\ne \rho _1(x)+\rho _2(x).$$

Comment. In general, the values of \(\psi _i(x)\) may be complex. In this case, by using triangle inequality, the only thing that we can conclude about \(|\psi (x)|=|\psi _1(x)+\psi _2(x)|\) is that

$$||\psi _1(x)|-|\psi _2(x)||\le |\psi (x)|\le |\psi _1(x)|+|\psi _2(x)|.$$

By squaring all three parts of this double inequality, we can conclude that the probability density function \(\rho (x)=|\psi (x)|^2\) satisfies the inequality

$$(\sqrt{\rho _1(x)}-\sqrt{\rho _2(x)})^2\le \rho (x)\le (\sqrt{\rho _1(x)}+\sqrt{\rho _2(x)})^2,$$

i.e., that

$$\rho _1(x)+\rho _2(x)-2\sqrt{\rho _1(x)\cdot \rho _2(x)}\le \rho (x)\le \rho _1(x)+\rho _2(x)+2\sqrt{\rho _1(x)\cdot \rho _2(x)}.$$

How is this related to human behavior. In the early 1980s, a group of researchers from the Republic of Georgia observed the behavior of kids in a two-door room; see, e.g., [13, 14]. In some cases, both doors were open, in other cases, only one door was open. On the other side, boxes with treats were placed, and the researchers measured how frequently kids pick up treats from the box located at spatial location x.

It turns out that the older kids, after walking through the door, mostly went to the box which was the closest to this door. For these kids, the frequency f(x) of selecting the box x when both doors were open was (approximately) equal to the sum \(f_1(x)+f_2(x)\) of the frequencies \(f_i(x)\) corresponding to the cases when the i-th door was open and the other door was closed. In other words, older kids exhibited non-quantum behavior.

Somewhat surprising, for younger kids (3–4 years old), the frequency f(x) was different from the sum \(f_1(x)+f_2(x)\): not only that, it was close to the quantum formula \(f(x)=(\sqrt{f_1(x)}+\sqrt{f_2(x)})^2\).

For the adults – just like for the older kids – a large part of their decision making behavior is described by the traditional (non-quantum) probabilities. However, surprisingly, some aspects of their behavior are better described by the quantum formulas; see, e.g., [1, 3].

Remaining questions. How can we explain the usability of quantum formulas? And how can we come up with formulas that take into account some similarity to quantum phenomena without requiring that all formulas are quantum ones?

In this paper, we use several uncertainty-related approaches to come up with such a more general formula.

3 How to Explain the Quantum Formulas and to Get a General Expression Containing Non-quantum and Quantum Formulas as Particular Cases

Formulation of the problem: reminder. For each sensor location x, we know the values of the probability densities \(\rho _1(x)\) and \(\rho _2(x)\) corresponding to situations when one of the slots is open. Based on these values, we need to estimate the value \(\rho (x)\).

Let f(ab) denote the algorithm that transforms the known values \(a=\rho _1(x)\) and \(b =\rho _2(x)\) into the estimate for \(\rho (x)\). In terms of this algorithm, the desired estimate has the form \(\rho (x)=f(\rho _1(x),\rho _2(x))\).

The problem is: which function f(ab) is the most appropriate?

Comment. The two-slot experiment is just an example. The algorithm f(ab) can be used not only for the two-slot experiment, but for all possible situations when we need to combine the known probabilities \(a=P(A)\) and \(b=P(B)\) of two events A and B into an estimate f(ab) for the probability \(P(A\vee B)\) of the disjunction \(A\vee B\).

First natural requirement: commutativity. The estimate for \(\rho (x)\) should not depend on which door we call the first one and which one we call the second one. Thus, we must have \(f(\rho _1(x),\rho _2(x))=f(\rho _2(x),\rho _1(x))\) for all possible values \(\rho _1(x)\) and \(\rho _2(x)\).

In other words, we must have \(f(a,b)=f(b,a)\) for all possible values a and b, i.e., the function f(ab) must be commutative.

Second natural requirement: continuity. In practice, all probability values are estimated only approximately, as frequencies \(\widetilde{\rho }_i(x)\approx \rho _i(x)\). The more observations we have, the more accurate these approximations, i.e., the closer the estimates to the actual probabilities. It is therefore reasonable to require that as the approximate values \(\widetilde{\rho }_1(x)\) and \(\widetilde{\rho }_2(x)\) tend to the actual values \(\rho _1(x)\) and \(\rho _2(x)\), the resulting estimate \(f(\widetilde{\rho }_1(x),\widetilde{\rho }_2(x))\) should tend to the estimate \(f(\rho _1(x),\rho _2(x))\) based on the actual values. In other words, it is reasonable to require that the function f(ab) be continuous.

Third natural requirement: monotonicity. It is also reasonable to require that if one of the probabilities \(\rho _i(x)\) increases, the resulting overall probability \(\rho (x)\) should increase as well. In other words, it is reasonable to require that the function f(ab) is a (non-strictly) increasing function of each of its variables.

Fourth natural requirement: associativity. If we have three doors instead of two, then we can estimate the probability \(\rho (x)\) corresponding to the case when all 3 doors are open in two different ways:

  • We can first use the algorithm f(ab) to estimate the probability density function \(\rho _{12}(x)=f(\rho _1(x),\rho _2(x))\) corresponding to the case when the first two doors are open, and then again apply the same algorithm f(ab) to combine the probability density function \(\rho _{12}(x)\) with the probability density function \(\rho _3(x)\), resulting in the value \(f(f(\rho _1(x),\rho _2(x)),\rho _3(x))\).

  • Alternatively, we can first combine the probability density functions corresponding to doors 2 and 3, resulting in \(\rho _{23}(x)=f(\rho _2(x),\rho _3(x))\), and then combine the resulting probability density function with \(\rho _1(x)\), resulting in

    $$f(\rho _1(x),\rho _{23}(x))=f(\rho _1(x),f(\rho _2(x),\rho _3(x))).$$

It is reasonable to require that these two estimates are equal for all possible values \(\rho _i(x)\), i.e., that \(f(f(a,b),c)=f(a,f(b,c))\) for all a, b, and c – in other words, that the operation f(ab) is associative.

Fifth natural requirement: scale-invariance. By definition, the probability density is probability divided by the length or area (or volume). In principle, we can use different units for measuring length, and thus, different units for measuring area or volume. If we replace the original measuring unit with the one which is \(\lambda \) times smaller, the numerical values of the probability density gets multiplied by \(\lambda \).

It is reasonable to require that the estimating function f(ab) should not change if we thus re-scale all the values of the probability density, that \(\rho (x)=f(\rho _1(x),\rho _2(x))\) should imply \(\lambda \cdot \rho (x)=f(\lambda \cdot \rho _1(x),\lambda \cdot \rho _2(x))\). Thus, we require that \(f(\lambda \cdot a,\lambda \cdot b)=\lambda \cdot f(a,b)\) for all possible values a, b, and \(\lambda \).

Now, we are ready for formulate our main result.

Definition 1

We say that a function \(f(a,b):\mathrm{I\!R}^+_0\times \mathrm{I\!R}^+_0\rightarrow \mathrm{I\!R}^+_0\) from non-negative real numbers to non-negative real numbers is a scale-invariant estimation function if it is commutative, associative, continuous, (non-strictly) increasing, and \(f(\lambda \cdot a,\lambda \cdot b)=\lambda \cdot f(a,b)\) for all a, b, and \(\lambda \).

Proposition 1

The only scale-invariant estimation functions are \(f(a,b)=0\), \(f(a,b)=\min (a,b)\), \(f(a,b)=\max (a,b)\), and \(f(a,b)=(a^\alpha +b^\alpha )^{1/\alpha }\) for some \(\alpha \).

4 Proof

Comment. In principle, we could shorten our proof if we take into account the known general structure of 1-D semigroups [6, 10], but, for pedagogical purposes, we decided to have a longer “from scratch” proof, since in this proof, all steps are clear and it is, thus more convincing to non-mathematical readers.

\(1^\circ \). Depending on whether the value f(1, 1) is equal to 1 or not, we have two possible cases: \(f(1,1)=1\) and when \(f(1,1)\ne 1\). Let us consider these two cases one by one.

\(2^\circ \). Let us first consider the case when \(f(1,1)=1\). In this case, the value f(0, 1) can be either equal to 0 or different from 0. Let us consider both subcases.

\(2.1^\circ \). Let us first consider the first subcase, when \(f(0,1)=0\).

In this case, for every \(b>0\), scale invariance with \(\lambda =b\) implies that

$$f(b\cdot 0,b\cdot 1)=b\cdot 0,$$

i.e., that \(f(0,b)=0\). By taking \(b\rightarrow 0\) and using continuity, we also get \(f(0,0)=0\). Thus, \(f(0,b)=0\) for all b.

By commutativity, we have \(f(a,0)=0\) for all a. So, to fully describe the operation f(ab), it is sufficient to consider the cases when \(a>0\) and \(b>0\).

\(2.1.1^\circ \). Let us prove, by contradiction, that in this subcase, we have \(f(1,a)\le 1\) for all a.

Indeed, let us assume that for some a, we have \(b \,{\mathop {=}\limits ^\mathrm{def}} \, f(1,a)>1\). Then, due to associativity and \(f(1,1)=1\), we have \(f(1,b)=f(1,f(1,a))=f(f(1,1),a)=f(1,a)=b\).

Due to scale-invariance with \(\lambda =b\), the equality \(f(1,b)=b\) implies that \(f(b,b^2)=b^2\). Thus, \(f(1,b^2)=f(1,f(b,b^2))=f(f(1,b),b^2)=f(b,b^2)=b^2\).

Similarly, from \(f(1,b^2)=b^2\), we conclude that for \(b^4=(b^2)^2\), we have \(f(1,b^4)=b^4\), and, in general, that \(f(1,b^{2^n})=b^{2^n}\) for every n.

Scale invariance with \(\lambda =b^{-2^n}\) implies that \(f(b^{-2^n},1)=1\). In the limit \(n\rightarrow \infty \), we get \(f(0,1)=1\), which contradicts to our assumption that \(f(0,1)=0\). This contradiction shows that indeed, \(f(1,a)\le 1\).

\(2.1.2^\circ \). For \(a\ge 1\), monotonicity implies \(1=f(1,1)\le f(1,a)\), so \(f(1,a)\le 1\) implies that \(f(1,a)=1\).

Now, for any \(a'\) and \(b'\) for which \(0<a'\le b'\), if we denote \(r\, {\mathop {=}\limits ^\mathrm{def}}\,\displaystyle \frac{b'}{a'}\ge 1\), then scale-invariance with \(\lambda =a'\) implies that \(a'\cdot f(1,r)=f(a'\cdot 1,a'\cdot r)=f(a',b')\). Here, \(f(1,r)=1\), thus \(f(a',b')=a'\cdot 1=a'\), i.e., \(f(a',b')=\min (a',b')\). Due to commutativity, the same formula also holds when \(a'\ge b'\). So, in this case, \(f(a,b)=\min (a,b)\) for all a and b.

\(2.2^\circ \). Let us now consider the second subcase of the first case, when \(f(0,1)>0\).

\(2.2.1^\circ \). Let us first show that in this subcase, we have \(f(0,0)=0\).

Indeed, scale-invariance with \(\lambda =2\) implies that from \(f(0,0)=a\), we can conclude that \(f(2\cdot 0,2\cdot 0)=f(0,0)=2\cdot a.\) Thus \(a=2\cdot a\), hence \(a=0\). The statement is proven.

\(2.2.2^\circ \). Let us now prove that in this subcase, \(f(0,1)=1\).

Indeed, in this case, for \(a\,{\mathop {=}\limits ^\mathrm{def}}\,f(0,1)\), we have, due to \(f(0,0)=0\) and associativity, that \(f(0,a)=f(0,f(0,1))=f(f(0,0),1)=f(0,1)=a.\) Here, \(a>0\), so by applying scale invariance with \(\lambda =a^{-1}\), we conclude that \(f(0,1)=1.\)

\(2.2.3^\circ \). Let us now prove that for every \(a\le b\), we have \(f(a,b)=b\). So, due to commutativity, we have \(f(a,b)=\max (a,b)\) for all a and b.

Indeed, from \(f(1,1)=1\) and \(f(0,1)=1\), due to scale invariance with \(\lambda =b\), we conclude that \(f(0,b)=b\) and \(f(b,b)=b\). Due to monotonicity, \(0\le a\le b\) implies that \(b=f(0,b)\le f(a,b)\le f(b,b)=b\), thus \(f(a,b)=b\). The statement is proven.

\(3^\circ \). Let us now consider the remaining case when \(f(1,1)\ne 1\).

\(3.1^\circ \). Let us denote \(v(k)\,{\mathop {=}\limits ^\mathrm{def}}\, f(1,f(\ldots , 1)\ldots )\) (k times). Then, due to associativity, for every m and n, the value \(v(m\cdot n)=f(1,f(\ldots , 1)\ldots )\) (\(m\cdot n\) times) can be represented as

$$f(f(1,f(\ldots , 1)\ldots ),\ldots ,f(1,f(\ldots , 1)\ldots )),$$

where we divide the 1s into m groups with n 1s in each. For each group, we have \(f(1,f(\ldots , 1)\ldots )=v(n)\). Thus, \(v(m\cdot n)=f(v(n),f(\ldots , v(n))\ldots )\) (m times).

We know that \(f(1,f(\ldots , 1)\ldots )\) (m times) \(= v(m)\). Thus, by using scale-invariance with \(\lambda =v(n)\), we conclude that \(v(m\cdot n)=v(m)\cdot v(n)\), i.e., that that function v(n) is multiplicative. In particular, this means that for every number p and for every positive integer n, we have \(v(p^n)=(v(p))^n\).

\(3.2^\circ \). If \(v(2)=f(1,1)>1\), then by monotonicity, we get \(v(3)=f(1,v(2))\ge f(1,1)=v(2)\), and, in general, \(v(n+1)\ge v(n)\). Thus, in this case, the sequence v(n) is (non-strictly) increasing.

Similarly, if \(v(2)=f(1,1)<1\), then we get \(v(3)\le v(2)\) and, in general, \(v(n+1)\le v(n)\), i.e., in this case, the sequence v(n) is strictly decreasing.

Let us consider these two cases one by one.

\(3.2.1^\circ \). Let us first consider the case when the sequence v(n) is increasing. In this case, for every three integers m, n, and p, if \(2^m\le p^n\), then \(v(2^m)\le v(p^n)\), i.e., \((v(2))^m\le (v(p))^n\).

For all m, n, and p, the inequality \(2^m\le p^n\) is equivalent to \(m\cdot \ln (2)\le n\cdot \ln (p)\), i.e., to \(\displaystyle \frac{m}{n}\le \displaystyle \frac{\ln (p)}{\ln (2)}\). Similarly, the inequality \((v(2))^m\ge (v(p))^n\) is equivalent to \(\displaystyle \frac{m}{n}\le \displaystyle \frac{\ln (v(p))}{\ln (v(2))}\). Thus, the above conclusion “if \(2^m\le p^n\) then \((v(2))^m\le (v(p))^n\)” takes the following form:

$$\text {for every rational number }\frac{m}{n},\text { if }\frac{m}{n}\le \frac{\ln (p)}{\ln (2)}\text { then }\frac{m}{n}\le \frac{\ln (v(p))}{\ln (v(2))}.$$

Similarly, for all \(m'\), \(n'\), and p, if \(p^{n'}\le 2^{m'}\), then \(v(p^{n'})\le v(2^{m'})\), i.e., \((v(p))^{n'}\le (v(2))^{m'}\). The inequality \(p^{n'}\le 2^{m'}\) is equivalent to \(n'\cdot \ln (p)\le m'\cdot \ln (2)\), i.e., to \(\displaystyle \frac{\ln (p)}{\ln (2)}\le \displaystyle \frac{m'}{n'}\). Also, the inequality \((v(p))^{n'}\le (v(2))^{m'}\) is equivalent to \(\displaystyle \frac{\ln (v(p))}{\ln (v(2))}\le \displaystyle \frac{m'}{n'}\). Thus, the conclusion “if \(p^{n'}\le 2^{m'}\) then \((v(p))^{n'}\le (v(2))^{m'}\)” takes the following form:

$$\text {for every rational number }\frac{m'}{n'},\text { if }\frac{\ln (p)}{\ln (2)}\le \frac{m'}{n'}\text { then } \frac{\ln (v(p))}{\ln (v(2))}\le \frac{m'}{n'}.$$

Let us denote \(\gamma \,{\mathop {=}\limits ^\mathrm{def}}\, \displaystyle \frac{\ln (p)}{\ln (2)}\) and \(\beta \,{\mathop {=}\limits ^\mathrm{def}}\, \displaystyle \frac{\ln (v(p))}{\ln (v(2))}\). For every \(\varepsilon >0\), there exist rational numbers \(\displaystyle \frac{m}{n}\) and \(\displaystyle \frac{m'}{n'}\) for which \(\gamma -\varepsilon \le \displaystyle \frac{m}{n}\le \gamma \le \displaystyle \frac{m'}{n'}\le \gamma +\varepsilon \). For these numbers, the above two properties imply that \(\displaystyle \frac{m}{n}\le \beta \) and \(\beta \le \displaystyle \frac{m'}{n'}\) and thus, that \(\gamma -\varepsilon \le \beta \le \gamma +\varepsilon \), i.e., that \(|\gamma -\beta |\le \varepsilon \). This is true for all \(\varepsilon >0\), so we conclude that \(\beta =\gamma \), i.e., that \(\displaystyle \frac{\ln (v(p))}{\ln (v(2))}=\gamma \). Hence, \(\ln (v(p))=\gamma \cdot \ln (p)\) and thus, \(v(p)=p^\gamma \) for all integers p.

\(3.2.2^\circ \). We can reach a similar conclusion \(v(p)=p^\gamma \) when the sequence v(n) is decreasing and \(v(2)<1\), and a conclusion that \(v(p)=0\) if \(v(2)=0\).

\(3.3^\circ \). By definition of v(n), we have \(f(v(m),v(m'))=v(m+m')\). Thus, we have \(f(m^\gamma ,(m')^\gamma )=(m+m')^\gamma .\) By using scale-invariance with \(\lambda =n^{-\gamma }\), we get \(f\left( \displaystyle \frac{m^\gamma }{n^\gamma }, \displaystyle \frac{(m')^\gamma }{n^\gamma }\right) = \displaystyle \frac{(m+m')^\gamma }{n^\gamma }.\) Thus, for \(a=\displaystyle \frac{m^\gamma }{n^\gamma }\) and \(b=\displaystyle \frac{(m')^\gamma }{n^\gamma }\), we get \(f(a,b)=(a^\alpha +b^\alpha )^{1/\alpha }\), where \(\alpha \,{\mathop {=}\limits ^\mathrm{def}}\,1/\gamma \).

Rational numbers \(r=\displaystyle \frac{m}{n}\) are everywhere dense on the real line, hence the values \(r^\gamma \) are also everywhere dense, i.e., every real number can be approximated, with any given accuracy, by such numbers. Thus, continuity implies that \(f(a,b)=(a^\alpha +b^\alpha )^{1/\alpha }\) for every two real numbers a and b.

The proposition is proven.

Discussion. For \(\alpha =1\), we get the usual formula for the probability for the event \(A\vee B\) when A and B are disjoint. For \(\alpha =0.5\), we get the quantum formula. Thus, we get the desired justified general formula or which traditional probabilistic formula and the quantum formula are particular cases.

5 Relation to Entropy

What we do in this section. Let us start our analysis of the resulting general formula. In this section, we show that this formula has an interesting relation to Shannon’s entropy.

Informal analysis of the problem. As we have mentioned, most aspects of human behavior and human decision making can be described by the usual probabilistic formulas. This means that while some deviations from the usual formulas are needed – to take into account some aspects of human behavior which are better described by quantum formulas – the corresponding value \(\alpha \) should be close to the value \(\alpha =1\) corresponding to the usual probabilistic case.

Since quantum formulas seem to capture some aspects of human behavior, and quantum formulas correspond to \(\alpha <1\), this means that the actual value \(\alpha \) is close to 1 and smaller than 1. Thus, we can conclude that \(\alpha =1-\varepsilon \) for some small \(\varepsilon >0\).

The fact that \(\varepsilon \) is small means that we can safely ignore terms which are quadratic (or of higher order) in terms of \(\varepsilon \), and keep only terms which are linear in \(\varepsilon \). Let us see how the above formula can be this simplified.

Formal analysis of the problem. For every value a, we have \(a^\alpha =a^{1-\varepsilon }=a\cdot a^{-\varepsilon }.\) Here, since \(a=\exp (\ln (a))\), we get

$$a^{-\varepsilon }=(\exp (\ln (a))^{-\varepsilon }= \exp (-\varepsilon \cdot \ln (a))\approx 1-\varepsilon \cdot \ln (a).$$

Thus, \(a^\alpha \approx a\cdot (1-\varepsilon \cdot \ln (a))=a-\varepsilon \cdot a\cdot \ln (a),\) and

$$(\rho _1(x))^{\alpha }+(\rho _2(x))^\alpha \approx \rho _1(x)+\rho _2(x)- \varepsilon \cdot (\rho _1(x)\cdot \ln (\rho _1(x))+\rho _2(x)\cdot \ln (\rho _2(x))).$$

Similarly, since \(\displaystyle \frac{1}{\alpha }=\displaystyle \frac{1}{1-\varepsilon }\approx 1+\varepsilon ,\) we get \(a^{1/\alpha }=a+\varepsilon \cdot a\cdot \ln (a).\) Thus,

$$\begin{aligned} \begin{array}{c} \rho (x)=((\rho _1(x))^{\alpha }+(\rho _2(x))^\alpha )^{1/\alpha } \\ \approx (\rho _1(x)+\rho _2(x)- \varepsilon \cdot (\rho _1(x)\cdot \ln (\rho _1(x))\\ + \, \rho _2(x)\cdot \ln (\rho _2(x)))+\varepsilon \cdot (\rho _1(x)+\rho _2(x))\cdot \ln (\rho _1(x)+\ln (\rho _2(x))). \end{array} \end{aligned}$$

Resulting formula and its relation to Shannon’s entropy. Thus, we arrive at the following formula: \(\rho (x)\approx \rho _1(x)+\rho _2(x)+\varepsilon \cdot \varDelta \rho (x),\) where we denoted

$$\begin{aligned} \begin{array}{c} \varDelta \rho (x)\,{\mathop {=}\limits ^\mathrm{def}}\,-\rho _1(x)\cdot \ln (\rho _1(x))-\rho _2(x)\cdot \ln (\rho _2(x))\\ -(-(\rho _1(x)+\rho _2(x))\cdot \ln (\rho _1(x)+\rho _2(x))). \end{array} \end{aligned}$$

The sum of expressions of the type \(-\rho (x)\cdot \ln (\rho (x))\) is exactly what we see in the formula for Shannon’s entropy, so we get a (unexpected but clear) connection of quantum-type effects with Shannon’s entropy.

6 Fuzzy and Probabilistic Interpretations of Our General Formula

It is not easy to interpret our formula in probabilistic terms. In general, in the traditional probability theory, the probability \(P(A\vee B)\) of the disjunction \(A\vee B\) cannot exceed the sum of the two corresponding probabilities: \(P(A\vee B)\le P(A)+P(B)\). Similarly, for the probability density functions \(\rho _1(x)\) and \(\rho _2(x)\), we should have \(\rho (x)\le \rho _1(x)+\rho _2(x)\).

However, in the quantum case,

$$\rho (x)=\rho _1(x)+\rho _2(x)+2\sqrt{\rho _1(x)\cdot \rho _2(x)}> \rho _1(x)+\rho _2(x).$$

This is one of the results showing that quantum formulas cannot be interpreted in terms of the traditional probabilities.

A similar inequality holds for all possible values \(\alpha <1\). So what shall we do?

Comment. The fact that some aspects of human behavior cannot be adequately described in probabilistic terms is well known; see, e.g., [5]. For example, in certain situations, people estimate the probability that a person X is a professional and a feminist as higher than the probability that this person is a feminist – an inequality which is impossible for probabilities, since the probability of a sub-event cannot exceed the probability of the original super-event.

First option: fuzzy interpretation. A natural first option is to take into account that, since here, \(P(A\vee B)>P(A)+P(B)\), the values P(A), P(B), and \(P(A\vee B)\) are not real probabilities, they are non-probabilistic degrees of certainty. Thus, to describe these degrees, it is reasonable to consider the most well-known non-probabilistic uncertainty formalism: the formalism of fuzzy logic; see, e.g., [2, 7, 9, 11, 15].

Second option: probabilistic interpretation. Another option is to explicitly take into account that, e.g., \(\rho _1(x)\) is the probability that the particle passed through the first slot and did not pass through the second slot.

This is a known possible interpretation of the above feminist paradox – that when people are asked to compare the probability that X is a feminist and the probability that X is a professional and a feminist, they interpret the first option as saying that X is a feminist but not a professional.

If we denote by \(A_1\) the event that the particle passed through the first slot, this means that \(\rho _1(x)\) is not the probability of this event \(A_1\), but rather the probability of a composite event \( A'_1\,{\mathop {=}\limits ^\mathrm{def}}\,A_1\, \& \,(\lnot A_2)\), i.e., the value \( P(A_1)-P(A\, \& \,A_2)\) Similarly, \(\rho _2(x)\) is the probability of \( A'_2\,{\mathop {=}\limits ^\mathrm{def}}\,A_2\, \& \,(\lnot A_1)\), i.e., \( P(A_2)-P(A_1\, \& \, A_2)\).

Here, \(\rho (x)\) is the probability of \(A_1\vee A_2\), i.e., the probability \( P(A_1\vee A_2)=P(A_1)+P(A_2)-P(A_1\, \& \,A_2)\). Thus, our formula \((\rho (x))^\alpha =(\rho _1(x))^\alpha +(\rho _2(x))^\alpha \) can be interpreted as the following indirect formula for determining a function that described the probability \( C(u,v)\,{\mathop {=}\limits ^\mathrm{def}}\,P(A_1\, \& \,A_2)\) in terms of the probabilities \(u\,{\mathop {=}\limits ^\mathrm{def}}\,P(A_1)\) and \(v\,{\mathop {=}\limits ^\mathrm{def}}\,P(A_2)\):

$$(u+v-C(u,v))^\alpha =(u-C(u,v))^\alpha +(v-C(u,v))^\alpha .$$

Comment. This interpretation is related to a fuzzy one. Indeed, in the probabilistic interpretation, we started with the assumption that the particle cannot go through both slots, i.e., that going through the second slot is equivalent to the negation of going through the first slot. We ended up by realizing that, to make sense of the corresponding formulas for the probabilities, we need to allow a non-zero probability that the particle goes through both slots.

This is exactly what fuzzy does: instead of assuming that a person is either young or not young, it takes into account that the same person can be to some extent young and to some extent not young.

In the quantum case, we have an explicit expression for the corresponding function C(uv). In general, from the above formula, we cannot extract an explicit expression for \(C\,{\mathop {=}\limits ^\mathrm{def}}\,C(u,v)\), but it is possible in the quantum case, when \(\alpha =1/2\). In this case, by squaring both sides of the above equation, we conclude that

$$u+v-C=u-C+v-C+2\sqrt{(u-C)\cdot (v-C)}.$$

By cancelling equal terms on both sides and moving C to the left-hand side, we conclude that \(C=2\sqrt{(u-C)\cdot (v-C)}\). Squaring both sides of this equality, we get \(C^2=4u\cdot v-4(u+v)\cdot C+4C^2,\) i.e., the quadratic equation

$$3C^2-4(u+v)\cdot C+4u\cdot v=0,$$

with an explicit solution

$$C(u,v)=\frac{2(u+v)\pm \sqrt{4(u+v)^2-48u\cdot v}}{6}= \frac{u+v\pm \sqrt{(u+v)^2-12u\cdot v}}{3}.$$

Comment. It is worth mentioning that, as one can check, the resulting operation C(uv) is not associative.