1 Introduction

Recently the mathematical formalism and methodology of QM, especially theory of quantum measurement, started to be widely applied outside of physicsFootnote 1: to cognition, psychology, economics, finances, decision making, AI, game theory, and information retrieval (for the latter, see, e.g., [24, 51,52,53, 56, 58, 59] and the chapters in this book). This chapter contains a brief introduction to the mathematical formalism and axiomatics of QM. It is oriented to non-physicists. Since QM is a statistical theory it is natural to start with the classical probability model (Kolmogorov [49]). Then we present basics of theory of Hilbert spaces and Hermitian operators, representation of pure and mixed states by normalized vectors and density operators. This introduction is sufficient to formulate the axiomatics of QM in the form of five postulates. The projection postulate (the most questionable postulate of QM) is presented in a separate section. We distinguish sharply the cases of quantum observables represented by Hermitian operators with nondegenerate and degenerate spectra, the von Neumann’s and Lüders’ forms of the projection postulate. The axiomatics is completed by a short section on the main interpretations of QM. The projection postulate (in Lüders’ form) plays the crucial role in the definition of quantum conditional (transition) probability. By operating with the latter we consider interference of probabilities for two incompatible observables, as a modification of the formula of total probability by adding the interference term. This viewpoint to interference of probabilities was elaborated in a series of works of Khrennikov (see, e.g., [29,30,31,32,33,34,35,36,37,38]).

Since classical probability theory is based on the Boolean algebra of events a violation of the law of total probability can be treated as the probabilistic sign of a violation of the laws of the Boolean logics. From this viewpoint, quantum theory can be considered as representing a new kind of logic, the so-called quantum logic. The latter is also briefly presented in a separate section.

We continue this review with a new portion of “quantum mathematics,” namely the notion of the tensor product of Hilbert spaces and the tensor product of operators. After the section on Dirac’s notation with ket- and bra-vector, we discuss briefly the notion of qubit and entanglement of a few qubits. This chapter is finished with the presentation of the detailed analysis of the probabilistic structure of the two-slit experiment, as a bunch of different experimental contexts. This contextual structure leads to a violation of the law of total probability and the non-Kolmogorovean probabilistic structure of this experiment.

We hope that this chapter would be interesting for newcomers to quantum-like modeling. May be even experts can find something useful, say the treatment of interference of probabilities as a violation of the law of total probability. In any event, this chapter can serve as the mathematical and foundational introduction to other chapters of this book devoted to the concrete applications.

2 Kolmogorov’s Axiomatics of Classical Probability

The main aim of QM is to provide probabilistic predictions on the results of measurements. Moreover, statistics of the measurements of a single quantum observable can be described by classical probability theory. In this section we shall present an elementary introduction to this theory.

We remark that classical probability theory is coupled to experiment in the following way:

  • Experimental contexts (system’s state preparations) are represented by probabilities.

  • Observables are represented by random variables.

In principle, we can call probability a state and this is the direct analog of the quantum state (Sect. 6.1, the ensemble interpretation). However, we have to remember that the word “state” has the meaning “statistical state,” the state of an ensemble of systems prepared for measurement.

The Kolmogorov probability space [49, 50] is any triple

$$\displaystyle \begin{aligned} (\Omega, \mathcal{F}, \mathbf{P}), \end{aligned}$$

where Ω is a set of any origin and \(\mathcal {F}\) is a σ-algebra of its subsets, P is a probability measure on \(\mathcal {F}.\) The set Ω represents random parameters of the model. Kolmogorov called elements of Ω elementary events . This terminology is standard in mathematical literature. Sets of elementary events are regarded as events . The key point of Kolmogorov’s axiomatization of probability theory is that not any subset of Ω can be treated as an event. For any stochastic model, the system of events \(\mathcal {F}\) is selected from the very beginning. The key mathematical point is that \(\mathcal {F}\) has to be a σ-algebra. (Otherwise it would be very difficult to develop a proper notion of integral. And the latter is needed to define average of a random variable.)

We remind that a σ-algebra is a system of sets which contains Ω and empty set, it is closed with respect to the operations of countable union and intersection and to the operation of taking the complement of a set. For example, the collection of all subsets of Ω is a σ-algebra. This σ-algebra is used in the case of finite or countable sets:

$$\displaystyle \begin{aligned} \Omega=\{\omega_1,\ldots ,\omega_n,\ldots \}. \end{aligned} $$
(1)

However, for “continuous sets,” e.g., Ω = [a, b] ⊂R, the collection of all possible subsets is too large to have applications. Typically it is impossible to describe a σ-algebra in the direct terms. To define a σ-algebra, one starts with a simple system of subsets of Ω and then consider the σ-algebra which is generated from this simple system with the aid of aforementioned operations. In particular, one of the most important for applications σ-algebras, the so-called Borel σ-algebra , is constructed in this way by staring with the system consisting of all open and closed intervals of the real line. In a metric space (in particular, in a Hilbert space), the Borel σ-algebra is constructed by starting with the system of all open and closed balls.

Finally, we remark that in American literature the term σ-field is typically used, instead of σ-algebra.

The probability is defined as a measure, i.e., a map from \(\mathcal {F}\) to nonnegative real numbers which is σ-additive:

$$\displaystyle \begin{aligned} \mathbf{P}(\cup_{j} A_j) = \sum_j \mathbf{P}(A_j), \end{aligned} $$
(2)

where \(A_j \in \mathcal {F}\) and A i ∩ A j = ∅, i ≠ j. The probability measure is always normalized by one:

$$\displaystyle \begin{aligned} \mathbf{P}(\Omega)=1. \end{aligned} $$
(3)

In the case of a discrete probability space, see (1), the probability measures have the form

$$\displaystyle \begin{aligned} \mathbf{P} (A) = \sum_{\omega_j \in A} p_j, \; p_j=\mathbf{P}(\{\omega_j\}). \end{aligned}$$

In fact, any finite measure μ, i.e., μ( Ω) < , can be transformed into the probability measure by normalization:

$$\displaystyle \begin{aligned} \mathbf{P}(A)= \frac{\mu(A)}{\mu(\Omega)}, A \in \mathcal{F}. \end{aligned} $$
(4)

A (real) random variable is a map ξ :  Ω →R which is measurable with respect to the Borel σ-algebra \(\mathcal {B}\) of R and the σ-algebra \(\mathcal {F}\) of Ω. The latter means that, for any set \(B \in \mathcal {B},\) its preimage ξ −1(B) = {ω ∈ Ω : ξ(ω) ∈ B} belongs to \(\mathcal {F}.\) This condition provides the possibility to assign the probability to the events of the type “values of ξ belong to a (Borel) subset of the real line.” The probability distribution of ξ is defined as

$$\displaystyle \begin{aligned} {\mathbf{P}}_\xi(B)= \mathbf{P}( \xi^{-1}(B)). \end{aligned} $$
(5)

In the same way we define the real (and complex) vector-valued random variables, ξ :  Ω →R n and ξ :  Ω →C n.

Let ξ 1, …, ξ k be real-valued random variables. Their join probability distribution \({\mathbf {P}}_{\xi _1,\ldots ,\xi _k}\) is defined as the probability distribution of the vector-valued random variable ξ = (ξ 1, …, ξ k). To determine this probability measure, it is sufficient to define probabilities

$$\displaystyle \begin{aligned} {\mathbf{P}}_{\xi_1,\ldots ,\xi_k}( \Gamma_1 \times \cdots \times \Gamma_k)= \mathbf{P}(\omega \in \Omega: \xi_1(\omega) \in \Gamma_1, \ldots ., \xi_k(\omega) \in \Gamma_k), \end{aligned}$$

where Γj, j = 1, …, k, are intervals (open, closed, half-open) of the real line.

Suppose now that random variables ξ 1, …, ξ k represent observables a 1, …, a k. For any point ω ∈ Ω, the values of the vector ξ composed of these random variables are well defined ξ(ω) = (ξ 1(ω), …, ξ k(ω)). This vector represents a joint measurement of the observables and \({\mathbf {P}}_{\xi _1,\ldots ,\xi _k}\) represents the probability distribution for the outcomes of these jointly measured observables. Thus classical probability theory is applicable for jointly measurable observables, compatible observables in the terminology of QM (Sect. 5).

A random variable is called discrete if its image consists of finite or countable number of points, ξ = α 1, …, α n, …. In this case its probability distribution has the form

$$\displaystyle \begin{aligned} \mathbf{P} (B)= \sum_{\alpha_j \in B} P_{\alpha_j}, \; P_{\alpha_j}=\mathbf{P}(\omega \in \Omega: \xi(\omega)= \alpha_j). \end{aligned} $$
(6)

The mean value (average ) of a real-valued random variable is defined as its integral (the Lebesgue integral)

$$\displaystyle \begin{aligned} E \xi = \int_\Omega \xi(\omega) d P(\omega). \end{aligned} $$
(7)

For a discrete random variable, its mean value has the simple representation:

$$\displaystyle \begin{aligned} E \xi= \sum_{\alpha_j \in B} \alpha_j P_{\alpha_j}. \end{aligned} $$
(8)

In the Kolmogorov model the conditional probability is defined by the Bayes formula

$$\displaystyle \begin{aligned} \mathbf{P} (B\vert A)= \frac{\mathbf{P} (B \cap A)}{\mathbf{P} (A)}, \; \mathbf{P} (A)> 0. \end{aligned} $$
(9)

We stress that other axioms are independent of this definition.

We also present the formula of total probability (FTP) which is a simple consequence of the Bayes formula. Consider the pair, a and b, of discrete random variables. Then

$$\displaystyle \begin{aligned} \mathbf{P} (b= \beta)= \sum_{\alpha} \mathbf{P}(a=\alpha) \mathbf{P} (b=\beta\vert a=\alpha). \end{aligned} $$
(10)

Thus the b-probability distribution can be calculated from the a-probability distribution and the conditional probabilities P(b = β|a = α). These conditional probabilities are known as transition probabilities.

This formula plays the crucial role in Bayesian inference. It is applicable to the plenty of phenomena, in insurance, finances, economics, engineering, biology, AI, game theory, decision making, and programming. However, as was shown by the author of this review, in quantum domain FTP is violated and it is perturbed by the so-called interference term. Recently it was shown that even data collected in cognitive science, psychology, game theory, and decision making can violate classical FTP [1,2,3,4,5,6,7,8,9,10, 18, 38,39,40,41, 44,45,46,47,48].

3 Quantum Mathematics

We present the basic mathematical structures of QM and couple them to quantum physics.

3.1 Hermitian Operators in Hilbert Space

We recall the definition of a complex Hilbert space. Denote it by H. This is a complex linear space endowed with a scalar product (a positive-definite nondegenerate Hermitian form) which is complete with respect to the norm corresponding to the scalar product, 〈⋅|⋅〉. The norm is defined as

$$\displaystyle \begin{aligned} \Vert \phi\Vert= \sqrt{\langle \phi\vert \phi\rangle}. \end{aligned}$$

In the finite-dimensional case the norm and, hence, completeness are of no use. Thus those who have no idea about functional analysis (but know essentials of linear algebra) can treat H simply as a finite-dimensional complex linear space with the scalar product.

For a complex number z = x + iy, x, y ∈R, its conjugate is denoted by \(\bar {z},\) here \(\bar {z}=x- iy.\) The absolute value of z is given by \(\vert z\vert ^2 = z\bar {z}= x^2 + y^2.\)

For reader’s convenience, we recall that the scalar product is a function from the Cartesian product H × H to the field of complex numbers \(\mathbb {C},\) ψ 1, ψ 2 →〈ψ 1|ψ 2〉, having the following properties:

  1. 1.

    Positive-definiteness: 〈ψ|ψ〉≥ 0 with 〈ψ, ψ〉 = 0 if and only if ψ = 0.

  2. 2.

    Conjugate symmetry: \(\langle \psi _{1} \vert \psi _{2}\rangle =\overline {\langle \psi _{2} \vert \psi _{1}\rangle }\)

  3. 3.

    Linearity with respect to the second argumentFootnote 2: 〈ϕ|k 1 ψ 1 + k 2 ψ 2〉 = k 1ϕ|ψ 1〉 + k 2ϕ|ψ 2〉, where k 1, k 2 are the complex numbers.

A reader who does not feel comfortable in the abstract framework of functional analysis can simply proceed with the Hilbert space H = C n, where C is the set of complex numbers, and the scalar product

$$\displaystyle \begin{aligned} \langle u \vert v \rangle= \sum_i u_i \bar{v}_i, u=(u_1,\ldots ,u_n), v= (v_1,\ldots ,v_n). \end{aligned} $$
(11)

In this case the above properties of a scalar product can be easily derived from (11). Instead of linear operators, one can consider matrices.

We also recall a few basic notions of theory of linear operators in complex Hilbert space. A map \(\widehat {a}: H \to H\) is called a linear operator, if it maps linear combination of vectors into linear combination of their images:

$$\displaystyle \begin{aligned} \widehat{a} (\lambda_1 \psi_1 + \lambda_2 \psi_2) = \lambda_1 \widehat{a} \psi_1 + \lambda_2 \widehat{a} \psi_2, \end{aligned}$$

where λ j ∈C, ψ j ∈ H, j = 1, 2.

For a linear operator \(\widehat {a}\) the symbol \(\widehat {a}^{\ast }\) denotes its adjoint operator which is defined by the equality

$$\displaystyle \begin{aligned} \langle \widehat{a} \psi_{1}|\psi_{2}\rangle=\langle\psi_{1}|\widehat{a}^{\ast}\psi_{2}\rangle. {} \end{aligned} $$
(12)

Let us select in H some orthonormal basis (e i), i.e., 〈e i|e j〉 = δ ij. By denoting the matrix elements of the operators \(\widehat {a}\) and \(\widehat {a}^{\ast }\) as a ij and \(a_{ij}^{\ast },\) respectively, we rewrite the definition (12) in terms of the matrix elements:

$$\displaystyle \begin{aligned} a_{ij}^{\ast}=\bar{a}_{ji}. \end{aligned}$$

A linear operator \(\widehat {a}\) is called Hermitian if it coincides with its adjoint operator:

$$\displaystyle \begin{aligned} \widehat{a}= \widehat{a}^\star. \end{aligned}$$

If an orthonormal basis in H is fixed, (e i), and \(\widehat {a}\) is represented by its matrix, A = (a ij), where \(a_{ij}= \langle \widehat {a} e_i \vert e_j \rangle ,\) then it is Hermitian if and only if

$$\displaystyle \begin{aligned} \bar{a}_{ij}=a_{ji}. \end{aligned}$$

We remark that, for a Hermitian operator, all its eigenvalues are real. In fact, this was one of the main reasons to represent quantum observables by Hermitian operators. In the quantum formalism, the spectrum of a linear operator (the set of eigenvalues while we are in the finite-dimensional case) coincides with the set of possibly observable values (Sect. 4, Postulate 3). We also recall that eigenvectors of Hermitian operators corresponding to different eigenvalues are orthogonal. This property of Hermitian operators plays some role in justification of the projection postulate of QM, see Sect. 5.1.

A linear operator \(\widehat {a}\) is positive-semidefinite if, for any ϕ ∈ H,

$$\displaystyle \begin{aligned} \langle \widehat{a} \phi \vert \phi \rangle \geq 0. \end{aligned}$$

This is equivalent to positive-semidefiniteness of its matrix.

For a linear operator \(\widehat {a},\) its trace is defined as the sum of diagonal elements of its matrix in any orthonormal basis:

$$\displaystyle \begin{aligned} \mathrm{Tr}\; \widehat{a}= \sum_i a_{ii}= \sum_i \langle \widehat{a} e_i \vert e_i \rangle, \end{aligned}$$

i.e., this quantity does not depend on a basis.

Let L be a subspace of H. The orthogonal projector P : H → L onto this subspace is a Hermitian, idempotent (i.e., coinciding with its square), and positive-semidefinite operatorFootnote 3:

  1. (a)

    P  = P;

  2. (b)

    P 2 = P;

  3. (c)

    P ≥ 0.

Here (c) is a consequence of (a) and (b). Moreover, an arbitrary linear operator satisfying (a) and (b) is an orthogonal projector—onto the subspace PH.

3.2 Pure and Mixed States: Normalized Vectors and Density Operators

Pure quantum states are represented by normalized vectors, ψ ∈ H : ∥ψ∥ = 1. Two colinear vectors,

$$\displaystyle \begin{aligned} \psi^\prime= \lambda \psi, \lambda \in \mathbf{C}, \vert \lambda\vert=1, \end{aligned} $$
(13)

represent the same pure state. Thus, rigorously speaking, a pure state is an equivalence class of vectors having the unit norm: ψ ∼ ψ for vectors coupled by (13). The unit sphere of H is split into disjoint classes—pure states. However, in concrete calculations one typically uses just concrete representatives of equivalent classes, i.e., one works with normalized vectors.

Each pure state can also be represented as the projection operator P ψ which projects H onto one dimensional subspace based on ψ. For a vector ϕ ∈ H,

$$\displaystyle \begin{aligned} P_\psi \phi= \langle \phi \vert \psi \rangle \; \psi. \end{aligned} $$
(14)

The trace of the one dimensional projector P ψ equals 1:

$$\displaystyle \begin{aligned} \mathrm{Tr}\; P_\psi= \langle \psi\vert \psi \rangle=1. \end{aligned}$$

We summarize the properties of the operator P ψ representing the pure state ψ. It is

  1. (a)

    Hermitian,

  2. (b)

    positive-semidefinite,

  3. (c)

    trace one,

  4. (d)

    idempotent.

Moreover, any operator satisfying (a)–(d) represents a pure state. Properties (a) and (d) characterize orthogonal projectors, property (b) is their consequence. Property (c) implies that the projector is one dimensional.

The next step in the development of QM was the extension of the class of quantum states, from pure states represented by one dimensional projectors to states represented by linear operators having the properties (a)–(c). Such operators are called density operators. (This nontrivial step of extension of the class of quantum states was based on the efforts of Landau and von Neumann.) The symbol D(H) denotes the space of density operators in the complex Hilbert space H.

One typically distinguishes pure states, as represented by one dimensional projectors, and mixed states, the density operators which cannot be represented by one dimensional projectors. The terminology “mixed” has the following origin: any density operator can be represented as a “mixture” of pure states (ψ i) : 

$$\displaystyle \begin{aligned} \rho= \sum_i p_i P_{\psi_i}, \; p_i \in [0,1], \sum_i p_i =1. \end{aligned} $$
(15)

(To simplify formulas, we shall not put the operator-label “hat” in the symbols denoting density operators, i.e., \(\rho \equiv \widehat {\rho }.)\) The state is pure if and only if such a mixture is trivial: all p i, besides one, equal zero. However, by operating with the terminology “mixed state” one has to take into account that the representation in the form (15) is not unique. The same mixed state can be presented as mixtures of different collections of pure states.

Any operator ρ satisfying (a)–(c) is diagonalizable (even in infinite-dimensional Hilbert space), i.e., in some orthonormal basis it is represented as a diagonal matrix, ρ = diag(p j), where p j ∈ [0, 1], ∑j p j = 1. Thus it can be represented in the form (15) with mutually orthogonal one dimensional projectors. The property (d) can be used to check whether a state is pure or not.

We point out that pure states are merely mathematical abstractions; in real experimental situations, it is possible to prepare only mixed states. The degree of purity is defined as

$$\displaystyle \begin{aligned} \mathrm{purity}(\rho)= \mathrm{Tr} \rho^2. \end{aligned}$$

Experimenters are satisfied by getting this quantity near one.

4 Quantum Mechanics: Postulates

We state again that H denotes complex Hilbert space with the scalar product 〈⋅, ⋅〉 and the norm ∥⋅∥ corresponding to the scalar product.

Postulate 1

(The Mathematical Description of Quantum States) Quantum (pure) states (wave functions) are represented by normalized vectors ψ (i.e.,ψ2 = 〈ψ, ψ〉 = 1) of a complex Hilbert space \(\mathcal {H}.\) Every normalized vector ψ  H may represent a quantum state. If a vector ψ corresponding to a state is multiplied by any complex number c, |c| = 1, the resulting vector will correspond to the same state. Footnote 4

The physical meaning of “a quantum state” is not defined by this postulate, see Sect. 6.1.

Postulate 2

(The Mathematical Description of Physical Observables) A physical observable a is represented by a Hermitian operator \(\widehat {a}\) in complex Hilbert space H. Different observables are represented by different operators.

Postulate 3

(Spectral) For a physical observable a which is represented by the Hermitian operator \(\widehat {a}\) we can predict (together with some probabilities) values \(\lambda \in \mathrm {Spec} (\widehat {a}).\)

We restrict our considerations by simplest Hermitian operators which are analogous to discrete random variables in classical probability theory. We recall that a Hermitian operator \(\widehat {a}\) has purely discrete spectrum if it can be represented as

$$\displaystyle \begin{aligned} \widehat{a} = \alpha_1 P_{\alpha_1}^a +\cdots +\alpha_m P_{\alpha_m}^a+\cdots ,\; \alpha_m \in \mathbf{R}, \end{aligned} $$
(16)

where \(P_{\alpha _m}^a\) are orthogonal projection operators related to the orthonormal eigenvectors \(\{e_{km}^a\}_k\) of \(\widehat {a}\) corresponding to the eigenvalues α m by

$$\displaystyle \begin{aligned} P_{\alpha_m}^a \psi =\sum_{k} \langle \psi, e_{km}^a\rangle e_{km}^a,\; \psi \in H. \end{aligned} $$
(17)

Here k labels the eigenvectors \(e_{km}^a\) which belong to the same eigenvalue α m of \(\widehat {a}.\)

Postulate 4

(Born’s Rule) Let a physical observable a be represented by a Hermitian operator \(\widehat {a}\) with purely discrete spectrum. The probability P ψ(a = α m) to obtain the eigenvalue α m of \(\widehat {a}\) for measurement of a in a state ψ is given by

$$\displaystyle \begin{aligned} {\mathbf{P}}_\psi (a=\alpha_m)= \Vert P_m^a \psi \Vert^2 . \end{aligned} $$
(18)

If the operator \(\widehat {a}\) has nondegenerate (purely discrete) spectrum , then each α m is associated with one dimensional subspace. The latter can be fixed by selecting any normalized vector, say \(e_{m}^a.\) In this case orthogonal projectors act simply as

$$\displaystyle \begin{aligned} P_{\alpha_m}^a \psi = \langle \psi, e_{m}^a\rangle e_{m}^a. \end{aligned} $$
(19)

Formula (18) takes a very simple form

$$\displaystyle \begin{aligned} {\mathbf{P}}_\psi (a=\alpha_m)= \vert \langle \psi, e_{m}^a\rangle \vert^2 . \end{aligned} $$
(20)

It is Born’s rule in the Hilbert space formalism.

It is important to point out that if state ψ is an eigenstate of operator \(\widehat {a}\) representing observable a, i.e., \(\widehat {a} \psi = \alpha \psi ,\) then the outcome of observable a equals α with probability one.

We point out that, for any fixed quantum state ψ, each quantum observable \(\widehat {a}\) can be represented as a classical random variable (Sect. 2). In the discrete case the corresponding probability distribution is defined as

$$\displaystyle \begin{aligned} \mathbf{P} (A)= \sum_{\alpha_m \in A} {\mathbf{P}}_\psi (a=\alpha_m), \end{aligned}$$

where P ψ(a = α m) is given by Born’s rule.

Thus each concrete quantum measurement can be described by the classical probability model.

Problems (including deep interpretational issues) arise only when one tries to describe classically data collected for a few incompatible observables (Sect. 5).

By using the Born’s rule (18 ) and the classical probabilistic definition of average (Sect. 2), it is easy to see that the average value of the observable a in the state ψ (belonging to the domain of definition of the corresponding operator \(\widehat {a})\) is given by

$$\displaystyle \begin{aligned} \langle a \rangle_\psi =\langle \widehat{a}\;\psi,\psi \rangle. \end{aligned} $$
(21)

For example, for an observable represented by an operator with the purely discrete spectrum, we have

$$\displaystyle \begin{aligned} \langle a \rangle _\psi = \sum_m \alpha_m {\mathbf{P}}_\psi (a=\alpha_m) = \sum_m \alpha_m \Vert P_m^a \psi \Vert^2 = \langle \widehat{a}\;\psi,\psi \rangle. \end{aligned}$$

Postulate 5

(Time Evolution of Wave Function) Let \(\widehat {\mathcal {H}} \) be the Hamiltonian of a quantum system, i.e., the Hermitian operator corresponding to the energy observable. The time evolution of the wave function ψ  H is described by the Schrödinger equation

$$\displaystyle \begin{aligned} i \frac{d}{d t} \psi(t)= \widehat{\mathcal{H}} \psi(t) \end{aligned} $$
(22)

with the initial condition ψ(0) = ψ.

5 Compatible and Incompatible Observables

Two observables a and b are called compatible if a measurement procedure for their joint measurement can be designed, i.e., a measurement of the vector observable d = (a, b). In such a case their joint probability distribution is well defined. In the opposite case, i.e., when such a joint-measurement procedure does not exist, observables are called incompatible. The joint probability distribution of incompatible observables has no meaning.

In QM, compatible observables a and b are represented by commuting Hermitian operators \(\widehat {a}\) and \(\widehat {b},\) i.e., \([\widehat {a}, \widehat {b}]=0;\) consequently, incompatible observables a and b are represented by noncommuting operators, i.e., \([\widehat {a}, \widehat {b}]\neq 0.\) Thus in the QM-formalism compatibility–incompatibility is represented as commutativity–noncommutativity.

Postulate 4a

(Born’s Rule for Joint Measurements) Let observables a and b be represented by Hermitian operators \(\widehat {a}\) and \(\widehat {b}\) with purely discrete spectrum. The probability to obtain the eigenvalues α m and β k in a joint measurement of a and b in a state ψ—the joint probability distribution—is given by

$$\displaystyle \begin{aligned} {\mathbf{P}}_\psi (a=\alpha_m, b= \beta_k) = \Vert P_k^b P_m^a \psi \Vert^2= \Vert P_m^a P_k^b\psi \Vert^2 . \end{aligned} $$
(23)

It is crucial that the spectral projectors of commuting operators commute, so the probability distribution does not depend on the order of the values of observables. This is a classical probability distribution (Sect. 2). Any pair of compatible observables a and b can be represented by random variables: for example, by using the joint probability distribution as the probability measure.

A family of compatible observables a 1, …, a n is represented by commuting Hermitian operators \(\widehat {a}_1,\ldots , \widehat {a}_n,\) i.e., \([\widehat {a}_i, \widehat {a}_j]= 0\) for all pairs i, j. The joint probability distribution is given by the natural generalization of rule (23):

$$\displaystyle \begin{aligned} {\mathbf{P}}_{a_1,\ldots , a_n; \psi}(\alpha_{1m}, \ldots , \alpha_{nk} )\equiv {\mathbf{P}}_\psi (a_1=\alpha_{1m}, \ldots , a_n=\alpha_{nk} ) \end{aligned}$$
$$\displaystyle \begin{aligned} = \Vert P_k^{a_n}\ldots . P_m^{a_1} \psi \Vert^2= \cdots .= \Vert P_m^{a_1} \ldots . P_k^{a_n} \psi \Vert^2, \end{aligned} $$
(24)

where all possible permutations of projectors can be considered.

Now we point to one distinguishing feature of compatibility of quantum observables which is commonly not emphasized. The relation of commutativity of operators is the pairwise relation, it does not involve say triples of operators. Thus, for joint measurability of a group of quantum observables a 1, …, a n, their pairwise joint measurability is sufficient. Thus if we are able to design measurement procedures for all possible pairs, then we are always able to design a joint-measurement procedure for the whole group of quantum observables a 1, …, a n. This is the specialty of quantum observables. In particular, if there exist all pairwise joint probability distributions \({\mathbf {P}}_{a_i,a_j; \psi },\) then the joint probability \({\mathbf {P}}_{a_1,\ldots , a_n; \psi }\) is defined as well.

The Born’s rule can be generalized to the quantum states represented by density operators (Sect. 9.1, formula (40)).

5.1 Post-Measurement State From the Projection Postulate

The projection postulate is one of the most questionable and debatable postulates of QM. We present it in the separate section to distinguish it from other postulates of QM, Postulates 1–5, which are commonly accepted.

Consider pure state ψ and quantum observable (Hermitian operator) \(\widehat {a}\) representing some physical observable a. Suppose that \(\widehat {a}\) has nondegenerate spectrum; denote its eigenvalues by α 1, .., α m, … and the corresponding eigenvectors by \(e_{1}^a,\ldots ,e_{m}^a, \ldots \) (here α i ≠ α j, i ≠ j.) This is an orthonormal basis in H. We expand the vector ψ with respect to this basis:

$$\displaystyle \begin{aligned} \psi=k_{1}e_{1}^a+\cdots+k_{m}e_{m}^a+\cdots , \end{aligned} $$
(25)

where (k j) are complex numbers such that

$$\displaystyle \begin{aligned} \Vert \psi \Vert^2= |k_{1}|{}^{2}+\cdots +|k_{m}|{}^{2}+\cdots =1. {} \end{aligned} $$
(26)

By using the terminology of linear algebra we say that the pure state ψ is a superposition of the pure states e j. The von Neumann projection postulate describes the post-measurement state and it can be formulated as follows:

Postulate 6VN

(Projection Postulate, von Neumann) Measurement of observable a resulting in output α i induces reduction of superposition (25) to the basis vector \(e_i^a.\)

The procedure described by the projection postulate can be interpreted in the following way:

Superposition (25) reflects uncertainty in the results of measurements for an observable a. Before measurement a quantum system “does not know how it will answer to the question a.” The Born’s rule presents potentialities for different answers. Thus a quantum system in the superposition state ψ does not have propensity to any value of a as its objective property. After the measurement the superposition is reduced to the single term in the expansion (25) corresponding to the value of a obtained in the process of measurement.

Consider now an arbitrary quantum observable a with purely discrete spectrum, i.e., \(\widehat {a} = \alpha _1 P_{\alpha _1}^a +\cdots +\alpha _m P_{\alpha _m}^a+\cdots .\) The Lüders projection postulate describes the post-measurement state and it can be formulated as follows:

Postulate 6L

(Projection Postulate, Lüders) Measurement of observable a resulting in output α m induces projection of state ψ on state

$$\displaystyle \begin{aligned} \psi_{\alpha_m}= \frac{P_{\alpha_m}^a \psi}{\Vert P_{\alpha_m}^a \psi \Vert}. \end{aligned}$$

In contrast to the majority of books on quantum theory, we sharply distinguish the cases of quantum observables with nondegenerate and degenerate spectra. von Neumann formulated Postulate 6VN only for observables with nondegenerate spectra. Lüders “generalized” von Neumann’s postulate to the case of observables with degenerate spectra. However, for such observables, von Neumann formulated [60] a postulate which is different from Postulate 6L. The post-measurement state need not be again a pure state.

We remark that Postulate 6L can be applied even to quantum states which are represented by density operators (Sect. 9.1, formula (41)).

6 Interpretations of Quantum Mechanics

Now we are going to discuss one of the most important and complicated issues of quantum foundations, the problem of an interpretation of a quantum state. There were elaborated numerous interpretations which can differ crucially from each other. This huge diversity of interpretations is a sign of the deep crises in quantum foundations.

In this section, we briefly discuss a few basic interpretations. Then in Sect. 9.1 we shall consider the Växjö (realist ensemble contextual) interpretation. Its presentation needs additional mathematical formulas. Therefore we placed it into a separate section.

6.1 Ensemble and Individual Interpretations

The Ensemble Interpretation

A (pure) quantum state provides a description of certain statistical properties of an ensemble of similarly prepared quantum systems.

This interpretation is upheld, for example, by Einstein, Popper, Blokhintsev, Margenau, Ballentine, Klyshko, and recent years by, e.g., De Muynck, De Baere, Holevo, Santos, Khrennikov, Nieuwenhuizen, Adenier, Groessing, and many others.

The Copenhagen Interpretation

A (pure) quantum state provides the complete description of an individual quantum system.

This interpretation was supported by a great variety of members, from Schrödinger’s original attempt to identify the electron with a wave function solution of his equation to the several versions of the Copenhagen interpretation [12,13,14, 53, 54] (for example, Heisenberg, Bohr, Pauli, Dirac, von Neumann, Landau, Fock, and recent years by, e.g., Greenberger, Mermin, Lahti, Peres, Summhammer). Nowadays the individual interpretation is extremely popular, especially in quantum information and computing.

Instead of Einstein’s terminology “ensemble interpretation,” Ballentine [7,8,9] used the terminology “statistical interpretation.” However, Ballentine’s terminology is rather misleading, because the term “statistical interpretation” was also used by von Neumann [60] for individual randomness! For him “statistical interpretation” had the meaning which is totally different from the Ballentine’s “ensemble-statistical interpretation.” J. von Neumann wanted to emphasize the difference between deterministic (Newtonian) classical mechanics in that the state of a system is determined by values of two observables (position and momentum) and quantum mechanics in that the state is determined not by values of observables, but by probabilities. We shall follow Albert Einstein and use the terminology ensemble interpretation.

We remark that following von Neumann [60] the supporters of the individual interpretation believe in irreducible quantum randomness, i.e., that the behavior of an individual quantum system is irreducibly random. Why does it behave in such a way? Because it is quantum, so it can behave so unusually. Nowadays this von Neumann’s claim is used to justify superiority of the quantum technology over the classical technology. For example, superiority of quantum random generators.

6.2 Information Interpretations

The quantum information revolution generated a variety of information interpretations of QM (see, for example, [16, 17, 19, 20]). By these interpretations the quantum formalism describes special way of information processing, more general than the classical information processing. Roughly speaking one can forget about physics and work solely with probability, entropy, and information. Quantum Bayesianism (QBism) [25, 26] can be considered as one of such information, in its extreme form: the quantum formalism describes very general scheme of assignments of subjective probabilities to possible outcomes of experiments, assignment by human agents.

7 Quantum Conditional (Transition) Probability

In the classical Kolmogorov probabilistic model (Sect. 2), besides probabilities one operates with the conditional probabilities defined by the Bayes formula (see Sect. 2, formula (9)). The Born’s postulate defining quantum probability should also be completed by a definition of the conditional probability. We have remarked that, for one concrete observable, the probability given by Born’s rule can be treated classically. However, the definition of the conditional probability involves two observables. Such situations cannot be treated classically (at least straightforwardly, cf. Sect. 2). Thus conditional probability is really a quantum probability.

Let physical observables a and b be represented by Hermitian operators with purely discrete (may be degenerate) spectra:

$$\displaystyle \begin{aligned} \widehat{a}=\sum_{m} \alpha_m P_{\alpha_m}^a ,\; \widehat{b}=\sum_{m} \beta_m P_{\beta_m}^b. \end{aligned} $$
(27)

Let ψ be a pure state and let \(P_{\alpha _k}^a \psi \neq 0.\) Then the probability to get the value b = β m under the condition that the value a = α k was observed in the preceding measurement of the observable a on the state ψ is given by the formula

$$\displaystyle \begin{aligned} {\mathbf{P}}_\psi (b=\beta_m\vert a=\alpha_k) \equiv \frac{\Vert P_{\beta_m}^b \; P_{\alpha_k}^a\psi \Vert^2}{\Vert P_{\alpha_k}^a \; \psi \Vert^2}. \end{aligned} $$
(28)

One can motivate this definition by appealing to the projection postulate (Lüders’ version). After the a-measurement with output a = α k initially prepared state ψ is projected onto the state

$$\displaystyle \begin{aligned} \psi_{\alpha_k}= \frac{P_{\alpha_k}^a\psi }{\Vert P_{\alpha_k}^a \; \psi \Vert}. \end{aligned}$$

Then one applies Born’s rule to the b-measurement for this state.

Let the operator \(\widehat {a}\) has nondegenerate spectrum, i.e., for any eigenvalue α the corresponding eigenspace (i.e., generated by eigenvectors with \(\widehat {a}\psi = \alpha \psi )\) is one dimensional. We can write

$$\displaystyle \begin{aligned} {\mathbf{P}}_\psi (b=\beta_m\vert a=\alpha_k)= \Vert P_{\beta_m}^b e_k^a \Vert^2 \end{aligned} $$
(29)

(here \(\widehat {a} e_k^a = \alpha _k e_k^a).\) Thus the conditional probability in this case does not depend on the original state ψ. We can say that the memory about the original state was destroyed. If also the operator \(\widehat {b}\) has nondegenerate spectrum, then we have: \( {\mathbf {P}}_\psi (b=\beta _m\vert a=\alpha _k)= \vert \langle e_m^b,e_k^a \rangle \vert ^2 \) and \( {\mathbf {P}}_\psi (a=\alpha _k\vert b=\beta _m)= \vert \langle e_k^a,e_m^b \rangle \vert ^2.\) By using symmetry of the scalar product we obtain the following result:

Let both operators \(\widehat {a}\) and \(\widehat {b}\) have purely discrete nondegenerate spectra and let \(P_k^a \psi \neq 0\) and \(P_m^b \psi \neq 0.\) Then conditional probability is symmetric and it does not depend on the original state ψ : 

$$\displaystyle \begin{aligned} {\mathbf{P}}_\psi (b=\beta_m\vert a=\alpha_k)= {\mathbf{P}}_\psi (a=\alpha_k\vert b=\beta_m)= \vert \langle e_m^b,e_k^a \rangle\vert^2. \end{aligned} $$
(30)

8 Observables with Nondegenerate Spectra: Double-Stochasticity of the Matrix of Transition Probabilities

We remark that classical (Kolmogorov–Bayes) conditional probability is not symmetric, besides very special situations. Thus QM is described by a very specific probabilistic model.

Consider two nondegenerate observables. Set p β|α = P(b = β|a = α). The matrix of transition probabilities P b|a = (p β|α) is not only stochastic, i.e.,

$$\displaystyle \begin{aligned} \sum_{\beta} p_{\beta \vert \alpha}=1 \end{aligned}$$

but it is even doubly stochastic :

$$\displaystyle \begin{aligned} \sum_{\alpha} p_{\beta \vert \alpha}= \sum_{\alpha} \vert \langle e_\beta^b,e_\alpha^a \rangle\vert^2= \langle e_\beta^b,e_\beta^b\rangle=1. \end{aligned}$$

In Kolmogorov’s model, stochasticity is the general property of matrices of transition probabilities. However, in general classical matrices of transition probabilities are not doubly stochastic. Hence, double stochasticity is a very specific property of quantum probability.

We remark that statistical data collected outside quantum physics, e.g., in decision making by humans and psychology, violates the quantum law of double stochasticity [38]. Such data cannot be mathematically represented with the aid of Hermitian operators with nondegenerate spectra. One has to consider either Hermitian operators with degenerate spectra or positive operator valued measures (POVMs).

9 Formula of Total Probability with the Interference Term

We shall show that the quantum probabilistic calculus violates the conventional FTP (10), one of the basic laws of classical probability theory. In this section, we proceed in the abstract setting by operating with two abstract incompatible observables. The concrete realization of this setting for the two-slit experiment demonstrating interference of probabilities in QM will be presented in Sect. 16 which is closely related to Feynman’s claim [22, 23] on the nonclassical probabilistic structure of this experiment.

Let H 2 = C ×C be the two dimensional complex Hilbert space and let ψ ∈ H 2 be a quantum state. Let us consider two dichotomous observables b = β 1, β 2 and a = α 1, α 2 represented by Hermitian operators \(\widehat {b}\) and \(\widehat {a},\) respectively (one may consider simply Hermitian matrices). Let \(e^b = \{e_\beta ^b \}\) and \(e^a = \{e_\alpha ^a \}\) be two orthonormal bases consisting of eigenvectors of the operators. The state ψ can be represented in the two ways

$$\displaystyle \begin{aligned} \psi= c_1 e_1^a + c_2 e_2^a, \; c_\alpha= \langle \psi, e_\alpha^a \rangle ; \end{aligned} $$
(31)
$$\displaystyle \begin{aligned} \psi= d_1 e_1^b + d_2 e_2^b, \; d_\beta= \langle \psi, e_\beta^b \rangle . \end{aligned} $$
(32)

By Postulate 4 we have

$$\displaystyle \begin{aligned} \mathbf{P}( a= \alpha)\equiv {\mathbf{P}}_\psi (a=\alpha)= \vert c_\alpha \vert^2. \end{aligned} $$
(33)
$$\displaystyle \begin{aligned} \mathbf{P}( b= \beta) \equiv {\mathbf{P}}_\psi ( b= \beta) = \vert d_\beta \vert^2. \end{aligned} $$
(34)

The possibility to expand one basis with respect to another basis induces connection between the probabilities P(a = α) and P(b = β). Let us expand the vectors \(e_\alpha ^a\) with respect to the basis e b

$$\displaystyle \begin{aligned} e_1^a = u_{11} e_1^b + u_{12} e_2^b \end{aligned} $$
(35)
$$\displaystyle \begin{aligned} e_2^a = u_{21} e_1^b + u_{22} e_2^b, \end{aligned} $$
(36)

where \(u_{\alpha \beta }= \langle e_\alpha ^a, e_\beta ^b \rangle .\) Thus d 1 = c 1 u 11 + c 2 u 21, d 2 = c 1 u 12 + c 1 u 22. We obtain the quantum rule for transformation of probabilities:

$$\displaystyle \begin{aligned} \mathbf{P} (b= \beta) = \vert c_1 u_{1 \beta} + c_2 u_{2\beta}\vert^2 . \end{aligned} $$
(37)

On the other hand, by the definition of quantum conditional probability, see (28), we obtain

$$\displaystyle \begin{aligned} \mathbf{P} (b=\beta\vert a=\alpha)\equiv {\mathbf{P}}_\psi (b=\beta\vert a=\alpha) = \vert \langle e_\alpha^a, e_\beta^b \rangle \vert^2. \end{aligned} $$
(38)

By combining (33), (34) and (37), (38) we obtain the quantum formula of total probabilitythe formula of the interference of probabilities:

$$\displaystyle \begin{aligned} \mathbf{P} (b= \beta)= \sum_{\alpha} \mathbf{P}(a=\alpha) \mathbf{P} (b=\beta\vert a=\alpha) \end{aligned} $$
(39)
$$\displaystyle \begin{aligned} + 2 \cos \theta \sqrt{ \mathbf{P}(a=\alpha_1) \mathbf{P} (b=\beta\vert a=\alpha_1) \mathbf{P}(a=\alpha_2) \mathbf{P} (b=\beta\vert a=\alpha_2)}. \end{aligned}$$

In general \(\cos \theta \neq 0.\) Thus the quantum FTP does not coincide with the classical FTP (10) which is based on the Bayes’ formula.

We presented the derivation of the quantum FTP only for observables given by Hermitian operators acting in the two dimensional Hilbert space and for pure states. In Sect. 9.1, we give (without proving) the formula for spaces of an arbitrary dimension and states represented by density operators (see [42] for quantum FTP for observables represented by POVMs).

9.1 Växjö (Realist Ensemble Contextual) Interpretation of Quantum Mechanics

The Växjö interpretation [33] is the realist ensemble contextual interpretation of QM. Thus, in contrast to Copenhagenists or QBists, by the Växjö interpretation QM is not complete and it can be emergent from a subquantum model. This interpretation is the ensemble interpretation This interpretation is contextual, i.e., experimental contexts have to be taken into account really seriously.

By the Växjö interpretation the probabilistic part of QM is a special mathematical formalism to work with contextual probabilities for families of contexts, which are, in general, incompatible. Of course, the quantum probabilistic formalism is not the only possible formalism to operate with contextual probabilities.

The main distinguishing feature of the formalism of quantum probability is its complex Hilbert space representation and the Born’s rule. All quantum contexts can be unified with the aid of a quantum state ψ (wave function, complex probability amplitude). A state represents only a part of context, the second part is given by an observable a. Thus the quantum probability model is not just a collection of Kolmogorov probability spaces. These spaces are coupled by quantum states.

Each theory of probability has two main purposes: descriptive and predictive. In classical probability theory its predictive machinery is based on Bayesian inference and, in particular, FTP (Sect. 2, formula (10)).

Can the probabilistic formalism of QM be treated as a generalization of Bayesian inference?

My viewpoint is that the quantum FTP with the interference term (Sect. 9, formula (39)) is, in fact, a modified rule for the probability update. QM provides the following inference machinery. There are given a mixed state represented by density operator ρ and two quantum observables a and b represented mathematically by Hermitian operators \(\widehat {a}\) and \(\widehat {b}\) with purely discrete spectra. The first measurement of a can be treated as collection of information about the state ρ. The result a = α i appears with the probability

$$\displaystyle \begin{aligned} p^a(\alpha_i)= \mathrm{Tr} P_i^a \rho. \end{aligned} $$
(40)

This is generalization of the Born’s rule to mixed states.

Postulate 6L (the projection postulate in the Lüders’ form) can be extended to mixed states. Initial state ρ is transferred to the state

$$\displaystyle \begin{aligned} \rho_{a_i} = \frac{P_i^a \rho P_i^a } {\mathrm{Tr} P_i^a \rho P_i^a }. \end{aligned} $$
(41)

Then, for each state \( \rho _{a_i},\) we perform measurement of b and obtain probabilities

$$\displaystyle \begin{aligned} p(\beta_j\vert \alpha_i) = \mathrm{Tr} P_j^b \rho_{a_i} . \end{aligned} $$
(42)

These are quantum conditional (transition) probabilities for the initial state given by a density operator (generalization of the formalism of Sect. 7).

We now recall the general form of the quantum FTP [42]:

$$\displaystyle \begin{aligned} p (b=\beta) = \sum_k p(b=\beta\vert a=\alpha_k) p(a=\alpha_k) \end{aligned} $$
(43)
$$\displaystyle \begin{aligned} + 2 \sum _{k <m} \cos \phi_{j; k,m} \sqrt{p(b=\beta \vert a=\alpha_k) p(a=\alpha_k) p(b=\beta \vert \alpha_m) p(a=\alpha_m)}. \end{aligned}$$

Thus we can predict the probability of the result β j for the b-observable on the basis of the probabilities for the results α i for the a-observable and conditional probabilities. Of course, the main nonclassical feature of this probability update rule is the presence of phase angles. In the case of dichotomous observables of the von Neumann–Lüders type the phase angles ϕ j can be expressed in terms of probabilities.

10 Quantum Logic

von Neumann and Birkhoff [11, 61] suggested to represent events (propositions) by orthogonal projectors in complex Hilbert space H.

For an orthogonal projector P, we set H P = P(H), its image, and vice versa, for subspace L of H, the corresponding orthogonal projector is denoted by the symbol P L.

The set of orthogonal projectors is a lattice with the order structure: P ≤ Q iff H P ⊂ H Q or equivalently, for any ψ ∈ H, 〈ψ|〉≤〈ψ|〉.

We recall that the lattice of projectors is endowed with operations “and” (∧) and “or” (∨). For two projectors P 1, P 2, the projector R = P 1 ∧ P 2 is defined as the projector onto the subspace \(H_R= H_{P_1} \cap H_{P_2}\) and the projector S = P 1 ∨ P 2 is defined as the projector onto the subspace H R defined as the minimal linear subspace containing the set-theoretic union \(H_{P_1} \cup H_{P_2}\) of subspaces \(H_{P_1}, H_{P_2}:\) this is the space of all linear combinations of vectors belonging to these subspaces. The operation of negation is defined as the orthogonal complement: P  = {y ∈ H : 〈y|x〉 = 0 for all x ∈ H P}.

In the language of subspaces the operation “and” coincides with the usual set-theoretic intersection, but the operations “or” and “not” are nontrivial deformations of the corresponding set-theoretic operations. It is natural to expect that such deformations can induce deviations from classical Boolean logic.

Consider the following simple example. Let H be two dimensional Hilbert space with the orthonormal basis (e 1, e 2) and let \(v=(e_1+e_2)/\sqrt {2}.\) Then \(P_v \wedge P_{e_1} =0\) and \( P_v \wedge P_{e_2}=0,\) but \( P_v \wedge (P_{e_1}\vee P_{e_2})=P_v.\) Hence, for quantum events, in general the distributivity law is violated:

$$\displaystyle \begin{aligned} P\wedge (P_{1}\vee P_{2}) \neq (P\wedge P_{1})\vee (P\wedge P_{2}). \end{aligned} $$
(44)

The lattice of orthogonal projectors is called quantum logic. It is considered as a (very special) generalization of classical Boolean logic. Any sub-lattice consisting of commuting projectors can be treated as classical Boolean logic.

At the first sight the representation of events by projectors/linear subspaces might look exotic. However, this is simply a prejudice which springs from too common usage of the set-theoretic representation of events (Boolean logic) in the modern classical probability theory. The tradition to represent events by subsets was firmly established by A. N. Kolmogorov in 1933. We remark that before him the basic classical probabilistic models were not of the set-theoretic nature. For example, the main competitor of the Kolmogorov model, the von Mises frequency model, was based on the notion of a collective.

As we have seen, quantum logic relaxes some constraints posed on the operations of classical Boolean logic, in particular, the distributivity constraint. This provides novel possibilities for logically consistent reasoning.

Since human decision makers violate FTP [32, 38]—the basic law of classical probability, it seems that they process information by using nonclassical logic. Quantum logic is one of the possible candidates for logic of human reasoning. However, one has to remember that in principle other types of nonclassical logic may be useful for mathematical modeling of human decision making.

11 Space of Square Integrable Functions as a State Space

Although we generally proceed with finite-dimensional Hilbert spaces, it is useful to mention the most important example of infinite-dimensional Hilbert space used in QM. Consider the space of complex valued functions, \(\psi : \mathbb {R}^{m} \to \mathbb {C},\) which are square integrable with respect to the Lebesgue measure on \(\mathbb {R}^{m}:\)

$$\displaystyle \begin{aligned}\Vert \psi\Vert^2= \int_{\mathbb{R}^{m}} \vert \psi (x) \vert^2 d x < \infty. \end{aligned} $$
(45)

It is denoted by the symbol \(L^{2}(\mathbb {R}^{m}).\) Here the scalar product is given by

$$\displaystyle \begin{aligned} \langle \psi_1\vert \psi_2\rangle= \int_{\mathbb{R}^{m}} \bar{\psi}_1(x) \psi_2(x) d x. \end{aligned}$$

A delicate point is that, for some measurable functions, \(\psi : \mathbb {R}^{m} \to \mathbb {C},\) which are not identically zero, the integral

$$\displaystyle \begin{aligned} \int_{\mathbb{R}^{m}} \vert \psi (x) \vert^2 d x=0. \end{aligned} $$
(46)

We remark that the latter equality implies that ψ(x) = 0 a.e. (almost everywhere). Thus the quantity defined by (45) is, in fact, not norm: ∥ψ∥ = 0 does not imply that ψ = 0. To define a proper Hilbert space, one has to consider as its elements not simply functions, but classes of equivalent functions, where the equivalence relation is defined as ψ ∼ ϕ if and only if ψ(x) = ϕ(x) a.e. In particular, all functions satisfying (46) are equivalent to the zero-function.

12 Operation of Tensor Product

Let both state spaces be L 2-spaces, the spaces of complex valued square integrable functions: \(H_{1}=L^{2}(\mathbb {R}^{k})\) and \(L^{2}(\mathbb {R}^{m}).\)

Take two functions: ψ ≡ ψ(x) belongs to H 1 and ϕ ≡ ϕ(y) belongs to H 2. By multiplying these functions we obtain the function of two variables Ψ(x, y) = ψ(x) × ϕ(y), where × denotes the usual point wise product.Footnote 5 It is easy to check that this function belongs to the space \(H=L^{2}(\mathbb {R} ^{k+m}).\) Take now n functions, ψ 1(x), …, ψ n(x), from H 1 and n functions, ϕ 1(y), …ϕ n(y), from H 2 and consider the sum of their pairwise products:

$$\displaystyle \begin{aligned} \Psi(x,y)=\sum_{i}\psi_{i}(x)\times\phi_{i}(y). {} \end{aligned} $$
(47)

This function also belongs to H.

It is possible to show that any function belonging to H can be represented as (47), where the sum is in general infinite. Multiplication of functions is the basic example of the operation of the tensor product. The latter is denoted by the symbol ⊗ . Thus in the example under consideration ψ ⊗ ϕ(x, y) = ψ(x) × ϕ(y). The tensor product structure on \(H=L^{2}(\mathbb {R} ^{k+m})\) is symbolically denoted as H = H 1 ⊗ H 2.

Consider now two arbitrary orthonormal bases in spaces \(H_{k},(e_{j}^{(k)}),k=1,2.\) Then functions \((e_{ij}=e_{i}^{(1)}\otimes e_{j}^{(2)})\) form an orthonormal basis in H :  any Ψ ∈ H can be represented as

$$\displaystyle \begin{aligned} \Psi=\sum c_{ij}e_{ij}\equiv\sum c_{ij}e_{i}^{(1)}\otimes e_{j}^{(2)}, {} \end{aligned} $$
(48)

where

$$\displaystyle \begin{aligned} \sum|c_{ij}|{}^{2}<\infty. {} \end{aligned} $$
(49)

Consider now two arbitrary finite-dimensional Hilbert spaces, H 1, H 2. For each pair of vectors ψ ∈ H 1, ϕ ∈ H 2, we form a new formal entity denoted by ψ ⊗ ϕ. Then we consider the sums Ψ =∑i ψ i ⊗ ϕ i. On the set of such formal sums we can introduce the linear space structure. (To be mathematically rigorous, we have to constraint this set by some algebraic relations to make the operations of addition and multiplication by complex numbers well defined.) This construction gives us the tensor product H = H 1 ⊗ H 2. In particular, if we take orthonormal bases in \(H_{k},(e_{j}^{(k)}),k=1,2,\) then \((e_{ij}=e_{i}^{(1)}\otimes e_{j}^{(2)})\) form an orthonormal basis in H, any Ψ ∈ H can be represented as (48) with (49).

The latter representation gives the simplest possibility to define the tensor product of two arbitrary (i.e., may be infinite-dimensional) Hilbert spaces as the space of formal series (48) satisfying the condition (49).

Besides the notion of the tensor product of states, we shall also use the notion of the tensor product of operators. Consider two linear operators \(\widehat {a}_{i}:H_{i} \rightarrow H_{i},i=1,2.\) Their tensor product \( \widehat {a} \equiv \widehat {a}_{1}\otimes \widehat {a}_{2}:H \rightarrow H\) is defined starting with the tensor products of two vectors:

$$\displaystyle \begin{aligned} \widehat{a} \;\psi\otimes\phi=(\widehat{a}_{1} \psi)\otimes(\widehat{a}_{2}\phi). \end{aligned}$$

Then it is extended by linearity. By using the coordinate representation (48) the tensor product of operators can be represented as

$$\displaystyle \begin{aligned} \widehat{a} \; \Psi=\sum c_{ij}\; \widehat{a} e_{ij}\equiv\sum c_{ij}\widehat{a}_{1}e_{i}^{(1)}\otimes \widehat{a}_{2} e_{j}^{(2)}, {} \end{aligned} $$
(50)

If operators \(\widehat {a}_{i},i=1,2,\) are represented by matrices \((a_{kl}^{(i)}),\) with respect to the fixed bases, then the matrix (a kl.nm) of the operator \(\widehat {a}\) with respect to the tensor product of these bases can be easily calculated.

In the same way one defines the tensor product of Hilbert spaces, H 1, …, H n, denoted by the symbol H = H 1 ⊗⋯ ⊗ H n. We start with forming the formal entities ψ 1 ⊗⋯ ⊗ ψ n, where ψ j ∈ H j, j = 1, …, n. Tensor product space is defined as the set of all sums ∑j ψ 1j ⊗⋯ ⊗ ψ nj (which has to be constrained by some algebraic relations, but we omit such details). Take orthonormal bases in \(H_{k},(e_{j}^{(k)}),k=1,\ldots ,n.\) Then any Ψ ∈ H can be represented as

$$\displaystyle \begin{aligned} \Psi=\sum_{\alpha}c_{\alpha}e_{\alpha}\equiv\sum_{\alpha=(j_{1}\ldots j_{n} )}c_{j_{1}\ldots j_{n}}e_{j_{1}}^{(1)}\otimes\cdots \otimes e_{j_{n}}^{(n)}, {} \end{aligned} $$
(51)

where ∑α|c α|2 < .

13 Ket- and Bra-Vectors: Dirac’s Symbolism

Dirac’s notations [21] are widely used in quantum information theory. Vectors of H are called ket-vectors, they are denoted as |ψ〉. The elements of the dual space H of H, the space of linear continuous functionals on H, are called bra-vectors, they are denoted as 〈ψ|.

Originally the expression 〈ψ|ϕ〉 was used by Dirac for the duality form between H and H, i.e., 〈ψ|ϕ〉 is the result of application of the linear functional 〈ψ| to the vector |ϕ〉. In mathematical notation it can be written as follows. Denote the functional 〈ψ| by f and the vector |ϕ〉 by simply ϕ. Then 〈ψ|ϕ〉≡ f(ϕ). To simplify the model, later Dirac took the assumption that H is Hilbert space, i.e., the H can be identified with H. We remark that this assumption is an axiom simplifying the mathematical model of QM. However, in principle Dirac’s formalism [21] is applicable for any topological linear space H and its dual space H ; so it is more general than von Neumann’s formalism [60] rigidly based on Hilbert space.

Consider an observable a given by the Hermitian operator \(\widehat {a}\) with nondegenerate spectrum and restrict our consideration to the case of finite dimensional H. Thus the normalized eigenvectors e i of A form the orthonormal basis in H. Let \( \widehat {a} e_i= \alpha _i e_i.\) In Dirac’s notation e i is written as |α i〉 and, hence, any pure state can be written as

$$\displaystyle \begin{aligned} \vert \psi \rangle = \sum_i c_i\vert \alpha_i \rangle, \; \sum_i \vert c_i\vert^2=1. \end{aligned} $$
(52)

Since the projector onto |α i〉 is denoted as \(P_{\alpha _i}= \vert \alpha _i \rangle \langle \alpha _i \vert ,\) the operator \(\widehat {a}\) can be written as

$$\displaystyle \begin{aligned} \widehat{a} = \sum_i \alpha_i \vert \alpha_i \rangle \langle \alpha_i \vert. \end{aligned} $$
(53)

Now consider two Hilbert spaces H 1 and H 2 and their tensor product H = H 1 ⊗ H 2. Let (|α i〉) and (|β i〉) be orthonormal bases in H 1 and H 2 corresponding to the eigenvalues of two observables A and B. Then vectors |α i〉⊗|β j〉 form the orthonormal basis in H. Typically in physics the sign of the tensor product is omitted and these vectors are written as |α i〉|β j〉 or even as |α i β j〉. Thus any vector ψ ∈ H = H 1 ⊗ H 2 can be represented as

$$\displaystyle \begin{aligned} \psi = \sum_{ij} c_{ij} \vert \alpha_i \beta_j \rangle, \end{aligned} $$
(54)

where c ij ∈C (in the infinite-dimensional case these coefficients are constrained by the condition ∑ij|c ij|2 < ).

14 Qubit

In particular, in quantum information theory typically qubit states are represented with the aid of observables having the eigenvalues 0, 1. Each qubit space is two dimensional:

$$\displaystyle \begin{aligned} \vert \psi\rangle= c_0 \vert 0\rangle + c_1 \vert 1\rangle, \; \vert c_0\vert^2 + \vert c_1\vert^2 =1. \end{aligned} $$
(55)

A pair of qubits is represented in the tensor product of single qubit spaces, here pure states can be represented as superpositions:

$$\displaystyle \begin{aligned} \vert \psi\rangle = c_{00} \vert 00 \rangle + c_{01} \vert 01 \rangle + c_{10} \vert 10 \rangle + c_{11} \vert 00 \rangle, \end{aligned} $$
(56)

where ∑ij|c ij|2 = 1. In the same way the n-qubit state is represented in the tensor product of n one-qubit state spaces (it has the dimension 2n) : 

$$\displaystyle \begin{aligned} \vert \psi\rangle = \sum_{x_j=0,1} c_{x_1 \ldots x_n} \vert x_1 \ldots x_n \rangle, \end{aligned} $$
(57)

where \( \sum _{x_j=0,1} \vert c_{x_1 \ldots x_n}\vert ^2=1.\) We remark that the dimension of the n qubit state space grows exponentially with the growth of n. The natural question about possible physical realizations of such multi-dimensional state spaces arises. The answer to it is not completely clear; it depends very much on the used interpretation of the wave function.

15 Entanglement

Consider the tensor product H = H 1 ⊗ H 2 ⊗⋯ ⊗ H n of Hilbert spaces H k, k = 1, 2, …, n. The states of the space H can be separable and non-separable (entangled). We start by considering pure states. The states from the first class, separable pure states, can be represented in the form:

$$\displaystyle \begin{aligned} \vert \psi \rangle=\otimes_{k=1}^n \vert \psi_k \rangle=\vert \psi_1\ldots \psi_{n}\rangle , \end{aligned} $$
(58)

where |ψ k〉∈ H k. The states which cannot be represented in this way are called non-separable or entangled. Thus mathematically the notion of entanglement is very simple, it means impossibility of tensor factorization.

For example, let us consider the tensor product of two one-qubit spaces. Select in each of them an orthonormal basis denoted as |0〉, |1〉. The corresponding orthonormal basis in the tensor product has the form |00〉, |01〉, |10〉, |11〉. Here we used Dirac’s notations, see Sect. 13, near the end. Then the so-called Bell’s states

$$\displaystyle \begin{aligned} \vert \Phi^+ \rangle= (\vert 00 \rangle + \vert 11 \rangle)/\sqrt{2},\;\; \vert \Phi^- \rangle= (\vert 00 \rangle - \vert 11 \rangle)/\sqrt{2}; \end{aligned} $$
(59)
$$\displaystyle \begin{aligned} \vert \Psi^+ \rangle= (\vert 01 \rangle + \vert 10 \rangle)/\sqrt{2}, \;\; \vert \Psi^- \rangle= (\vert 01 \rangle - \vert 10 \rangle)/\sqrt{2} \end{aligned} $$
(60)

are entangled.

Although the notion of entanglement is mathematically simple, its physical interpretation is one of the main problems of modern quantum foundations. The common interpretation is that entanglement encodes quantum nonlocality, the possibility of action at the distance (between parts of a system in an entangled state). Such an interpretation implies the drastic change of all classical physical presentations about nature, at least about the microworld. In the probabilistic terms entanglement induces correlations which are too strong to be described by classical probability theory. (At least this is the common opinion of experts in quantum information theory and quantum foundations.) Such correlations violate the famous Bell inequality which can be derived only in classical probability framework. The latter is based on the use of a single probability space covering probabilistic data collected in a few incompatible measurement contexts.

Now consider a quantum state given by density operator ρ in H. This state is called separable if it can be factorized in the product of density operators in spaces H k : 

$$\displaystyle \begin{aligned} \rho =\otimes_{k=1}^n \rho_k, \end{aligned} $$
(61)

otherwise the state ρ is called entangled. We remark that an interpretation of entanglement for mixed states is even more complicated than for pure states.

16 Violation of Formula of Total Probability in Two-Slit Experiment

Consider the famous two-slit experiment with the symmetric setting: the source of photons is located symmetrically with respect to two slits, Fig. 1.

Fig. 1
figure 1

Experimental setup

Consider the following pair of observables a and b. We select a as the “slit passing observable,” i.e., a = 0, 1, see Fig. 1 (we use indexes 0, 1 to be close to qubit notation) and observable b as the position on the photo-sensitive plate, see Fig. 2. We remark that the b-observable has the continuous range of values, the position x on the photo-sensitive plate. We denote P(a = i) by P(i) (i = 0, 1), and P(b = x) by P(x). Physically the a-observable corresponds to measurement of position (coarse grained to “which slit?”) and the b-observable represents measurement of momentum.

Fig. 2
figure 2

Context with both slits is open

In quantum foundational studies, various versions of the two-slit experiment have been successfully performed, not only with photons, but also with electrons and even with macroscopic molecules (by Zeilinger’s group). All those experiment demonstrated matching with predictions of QM. Experimenters reproduce the interference patterns predicted by QM and calculated by using the wave functions.

The probability that a photon is detected at position x on the photo-sensitive plate is represented as

$$\displaystyle \begin{aligned} \mathbf{P}(x) &=\left\vert \frac{1}{\sqrt{2}}\psi_{0}(x)+\frac{1}{\sqrt{2}}\psi _{1}(x)\right\vert ^{2}\\ &=\frac{1}{2}\left\vert \psi_{0}(x)\right\vert ^{2}+\frac{1}{2}\left\vert \psi_{1}(x)\right\vert ^{2}+\left\vert \psi_{0}(x)\right\vert \left\vert \psi_{1}(x)\right\vert \cos\theta,{} \end{aligned} $$
(62)

where ψ 0 and ψ 1 are two wave functions, whose squared absolute values \(\left \vert \psi _{i}(x)\right \vert ^{2}\) give the distributions of photons passing through the slit i = 0, 1, see Figs. 3 and 4. Here we explored the rule of addition of complex probability amplitudes, a quantum analog of the rule of addition of probabilities. This rule is the direct consequence of the linear space structure of quantum state spaces.

Fig. 3
figure 3

Context with one slit is closed-I

The term

$$\displaystyle \begin{aligned} \left\vert \psi_{0}(x)\right\vert \left\vert \psi_{1}(x)\right\vert \cos\theta \end{aligned}$$

implies the interference effect of two wave functions. Let us denote \(\left \vert \psi _{i}(x)\right \vert ^{2}\) by P(x|i), then Eq. (62) is represented as

$$\displaystyle \begin{aligned} \mathbf{P}(x)=\mathbf{P}(0)\mathbf{P}(x|0)+\mathbf{P}(1)\mathbf{P}(x|1)+2\sqrt{\mathbf{P}(0)\mathbf{P}(x|0)\mathbf{P}(1)\mathbf{P}(x|1)}\cos\theta. {} \end{aligned} $$
(63)

Here the values of probabilities P(0) and P(1) are equal to 1∕2 since we consider the symmetric settings. For general experimental settings, P(0) and P(1) can be taken as the arbitrary nonnegative values satisfying P(0) + P(1) = 1. In the above form, the classical probability law (FTP)

$$\displaystyle \begin{aligned} \mathbf{P}(x)=\mathbf{P}(0)\mathbf{P}(x|0)+\mathbf{P}(1)\mathbf{P}(x|1) {} \end{aligned} $$
(64)

is violated, and the term of interference \(2\sqrt {\mathbf {P}(x|0)\mathbf {P}(0)\mathbf {P}(x|1)\mathbf {P}(1)} \cos \theta \) specifies the violation.

The crucial point is that the two-slit experiment has the multi-contextual structure: C i, i = 0, 1, only the ith slit is open, and C 01, both slits are open, see Figs. 3, 4, and 2. Comparison of possibilities is represented as comparison of the corresponding probability distributions P(x|i), P(x). In the contextual notations they can be written as

$$\displaystyle \begin{aligned} p_{C_i}^b(x)\equiv \mathbf{P}(b=x\vert C_i), p_{C_{01}}^b(x)\equiv \mathbf{P}(b=x\vert C_{01}). \end{aligned}$$

Here conditioning is not classical probabilistic event conditioning, but context conditioning: different contexts are mathematically represented by different Kolmogorov probability spaces. The general contextual probability theory including its representation in complex Hilbert space is presented in very detail in my monograph [37].

Fig. 4
figure 4

Context with one slit is closed-II