1 Introduction

The theory of decoherence is arguably one of the greatest advances in fundamental physics of the past forty years. Without adding anything new to the quantum mechanical framework, and considering that the Schrödinger equation is universally valid, it explains why quantum interferences virtually disappear at macroscopic scales. Since the pioneering papers [1, 2], a wide variety of models have been designed to understand decoherence in different specific contexts (see the review [3 or 4] and the numerous references therein). In this paper, we would like to embrace a more general point of view and understand the mathematical reason why decoherence is so ubiquitous between quantum mechanical systems.

We start by introducing general quantities and notations to present as concisely as possible the idea underlying the theory of decoherence (§2). We then build two simple but very general models to reveal the mathematical mechanisms that make decoherence so universal, thereby justifying why quantum interferences disappear due to the Schrödinger dynamics only (§3 and §4). We recover in §3.2 and §4.1 the well-known typical decay of the non-diagonal terms of the density matrix in \(n^{-\frac{1}{2}}\), with n the dimension of the Hilbert space describing the environment. The most important result is Theorem 3.3, proved in §3.3, giving estimates for the level of decoherence induced by a random environment on a system of given sizes. We conclude in §3.4 that even very small environments (of typical size at least \(N_{\mathcal {E}} = \ln (N_{\mathcal {S}})\) with \(N_{\mathcal {S}}\) the size of the system) suffice, under assumptions discussed in §3.5. We also give a general formula estimating the level of classicality of a quantum system in terms of the entropy of entanglement with its environment (§4.2, proved in the Annex A), and propose alternative ways of quantifying decoherence in §5.

2 The Basics of Decoherence

The theory of decoherence sheds light on the reason why quantum interferences disappear when a system gets entangled with a macroscopic one, for example an electron in a double-slit experiment that doesn’t interfere anymore when entangled with a detector. According to Di Biagio and Rovelli [5], the deep difference between classical and quantum is the way probabilities behave: all classical phenomena satisfy the total probability formula

$$\begin{aligned} {\mathbb {P}}(B=y) = \sum _{x \in \textrm{Im}(A)} {\mathbb {P}}(A=x) {\mathbb {P}}(B=y \mid A=x), \end{aligned}$$

relying on the fact that, even though the actual value of the variable A is not known, one can still assume that it has a definite value among the possible ones. This, however, is not correct for quantum systems. It is well-known that the diagonal elements of the density matrix account for the classical behavior of a system (they correspond to the terms of the total probability formula) while the non-diagonal terms are the additional interference terms. As a reminder, this is because the probability to obtain an outcome x is:

$$\begin{aligned} {\text {tr}}(\rho {|{x}\rangle }{\langle {x}|}) = \sum _{i,j=1}^n \rho _{ij} {\langle {j \vert x}\rangle } {\langle {x \vert i}\rangle } = \sum _{i=1}^n \underbrace{\rho _{ii} |{\langle {x \vert i}\rangle } |^2}_{{\mathbb {P}}(i) {\mathbb {P}}(x \mid i)} + \sum _{1\leqslant i < j \leqslant n} \underbrace{2 \textrm{Re}( \rho _{ij} {\langle {j \vert x}\rangle } {\langle {x \vert i }\rangle })}_{\text {interferences}}. \end{aligned}$$

Here is the typical situation encountered in decoherence studies. Consider a system \(\mathcal {S}\), described by a Hilbert space \(\mathcal {H}_{\mathcal {S}}\) of dimension d, that interacts with an environment \(\mathcal {E}\) described by a space \(\mathcal {H}_{\mathcal {E}}\) of dimension n, and let \(\mathcal {B} = ({|{i}\rangle })_{1\leqslant i \leqslant d}\) be an orthonormal basis of \(\mathcal {H}_{\mathcal {S}}\). In the sequel, we will say that each \({|{i}\rangle }\) corresponds to a possible history of the system in this basis (this expression will be given its full meaning in a future article dedicated to the measurement problem). Let’s also assume that \(\mathcal {B}\) is a conserved basis during the interaction with \(\mathcal {E}\). When \(\mathcal {E}\) is a measurement apparatus for the observable A, the eigenbasis of \(\hat{A}\) is clearly a conserved basis; in general, the eigenbasis of any observable such that \(\hat{A} \otimes \mathbbm {1}\) commutes with the interaction Hamiltonian is suitable (but the existence of such an observable is not guaranteed, unless \(\hat{H}_{int}\) takes the form \(\sum _i {\hat{\Pi }}^{\mathcal {S}}_i \otimes \hat{H}^{\mathcal {E}}_{i}\), where \(({\hat{\Pi }}^{\mathcal {S}}_i)_{1 \leqslant i \leqslant d}\) is a family of commuting orthogonal projectors).

We further suppose that \(\mathcal {S}\) and \(\mathcal {E}\) are initially non entangled, allowing us to write \({|{\Psi }\rangle } = \left( \sum _{i=1}^d c_i {|{i}\rangle } \right) \otimes {|{\mathcal {E}_0}\rangle }\) as the initial state before interaction. After a time t, due to its Schrödinger evolution in the conserved basis, the total state becomes \({|{\Psi (t)}\rangle } = \sum _{i=1}^d c_i {|{i}\rangle } \otimes {|{\mathcal {E}_i(t)}\rangle }\) for some unit vectors \(({|{\mathcal {E}_i(t)}\rangle })_{1\leqslant i \leqslant d}\). Define \(\eta (t) = \displaystyle \max _{i \ne j} \; |{\langle {\mathcal {E}_i(t) \vert \mathcal {E}_j(t)}\rangle } |\). If \(({|{e_k}\rangle })_{1\leqslant k \leqslant n}\) denotes an orthonormal basis of \(\mathcal {H}_{\mathcal {E}}\), the state of \(\mathcal {S}\), obtained by tracing out the environment, is:

$$\begin{aligned} \rho _{\mathcal {S}}(t)&= {\text {tr}}_{\mathcal {E}} {|{\Psi (t)}\rangle } {\langle {\Psi (t)}|} \\&= \sum _{k=1}^n \left( \sum _{i=1}^d |c_i |^2 |{\langle {e_k \vert \mathcal {E}_i (t)}\rangle } |^2 {|{i}\rangle }{\langle {i}|} + \sum _{1 \leqslant i \ne j \leqslant d} c_i \overline{c_j} {\langle {e_k \vert \mathcal {E}_i (t)}\rangle } {\langle {\mathcal {E}_j (t) \vert e_k}\rangle } {|{i}\rangle }{\langle {j}|} \right) \\&= \sum _{i=1}^d |c_i |^2 \underbrace{ \sum _{k=1}^n |{\langle {e_k \vert \mathcal {E}_i (t)}\rangle } |^2 }_{= 1} \; {|{i}\rangle }{\langle {i}|}\\&\quad + \sum _{1 \leqslant i \ne j \leqslant d} c_i \overline{c_j} {\langle {\mathcal {E}_j (t)}|} \Big ( \underbrace{ \sum _{k=1}^n {|{e_k}\rangle } {\langle {e_k}|} }_{= \mathbbm {1}} \Big ) {|{\mathcal {E}_i (t)}\rangle } \; {|{i}\rangle }{\langle {j}|} \\&= \sum _{i=1}^d |c_i |^2 {|{i}\rangle }{\langle {i}|} + \sum _{1 \leqslant i \ne j \leqslant d} c_i \overline{c_j} {\langle {\mathcal {E}_j(t) \vert \mathcal {E}_i(t)}\rangle } {|{i}\rangle }{\langle {j}|} \\&\equiv \rho _{\mathcal {S}}^{(d)} + \rho _{\mathcal {S}}^{(q)}(t), \end{aligned}$$

where \(\rho _{\mathcal {S}}^{(d)}\) stands for the (time independent) diagonal part of \(\rho _{\mathcal {S}}(t)\) (which corresponds to the total probability formula), and \(\rho _{\mathcal {S}}^{(q)}(t)\) for the remaining non diagonal terms responsible for the interferences between the possible histories. It is not difficult to show (see the Annex A) that \(|\hspace{-1.111pt}|\hspace{-1.111pt}| \rho _{\mathcal {S}}^{(q)}(t)|\hspace{-1.111pt}|\hspace{-1.111pt}| \leqslant \eta (t)\), where \(|\hspace{-1.111pt}|\hspace{-1.111pt}|M|\hspace{-1.111pt}|\hspace{-1.111pt}|\) stands for the usual operator norm on matrices, i.e. \(|\hspace{-1.111pt}|\hspace{-1.111pt}|M|\hspace{-1.111pt}|\hspace{-1.111pt}| = \sup _{\Vert {|{\Psi }\rangle } \Vert =1} \; \Vert M {|{\Psi }\rangle } \Vert\). Therefore \(\eta\) measures how close the system is from being classical because, as shown in A, we have for all subspaces \(F \subset \mathcal {H}_{\mathcal {S}}\) (recall that, in the quantum formalism, probabilistic events correspond to subspaces):

$$\begin{aligned} |\underbrace{{\text {tr}}(\rho _{\mathcal {S}}(t) \Pi _F)}_{\text {quantum probability}} - \underbrace{{\text {tr}}(\rho _{\mathcal {S}}^{(d)} \Pi _F)}_{\text {classical probability}} |\leqslant \dim (F) \; \eta (t). \end{aligned}$$
(1)

In other words, \(\eta (t)\) estimates how decohered the system is. Notice well that it is only during an interaction between \(\mathcal {S}\) and \(\mathcal {E}\) that decoherence can occur; any future internal evolution U of \(\mathcal {E}\) lets \(\eta\) unchanged since \({\langle {U \mathcal {E}_j \vert U \mathcal {E}_i}\rangle } = {\langle {\mathcal {E}_j \vert \mathcal {E}_i}\rangle }\). Also, a more precise definition for \(\eta\) could be \(\displaystyle \max _{\begin{array}{c} i \ne j \\ c_i, c_j \ne 0 \end{array}} \; |{\langle {\mathcal {E}_i(t) \vert \mathcal {E}_j(t)}\rangle } |\) with, by convention, \(\eta = 0\) when only one \(c_i\) is non zero, so that \(\rho _{\mathcal {S}}\) being diagonal in a basis becomes equivalent to \(\eta =0\) in this basis. This way \(\eta\) really quantifies the interferences between possible histories (of non zero probability). This is not true with the definition above, as is clear for example for the trivial interaction \({|{\Psi (t)}\rangle } = \sum _{i=1}^d c_i {|{i}\rangle } \otimes {|{\mathcal {E}_0}\rangle }\): here \(\rho _{\mathcal {S}}\) is diagonal (i.e. no interferences) in any orthonormal basis containing the vector \(\sum _{i=1}^d c_i {|{i}\rangle }\), but the simpler definition yields \(\eta = 1\) in any basis.

The aim of the theory of decoherence is to explain why \(\eta (t)\) rapidly goes to zero when n is large, so that the state of the system almost immediatelyFootnote 1 evolves from \(\rho _{\mathcal {S}}\) to \(\rho _{\mathcal {S}}^{(d)}\) in the conserved basis. As recalled in the introduction, a lot of different models already explain this phenomenon in specific contexts. In this paper, we shall build two (excessively) simple but quite universal models that highlight the fundamental reason why \(\eta (t) \rightarrow 0\) so quickly, and that will allow us to determine the typical size of an environment needed to entail proper decoherence on a system.

3 First Model: Purely Random Environment

When no particular assumption is made to specify the type of environment under study, the only reasonable behaviour to assume for \({|{\mathcal {E}_i(t)}\rangle }\) is that of a Brownian motion on the sphere \({\mathbb {S}}^{n} = \{ {|{\Psi }\rangle } \in \mathcal {H}_{\mathcal {E}} \mid \Vert {|{\Psi }\rangle } \Vert = 1 \} \subset \mathcal {H}_{\mathcal {E}} \simeq {\mathbb {C}}^n \simeq {\mathbb {R}}^{2n}\). It boils down to representing the environment as a purely random system with no preferred direction of evolution. This choice will be discussed in §3.5. Another bold assumption would be the independence of the \(({|{\mathcal {E}_i(t)}\rangle })_{1\leqslant i \leqslant d}\); we will dare to make this assumption anyway.

3.1 Convergence to the Uniform Measure

We will first show that the probabilistic law of each \({|{\mathcal {E}_i(t)}\rangle }\) converges exponentially fast to the uniform probability measure on \({\mathbb {S}}^{n}\). To make things precise, endow \({\mathbb {S}}^{n}\) with its Borel \(\sigma\)-algebra \(\mathcal {B}\) and with the canonical Riemannian metric g, which induces the uniform measure \(\mu\) that we suppose normalized to a probability measure. Let \(\nu _t\) be the law of the random variable \({|{\mathcal {E}_i(t)}\rangle }\), that is \(\nu _t(B) = {\mathbb {P}}\big ({|{\mathcal {E}_i(t)}\rangle } \in B \big )\) for all \(B \in \mathcal {B}\). Denote \(\Delta f = \frac{1}{\sqrt{g}} \partial _i (\sqrt{g} g^{ij} \partial _j f)\) the Laplacian operator on \(\mathcal {C}^\infty ({\mathbb {S}}^{n})\) which can be extended to \(L^2({\mathbb {S}}^{n})\), the completion of \(\mathcal {C}^\infty ({\mathbb {S}}^{n})\) for the scalar product \((f,h) = \int _{{\mathbb {S}}^{n}} f(x) h(x) \textrm{d}\mu\). The Hille-Yosida theory allows to define the Brownian motion on the sphere as the Markov semigroup of stochastic kernels generated by \(\Delta\). In particular, this implies that if \(p_t\) is the density after a time t, i.e. \(\nu _t(\textrm{d}x) = p_t(x) \mu (\textrm{d}x)\), then \(p_t = e^{t \Delta }p_0\). Of course, the law \(\nu _0\) of the deterministic variable \({|{\mathcal {E}_i(0)}\rangle } = {|{\mathcal {E}_0}\rangle }\) corresponds to a Dirac distribution, which is not strictly speaking in \(L^2({\mathbb {S}}^{n})\), but we can rather consider it as given by a sharply peaked density (with respect to \(\mu\)) \(p_0 \in L^2({\mathbb {S}}^{n})\). Finally, recall that the total variation norm of a measure defined on \(\mathcal {B}\) is given by \(\Vert \sigma \Vert _{TV} = \underset{B \in \mathcal {B} }{\sup }|\sigma (B) |\).

Proposition 3.1

We have \(\Vert \nu _t - \mu \Vert _{TV} \underset{t \rightarrow +\infty }{\longrightarrow }0\) exponentially fast. Moreover, if \(T({\mathbb {S}}^{n}) = \inf \{ t>0 \mid \Vert \nu _t - \mu \Vert _{TV} \leqslant \frac{1}{e} \}\) denotes the characteristic time to equilibrium for the brownian diffusion on \({\mathbb {S}}^{n}\), then \(T({\mathbb {S}}^{n}) \underset{n \rightarrow +\infty }{\sim }\frac{\ln (2n)}{4n}\).

Proof

See [7] for a precise proof of this proposition. The overall idea is to decompose the density of the measure \(\nu _t\) in an eigenbasis of the Laplacian, so that the Brownian motion (which is generated by \(\Delta\)) will exponentially kill all modes but the one associated with the eigenvalue 0, that is the constant one. The estimate of \(T({\mathbb {S}}^{n})\) is then obtained by examining how fast each mode (multiplied by its multiplicity) is killed. Interestingly enough, the convergence is faster as n increases since \(T({\mathbb {S}}^{n}) \underset{n \rightarrow \infty }{\longrightarrow }0\). \(\square\)

3.2 Most Vectors are Almost Orthogonal

Consequently, we are now interested in the behavior of the scalar products between random vectors uniformly distributed on the complex n-sphere \({\mathbb {S}}^{n}\). The first thing to understand is that, in high dimension, most pairs of unit vectors are almost orthogonal.

Proposition 3.2

Denote by \(S = {\langle { \mathcal {E}_1 \vert \mathcal {E}_2}\rangle } \in {\mathbb {C}}\) the random variable where \({|{\mathcal {E}_1}\rangle }\) and \({|{\mathcal {E}_2}\rangle }\) are two independent uniform random variables on \({\mathbb {S}}^{n}\). Then \({\mathbb {E}}(S) = 0\) and \({\mathbb {V}}(S) = {\mathbb {E}}(|S |^2) = \frac{1}{n}\).

Proof

Clearly, \({|{\mathcal {E}_1}\rangle }\) and \(-{|{\mathcal {E}_1}\rangle }\) have the same law, hence \({\mathbb {E}}(S) = {\mathbb {E}}(-S) = 0\). What about its variance? One can rotate the sphere to impose for example \({|{\mathcal {E}_1}\rangle } = (1,0, \dots , 0)\), and by independence \({|{\mathcal {E}_2}\rangle }\) still follows a uniform law. Such a uniform law can be achieved by generating 2n independent normal random variables \((X_i)_{1 \leqslant i \leqslant 2n }\) following \(\mathcal {N}(0,1)\), and by considering the random vector \({|{\mathcal {E}_2}\rangle } = \left( \frac{X_1+ iX_2}{\sqrt{X_1^2 + \dots + X_{2n}^2 }}, \dots , \frac{X_{2n-1} + i X_{2n}}{ \sqrt{X_1^2 + \dots + X_{2n}^2 }} \right)\). Indeed, for any continuous function \(f: {\mathbb {S}}^{n} \rightarrow {\mathbb {R}}\) (with \(\textrm{d}\sigma ^n\) denoting the measure induced by Lebesgue’s on \({\mathbb {S}}^{n}\)):

$$\begin{aligned} {\mathbb {E}}[f({|{\mathcal {E}_2}\rangle })]&= \frac{1}{(2\pi )^n} \int _{{\mathbb {R}}^{2n}} f \left( \frac{x_1+i x_2}{\sqrt{x_1^2 + \dots + x_{2n}^2 }}, \dots , \frac{x_{2n-1}+i x_{2n}}{\sqrt{x_1^2 + \dots + x_{2n}^2 }} \right) \\&\quad e^{-(x_1^2 + \dots + x_{2n}^2)/2} \textrm{d}x_1 \dots \textrm{d}x_{2n} \\&= \frac{1}{(2\pi )^n} \int _0^\infty \left[ \int _{{\mathbb {S}}^{n}} f(u) \textrm{d}\sigma ^n(u) \right] e^{-\frac{r^2}{2}}r^{2n-1} \textrm{d}r \\&= \omega _n \int _{{\mathbb {S}}^{n}} f(u) \textrm{d}\sigma ^n(u), \end{aligned}$$

which means that \({|{\mathcal {E}_2}\rangle }\) defined this way follows indeed the uniform law.

In these notations, \(|S |^2 =\frac{X_1^2 + X_2^2}{X_1^2 + \dots + X_{2n}^2 }\). Since each \(X_i^2\) follows a \(\chi ^2\) law, it is then a classical lemma to show that \(|S |^2\) follows a \(\beta _{1,n-1}\) distribution, whose mean equals \(\frac{1}{n}\). For a more elementary argument, note that, up to relabelling the variables, we have \(\forall k \in \llbracket 1,n \rrbracket , \; {\mathbb {E}} \left( \frac{X_1^2 + X_2^2}{X_1^2 + \dots + X_{2n}^2 } \right) = {\mathbb {E}}\left( \frac{X_{2k-1}^2 + X_{2k}^2}{X_1^2 + \dots + X_{2n}^2} \right)\) and so:

$$\begin{aligned} {\mathbb {V}}(S) = {\mathbb {E}} \left( \frac{X_1^2 + X_2^2}{X_1^2 + \dots + X_{2n}^2 } \right) = \frac{1}{n} \sum _{k=1}^n {\mathbb {E}}\left( \frac{X_{2k-1}^2 + X_{2k}^2}{X_1^2 + \dots + X_{2n}^2} \right) = \frac{1}{n} {\mathbb {E}}(1) = \frac{1}{n}. \end{aligned}$$

Alternatively, had we worked on the real sphere \(\subset {\mathbb {R}}^{2n}\) endowed with the real scalar product, the variance would have been \(\frac{1}{2n}\). This highlights the fact that the real and complex spheres are indeed isomorphic as topological or differential manifolds, but not as Riemannian manifolds.

The same result would have been recovered if, instead of picking randomly a pair of vectors, we had chosen uniformly the unitary evolution operators \((U^{(i)}(t))_{1\leqslant i \leqslant d}\) such that \({|{\mathcal {E}_i(t)}\rangle } = U^{(i)}(t) {|{\mathcal {E}_0}\rangle }\), resulting from the interaction Hamiltonian. Again, if no direction of evolution is preferred, it is reasonable to consider the law of each \(U^{(i)}(t)\) to be given by the Haar measure \(\textrm{d}U\) on the unitary group \(\mathcal {U}_n\). If moreover they are independent, then \(U^{(i)}(t)^\dagger U^{(j)}(t)\) also follows the Haar measure for all ij so that, using [8, (112)]:

$$\begin{aligned} {\mathbb {V}} \big ( {\langle {\mathcal {E}_i(t) \vert \mathcal {E}_j(t)}\rangle } \big ) = \int _{\mathcal {U}_n} |{\langle {\mathcal {E}_0 \vert U \mathcal {E}_0}\rangle } |^2 dU = \prod _{i=2}^n \frac{i-1}{i} = \frac{1}{n}. \end{aligned}$$

\(\square\)

Therefore, \(|{\langle {\mathcal {E}_i(t) \vert \mathcal {E}_j(t)}\rangle } |\) is, after a very short time, of order \(\sqrt{{\mathbb {V}}(S)} = \frac{1}{\sqrt{\dim (\mathcal {H}_\mathcal {E})}}\), which is a well-known estimate already obtained by Zurek in [9]. When \(d=2\), if \(\mathcal {E}\) is composed of \(N_{\mathcal {E}}\) particles and each of them is described by a p-dimensional Hilbert space, then very rapidly:

$$\begin{aligned} \eta \sim p^{-N_{\mathcal {E}}/2} \end{aligned}$$
(2)

which is virtually zero for macroscopic environments, therefore decoherence is guaranteed. Of course, this is not true anymore if d is large, because there will be so many pairs that some of them will inevitably become non-negligible, and so will \(\eta\). We would like to determine a condition between n and d under which proper decoherence is to be expected. In other words, what is the minimal size of an environment needed to decohere a given system?

3.3 Direct Study of \(\eta\)

To answer this question, we should be more precise and consider directly the random variable \(\eta _{n,d} = \displaystyle \max _{i \ne j} \; |{\langle {\mathcal {E}_i \vert \mathcal {E}_j}\rangle } |\) where the \(({|{\mathcal {E}_i)}\rangle }_{1\leqslant i \leqslant d}\) are d random vectors uniformly distributed on the complex n-sphere \({\mathbb {S}}^{n}\). In the following, we fix \(\varepsilon \in \left]0,1 \right[\) as well as a threshold \(s \in [0,1[\) close to 1, and define \(d_{max}^{\varepsilon , s}(n) = \min \{ d \in {\mathbb {N}} \mid {\mathbb {P}}(\eta _{n,d} \geqslant \varepsilon ) \geqslant s \}\), so that if \(d_{max}^{\varepsilon , s}(n)\) points or more are placed randomly on \({\mathbb {S}}^{n}\), it is very likely (with probability \(\geqslant s\)) that at least one of the scalar products will be greater that \(\varepsilon\).

Theorem 3.3

The following asymptotic estimates hold:

  1. 1.

    \(\displaystyle d_{max}^{\varepsilon , s}(n) \underset{n \rightarrow \infty }{\sim }\ \sqrt{-2 \ln (1-s)} \left( \frac{1}{1-\varepsilon ^2} \right) ^{\frac{n-1}{2}}\)

  2. 2.

    \(\boxed { \eta _{n,d} \underset{n \text { or } d \rightarrow \infty }{\overset{{\mathbb {P}}}{ \xrightarrow {\hspace{0.8cm}} }} \sqrt{1-d^{-\frac{2}{n}}} }\).

To derive these formulas, we first need the following geometrical lemma.

Lemma 3.4

Let \(A_n = |{\mathbb {S}}^{n} |\) be the area of the complex n-sphere for \(\textrm{d}\sigma ^n\) (induced by Lebesgue’s measure), \(C_{n}^{\varepsilon }(x) = \{ u \in {\mathbb {S}}^{n} \mid |{\langle {u \vert x }\rangle } |\geqslant \varepsilon \}\) the ‘spherical cap’Footnote 2 centered in x of parameter \(\varepsilon\), and \(A_n^{\varepsilon } = |C_n^{\varepsilon } |\) the area of any spherical cap of parameter \(\varepsilon\). Then for all \(n\geqslant 1\):

$$\begin{aligned} \frac{A_n^{\varepsilon }}{A_n} = (1-\varepsilon ^2)^{n-1}. \end{aligned}$$

Proof of Lemma

This result can be directly obtained from the fact that, as noticed in the proof of Proposition 3.2, \(|{\langle { \mathcal {E}_1 \vert \mathcal {E}_2}\rangle } |^2\) follows a \(\beta _{1,n-1}\) distribution when \({|{\mathcal {E}_1}\rangle }\) and \({|{\mathcal {E}_2}\rangle }\) are chosen uniformly and independently on \({\mathbb {S}}^{n}\). We can then write:

$$\begin{aligned} \frac{A_n^{\varepsilon }}{A_n} = {\mathbb {P}}(|{\langle {u \vert x }\rangle } |^2 \geqslant \varepsilon ^2) = \int _{\varepsilon ^2}^1 \frac{\Gamma (n)}{\Gamma (n-1)} (1-x)^{n-2} \textrm{d}x = \left[ (1-x)^{n-1} \right] _{\varepsilon ^2}^{1} = (1-\varepsilon ^2)^{n-1}. \end{aligned}$$

A more ‘physicist-friendly’ proof can also be given, based on an appropriate choice of coordinates on the n-sphere. Recall that \({\mathbb {S}}^{n} \subset {\mathbb {C}}^n \simeq {\mathbb {R}}^{2n}\) can be seen as a real manifold of dimension \(2n-1\). Consider the set of coordinates \((r,\theta , \varphi _1, \dots , \varphi _{2n-3})\) on \({\mathbb {S}}^{n}\) defined by the chart

$$\begin{aligned} \begin{array}{cccl} F: &{} [0,1] \times [0,2\pi [ \times [0, \pi ]^{2n-4} \times [0,2\pi [ &{} \longrightarrow &{} {\mathbb {S}}^{n} \\ &{} (r, \theta , \varphi _1, \dots , \varphi _{2n-3}) &{} \longmapsto &{} (x_1+ix_2, \dots , x_{2n-1}+ix_{2n}) \simeq (x_1, \dots , x_{2n}) = \\ &{}&{}&{} ( r \cos (\theta ), r \sin (\theta ), \sqrt{1-r^2} \cos (\varphi _1), \sqrt{1-r^2} \sin (\varphi _1) \cos (\varphi _2), \dots , \\ &{}&{}&{} \sqrt{1-r^2} \sin (\varphi _1) \dots \cos (\varphi _{2n-3}), \sqrt{1-r^2} \sin (\varphi _1) \dots \sin (\varphi _{2n-3}) ). \end{array} \end{aligned}$$

This amounts to choose the modulus r and the argument \(\theta\) of \(x_1+ix_2\), and then describe the remaining parameters using the standard spherical coordinates on \({\mathbb {S}}^{n-1}\), seen as a sphere of real dimension \(2n-3\), including a radius factor \(\sqrt{1-r^2}\). The advantage of these coordinates is that \(C_{n}^{\varepsilon }(1, 0,\dots ,0)\) simply corresponds to the set of points for which \(r \geqslant \varepsilon\).

The metric in these coordinates happens to be diagonal, given by:

  • \({{g}_{rr}}=\left\langle \left. {{e}_{r}} \right|{{e}_{r}} \right\rangle =1+\frac{{{r}^{2}}}{1-{{r}^{2}}}\)

  • \({{g}_{\theta \theta }}=\left\langle \left. {{e}_{\theta }} \right|{{e}_{\theta }} \right\rangle ={{r}^{2}}\)

  • \(g_{\varphi _i \varphi _i} = (1-r^2) [g_{\varphi _i \varphi _i}]\) with [g] the metric corresponding to the spherical coordinates on \({\mathbb {S}}^{n-1}\).

It is now easy to compute the desired quantity:

$$\begin{aligned} A_n^{\varepsilon }&= \int _\varepsilon ^1 \sqrt{1 + \frac{r^2}{1-r^2}} \textrm{d}r \int _{[0,2\pi [\times [0,\pi [^{2n-4} \times [0,2\pi [ } r\textrm{d}\theta \sqrt{1-r^2}^{2n-3} \sqrt{[g]} \textrm{d}\varphi _1 \dots \textrm{d}\varphi _{2n-3} \\&= 2\pi A_{n-1} \int _\varepsilon ^1 r(1-r^2)^{n-2} \textrm{d}r \\&= \frac{\pi A_{n-1}}{n-1} \int _\varepsilon ^1 2(n-1)r(1-r^2)^{n-2} \textrm{d}r \\&= \frac{\pi A_{n-1}}{n-1} (1-\varepsilon ^2)^{n-1} \end{aligned}$$

and, finally,

$$\begin{aligned} \frac{A_n^{\varepsilon }}{A_n} = \frac{A_n^{\varepsilon }}{A_n^0} = (1-\varepsilon ^2)^{n-1}. \end{aligned}$$

\(\square\)

We are now ready to prove the theorem.

Proof of Theorem 3.3

For this proof, we find some inspiration in [10], but eventually obtain sharper bounds with simpler arguments. Another major reference concerning spherical caps is [11]. We say that a set of vectors on a sphere are \(\varepsilon\)-separated if all scalar products between any pairs among them are not greater than \(\varepsilon\) in modulus. Denote \(\textrm{d}{\overline{\sigma }}^n\) the normalized Lebesgue’s measure on \({\mathbb {S}}^{n}\), that is \(\textrm{d}{\overline{\sigma }}^n = \frac{\textrm{d}\sigma ^n}{A_n}\), and consider the following events:

  • \(A: \forall k \in \llbracket 1, d-1 \rrbracket , |{\langle { \mathcal {E}_d \vert \mathcal {E}_k }\rangle } |\leqslant \varepsilon\)

  • \(B: ({|{\mathcal {E}_k}\rangle })_{1\leqslant k \leqslant d-1}\) are \(\varepsilon -\)separated

so as to write \({\mathbb {P}}(\eta _{n,d} \leqslant \varepsilon ) = {\mathbb {P}}(A \mid B) {\mathbb {P}}(B) = \frac{ {\mathbb {P}}(A \cap B)}{ {\mathbb {P}}(B)} {\mathbb {P}}(\eta _{n,d-1} \leqslant \varepsilon )\), with:

$$\begin{aligned} \frac{ {\mathbb {P}}(A \cap B)}{ {\mathbb {P}}(B)}&= \frac{ \displaystyle \int _{({\mathbb {S}}^{n})^{d-1}} \textrm{d}{\overline{\sigma }}^n(x_1) \dots \textrm{d}{\overline{\sigma }}^n(x_{d-1}) \mathbbm {1}_{ \{ x_1, \dots , x_{d-1} \text { are } \varepsilon \text {-separated} \} } \left( 1- \frac{\left| \bigcup _{k=1}^{d-1} C_{n}^{\varepsilon }(x_k) \right| }{A_n} \right) }{ \displaystyle \int _{({\mathbb {S}}^{n})^{d-1}} \textrm{d}{\overline{\sigma }}^n(x_1) \dots \textrm{d}{\overline{\sigma }}^n(x_{d-1}) \mathbbm {1}_{ \{ x_1, \dots , x_{d-1} \text { are } \varepsilon \text {-separated} \} } } \\&= 1- {\mathbb {E}}\left( \frac{ \left| \bigcup _{k=1}^{d-1} C_{n}^{\varepsilon }({|{\mathcal {E}_k}\rangle }) \right| }{A_n} \Bigg \vert B \right) . \end{aligned}$$

We need to find bounds on the latter quantity. Obviously, \({\mathbb {E}}\left( \frac{ \left| \bigcup _{k=1}^{d-1} C_{n}^{\varepsilon }({|{\mathcal {E}_k}\rangle }) \right| }{A_n} \Bigg \vert B \right) \leqslant (d-1) \frac{A_n^{\varepsilon }}{A_n}\), corresponding to the case when all the caps are disjoint. For the lower bound, define the sequence \(u_d = {\mathbb {E}}\left( \frac{ \left| \bigcup _{k=1}^{d} C_{n}^{\varepsilon }({|{\mathcal {E}_k}\rangle }) \right| }{A_n} \right)\), which clearly satisfies \(u_d \leqslant {\mathbb {E}}\left( \frac{ \left| \bigcup _{k=1}^{d} C_{n}^{\varepsilon }({|{\mathcal {E}_k}\rangle }) \right| }{A_n} \Bigg \vert B \right)\), because conditioning on the vectors being separated can only decrease the overlap between the different caps. First observe that \(u_1 = \frac{A_n^{\varepsilon }}{A_n} \equiv \alpha\), and compute:

$$\begin{aligned} u_d&= u_{d-1} + {\mathbb {E}}\left( \frac{ \left| C_{n}^{\varepsilon }({|{\mathcal {E}_d}\rangle }) \setminus \bigcup _{k=1}^{d-1} C_{n}^{\varepsilon }({|{\mathcal {E}_k}\rangle }) \right| }{A_n} \right) \\&= u_{d-1} + \int _{({\mathbb {S}}^{n})^{d}} \textrm{d}{\overline{\sigma }}^n(x_1) \dots \textrm{d}{\overline{\sigma }}^n(x_d) \int _{C_{n}^{\varepsilon }(x_d)} \mathbbm {1}_{ \{ y \notin \bigcup _{k=1}^{d-1} C_{n}^{\varepsilon }(x_k) \} } \textrm{d}{\overline{\sigma }}^n(y) \\&= u_{d-1} + \int _{({\mathbb {S}}^{n})^{d-1}} \textrm{d}{\overline{\sigma }}^n(x_1) \dots \textrm{d}{\overline{\sigma }}^n(x_{d-1}) \int _{{\mathbb {S}}^{n}} \frac{\left| C_{n}^{\varepsilon }(y) \right| }{A_n} \mathbbm {1}_{ \{ y \notin \bigcup _{k=1}^{d-1} C_{n}^{\varepsilon }(x_k) \} } \textrm{d}{\overline{\sigma }}^n(y) \\&= u_{d-1} + \frac{A_n^\varepsilon }{A_n} \int _{({\mathbb {S}}^{n})^{d-1}} \textrm{d}{\overline{\sigma }}^n(x_1) \dots \textrm{d}{\overline{\sigma }}^n(x_{d-1}) \left( 1- \frac{ \left| \bigcup _{k=1}^{d-1} C_{n}^{\varepsilon }(x_k) \right| }{A_n} \right) \\&= u_{d-1} + \frac{A_n^{\varepsilon }}{A_n} (1-u_{d-1}) \\&= (1-\alpha )u_{d-1} + \alpha , \end{aligned}$$

where the main trick was to invert the integrals on \(x_d\) and on y. This result is actually quite intuitive: it states that when adding a new cap, only a fraction \(1-u_{d-1}\) of it on average will be outside the previous caps and contribute to the new total area covered by the caps. Hence \(u_d = 1- (1-\alpha )^d\), and the recurrence relation becomes:

$$\begin{aligned} \left( 1 - (d-1) \frac{A_n^{\varepsilon }}{A_n} \right) {\mathbb {P}}(\eta _{n,d-1}\leqslant \varepsilon ) \leqslant {\mathbb {P}}(\eta _{n,d} \leqslant \varepsilon ) \leqslant \left( 1-\frac{A_n^{\varepsilon }}{A_n} \right) ^{d-1} {\mathbb {P}}(\eta _{n,d-1}\leqslant \varepsilon ). \end{aligned}$$

Applying the lemma, we get by induction:

$$\begin{aligned} \prod _{k=1}^{d-1} (1-k (1-\varepsilon ^2)^{n-1} ) \leqslant {\mathbb {P}}(\eta _{n,d} \leqslant \varepsilon ) \leqslant (1- (1-\varepsilon ^2)^{n-1})^{\frac{d(d-1)}{2} }. \end{aligned}$$

Note that the left inequality is valid only as long as \(d \leqslant \big ( \frac{1}{1-\varepsilon ^2} \big )^{n-1}\), but when d is larger than this critical value, the right hand side becomes very small (of order \(e^{-1/2(1-\varepsilon ^2)^{n-1}}\)), so we may take 0 as a good lower bound in this case. The two bounds are in fact extremely close to each other, and get closer as n or d goes larger. To quantify this precisely, let’s denote \(f_{n,d}(\varepsilon ) = (1- (1-\varepsilon ^2)^{n-1})^{\frac{d(d-1)}{2}}\), \(g_{n,d}(\varepsilon ) = \prod _{k=1}^{d-1} (1-k (1-\varepsilon ^2)^{n-1} )\), and let’s show that \(|f_{n,d}(\varepsilon ) - g_{n,d}(\varepsilon ) |\underset{n \text { or } d \rightarrow \infty }{\longrightarrow } 0\). Two cases have to be considered.

  • First case: if \(d \geqslant d_c \equiv \left( \frac{1}{1-\varepsilon ^2}\right) ^{\frac{3}{5} (n-1)}\), then \(f_{n,d}(\varepsilon )\) is small so we can write:

    $$\begin{aligned} |f_{n,d}(\varepsilon ) - g_{n,d}(\varepsilon ) |&\leqslant f_{n,d}(\varepsilon ) = e^{\frac{d(d-1)}{2} \ln (1- (1-\varepsilon ^2)^{n-1})} \\&\leqslant e^{-\frac{(d-1)^2}{2} (1-\varepsilon ^2)^{n-1}} \quad \quad \quad \text {(since } \ln (1-x) \leqslant -x) \\&\leqslant e^{ -\frac{1+ o(1)}{2 (1-\varepsilon ^2)^{\frac{n-1}{5}}}} \quad \quad \quad \text {(using } d \geqslant d_c) \\&\leqslant e^{-\frac{d^{1/3}}{2} (1+o(1))}, \end{aligned}$$

    where \(1+o(1) = \left( \frac{d-1}{d} \right) ^2 \underset{n \text { or } d \rightarrow \infty }{\longrightarrow } 1\).

  • Second case: if \(d \leqslant d_c\), first note that \(\forall k \in \llbracket 1,d \rrbracket , \forall x \in [ 0,\frac{1}{d^{5/3}}[\),

    $$\begin{aligned} 1 \leqslant \frac{(1-x)^k}{1-kx} \leqslant \frac{1- kx + \frac{k(k-1)}{2}x^2}{1-kx} \leqslant 1+ \frac{k(k-1)}{2} \frac{x^2}{(1-x^{2/5})}. \end{aligned}$$

    Therefore,

    $$\begin{aligned} \left|\ln (f_{n,d}(\varepsilon ) - \ln (g_{n,d}(\varepsilon )) \right|=&\left|\sum _{k=1}^{d-1} k \ln (1- (1-\varepsilon ^2)^{n-1}) - \ln (1-k (1-\varepsilon ^2)^{n-1}) \right|\\ \leqslant&\sum _{k=1}^{d-1} \left|\ln \left( 1+ \frac{k(k-1)}{2} \frac{(1-\varepsilon ^2)^{2(n-1)}}{1- (1-\varepsilon ^2)^{\frac{2}{5}(n-1)}} \right) \right|\quad \\&\quad \text {(applying the inequality for } x = (1-\varepsilon ^2)^{n-1}) \\ \leqslant&\frac{(1-\varepsilon ^2)^{2(n-1)}}{1- (1-\varepsilon ^2)^{\frac{2}{5}(n-1)}} \underbrace{\sum _{k=1}^{d-1} \frac{k(k-1)}{2}}_{=\frac{d^3}{6} - \frac{d^2}{2} + \frac{d}{3} \leqslant \frac{d^3}{6} } \\ \leqslant&\frac{ (1-\varepsilon ^2)^{\frac{n-1}{5}} }{ 6 (1-(1-\varepsilon ^2)^{\frac{2}{5}(n-1)}) } \quad \quad \quad \text {(using } d \leqslant d_c) \\ \leqslant&\frac{d^{-\frac{1}{3}} }{ 6 (1-d^{-\frac{2}{3}}) }. \\ \end{aligned}$$

    Hence:

    $$\begin{aligned}&\frac{g_{n,d}(\varepsilon )}{f_{n,d}(\varepsilon )} \in \Big [ \exp \left( -\frac{ (1-\varepsilon ^2)^{\frac{n-1}{5}} }{ 6 (1-(1-\varepsilon ^2)^{\frac{2}{5}(n-1)}) } \right) , 1 \Big ] \\ \Rightarrow \quad&|f_{n,d}(\varepsilon ) - g_{n,d}(\varepsilon ) |\leqslant \left( 1 - \exp \left( - \frac{ (1-\varepsilon ^2)^{\frac{n-1}{5}} }{ 6 (1-(1-\varepsilon ^2)^{\frac{2}{5}(n-1)}) } \right) \right) f_{n,d}(\varepsilon ) \\&\quad \leqslant \frac{ (1-\varepsilon ^2)^{\frac{n-1}{5}} }{ 6 (1-(1-\varepsilon ^2)^{\frac{2}{5}(n-1)}) } \quad \quad \quad \text {(since } 1-e^{-x} \leqslant x \text { and } f_{n,d}(\varepsilon ) \leqslant 1) \\&\quad \leqslant \frac{d^{-\frac{1}{3}} }{ 6 (1-d^{-\frac{2}{3}}) }. \end{aligned}$$

We have thus shown that the difference between the two bounds \(f_{n,d}(\varepsilon )\) and \(g_{n,d}(\varepsilon )\) can be controlled by a quantity that can be expressed solely in terms of either n or d but that anyway vanishes when either n or d tend to infinity. If we call \(\xi\) this vanishing term, it is straightforward to see that:

$$\begin{aligned} \min \{ d \in {\mathbb {N}} \mid 1- f_{n,d}(\varepsilon ) + \xi \geqslant s \} \leqslant d_{max}^{\varepsilon , s}(n) \leqslant \min \{ d \in {\mathbb {N}} \mid 1- f_{n,d}(\varepsilon ) \geqslant s \}, \end{aligned}$$

and after some work, this implies:

$$\begin{aligned} \Bigg \lfloor \sqrt{-2\ln (1-s+\xi )} \sqrt{\frac{1}{(1-\varepsilon ^2)^{n-1}} -1} \Bigg \rfloor \leqslant d_{max}^{\varepsilon , s}(n) \leqslant \Bigg \lceil \frac{\sqrt{-2\ln (1-s)} }{(1-\varepsilon ^2)^{\frac{n-1}{2}}} \Bigg \rceil , \end{aligned}$$

hence \(\displaystyle d_{max}^{\varepsilon , s}(n) \underset{n \rightarrow \infty }{\sim }\ \sqrt{-2 \ln (1-s)} \left( \frac{1}{1-\varepsilon ^2} \right) ^{\frac{n-1}{2}}\), which is the first part of the theorem.

The intuition concerning the second statement comes from the following observation. We know that \({\mathbb {P}}(\eta _{n,d} \leqslant \varepsilon ) \simeq f_{n,d}(\varepsilon )\), and this function happens to be almost constant equal to 0 in the vicinity of \(\varepsilon =0\), almost 1 in the vicinity of \(\varepsilon =1\), and to have a very sharp step between the two; this step sharpens as n or d grows larger. This explains why the mass of probability is highly peaked around a critical value \(\varepsilon _c\), so that \(\displaystyle \eta _{n,d} \simeq {\mathbb {E}}(\eta _{n,d})\) converges to a deterministic variable when n or d \(\rightarrow \infty\). This is certainly due to the averaging effect of considering the maximum of a set of \(\frac{d(d+1)}{2}\) scalar products. The critical value \(\varepsilon _c\) satisfies:

$$\begin{aligned} (1- (1-\varepsilon _c^2)^{n-1})^{\frac{d(d-1)}{2}} = \frac{1}{2} \Leftrightarrow \varepsilon _c = \sqrt{ 1- (1- 2^{-2/d(d-1)})^{1/n-1} } \simeq \sqrt{1-d^{-2/n}}. \end{aligned}$$

Now, the precise proof of the convergence of \(\eta _{n,d}\) in probability goes as follows. Let \(\delta >0\). We have to show that \({\mathbb {P}}\left( \left|\eta _{n,d} - \sqrt{1-d^{-\frac{2}{n}}} \right|\leqslant \delta \right) \underset{n \text { or } d \rightarrow \infty }{\longrightarrow } 1\). It is equivalent but easier to show that \({\mathbb {P}}\left( \sqrt{1-d^{-\frac{2}{n}} - \delta } \leqslant \eta _{n,d} \leqslant \sqrt{1-d^{-\frac{2}{n}} +\delta } \right) \underset{n \text { or } d \rightarrow \infty }{\longrightarrow } 1\). Taking \(f_{n,d}\) as an approximation for the distribution function of \(\eta _{n,d}\), we can write:

$$\begin{aligned} {\mathbb {P}}\left( \eta _{n,d} \leqslant \sqrt{1-d^{-\frac{2}{n}} +\delta } \right) = \left( 1- \max \left( 0,d^{-2/n} - \delta \right) ^{n-1} \right) ^{\frac{d(d-1)}{2}} + o(1), \end{aligned}$$

where o(1) stands for a quantity that goes to zero when either n or d goes to infinity (bounded by \(\xi\)), and where the max appears because if \(d^{-2/n} \leqslant \delta\), \({\mathbb {P}}\left( \eta _{n,d} \leqslant \sqrt{1-d^{-\frac{2}{n}} +\delta } \right)\) is simply equal to 1. Clearly, \({\mathbb {P}}\left( \eta _{n,d} \leqslant \sqrt{1-d^{-\frac{2}{n}} +\delta } \right) \underset{n \text { or } d \rightarrow \infty }{\longrightarrow } 1\), and similarly, one shows that \({\mathbb {P}}\left( \eta _{n,d} \leqslant \sqrt{1-d^{-\frac{2}{n}} - \delta } \right) \underset{n \text { or } d \rightarrow \infty }{\longrightarrow } 0\), which completes the proof. \(\square\)

During the reviewing process, we discovered that the formula \(\sqrt{1-d^{-\frac{2}{n}}}\) had already been obtained in [12]. However, this work only deals with the maximum of the d scalar products between say the north pole and a set of d independent random vectors. This situation is easier to treat, in particular because the d scalar products are then independent random variables, which is certainly not the case for our \(\frac{d(d+1)}{2}\) scalar products.

3.4 Comparison with Simulation and Consequences

The above expressions actually give incredibly good estimations for \(d_{max}^{\varepsilon , s}(n)\) and \(\eta _{n,d}\), as shown in Figs. 1 and 2.

Fig. 1
figure 1

Simulation vs prediction for \(d_{max}^{\varepsilon , s}(n)\)

Fig. 2
figure 2

Simulation vs prediction for \(d \mapsto {\mathbb {E}}(\eta _{n,d})\) at fixed n

This theorem has a strong physical consequence. Indeed, \(\mathcal {E}\) induces proper decoherence on \(\mathcal {S}\) as long as \(\eta _{n,d} \ll 1\), that is when \(d^{-2/n}\) is very close to 1, i.e. when \(d \ll e^{n/2}\). Going back to physically meaningful quantities, we write as previously \(n = p^{N_\mathcal {E}}\) and \(d = p^{N_\mathcal {S}}\) where \(N_\mathcal {E}\) and \(N_\mathcal {S}\) stand for the number of particles composing \(\mathcal {E}\) and \(\mathcal {S}\). The condition becomes: \(2 \ln (p)N_\mathcal {S} \ll p^{N_\mathcal {E}}\) or simply:

$$\begin{aligned} \boxed { \frac{\ln (N_\mathcal {S})}{\ln (p)} \ll N_\mathcal {E}. } \end{aligned}$$

A more precise condition can be obtained using \(d_{max}\), because \(\mathcal {E}\) induces proper decoherence on \(\mathcal {S}\) as long as \(d \leqslant d_{max}^{\varepsilon , s}(n)\) for an arbitrary choice of \(\varepsilon\) close to 0 and s close to 1. This rewrites: \(2\ln (p)N_\mathcal {S} \leqslant \ln (\sqrt{-2 \ln (1-s)}) + \ln \left( \frac{1}{1-\varepsilon ^2} \right) p^{N_\mathcal {E}} \simeq \varepsilon ^2 p^{N_\mathcal {E}}\) or simply: \(\ln (N_\mathcal {S}) \leqslant 2 \ln (\varepsilon ) + \ln (p) N_\mathcal {E}\). Thus, for instance, a gas composed of thousands of particles will lose most of its coherence if it interacts with only a few external particles. It is rather surprising that so many points can be placed randomly on a n-sphere before having the maximum of the scalar products becoming non-negligible. It is this property that makes decoherence an extremely efficient high-dimensional geometrical phenomenon.

3.5 Discussing the Hypotheses

On the one hand, this result could be seen as a worst case scenario for decoherence, since realistic Hamiltonians are far from random and actually discriminate even better the different possible histories. This is especially true if \(\mathcal {E}\) is a measurement apparatus for example, whose Hamiltonian is by construction such that the \(({|{\mathcal {E}_i(t)}\rangle })_{1\leqslant i \leqslant d}\) evolve quickly and deterministically towards orthogonal points of the sphere.

On the other hand, pursuing such a high level of generality led us to abstract and unphysical assumptions. First, realistic dynamics are not isotropic on the n-sphere (some transitions are more probable than others). Then, the assumption that each \({|{\mathcal {E}_i(t)}\rangle }\) can explore indistinctly all the states of \(\mathcal {H}_{\mathcal {E}}\) is very criticizable. As explained in [13]:

‘...the set of quantum states that can be reached from a product state with a polynomial-time evolution of an arbitrary time-dependent quantum Hamiltonian is an exponentially small fraction of the Hilbert space. This means that the vast majority of quantum states in a many-body system are unphysical, as they cannot be reached in any reasonable time. As a consequence, all physical states live on a tiny submanifold.’

It would then be more accurate in our model to replace \({\mathbb {S}}^{n}\) by this submanifold. But how does it look like geometrically and what is its dimension? If it were a subsphere of \({\mathbb {S}}^{n}\) of exponentially smaller dimension, then n should be replaced everywhere by something like \(\ln (n)\) in what precedes, so the condition would rather be \(N_\mathcal {S} \ll N_\mathcal {E}\) which is a completely different conclusion. Some clues to better grasp the submanifold are found in [14, §3.4]:

‘...one can prove that low-energy eigenstates of gapped Hamiltonians with local interactions obey the so-called area-law for the entanglement entropy. This means that the entanglement entropy of a region of space tends to scale, for large enough regions, as the size of the boundary of the region and not as the volume. (...) In other words, low-energy states of realistic Hamiltonians are not just ‘any’ state in the Hilbert space: they are heavily constrained by locality so that they must obey the entanglement area-law.’

More work is needed in order to draw precise conclusions taking this physical remarks into account.

4 Second Model: Interacting Particles

4.1 The Environment Feels the System

At present, let’s better specify the nature of the environment. Suppose that the energy of interaction dominates the evolution of the whole system \(\mathcal {S} + \mathcal {E}\) and can be expressed in terms of the positions \(x_1, \dots , x_N\) of the N particles composing the environment, together with the state of \(\mathcal {S}\) (this is the typical regime for macroscopic systems which decohere in the position basis [15, §III.E.2.]). If the latter is \({|{i}\rangle }\), denote \(H(i, x_1 \dots x_N)\) this energy. The initial state \({|{\Psi }\rangle } = \left( \sum _{i=1}^d c_i {|{i}\rangle } \right) \otimes \int f( x_1 \dots x_N) {|{ x_1 \dots x_N}\rangle } \textrm{d}x_1 \dots \textrm{d}x_N\) evolves into:

$$\begin{aligned} \sum _{i=1}^d c_i {|{i}\rangle } \otimes \underbrace{\int f( x_1 \dots x_N) e^{\frac{i}{\hbar } H(i, x_1 \dots x_N) t} {|{ x_1 \dots x_N}\rangle } }_{= {|{\mathcal {E}_i(t)}\rangle }} \textrm{d}x_1 \dots \textrm{d}x_N. \end{aligned}$$

Therefore:

$$\begin{aligned} {\langle { \mathcal {E}_i(t) \vert \mathcal {E}_j(t)}\rangle } = \int |f( x_1 \dots x_N) |^2 e^{\frac{i}{\hbar } \Delta (i,j, x_1 \dots x_N) t} \textrm{d}x_1 \dots \textrm{d}x_N, \end{aligned}$$

where \(\Delta (i,j, x_1 \dots x_N) = H(j, x_1 \dots x_N) - H(i, x_1 \dots x_N)\) is a spectral gap between eigenvalues of the Hamiltonian, measuring how much the environment feels the transition of \(\mathcal {S}\) from \({|{i}\rangle }\) to \({|{j}\rangle }\) in a given configuration of the environment. In a time interval \([-T,T]\), the mean value \(\frac{1}{2T} \int _{-T}^T {\langle { \mathcal {E}_i(t) \vert \mathcal {E}_j(t)}\rangle } \textrm{d}t\) yields \(\int |f( x_1 \dots x_N) |^2 \textrm{sinc}(\frac{\Delta (i,j, x_1 \dots x_N)}{\hbar } T)\) which is close to zero for all i and j as soon as \(T > \frac{\pi \hbar }{\displaystyle \min _{i, j, x_1 \dots x_N} \Delta (i,j, x_1 \dots x_N)}\), which is likely to be small if \(\mathcal {E}\) is a macroscopic system, for the energies involved will be much greater than \(\hbar\). Similarly, the empirical variance is:

$$\begin{aligned} {\mathbb {V}} = \frac{1}{2T} \int _{-T}^T |{\langle { \mathcal {E}_i(t) \vert \mathcal {E}_j(t)}\rangle } |^2 \textrm{d}t \sim \int |f( x_1 \dots x_N) |^4 \textrm{d}x_1 \dots \textrm{d}x_N, \end{aligned}$$

plus terms that go to zero after a short time. Note that the variables \(x_1 \dots x_N\) could be discretized to take p possible values, in which case \(n = \dim (\mathcal {H}_\mathcal {E}) = p^N\), and the integral becomes a finite sum. For a delocalized initial state with constant f, this sum is equal to \(p^{-N}\), and we recover the previous estimate (2) if \(d=2\): \(\eta \sim p^{-N/2}\). This model teaches us that the more the environment feels the difference between the possible histories, the more they decohere.

4.2 Entanglement Entropy as a Measure of Decoherence

What precedes suggests the following intuition: the smaller \(\eta\) is, the more information the environment has stored about the system because the more distinguishable (i.e. orthogonal) the \(({|{\mathcal {E}_i(t)}\rangle })_{1\leqslant i \leqslant d}\) are; on the other hand, the smaller \(\eta\) is, the fewer quantum interferences occur. It motivates the search for a general relationship between entanglement entropy (defined as the von Neumann entropy of the reduced density matrix of either \(\mathcal {S}\) or \(\mathcal {E}\), i.e. how much \(\mathcal {E}\) knows about \(\mathcal {S}\) or vice-versa) and the level of classicality of a system. Such results have already been derived for specific environments [4, (3.76)] [16, 17] but not, to our knowledge, in the general case. The following formula is proved in the annex A when S stands for the linear entropy (or purity defect) \(1-{\text {tr}}(\rho ^2)\), and some justifications are given when S denotes the entanglement entropy:

$$\begin{aligned} \forall F \subset \mathcal {H}_{\mathcal {S}}, \quad |{\text {tr}}(\rho _{\mathcal {S}}(t) \Pi _F) - {\text {tr}}(\rho _{\mathcal {S}}^{(d)} \Pi _F) |\leqslant \dim (F) \; \sqrt{ 1 - \inf _{{|{\Psi _\mathcal {S}(0)}\rangle }} \frac{ S(\rho _{\mathcal {S}}(t)) }{S(\rho _{\mathcal {S}}^{(d)})} }. \end{aligned}$$
(3)

5 Alternative Definitions for \(\eta\)

Lemma 3.4 allows for another way to quantify decoherence, which could be to ask, for each possible history i, what is the fraction \(F^\varepsilon _{n,d}\) of the other possible histories with which it interferes significantly, that is how many indices j are there such that \(|{\langle {\mathcal {E}_i \vert \mathcal {E}_j}\rangle } |\geqslant \varepsilon\)? As remarked in [18], this quantity is simply given by:

$$\begin{aligned} F^\varepsilon _{n,d} = \frac{1}{d-1} \sum _{\begin{array}{c} j=1 \\ j \ne i \end{array}}^d h_{ij} \quad \text {with } h_{ij} = \left\{ \begin{array}{ll} 1 \text { if } {|{ \mathcal {E}_j}\rangle } \in C^\varepsilon _n({|{\mathcal {E}_i}\rangle }) \\ 0 \text { otherwise } \end{array} \right. \end{aligned}$$

By the law of large numbers, we immediately deduce that \(F^\varepsilon _{n,d} \underset{d \rightarrow \infty }{\overset{\text {a.s.}}{ \longrightarrow }} \mathbb {P}(h_{ij} = 1) = (1- \varepsilon ^2)^{n-1}\), and we recover once again in this expression that the typical level of decoherence is \(\varepsilon \sim \frac{1}{\sqrt{n}}\). More interesting, perhaps, could be to quantify decoherence using the ‘expectation’ of the scalar products \(|{\langle {\mathcal {E}_i \vert \mathcal {E}_j}\rangle } |^2\) for \(i\ne j\), weighted by their quantum probabilities \(\frac{|c_i |^2|c_j |^2}{\sum _{i\ne j} |c_i |^2|c_j |^2} = \frac{|c_i |^2|c_j |^2}{1 - \sum _i |c_i |^4}\). One could then define:

$$\begin{aligned} \tilde{\eta } = \frac{1}{1 - \sum _i |c_i |^4} \sum _{i \ne j} |c_i |^2|c_j |^2 |{\langle {\mathcal {E}_i \vert \mathcal {E}_j}\rangle } |^2 = \frac{1}{1 - \sum _i |c_i |^4} \left( {\text {tr}}(\rho ^2_{\mathcal {S}}) - \sum _i |c_i |^4 \right) , \end{aligned}$$

based on a computation made in the annex A.2. The great advantage of this definition is that it can naturally be extended to the infinite dimensional case, unlike \(\displaystyle \max _{i \ne j} \; |{\langle {\mathcal {E}_i(t) \vert \mathcal {E}_j(t)}\rangle } |\) (if the scalar products vary continuously, their supremum is necessarily 1). Our proposal for quantifying decoherence in infinite dimension is therefore:

$$\begin{aligned} \tilde{\eta } = \frac{1}{1 - \int |c_x |^4 \textrm{d}x} \left( {\text {tr}}(\rho ^2_{\mathcal {S}}) - \int |c_x |^4 \textrm{d}x \right) . \end{aligned}$$

6 Conclusion

We introduced, in a mathematically rigorous way, general quantities that can be relevant for any study on decoherence, in particular the parameter \(\eta (t)\) that quantifies the level of decoherence at a given instant. Two simple models were then presented, designed to feel more intuitively the general process of decoherence. Most importantly, our study revealed the mathematical reason why the latter is so fast and universal, namely because surprisingly many points can be placed randomly on a n-sphere before having the maximum of the scalar products becoming non-negligible. We also learned that decoherence is neither perfect nor everlasting, since \(\eta\) is not expected to be exactly 0 and will eventually become large again (according to Borel-Cantelli’s lemma for the first model, and finding particular times such that all the exponentials are almost real in the second) pretty much like the ink drop in the glass of water will re-form again due to Poincaré’s recurrence theorem, even though the recurrence time can easily exceed the lifetime of the universe for realistic systems [9]. Finally, decoherence can be estimated by entanglement entropy because \(\eta\) is linked to what the environment knows about the system.

Further works could include the search for a description of the submanifold of reachable states mentioned in §3.5; a generalization to the cases where the initial environment is not in a pure state or the interaction admits no conserved basis; the study of the infinite dimensional case. Another interesting question would be to investigate how \(\eta\) depends on the basis in which decoherence is considered: quantum interferences are indeed suppressed in the conserved basis, but how strong are they in the other bases?