Abstract
We study a class of Markov chains that model the evolution of a quantum system subject to repeated measurements. Each Markov chain in this class is defined by a measure on the space of matrices, and is then given by a random product of correlated matrices taken from the support of the defining measure. We give natural conditions on this support that imply that the Markov chain admits a unique invariant probability measure. We moreover prove the geometric convergence towards this invariant measure in the Wasserstein metric. Standard techniques from the theory of products of random matrices cannot be applied under our assumptions, and new techniques are developed, such as maximum likelihood-type estimations.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We consider a complex vector space \(\mathbb {C}^k\) and its projective space \({\mathrm P}(\mathbb {C}^k)\) equipped with its Borel \(\sigma \)-algebra \(\mathcal {B}\). For a nonzero vector \(x\in {\mathbb {C}}^k\), we denote \(\hat{x}\) the corresponding equivalence class of x in \({\mathrm P}({\mathbb {C}}^k)\). For a linear map \(v\in \mathrm {M}_k({\mathbb {C}})\) we denote \(v\cdot \hat{x} \) the element of the projective space represented by \(v\,x\) whenever \(v\,x\ne 0\). We equip \(\mathrm {M}_k({\mathbb {C}})\) with its Borel \(\sigma \)-algebra and let \(\mu \) be a measure on \(\mathrm {M}_k({\mathbb {C}})\) with a finite second moment, \(\int _{\mathrm {M}_k({\mathbb {C}})} \Vert v\Vert ^2\,\mathrm{d}\mu (v)<\infty \), that satisfies the stochasticity condition
(we discuss this condition below).
In this article we are interested in particular Markov chains \((\hat{x}_n)\) on \({\mathrm P}(\mathbb {C}^k)\), defined by
where \(V_n\) is an \(\mathrm {M}_k({\mathbb {C}})\)-valued random variable with a probability density \( \Vert v x_n\Vert ^2/\Vert x_n\Vert ^2 \mathrm {d} \mu (v).\) Condition (1) ensures this is a probability density for any \(x_n\ne 0\). More precisely, such a Markov chain is associated with the transition kernel given for a set \(S\in \mathcal {B}\) and \(\hat{x}\in {\mathrm P}({\mathbb {C}}^k)\) by
where x is an arbitrary normalized vector representative of \(\hat{x}\). Moreover, the event \(\{vx=0\}\) always has probability 0, hence the Markov chain is well-defined on \({\mathrm P}({{\mathbb {C}}}^k)\). We recall that for any probability measure \(\nu \), \(\nu \Pi \) is the probability measure defined by
for any \(S\in \mathcal {B}\). A measure \(\nu \) is called invariant if \(\nu \Pi = \nu \).
We are interested in the large-time distribution of \((\hat{x}_n)\). Note that \(\hat{x}_n\) can be written as
so that the study of \(\hat{x}_n\) can be formulated in terms of random products of matrices. Markov chains associated to random products of matrices have been studied in a more general setting where the weight appearing in the transition kernel (2) is proportional to \(\Vert v x\Vert ^s\) for some \(s \ge 0\), instead of \(\Vert v x\Vert ^2\). The classical case of products of independent, identically distributed random matrices pioneered by Kesten, Furstenberg and Guivarc’h corresponds to \(s = 0\). In that case, for i.i.d. invertible random matrices \(Y_1,Y_2,\ldots \), denoting \(S_n=Y_n\ldots Y_1\), one is usually interested in the asymptotic properties of
for any \(x\ne 0\). In particular, a law of large numbers, a central limit theorem and a large deviation principle have been obtained for this quantity, under contractivity and strong irreducibility assumptions [8, 11, 16]. Such results are closely linked to the uniqueness of the invariant measure of the Markov chain
These results were generalized to the case \(s >0\) in [10]. Our framework corresponds to the case \(s=2\); in this case, and with the additional assumption (1), we provide a new method to study this Markov chain, and use it to derive the above results without assuming invertibility of the matrices, and with an optimal irreducibility assumption. We compare our approach with respect to that of [10] at the end of this section.
The method that we employ is motivated by an interpretation of this process as statistics of a quantum system being repeatedly indirectly measured. Let us expand on this as we introduce more notation and terminology. The set of states of a quantum system described by a finite dimensional Hilbert space \({{\mathbb {C}}}^k\) is the set of density matrices \({\mathcal {D}}_k:=\{\rho \in \mathrm {M}_k({{\mathbb {C}}})\ |\ \rho \ge 0,\ {{\text {tr}}}\,\rho =1\}\). This set is convex and the set of its extreme points is called the set of pure states. This latter set is in one to one correspondence with the projective space \({{\mathrm P}({{\mathbb {C}}}^k)}\) by the bijection \({{\mathrm P}({{\mathbb {C}}}^k)}\ni \hat{x}\mapsto \pi _{\hat{x}}\in {\mathcal {D}}_k\) with \(\pi _{\hat{x}}\) the orthogonal projector on the corresponding ray in \({{\mathbb {C}}}^k\). The time evolution of the system conditioned on a measurement outcome is encoded in a matrix v that updates the state of the system. The support of \(\mu \) is endowed with the meaning of the possible updates, and the system is updated according to v with a probability density \({{\text {tr}}}(v \rho v^*)\, \mathrm{d}\mu (v)\). Given v, a state \(\rho \) is mapped to a state \(v \rho v^*/{{\text {tr}}}(v \rho v^*)\). Iterating this procedure defines a random sequence \((\rho _n)\) in \({\mathcal {D}}_k\) called a quantum trajectory: after n measurements with resulting matrices \(v_1,\dots ,v_n\) the state of the system becomes
where \((v_1,\ldots ,v_n)\) has probability density \({{\text {tr}}}(v_{n} \dots v_{1} \rho _0 v_{1}^*\ldots v_{n}^*)\,\mathrm{d}\mu ^{\otimes n}(v_1,\ldots ,v_n)\). In other words, the process Eq. (3) describes an evolution of a repeatedly measured quantum system.
A key result in the theory of quantum trajectories is the purification theorem obtained by Kümmerer and Maassen [17] showing that quantum trajectories \((\rho _n)\) defined on \(\mathcal {D}_k\) almost surely approach the set of pure states (which are the extreme points of \(\mathcal {D}_k\)) if and only if the following purification condition is satisfied:
- (Pur)::
-
Any orthogonal projector \(\pi \) such that for any \(n\in {{\mathbb {N}}}\), \(\pi v_1^*\ldots v_n^* v_n\ldots v_1 \pi \propto \pi \) for \(\mu ^{\otimes n}\)-almost all \((v_1,\ldots ,v_n)\), is of rank one
(we write \(X \propto Y\) for X, Y two operators if there exists \(\lambda \in {{\mathbb {C}}}\) such that \(X=\lambda Y\)).
Under this assumption, the long-time behavior of the Markov chain is essentially dictated by its form on the set of pure states, i.e. for \(\rho _0=\pi _{\hat{x}_0}\). It is an immediate observation that
for all \(v\in \mathrm {M}_k({\mathbb {C}})\). This way our Markov chain \((\hat{x}_n)\) corresponds to the quantum trajectory \((\rho _n)\) described above when \(\rho _0\) is a pure state \(\pi _{\hat{x}_0}\).
Although ideas underlying our method are based on the connection of \((\hat{x}_n)\) with this physical problem, we will not explicitly use it in the paper. The notion of quantum trajectory originates in quantum optics [6], and Haroche’s Nobel prize winning experiment [9] is arguably the most prominent example of a system described by the above formalism. The reader interested in the involved mathematical structures might consult for example the review book [13] or the pioneering articles [14, 15, 17].
We will show that under the condition (Pur), the set of all invariant measures of the Markov chain (3) can be completely classified, depending on the operator \(\phi \) on \(\mathcal {D}_k\) describing the average evolution:
The map \(\phi \) on \({\mathcal {D}}_k\) is completely positive and trace-preserving.Footnote 1 Such a map is often called a quantum channel (see e.g. [22]). It has in particular the property of mapping states to states. Brouwer’s fixed point theorem shows that there exists an invariant state, i.e. \(\rho \in {\mathcal {D}}_k\) such that \(\phi (\rho )=\rho \). A necessary and sufficient algebraic condition for uniqueness of this invariant state is (see e.g. [5, 7, 22])
- (\(\phi \)-Erg)::
-
There exists a unique minimal non trivial subspace E of \({{\mathbb {C}}}^k\) such that \(\forall v\in {\text {supp}}\mu \), \(vE\subset E\).
If (\(\phi \)-Erg) holds with \(E={{\mathbb {C}}}^k\), then \(\phi \) is said to be irreducible. We chose the name (\(\phi \)-Erg) to avoid confusion with the notion of irreducibility for Markov chains. We moreover emphasize that we call this assumption (\(\phi \)-Erg) because it relies only on \(\phi \) and not on the different operators v in the support of \(\mu \): an equivalent statement of (\(\phi \)-Erg) is that there exists a unique minimal nonzero orthogonal projector \(\pi \) such that \(\phi (\pi )\le \lambda \pi \) for some \(\lambda \ge 0\) (see e.g. [20]).
We now state the main result of the paper:
Theorem 1.1
Assume that \(\mu \) satisfies assumptions (Pur) and (\(\phi \)-Erg). Then, the transition kernel \(\Pi \) has a unique invariant probability measure \(\nu _{\mathrm {inv}}\) and there exist \(m\in \{1,\ldots ,k\}\), \(C>0\) and \(0<\lambda <1\) such that for any probability measure \(\nu \) over \(\big ({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}}\big )\),
where \(W_1\) is the Wasserstein metric of order 1.
The Wasserstein metric is constructed with respect to a natural metric on the complex projective space. This metric is defined, for \(\hat{x},\,\hat{y}\) in \({{\mathrm P}({{\mathbb {C}}}^k)}\), by
where \(x,\,y\) are unit length representative vectors of \(\hat{x}\), \(\hat{y}\), and \(\langle \,\cdot \,,\,\cdot \,\rangle \) is the canonical hermitian inner product on \(\mathbb {C}^k\).
Let us now compare our results to those of the article [10] of Guivarc’h and Le Page. They consider a probability distribution \(\mu \) with support in \(\mathrm {GL}_k({\mathbb {C}})\), without requiring the normalization condition (1), and study the transition kernel on \({{\mathrm P}({{\mathbb {C}}}^k)}\) given, for \(S\in {\mathcal {B}}\), by
In the case \(s=2\), Theorem A of [10] implies the conclusions of Theorem 1.1 under two assumptions:
-
strong irreducibility, in the sense that there is no non-trivial finite union of proper subspaces of \({{\mathbb {C}}}^k\) left invariant by all \(v\in \mathrm {supp}\,\mu \),
-
contractivity, in the sense that there exists a sequence \((a_n)\) in \(T_\mu \), the smallest closed sub-semigroup of \(\mathrm {GL}_k({{\mathbb {C}}})\) containing \({\text {supp}}\mu \), such that \(\lim _{n\rightarrow \infty }a_n/\Vert a_n\Vert \) exists and is of rank one.
It is, however, immediate that strong irreducibility of \(\mu \) implies (\(\phi \)-Erg) with \(E={{\mathbb {C}}}^k\). In addition, if we assume \({\text {supp}}\mu \subset \mathrm {GL}_k({\mathbb {C}})\) and \({\text {supp}}\mu \) is strongly irreducible, the equivalence
holds (see “Appendix A”). Our results therefore offer a strong refinement of [10] in the restricted framework of \(s=2\) with \(\int v^* v \,\mathrm{d}\mu (v)=\mathrm{Id}_{{{\mathbb {C}}}^k}\). This assumption, although mathematically restrictive, is automatically verified in the framework of repeated (indirect) quantum measurements as described earlier in this section.
The article is structured as follows. Section 2 is devoted to the first part of Theorem 1.1, that is the uniqueness of the invariant measure. In Sect. 3 we show the geometric convergence towards the invariant measure with respect to the 1-Wasserstein metric. In Sect. 4 we discuss the Lyapunov exponents of the process and relate them to the convergence between the Markov chain and an estimate of the chain used in our proofs.
Notation For \(x\in {{\mathbb {C}}}^k\setminus \{0\}\), \(\hat{x}\) is its equivalence class in \({\mathrm P}({{\mathbb {C}}}^k)\) and, for \(\hat{x}\) in \({\mathrm P}({{\mathbb {C}}}^k)\), x is an arbitrary norm one vector representative of \(\hat{x}\). If e.g. \({{\mathbb {P}}}_\nu \) (resp. \({\mathbb {P}}^\rho \)) is a probability measure (depending on some a priori object \(\nu \) (resp. \(\rho \))) then \({{\mathbb {E}}}_\nu \) (resp. \({{\mathbb {E}}}^\rho \)) is the expectation with respect to \({{\mathbb {P}}}_\nu \) (resp. \({{\mathbb {P}}}^\rho \)). \({{\mathbb {N}}}\) represents the set of positive integers \(\{1,2,\ldots \}\).
2 Uniqueness of the invariant measure
This section concerns essentially the first part of Theorem 1.1. More precisely, under (\(\phi \)-Erg) and (Pur) we show that the Markov chain has a unique invariant measure. Note that an invariant measure always exists since \({\mathrm P}({\mathbb {C}}^k)\) is compact. We start by introducing a probability space describing both the state \(\hat{x}\in {\mathrm P}({{\mathbb {C}}}^k)\) and the sequence of matrices \((v_1,v_2,\ldots )\) such that \((v_n\ldots v_1\cdot \hat{x})\) has the same distribution as the Markov chain \((\hat{x}_n)\). Then, in Proposition 2.1, we show that the marginal on the matrix sequence is the same for any \(\Pi \)-invariant probability measure as long as (\(\phi \)-Erg) holds. In Proposition 2.2 and Lemma 2.3 we show that \((\hat{x}_n)\) is asymptotically a function of \((v_1,v_2,\ldots )\). We conclude on the uniqueness of the invariant measure in Corollary 2.4.
We now proceed to introduce some additional notation. We consider the space of infinite sequences \(\Omega :=\mathrm {M}_k({{\mathbb {C}}})^{{\mathbb {N}}}\), write \(\omega = (v_1,v_2, \dots )\) for any such infinite sequence, and denote by \(\pi _n\) the canonical projection on the first n components, \(\pi _n(\omega )=(v_1,\ldots ,v_n)\). Let \({\mathcal {M}}\) be the Borel \(\sigma \)-algebra on \(\mathrm {M}_k({{\mathbb {C}}})\). For \(n\in {{\mathbb {N}}}\), let \(\mathcal {O}_n\) be the \(\sigma \)-algebra on \(\Omega \) generated by the n-cylinder sets, i.e. \(\mathcal {O}_n = \pi _n^{-1}({\mathcal {M}}^{\otimes n})\). We equip the space \(\Omega \) with the smallest \(\sigma \)-algebra \(\mathcal {O}\) containing \(\mathcal {O}_n\) for all \(n\in {{\mathbb {N}}}\). We let \({\mathcal {B}}\) be the Borel \(\sigma \)-algebra on \({\mathrm P}({\mathbb {C}}^k)\), and denote
This makes \(\big ({\mathrm P}({\mathbb {C}}^k)\times \Omega ,\mathcal {J}\big )\) a measurable space. With a small abuse of notation we denote the sub-\(\sigma \)-algebra \(\{\emptyset ,{\mathrm P}({\mathbb {C}}^k)\}\times \mathcal {O}\) by \(\mathcal {O}\), and equivalently identify any \(\mathcal {O}\)-measurable function f with the \(\mathcal {J}\)-measurable function f satisfying \(f(\hat{x},\omega ) = f(\omega )\).
For \(i\in {\mathbb {N}}\), we consider the random variables \(V_i : \Omega \rightarrow \mathrm {M}_k({\mathbb {C}})\),
and we introduce \({\mathcal {O}}_n\)-mesurable random variables \((W_n)\) defined for all \(n\in {\mathbb {N}}\) as
With a small abuse of notation we identify cylinder sets and their bases, and extend this identification to several associated objects. In particular we identify \(O_n\in {\mathcal {M}}^{\otimes n}\) with \(\pi _n^{-1}(O_n)\), a function f on \({\mathcal {M}}^{\otimes n}\) with \(f \circ \pi _n\) and a measure \(\mu ^{\otimes n}\) with the measure \(\mu ^{\otimes n} \circ \pi _n\). Since \(\mu \) is not necessarily finite, we can not extend \((\mu ^{\otimes n})\) into a measure on \(\Omega \).
Let \(\nu \) be a probability measure over \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\). We extend it to a probability measure \(\mathbb {P}_\nu \) over \(({{\mathrm P}({{\mathbb {C}}}^k)}\times \Omega ,\mathcal {J})\) by letting, for any \(S\in {\mathcal {B}}\) and any cylinder set \(O_n \in \mathcal {O}_n\),
From relation (1), it is easy to check that the expression (9) defines a consistent family of probability measures and, by Kolmogorov’s theorem, this defines a unique probability measure \({{\mathbb {P}}}_\nu \) on \({{\mathrm P}({{\mathbb {C}}}^k)}\times \Omega \). In addition, the restriction of \(\mathbb {P}_\nu \) to \({\mathcal {B}}\otimes \{\emptyset ,\Omega \}\) is by construction \(\nu \).
We now define the random process \((\hat{x}_n)\). For \((\hat{x}, \omega )\in {{\mathrm P}({{\mathbb {C}}}^k)}\times \Omega \) we define \(\hat{x}_0(\hat{x}, \omega )=\hat{x}\). Note that for any n, the definition (9) of \({{\mathbb {P}}}_\nu \) imposes
This allows us to define a sequence \((\hat{x}_n)\) of \((\mathcal {J}_n)\)-adapted random variables on the probability space \(({{\mathrm P}({{\mathbb {C}}}^k)}\times \Omega ,\mathcal {J}, {{\mathbb {P}}}_\nu )\) by letting
whenever the expression makes sense, i.e. for any \(\omega \) such that \(W_n(\omega ) x\ne 0\), and extending it arbitrarily to the whole of \(\Omega \). The process \((\hat{x}_n)\) on \((\Omega \times {\mathrm P}({\mathbb {C}}^k),\mathcal {J}, \mathbb {P}_\nu )\) has the same distribution as the Markov chain defined by \(\Pi \) and initial probability measure \(\nu \).
Let us highlight the relation between \({{\mathbb {P}}}_\nu \) and density matrices. To that end, let
By linearity and positivity of the expectation, \(\rho _\nu \in {\mathcal {D}}_k\). Note that, conversely, for a given \(\rho \in {\mathcal {D}}_k\) there exists \(\nu \) (in general non-unique) such that \(\rho _\nu = \rho \). For example, if a spectral decomposition of \(\rho \) is \(\rho =\sum _j p_j \pi _{x_j}\) then necessarily \(\sum _j p_j=1\), so that \(\nu = \sum _j p_j \delta _{\hat{x}_j}\) is a probability measure on \({{\mathrm P}({{\mathbb {C}}}^k)}\), and it satisfies the desired relation (11).
This relation motivates the following definition of probability measures over \((\Omega ,\mathcal {O})\). For \(\rho \in \mathcal {D}_k\) and any cylinder set \(O_n \in \mathcal {O}_n\), let
In particular, for any \(S\in \mathcal {B}\) and \(A\in \mathcal {O}\),
The following proposition elucidates further the connection between \({{\mathbb {P}}}_\nu \) and \({{\mathbb {P}}}^{\rho _\nu }\).
Proposition 2.1
The marginal of \(\mathbb {P}_\nu \) on \(\mathcal {O}\) is the probability measure \({\mathbb {P}}^{\rho _\nu }\). Moreover, if (\(\phi \)-Erg) holds, \({{\mathbb {P}}}^{\rho _{\nu _a}}={{\mathbb {P}}}^{\rho _{\nu _b}}\) for any two \(\Pi \)-invariant probability measures \(\nu _a\) and \(\nu _b\).
Proof
By construction it is sufficient to check the equality of the measures on cylinder sets. Let \(O_n \in \mathcal {O}_n\); from the definition of \(\mathbb {P}_\nu \), and the linearity of the trace and the integral, we have
The equality between the marginal of \({{\mathbb {P}}}_\nu \) on \(\mathcal {O}\) and \({{\mathbb {P}}}^{\rho _\nu }\) follows.
If \(\nu \) is an invariant measure, on the one hand
On the other hand,
so that \(\rho _\nu \) is a fixed point of \(\phi \). Hence if (\(\phi \)-Erg) holds, \(\rho _\nu \) is the unique fixed point of \(\phi \) in \({\mathcal {D}}_k\). Hence \(\rho _{\nu _a}=\rho _{\nu _b}\) and \({{\mathbb {P}}}^{\rho _{\nu _a}}={{\mathbb {P}}}^{\rho _{\nu _b}}\) holds. \(\square \)
In the following we use the measure \({{\mathbb {P}}}^{\mathrm {ch}}={{\mathbb {P}}}^{\frac{1}{k}\mathrm{Id}_{{{\mathbb {C}}}^k}}\) associated to the operator \(\mathrm{Id}_{{\mathbb {C}}^k}/k\in {\mathcal {D}}_k\) as a reference measure. Since for any \(\rho \in {\mathcal {D}}_k\) there exists a constant c such that \(\rho \le c \,\frac{\mathrm{Id}_{{\mathbb {C}}^k}}{k}\), the measure \({{\mathbb {P}}}^\rho \) is absolutely continuous w.r.t. \({{\mathbb {P}}}^{\mathrm {ch}}\). We will denote absolute continuity between measures with the symbol \(\ll \), so that we have here
for all \(\rho \in {\mathcal {D}}_k\). The Radon–Nykodim derivative will be made explicit in Proposition 2.2. To that end, we use a particular \((\mathcal {O}_n)\)-adapted process. We define a sequence of matrix-valued random variables:
and extend the definition arbitrarily whenever \({{\text {tr}}}(W_n^*W_n)=0\). The latter alternative appears with probability 0: indeed, \({{\mathbb {P}}}^{\mathrm {ch}}\big ({{\text {tr}}}(W_n^*W_n)=0\big )=0\) and then by the absolute continuity of \({{\mathbb {P}}}^\rho \) with respect to \({{\mathbb {P}}}^{\mathrm {ch}}\) we have \({{\mathbb {P}}}_\nu \big ({{\text {tr}}}(W_n^*W_n)=0\big )={{\mathbb {P}}}^{\rho _\nu }\big ({{\text {tr}}}(W_n^*W_n)=0\big )=0\) for any measure \(\nu \). The key property of \(M_n\), that we establish in the proof of Proposition 2.2, is that it is an \((\mathcal {O}_n)\)-martingale with respect to \({{\mathbb {P}}}^{\mathrm {ch}}\).
From the existence of a polar decomposition for \(W_n\), for each n, there exists a unitary matrix-valued random variable \(U_n\) such that
This process \((U_n)\) can be chosen to be \((\mathcal {O}_n)\)-adapted.
The key technical results about \(M_n\) needed for our proofs are summarized in the following proposition. Recall that any \(\mathcal {O}\)-measurable function f is extended to a \(\mathcal {J}\)-measurable function by setting \(f(\hat{x},\omega )=f(\omega )\) for any \((\hat{x},\omega )\in {\mathrm P}({{\mathbb {C}}}^k)\times \Omega \).
Proposition 2.2
For any probability measure \(\nu \) over \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\), \((M_n)\) converges \(\mathbb {P}_\nu {\text {-}}\mathrm {a.s.}\) and in \(L^1\)-norm to an \(\mathcal {O}\)-measurable random variable \(M_\infty \). The change of measure formula
holds true for all \(\rho \in {\mathcal {D}}_k\).
Moreover, the measure \(\mu \) verifies (Pur) if and only if \(M_\infty \) is \({{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}\) a rank one projection for any probability measure \(\nu \) over \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\).
Proof
We start the proof by showing that \(M_n\) is a \({{\mathbb {P}}}^{\mathrm {ch}}\)-martingale. Recall that for all \(n\in {\mathbb {N}}\) and all \(O_n\in \mathcal {O}_n\),
From the definition of \(W_n\), Eq. (8),
This implies that for an arbitrary \(\mathcal {O}_n\)-measurable random variable Y
where the second equality follows from the stochasticity condition (1), \(\int v^* v \mathrm{d} \mu (v) = \mathrm{Id}_{{{\mathbb {C}}}_k}\). This shows that \((M_n)\) is an \((\mathcal {O}_n)\)-martingale w.r.t. \({{\mathbb {P}}}^{\mathrm {ch}}\). Since the sequence \((M_n)\) is composed of positive semidefinite matrices of trace one, its coordinates are a.s. uniformly bounded by 1. Therefore, the martingale property implies the \(L^1\) and a.s. convergence of \((M_n)\) to an \(\mathcal {O}\)-measurable random variable \(M_\infty \). Now note that for any \(\rho \in {\mathcal {D}}_k\),
This way, the convergence of \((M_n)\) implies the change of measure formula.
We now prove the last part of the proposition. Using the martingale property one can see that for all \(n\in {\mathbb {N}}\), and any fixed \(p \in \mathbb {N}\),
Since \((M_n)\) is bounded and almost surely convergent, applying Lebesgue’s dominated convergence theorem to each \({{\mathbb {E}}}^{\mathrm {ch}}(M_{k+n+1}^2)\), \(k=0,\ldots ,p-1\) as \(n\rightarrow \infty \) implies that the term \(V_n^p\) is convergent when n goes to infinity. Then, using the monotone convergence theorem in the last line of (17), we get that
It implies that the series \(\sum _{k=0}^\infty {{\mathbb {E}}}^{\mathrm {ch}}\big ((M_{k+p}-M_k)^2\vert {\mathcal {O}}_k\big )\) is almost surely finite. This yields that
Since all the norms are equivalent in finite dimension, Jensen’s inequality implies
At this stage we use the polar decomposition of \((W_n)\), Eq. (14), to write
Then we get an expression for the conditional expectation, see the first part of the proof,
We used non-negativity of \({{\text {tr}}}(v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_nM_nU_n^*)\) to get the second equality. The above equation holds for \({{\mathbb {P}}}^{\mathrm {ch}}\)-almost all realizations \(\big (U_n(\omega )\big )\) of \((U_n)\). Since the group of unitary matrices is compact, for any fixed \(\omega \) there exists a subsequence along which \(\big (U_n(\omega )\big )\) converges to a unitary matrix \(U_\infty (\omega )\). Taking the limit along this subsequence in the above expression yields (we drop \(\omega \) for notational simplicity):
This implies that
for \(\mu ^{\otimes p}\)-almost all \((v_1,\ldots ,v_p)\).
Denoting by \(\pi _\infty \) the orthogonal projector onto the range of \(M_\infty \), the above condition is equivalent to \(\pi _\infty U_\infty ^*v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_\infty \pi _\infty =\lambda \pi _\infty \) with \(\lambda ={{\text {tr}}}(v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_\infty M_\infty U_\infty ^*)\). Finally, it follows that
for \(\mu ^{\otimes p}\)-almost all \((v_1,\ldots ,v_p)\). Since \(U_\infty \pi _\infty U_\infty ^*\) is an orthogonal projector, the condition (Pur) implies (reintroducing \(\omega \)) that \({\text {rank}}(M_\infty (\omega ))={\text {rank}}(U_\infty (\omega ) \pi _\infty (\omega ) U_\infty ^*(\omega ))=1\). Since \(M_\infty (\omega )\) is a trace one, positive semidefinite matrix this means that \(M_\infty (\omega )\) is a rank one projector. Since this conclusion holds true for \({{\mathbb {P}}}^{\mathrm {ch}}\)-almost all \(\omega \) this establishes that the condition (Pur) implies that \(M_\infty \) is \({{\mathbb {P}}}^{\mathrm {ch}}\)-almost surely a rank 1 projection.
For the converse implication, assume that \(M_\infty \) is \({{\mathbb {P}}}^{\mathrm {ch}}\)-almost surely a rank one projection but that (Pur) does not hold. Then there exists \(\pi \), a rank two orthogonal projector, such that for all \(n\in {{\mathbb {N}}}\),
\(\mu ^{\otimes n}\)-almost everywhere. Since \(\mu ^{\otimes n}\)-almost everywhere \(M_n\propto W_n^*W_n\), we get that
\(\mu ^{\otimes n}\)-almost everywhere. Thus, \(\pi M_\infty \pi \propto \pi \) and, under our assumption that \({\text {rank}}M_\infty =1\;{{\mathbb {P}}}^{\mathrm {ch}}{\text {-}}\mathrm {a.s.}\) and \({\text {rank}}\pi =2\), this implies that \(\pi M_\infty \pi =0\), \({{\mathbb {P}}}^{\mathrm {ch}}\)-almost surely. On the other hand for all \(n\in {\mathbb {N}}\) we have \({{\mathbb {E}}}^{\mathrm {ch}}(M_n)=\mathrm{Id}_{{{\mathbb {C}}}^k}\), and the \(L^1\) convergence implies that \({{\mathbb {E}}}^{\mathrm {ch}}(M_\infty )=\mathrm{Id}_{{{\mathbb {C}}}^k}\). Then, \({{\mathbb {E}}}^{\mathrm {ch}}(\pi M_\infty \pi )=\pi \) which contradicts \(\pi M_\infty \pi =0\;{{\mathbb {P}}}^{\mathrm {ch}}{\text {-}}\mathrm {a.s.}\)\(\square \)
By the polar decomposition, the rank of \(W_n\) is equal to the rank of \(M_n\) and the proposition thus implies that \(W_n \rho _0 W_n^*/{{\text {tr}}}(W_n \rho _0 W_n^*)\) approaches the set of pure states for any \(\rho _0\in {\mathcal {D}}_k\) if and only if (Pur) holds. This is the result of Maassen and Kümmerer [17] mentioned in the introduction. Though \(M_n\) is not used in [17], the proof relies on similar ideas.
We are now in the position to show that the Markov chain \((\hat{x}_n)\) is asymptotically an \(\mathcal {O}\)-measurable process. This is expressed in the following lemma. Whenever (Pur) holds, we denote by \(\hat{z} \in {{\mathrm P}({{\mathbb {C}}}^k)}\) the \(\mathcal {O}\)-measurable random variable defined by
Recall that \(d(\cdot ,\cdot )\), defined by Eq. (7), is our metric on \({{\mathrm P}({{\mathbb {C}}}^k)}\).
Lemma 2.3
Assume (Pur) holds. Then for any probability measure \(\nu \) on \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\),
Proof
We start the proof by showing that for any \(\nu \)
Let \(\hat{x}\) be fixed and recall from Proposition 2.2 that (Pur) implies \(M_\infty =\pi _{\hat{z}}\). Since \(M_\infty x=\langle z,x\rangle z\), in order to show (19), it is enough to show that \(\hat{x}\) is \({{\mathbb {P}}}_\nu \)-almost surely not orthogonal to \(\hat{z}\). From Eq. (13) and the change of measure formula in Proposition 2.2,
Hence the event \(\{{{\text {tr}}}(\pi _{\hat{x}}\pi _{\hat{z}})=|\langle z,x\rangle |^2=0\}\) has \({{\mathbb {P}}}_{\nu }\)-measure 0. This proves the required claim, and (19) follows from the almost sure convergence of \(M_n\) to \(\pi _{\hat{z}}\).
Now using the polar decomposition, Eq. (14), and the fact that proportionality of vectors amounts to equality of their equivalence classes in \({\mathrm P}({\mathbb {C}}^k)\), we have
The first part of the proof then yields
\(\square \)
The uniqueness of the invariant measure which is the first part of Theorem 1.1 follows as a corollary.
Corollary 2.4
Assume (Pur) and (\(\phi \)-Erg). Then the Markov kernel \(\Pi \) admits a unique invariant probability measure.
Proof
For an invariant measure \(\nu \), the random variable \(\hat{x}_n\) is \(\nu \)-distributed for all \(n \in \mathbb {N}\). In particular, \( \mathbb {E}_\nu \big (f(\hat{x}_n)\big )\) is constant for any continuous function f. On the other hand Lemma 2.3 and Lebesgue’s dominated convergence theorem imply that
Hence
Assume now that there exist two invariant measures \(\nu _a\) and \(\nu _b\). Since \(U_n\cdot \hat{z}\) is \(\mathcal {O}\)-measurable, Proposition 2.1 implies
Then Eq. (20) applied with \(\nu =\nu _a\), resp. \(\nu = \nu _b\) gives
which means that \(\nu _a=\nu _b\) and the uniqueness is proved. \(\square \)
Assuming only (Pur) we can actually completely characterize the set of invariant measures.
Proposition 2.5
Assuming (Pur) there exists a set \(\{F_j\}_{j=1}^d\) of mutually orthogonal subspaces of \({{\mathbb {C}}}^k\) such that for each \(j\in \{1,\ldots ,d\}\) there exists a unique \(\Pi \)-invariant probability measure \(\nu _j\) supported on \({\mathrm P}(F_j)\), and the set of \(\Pi \)-invariant probability measures is the convex hull of \(\{\nu _j\}_{j=1}^d\).
The subspaces \(F_j\) are the ranges of the extremal fixed points of \(\phi \) in \({\mathcal {D}}_k\). This is shown in the proof of Proposition 2.5, which we give in “Appendix B”.
Remark 2.6
Assuming (\(\phi \)-Erg) only, the chain might or might not have a unique invariant probability measure. Indeed, if \({\text {supp}}\mu \subset \mathrm {SU}(k)\), Assumption (Pur) is trivially not verified and, as proved in “Appendix C”, the uniqueness of the invariant measure depends on the smallest closed subgroup of \(\mathrm {SU}(k)\) containing \({\text {supp}}\mu \). To illustrate this point, in the same appendix, we study two examples with \(\mu \) supported on and giving equiprobability to two elements of \(\mathrm {SU}(2)\) such that (\(\phi \)-Erg) holds. In the first example \(\Pi \) has a unique invariant probability measure whereas in the second example \(\Pi \) has uncountably many mutually singular invariant probability measures.
3 Convergence
We now turn to the proof of the second part of Theorem 1.1, namely the geometric convergence in Wasserstein distance of the process \((\hat{x}_n)\) towards the invariant measure. We first recall a definition of this distance for compact metric spaces: for X a compact metric space equipped with its Borel \(\sigma \)-algebra, the Wasserstein distance of order 1 between two probability measures \(\sigma \) and \(\tau \) on X can be defined using the Kantorovich–Rubinstein duality theorem as
where \(\mathrm{Lip}_1(X)=\{f:X\rightarrow {\mathbb {R}} \ \mathrm {s.t.}\ \vert f(x)-f(y)\vert \le d(x,y)\}\) is the set of Lipschitz continuous functions with constant one, and \(d\) is the metric on X.
The proof of Eq. (6) consists of three parts. In the first part we show a geometric convergence in total variation of \({{\mathbb {P}}}^\rho \) to \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\) under the shift \(\theta (v_1,v_2,\ldots )=(v_2,v_3,\ldots )\). In the second one we show a geometric convergence of the chain \((\hat{x}_n)\) towards an \(\mathcal {O}\)-measurable process \((\hat{y}_n)\). Finally, we combine these results to prove Eq. (6).
3.1 Convergence for \(\mathcal {O}\)-measurable random variables
Let us first discuss the origin of the integer m in Eq. (6). Let E be a subspace of \({{\mathbb {C}}}^k\) s.t. \(vE\subset E\) for any \(v\in {\text {supp}}\mu \). Let \((E_1,\ldots ,E_\ell )\) be an orthogonal partition of E, i.e.\(E=E_1\oplus \cdots \oplus E_\ell \). We say that \((E_1,\ldots ,E_\ell )\) is a \(\ell \)-cycle of \(\phi \) if \(vE_j\subset E_{j+1}\) for \(\mu \)-a.e. v (with the convention \(E_{\ell +1}=E_1\)).Footnote 2 The set of \(\ell \in {{\mathbb {N}}}\) for which there exists an \(\ell \)-cycle is non-empty (as it contains 1) and bounded (as necessarily \(\ell \le k\)).
Definition 3.1
The largest \(\ell \in {{\mathbb {N}}}\) such that there exists a \(\ell \)-cycle of \(\phi \) is called the period of \(\phi \). We denote this period by m.
Remark 3.2
-
The above definition for the period of \(\phi \) is similar to that of the period of a \(\varphi \)-irreducible Markov chain. It is obvious that if \((E_1,\ldots ,E_\ell )\) is an \(\ell \)-cycle of \(\phi \) then it is also an \(\ell \)-cycle of \(\Pi \). However, the Markov chain defined by \(\Pi \) is not \(\varphi \)-irreducible in general. Hence the results of [19] on the period of \(\varphi \)-irreducible Markov chains do not apply and the characterization of the period of \(\Pi \) remains an open problem.
-
The above definition shows that the union \(\bigcup _{j=1}^m E_j\) is invariant by \(\mu \)-a.e. v. Hence, the strong irreducibility assumption discussed at the end of the introduction implies that \(m=1\).
The following result is a reformulation of the Perron–Frobenius theorem of Evans and Høegh-Krohn (see [7]). The original formulation in [7] makes the additional assumption that \(E={{\mathbb {C}}}^k\) in (\(\phi \)-Erg). For the present extension see e.g. [22]. In the following statement, and in the rest of the article, for X an operator on \({{\mathbb {C}}}^k\) we denote \(\Vert X\Vert _1={{\text {tr}}}|X|\) (all statements are identical with a different norm, but this choice will spare us a few irrelevant constants).
Theorem 3.3
Assume that (\(\phi \)-Erg) holds. Then there exists a unique \(\phi \)-invariant element \(\rho _{\mathrm {inv}}\) of \({\mathcal {D}}_k\) with range equal to the minimal invariant subspace E. In addition, there exist two positive constants c and \(\lambda <1\) such that, with m defined in Definition 3.1, for any \(\rho \in {\mathcal {D}}_k\) and for any \(n\in {{\mathbb {N}}}\),
Proof
Theorem 4.2 in [7] implies that \(\rho _{\mathrm {inv}}\) is the unique \(\phi \)-invariant element of \({\mathcal {D}}_k\), that the eigenvalues of \(\phi \) of modulus one are exactly the m-th roots of unity, and that they are all simple. The statement follows, with \(\lambda \) any quantity strictly larger than the modulus of the largest non-peripheral eigenvalue. \(\square \)
Recall that \(\theta \) is the left shift operator on \(\Omega \), i.e.
The main result of this section is the following proposition. As announced it concerns the speed of convergence in total variation (expressed in terms of expectation values).
Proposition 3.4
Assume (\(\phi \)-Erg) holds. Then there exist two positive constants C and \(\lambda <1\) such that for any \(\mathcal {O}\)-measurable function f with essential bound \(\Vert f\Vert _\infty \), any \(\rho \in {\mathcal {D}}_k\) and all \(n\in {{\mathbb {N}}}\),
Proof
We claim that for any bounded \(\mathcal {O}\)-measurable function f,
It suffices to prove this relation for all \(\mathcal {O}_l\)-measurable functions for any integer l. Thus, let l be an integer and f an \(\mathcal {O}_l\)-measurable function. Then,
which is equal to \({{\mathbb {E}}}^{\phi (\rho )}(f)\), as claimed.
Applying Eq. (23) multiple times and using the change of measure of Proposition 2.2 we obtain
for any \(\mathcal {O}\)-measurable function f. Using \(|{{\text {tr}}}(M_\infty A)|\le \Vert A\Vert _1\) for \(A=A^*\) (remark that \(M_\infty \in {\mathcal {D}}_k\) by construction) we then obtain
and Theorem 3.3 yields the proposition with \(C=ck\). \(\square \)
3.2 Convergence to an \({\mathcal {O}}\)-measurable process
Let us introduce two relevant processes: for all \(n\in {{\mathbb {N}}}\), let
and
Both random variables \(\hat{y}_n\) and \(\hat{z}_n\) are \(\mathcal {O}_n\)-measurable.
The random variable \(\hat{z}_n\) corresponds to the maximum likelihood estimator of \(\hat{x}_0\). Note that the \({\text {argmax}}\) may not be uniquely defined. We can, however, define it in an \(\mathcal {O}_n\)-measurable way. The following results will not be affected by such a consideration, and we will not discuss such questions in the sequel. It follows from the definition of \(\hat{z}_n\) that
We recall that \(z_n\) is a vector representative of the class \(\hat{z}_n\).
Concerning \(\hat{y}_n\), it can be seen as an estimator of \(\hat{x}_n\) given the maximum likelihood estimation of \(\hat{x}_0\). The following proposition establishes consistency of this estimator, we show the geometric contraction in the mean of \((\hat{x}_n)\) and \((\hat{y}_n)\). In fact we prove a slightly more general statement that the estimator based on the first n outcomes can be replaced by an estimator based on outcomes between l and \(l+n\). We will prove the almost-sure contraction in Proposition 4.4.
Proposition 3.5
Assume (Pur) holds. Then there exist two positive constants C and \(\lambda <1\) such that for any probability measure \(\nu \) over \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\),
holds for all non-negative integers l and n.
In order to prove Proposition 3.5 we study the largest two singular values of \(W_n\). As is customary in the study of products of random matrices, we make use of exterior products. We recall briefly the relevant definitions: for \(p\in {{\mathbb {N}}}\) and p vectors \(x_1, \ldots , x_p\) in \({{\mathbb {C}}}^k\) we denote by \(x_1\wedge \cdots \wedge x_p\) the alternating bilinear form \((y_1,\ldots , y_p)\mapsto \det \big (\langle x_i, y_j\rangle \big )_{i,j=1}^p\). Then, the set of all \(x_1\wedge \cdots \wedge x_p\) is a generating family for the set \(\wedge ^p{{\mathbb {C}}}^k\) of alternating bilinear forms on \({{\mathbb {C}}}^k\), and we can define a hermitian inner product by
and denote by \(\Vert x_1\wedge \cdots \wedge x_p\Vert \) the associated norm. It is immediate to verify that our metric \(d\), defined by (7), satisfies
For an operator A on \({\mathbb {C}}^k\), we write \(\wedge ^p A\) for the operator on \(\wedge ^p{{\mathbb {C}}}^k\) defined by
Obviously \(\wedge ^p (AB)=\wedge ^p A\wedge ^p B\), so that \(\Vert \wedge ^p (AB)\Vert \le \Vert \wedge ^p A\Vert \Vert \wedge ^p B\Vert \). From e.g. Chapter XVI of [18] or Lemma III.5.3 of [4], we have in addition for \(1\le p\le k\)
where \(a_1(A)\ge \cdots \ge a_k(A)\) are the singular values of A, i.e. the square roots of eigenvalues of \(A^* A\), labelled in decreasing order.
Our strategy to prove Proposition 3.5 is to bound the left hand side of Eq. (27) by a submultiplicative function \(f : \mathbb {N} \rightarrow \mathbb {R}_+\) and then use Fekete’s lemma. We will show that the function
has the desired properties. The following lemma establishes an exponential decay of this function.
Lemma 3.6
Assume (Pur). Then there exist two positive constants C and \(\lambda <1\) such that
Proof
First, we prove \(\lim _{n\rightarrow \infty }f(n)=0\). To prove it, we express the function f(n) using the process \(W_n\) as
By definition the eigenvalues of \(M_n^{\frac{1}{2}}\) are the singular values of \(W_n/\sqrt{{{\text {tr}}}(W_n^*W_n)}\). Since by Proposition 2.2, \(M_n\) converges \({{\mathbb {P}}}^{\mathrm {ch}}{\text {-}}\mathrm {a.s.}\) to a rank one projection,
Using Eq. (30) we then conclude that
Since \(\Vert \wedge ^2 W_n\Vert \le \Vert W_n\Vert ^2\le {{\text {tr}}}(W_n^*W_n)\), the expression (32) and Lebesgue’s dominated convergence theorem imply \(\lim _{n\rightarrow \infty }f(n)=0\).
Second, remark that the function f is submultiplicative. Indeed, for \(p,q\in {\mathbb {N}}\) we have
and the submultiplicativity follows.
By Fekete’s subadditive Lemma, \(\frac{\log f(n)}{n}\) converges to \(\inf _{n\in {{\mathbb {N}}}} \frac{\log f(n)}{n}\), which is (strictly) negative (and possibly equal to \(-\infty \)) since \(f(n)\rightarrow 0\). Then there exists \(0<\lambda <1\) such that \(f(n)\le \lambda ^n\) for large enough n, and the conclusion follows. \(\square \)
We are now in a position to prove Proposition 3.5.
Proof of Proposition 3.5
The Markov property of \((\hat{x}_n)\) implies that
Provided inequality (27) is established for \(l=0\), the right hand side of the previous equality can be bounded by \(C \lambda ^n\). It is hence sufficient to prove the inequality for \(l=0\).
The case \(l=0\) follows from Lemma 3.6 if for any \(n \in {{\mathbb {N}}}\) and any probability measure \(\nu \),
To obtain this inequality, note that from the definitions of \(\hat{x}_n\), \(\hat{y}_n\) and \(\hat{z}_n\), we have that
holds \({{\mathbb {P}}}_\nu \)-almost surely. To get the first inequality we used \(\Vert W_n z_n\Vert = \Vert W_n\Vert \), and \(\Vert x_0\wedge z_n \Vert =d(\hat{x}_0,\hat{z}_n)\le 1\). In addition, by definition of \({{\mathbb {P}}}_\nu \),
which is f(n). Therefore (34) holds and Lemma 3.6 yields the proof. \(\square \)
3.3 Convergence in Wasserstein metric
The remainder of Sect. 3 is devoted to the proof of the second part of Theorem 1.1.
Proof of Eq. (6)
We are supposed to prove that
is exponentially decaying in n. The expression in the supremum on the right hand side is unchanged by adding an arbitrary constant to f. This freedom allows us to restrict the supremum to functions bounded by 1, i.e. \(\Vert f\Vert _\infty \le 1\).
Let \(f\in \mathrm{Lip}_1({\mathrm P}({\mathbb {C}}^k))\) be such a function. Our strategy is to approximate \(\hat{x}_{mn+r}\) by \(\hat{y}_{mp} \circ \theta ^{mq+r}\) with \(p=\lfloor \frac{n}{2} \rfloor \) and \(q=\lceil \frac{n}{2}\rceil \) so that in particular \(p+q =n\). Using telescopic estimates and the invariance of \(\nu _{\mathrm {inv}}\) we then have
We bound the terms on the right hand side using Proposition 3.5 for the first two terms and Proposition 3.4 for the last term. To this end let C and \(\lambda < 1\) be such that the bounds in both these propositions hold true. Since f is 1-Lipschitz continuous we have
Proposition 3.5 then implies that
and similarly with \(\nu \) replaced by \(\nu _{\mathrm {inv}}\). Regarding the last term in the above telescopic estimate we have by Proposition 3.4,
where we used the constraint \(\Vert f\Vert _\infty \le 1\) discussed at the beginning of the proof.
Putting these estimates together we get
and this concludes the proof of Eq. (6) and therefore of Theorem 1.1. \(\square \)
4 Lyapunov exponents
In this section, we study the almost sure stability exponents. The main results of this section will assume (\(\phi \)-Erg) with the additional assumption that \(E={{\mathbb {C}}}^k\).
Remark 4.1
Assuming \(E={{\mathbb {C}}}^k\) amounts to saying that \(\phi \) has no transient part. Without this assumption, we would have to take into account the almost sure Lyapunov exponent corresponding to the escape from the transient part. See [3] for a precise account of these ideas.
The relevance of this assumption will stem from the following straightforward inequalities: if \(\rho \) is any element of \({\mathcal {D}}_k\) then one has
and if \(\rho \) is faithful (i.e. definite positive), then
In particular, under the assumption that (\(\phi \)-Erg) holds with \(E={{\mathbb {C}}}^k\), thus \(\rho _{\mathrm {inv}}>0\) and for any \(\rho \in {\mathcal {D}}_k\), we have
Let us start by proving the following lemma, which concerns the ergodicity of \(\theta \) with respect to the measure \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\).
Lemma 4.2
Assume that (\(\phi \)-Erg) holds. Then the shift \(\theta \) on \((\Omega ,\mathcal {O})\) is ergodic with respect to the probability measure \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\).
Proof
Let A, \(A'\) in \(\mathcal {O}_l\). From the definition of \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\), for j large enough, \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\big (A \cap \theta ^{-j}(A')\big )\) equals
and the Perron–Frobenius Theorem 3.3 implies
for \(\mu ^{\otimes l}\)-almost all \((v_1,\ldots ,v_l)\) so that
which proves the ergodicity.\(\square \)
Now we can state our result concerning Lyapunov exponents.
Proposition 4.3
Assume that (\(\phi \)-Erg) holds with \(E={{\mathbb {C}}}^k\), and that (Pur) holds. Assume \(\int \Vert v\Vert ^2\log \Vert v\Vert \mathrm{d}\mu (v)< \infty \). Then there exists numbers
such that for any probability measure \(\nu \) over \(({\mathrm P}({\mathbb {C}}^k),{\mathcal {B}})\):
-
(1)
for any \(p\in \{1,\ldots ,k\}\),
$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\log \left\| \wedge ^p W_n\right\| =\sum _{j=1}^p \gamma _j,\quad {{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}, \end{aligned}$$(36) -
(2)
\(\gamma _2-\gamma _1<0\) with \(\gamma _2-\gamma _1\) understood as the limit of \(\frac{1}{n}\log \frac{\Vert \wedge ^2 W_n\Vert }{\Vert W_n\Vert ^2}\) whenever \(\gamma _1=-\infty \),
-
(3)
one has the convergence
$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n}(\log \Vert W_n x_0\Vert -\log \Vert W_n\Vert )=0\quad {{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}\end{aligned}$$(37)
Proof
Let us start by proving (1). Note that \(n\mapsto \log \Vert \wedge ^p W_n\Vert \) is subadditive by definition. The existence of the \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}{\text {-}}\mathrm {a.s.}\) limits \(\lim _{n\rightarrow \infty }\frac{1}{n}\log \Vert \wedge ^p W_n\Vert \) then follows from \({{\mathbb {E}}}^{\rho _{\mathrm {inv}}}(\log \Vert V\Vert ^2)\le \int \Vert v\Vert ^2\log \Vert v\Vert ^2d\mu (v)<\infty \), \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\circ \theta ^{-1}={{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\) and a direct application of Kingman’s subadditive ergodic theorem (see e.g. [21]). The fact that these limits are \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}{\text {-}}\mathrm {a.s.}\) constant comes from the \(\theta \)-ergodicity of \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\) proved in Lemma 4.2. Since by Eq. (35) any \({{\mathbb {P}}}^\rho \) is absolutely continuous with respect to \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\), Proposition 2.1 and the \(\mathcal {O}\)-measurability of \(\Vert \wedge ^pW_n\Vert \) imply the convergence holds \({{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}\) The numbers \(\gamma _j\) are uniquely defined, by defining \(\sum _{j=1}^p \gamma _j\) as the \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}{\text {-}}\mathrm {a.s.}\) limit \(\lim _{n\rightarrow \infty }\frac{1}{n}\log \Vert \wedge ^p W_n\Vert \) and imposing the rule that \(\gamma _{j+1}=-\infty \) if \(\gamma _{j}=-\infty \). This convention and (30) impose that \(\gamma _{j+1}\le \gamma _j\) for \(j=1,\ldots ,k-1\).
Concerning (2), recall the quantity f(n) defined in Eq. (31). Then Eq. (32) and the inequality \({{\text {tr}}}\,W_n^*W_n \le k \Vert W_n\Vert ^2\) give
Jensen’s inequality implies
so that by Lemma 3.6 and Fatou’s lemma, \(\log \lambda \ge \gamma _2-\gamma _1\) with \(\lambda \in (0,1)\).
Finally for (3), from Proposition 2.2, we have
Since \({{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}\,|\langle x_0, z\rangle |>0\), the proposition holds. \(\square \)
From this proposition we deduce the following almost sure convergence rate for the distance between the Markov chain \((\hat{x}_n)\) and the \((\mathcal {O}_n)\)-adapted process \((\hat{y}_n)\).
Proposition 4.4
Assume (Pur) holds and (\(\phi \)-Erg) holds with \(E={{\mathbb {C}}}^k\). Then, for any probability measure \(\nu \) on \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\),
Proof
Identity (28) and the definition of \(\hat{z}_n\) imply
Proposition 4.3 then yields the result. \(\square \)
Notes
Complete positivity is stronger than positivity; namely by definition \(\phi \) is completely positive iff \(\phi \otimes \mathrm{Id}_{M_n({{\mathbb {C}}})}\) is positive for all \(n\in {{\mathbb {N}}}\).
References
Applebaum, D.: Probability on Compact Lie Groups, Volume 70 of Probability Theory and Stochastic Modelling. Springer, Berlin (2014)
Baumgartner, B., Narnhofer, H.: The structures of state space concerning quantum dynamical semigroups. Rev. Math. Phys. 24(02), 1250001 (2012)
Benoist, T., Pellegrini, C., Ticozzi, F.: Exponential stability of subspaces for quantum stochastic master equations. Ann. Henri Poincaré 18, 2045–2074 (2017)
Bougerol, P., Lacroix, J.: Products of Random Matrices with Applications to Schrödinger Operators, Volume 8 of Progress in Probability and Statistics. Birkhäuser Boston, Inc., Boston (1985)
Carbone, R., Pautrat, Y.: Irreducible decompositions and stationary states of quantum channels. Rep. Math. Phys. 77(3), 293–313 (2016)
Carmichael, H.: An Open Systems Approach to Quantum Optics: Lectures Presented at the Université Libre de Bruxelles, October 28 to November 4, 1991. Springer, Berlin (1993)
Evans, D.E., Høegh-Krohn, R.: Spectral properties of positive maps on \(C^*\)-algebras. J. Lond. Math. Soc. (2) 17(2), 345–355 (1978)
Furstenberg, H., Kesten, H.: Products of random matrices. Ann. Math. Stat. 31(2), 457–469 (1960)
Guerlin, C., Bernu, J., Deleglise, S., Sayrin, C., Gleyzes, S., Kuhr, S., Brune, M., Raimond, J.-M., Haroche, S.: Progressive field-state collapse and quantum non-demolition photon counting. Nature 448(7156), 889–893 (2007)
Guivarc’h, Y., Le Page, É.: Spectral gap properties for linear random walks and Pareto’s asymptotics for affine stochastic recursions. Ann. Inst. H. Poincaré Probab. Stat. 52(2), 503–574 (2016)
Guivarc’h, Y., Raugi, A.: Frontière de Furstenberg, propriétés de contraction et théorèmes de convergence. Probab. Theory Relat. Fields 69(2), 187–242 (1985)
Guivarc’h, Y., Raugi, A.: Products of random matrices: convergence theorems. In: Cohen, J.E., Kesten, H., Newman, C.M. (eds.) Random Matrices and Their Applications (Brunswick, Maine, 1984), Volume 50 of Contemporary Mathematics, pp. 31–54. American Mathematical Society, Providence (1986)
Holevo, A.: Statistical Structure of Quantum Theory. Springer, Berlin (2001)
Kümmerer, B., Maassen, H.: An ergodic theorem for quantum counting processes. J. Phys. A 36(8), 2155 (2003)
Kümmerer, B., Maassen, H.: A pathwise ergodic theorem for quantum trajectories. J. Phys. A 37(49), 11889–11896 (2004)
Le Page, E.: Theoremes limites pour les produits de matrices aleatoires. In: Heyer, H. (ed.) Probability Measures on Groups. Lecture Notes in Mathematics, vol. 928. Springer, Berlin, Heidelberg (1982)
Maassen, H., Kümmerer, B.: Purification of quantum trajectories. Lect. Notes Monogr. Ser. 48, 252–261 (2006)
Mac Lane, S., Birkhoff, G.: Algebra, 3rd edn. Chelsea Publishing Co., New York (1988)
Meyn, S., Tweedie, R.L.: Markov Chains and Stochastic Stability, 2nd edn. Cambridge University Press, Cambridge (2009)
Schrader, R.: Perron-Frobenius theory for positive maps on trace ideals. In: Mathematical Physics in Mathematics and Physics (Siena, 2000), vol. 30, pp. 361–378 (2001)
Walters, P.: An Introduction to Ergodic Theory. Graduate Texts in Mathematics, vol. 79. Springer, Berlin (1982)
Wolf, M.M.: Quantum channels & operations: guided tour. http://www-m5.ma.tum.de/foswiki/pub/M5/Allgemeines/MichaelWolf/QChannelLecture.pdf (2012). Lecture notes based on a course given at the Niels–Bohr Institute. Accessed 28 Feb 2017
Acknowledgements
T.B. and C.P. would like to thank Y. Guivarc’h for his useful comments at an early stage of this work. Y.P. and C.P. would like to thank P. Bougerol for enlightening discussions about random products of matrices. Y.P. and C.P. would like to thank L. Miclo for relevant discussions regarding Markov chains. The research of T.B. has been supported by ANR-11-LABX-0040-CIMI within the program ANR-11-IDEX-0002-02. The research of T.B., Y.P. and C.P. has been supported by the ANR project StoQ ANR-14-CE25-0003-01 and CNRS InFIniTi project MISTEQ.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Equivalence of (Pur) and contractivity
We assume \({\text {supp}}\mu \subset \mathrm {GL}_k({{\mathbb {C}}})\). Recall that \(T_\mu \) is the smallest closed sub-semigroup of \(\mathrm {GL}_k({{\mathbb {C}}})\) that contains \({\text {supp}}\mu \). It is said to be contracting if there exists a sequence \((a_n)_{n\in {\mathbb {N}}}\subset T_\mu \) such that \(\lim _{n\rightarrow \infty } a_n/\Vert a_n\Vert \) exists and is a rank one matrix.
Proposition A.1
Assume \({\text {supp}}\mu \subset \mathrm {GL}_k({\mathbb {C}})\) and \(T_\mu \) is strongly irreducible. Then \(\mu \) verifies (Pur) if and only if \(T_\mu \) is contracting.
Proof
By Proposition 2.2 the implication (Pur)\(\Rightarrow \) contractivity follows by taking for \((a_n)\) a convergent subsequence of \((W_n(\omega ))\) for \(\omega \in {\text {supp}}{{\mathbb {P}}}^{\mathrm {ch}}\).
We prove the opposite implication by contradiction. Following [12, Lemma 3], under the assumptions of the proposition, \(T_\mu \) is contracting if and only if, for any two \(\hat{x}, \hat{y}\in {{\mathrm P}({{\mathbb {C}}}^k)}\) there exists a sequence of matrices \((a_n)\subset T_\mu \) such that
Now, assume that contractivity holds but (Pur) does not. Namely, that \(T_\mu \) is contracting but there exists an orthogonal projector \(\pi \) of rank \(\ge 2\), such that for any \(a\in T_\mu \),
Let x, y in the range of \(\pi \) be orthonormal vectors. Then \(\langle ax,ay\rangle =\langle x,y\rangle =0\), and \(\Vert ax\Vert , \Vert ay\Vert \) are nonzero, so that \(d(a\cdot \hat{x},a\cdot \hat{y})=1\). As this is true for any a in \(T_{\mu }\), contractivity cannot hold. This contradiction yields the proposition. \(\square \)
Appendix B: Set of invariant measures under assumption (Pur)
A quantum channel is a map \(\phi \) on \(\mathrm {M}_k({{\mathbb {C}}})\) of a form
where \(\mu \) is a measure satisfying the normalization condition (1). The decomposition of quantum channels to irreducible components was derived in [2, 5, 22]. The space \(\mathbb {C}^k\) is decomposed into orthogonal subspaces, one subspace is transient and in all other subspaces the map has a canonical tensor product structure. We recall these results.
There exists a decomposition
with the following properties. We denote by \(v^{(j)}\) the restriction of v to \(\mathbb {C}^{n_j}\).
-
(e1)
All invariant states are supported in the subspace \(L = \mathbb {C}^{n_1} \oplus \dots \oplus \mathbb {C}^{n_d} \oplus 0\),
-
(e2)
The restriction of v to this subspace is block diagonal,
$$\begin{aligned} v|_L = v^{(1)} \oplus \cdots \oplus v^{(d)}\oplus 0 \quad \mu {\text {-}}\mathrm {a.e.}\end{aligned}$$(38) -
(e3)
For each \(j=1, \dots ,d\) there is a decomposition \(\mathbb {C}^{n_j} = \mathbb {C}^{k_j} \otimes \mathbb {C}^{m_j}, \, n_j = k_j m_j\), a unitary matrix \(U_j\) on \(\mathbb {C}^{n_j}\) and a matrix \(\tilde{v}^{(j)}\) on \(\mathbb {C}^{k_j}\) such that
$$\begin{aligned} v^{(j)} = U_j \left( \tilde{v}^{(j)} \otimes \mathrm{Id}_{{{\mathbb {C}}}^{m_j}}\right) U_j^* \quad \mu -a.s. \end{aligned}$$(39) -
(e4)
There exists a full rank positive matrix \(\rho _j\) on \(\mathbb {C}^{k_j}\) such that
$$\begin{aligned} 0 \oplus \cdots \oplus U_j \left( \rho _j \otimes \mathrm{Id}_{{{\mathbb {C}}}^{m_j}}\right) U_j^* \oplus \cdots \oplus 0 \end{aligned}$$(40)is a fixed point of \(\phi \).
It follows from (e3) and (e4) that the set of fixed points for \(\phi \) is
The decomposition simplifies under the purification assumption.
Proposition B.1
Assume (Pur) holds. Then there exists a set \(\{\rho _j\}_{j=1}^d\) of positive definite matrices and an integer D such that the set of \(\phi \) fixed points is
Proof
The statement follows from the discussion preceding the proposition if we show that (Pur) implies \(m_1 = \dots = m_d =1\). Assume that one of the \(m_j\), e.g. \(m_1\), is greater than 1. Let x be a norm one vector in \(\mathbb {C}^{k_1}\). Then \(\pi = U_1\pi _{\hat{x}} \otimes \mathrm{Id}_{\mathbb {C}^{m_1}} U_1^*\oplus 0 \oplus \dots \oplus 0\) is a projection with rank bigger than 1, and by Eq. (39) we have, in the notation of (38) and (39),
for \(\mu ^{\otimes n}\)-almost all \(v_1,\ldots ,v_n\). This contradicts (Pur). \(\square \)
It is clear from Eq. (38) that to each extremal fixed point \(0 \oplus \dots \oplus \rho _j \oplus \dots \oplus 0\) corresponds a unique invariant measure \(\nu _j\) supported on its range \(F_j\). The converse is the subject of the next proposition.
Proposition B.2
Assume (Pur) holds. Then any \(\Pi \)-invariant probability measure is a convex combination of the measures \(\nu _j\), \(j=1,\ldots ,d\).
Proof
Let \(\nu \) be a \(\Pi \)-invariant probability measure. Let f be a continuous function. From Lemma 2.3,
Proposition 2.1 implies
with \(\rho _\nu \in {\mathcal {D}}_k\) a fixed point of \(\phi \). By Proposition B.1, (Pur) implies that there exist non negative numbers \(t_1,\ldots ,t_d\) summing up to one such that \(\rho _\nu =t_1\rho _1\oplus \cdots \oplus t_d\rho _d\oplus 0_{M_D({{\mathbb {C}}})}\). From the definition of \({{\mathbb {P}}}^{\rho _\nu }\),
where we used the abuse of notation \(\rho _j\equiv 0\oplus \cdots \oplus \rho _j\oplus \cdots \oplus 0\). Using Proposition 2.1, it follows that
Then Lemma 2.3 and the \(\Pi \)-invariance of each measure \(\nu _j\) yield the proposition. \(\square \)
Appendix C: Products of special unitary matrices
Proposition C.1
Assume \({\text {supp}}\mu \subset \mathrm {SU}(k)\). Let G be the smallest closed subgroup of \(\mathrm {SU}(k)\) such that \({\text {supp}}\mu \subset G\). For any \(\hat{x}\in {{\mathrm P}({{\mathbb {C}}}^k)}\), let \([\hat{x}]_G\) be the orbit of \(\hat{x}\) with respect to G and the action \(G\times {{\mathrm P}({{\mathbb {C}}}^k)}\ni (v,\hat{x})\mapsto v\cdot \hat{x}\). Namely, \([\hat{x}]_G:=\{\hat{y}\in {{\mathrm P}({{\mathbb {C}}}^k)}\ |\ \exists v\in G \text{ s.t. } \hat{y}=v\cdot \hat{x}\}\). Then, for any \(\hat{x}\), there exists a unique \(\Pi \)-invariant probability measure supported on \([\hat{x}]_G\), and this unique invariant measure is uniform in the sense that for any \(v\in G\) it is invariant by the map \(\hat{x}\mapsto v\cdot \hat{x}\).
Corollary C.2
With the assumption and definitions of the last proposition, if \(G=\mathrm {SU}(k)\), \(\Pi \) has a unique invariant probability measure and this probability is the uniform one on \({{\mathrm P}({{\mathbb {C}}}^k)}\).
Proof
The corollary being a trivial consequence of \(G=\mathrm {SU}(k)\Rightarrow [\hat{x}]_G={{\mathrm P}({{\mathbb {C}}}^k)}\ \forall \hat{x}\in {{\mathrm P}({{\mathbb {C}}}^k)}\), we are left with proving the proposition.
Let \(P_\mu \) be the Markov kernel on G defined by the left multiplication: \(P_\mu f(v)=\int _G f(uv)d\mu (u)\). Since G is compact as a closed subset of \(\mathrm {SU}(k)\), following [1, Proposition 4.8.1, Theorem 4.8.2], the unique \(P_\mu \)-invariant probability measure \(\mu _G\) on G is the normalized Haar measure on G. Since G is compact, Prokhorov’s theorem implies that for any \(u\in G\),
Let \(\hat{x}\in {{\mathrm P}({{\mathbb {C}}}^k)}\). Since \({\text {supp}}\mu \subset G\), for any \(\hat{y}\in [\hat{x}]_G\), \(\Pi (\hat{y}, [\hat{x}]_G)=1\). Then, \([\hat{x}]_G\) being compact, there exists a \(\Pi \)-invariant measure \(\nu \) supported on \([\hat{x}]_G\).
Let f be a continuous function on \([\hat{x}]_G\). Then,
For each \(\hat{y}\in [\hat{x}]_G\) let \(u_y\in G\) be such that \(\hat{y}=u_y\cdot \hat{x}\). The map \(v\mapsto vu_y\cdot \hat{x}\) being continuous, setting \(u=u_y\), the weak convergence (41) and Lebesgue’s dominated convergence theorem imply,
It follows that \(\nu \) is the image measure of \(\mu _G\) by the application \(v\mapsto v\cdot \hat{x}\). The left multiplication invariance of the Haar measure \(\mu _G\) yields the invariance of \(\nu \) by the map \(\hat{x}\mapsto v\cdot \hat{x}\) for any \(v\in G\). \(\square \)
Example C.3
Let \(\mu =\frac{1}{2}(\delta _{v_1}+\delta _{v_2})\) with,
Then \(G=\mathrm {SU}(2)\) and the uniform measure on \({\mathrm P}({{\mathbb {C}}}^2)\) is the unique \(\Pi \)-invariant probability measure.
Proof
Following Proposition C.1, it is sufficient to prove that any element of \(\mathrm {SU}(2)\) is the limit of a sequence of products of \(v_1\) and \(v_2\).
Let \(\sigma _1,\sigma _2,\sigma _3\) be the usual Pauli matrices:
The Pauli matrices being generators of \(\mathrm {SU}(2)\) in its fundamental representation, for any \(u\in \mathrm {SU}(2)\), there exist three reals \(\theta _1,\theta _2,\theta _3\in {{\mathbb {R}}}\) s.t.,
Especially, \(v_1=\exp (i\sigma _3)\) and \(v_2=\exp (i\sigma _1)\). Since for any \(j=1,2,3\), \(\exp (i\theta _j\sigma _j)=\exp (i(\theta _j+2\pi )\sigma _j)\), taking limits of sequences of powers of \(v_1\) or \(v_2\), for any \(\theta \in {{\mathbb {R}}}\), both
are elements of G. It remains to show that any \(u\in \mathrm {SU}(2)\) is a product of elements equal to \(\exp (i\theta \sigma _1)\) or \(\exp (i\theta \sigma _3)\) with \(\theta \) real.
Fix \((\theta _1,\theta _2,\theta _3)\in {{\mathbb {R}}}^3\). Then using spherical coordinates in \({{\mathbb {R}}}^3\), there exist \(r\in {{\mathbb {R}}}_+\), \(\theta \in [0,\pi ]\) and \(\varphi \in [0,2\pi [\) such that \(\theta _1=r\cos \theta \), \(\theta _2=r\sin \theta \cos \varphi \) and \(\theta _3=r\sin \theta \sin \varphi \). Then by direct computation,
It follows that as a product of elements of G, \(e^{i(\theta _1\sigma _1+\theta _2\sigma _2+\theta _3\sigma _3)}\in G\), hence \(G=\mathrm {SU}(2)\) and the example holds. \(\square \)
Example C.4
Let \(\mu =\frac{1}{2}(\delta _{v_1}+\delta _{v_2})\) with,
Then \(G=\{\pm \mathrm{Id}_{{{\mathbb {C}}}^2}, \pm v_1, \pm v_2, \pm v_1v_2\}\). For \(z\in {{\mathbb {C}}}\), let \(e_z=(1,z)^\mathsf {T}\) and \(e_\infty =(0,1)^\mathsf {T}\). With the conventions \(\infty ^{-1}=0\), \(0^{-1}=\infty \) and \(-\infty =\infty \), for any \(z\in {{\mathbb {C}}}\cup \{\infty \}\), \([\hat{e}_z]_G=\{\hat{e}_z, \hat{e}_{z^{-1}}, \hat{e}_{-z}, \hat{e}_{-z^{-1}}\}\) and the measure \(\frac{1}{4}(\delta _{\hat{e}_z}+\delta _{\hat{e}_{-z}}+\delta _{\hat{e}_{z^{-1}}}+\delta _{\hat{e}_{-z^{-1}}})\) is a \(\Pi \)-invariant probability measure.
The proof of this example is obtained by an explicit computation.
Rights and permissions
About this article
Cite this article
Benoist, T., Fraas, M., Pautrat, Y. et al. Invariant measure for quantum trajectories. Probab. Theory Relat. Fields 174, 307–334 (2019). https://doi.org/10.1007/s00440-018-0862-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-018-0862-9