1 Introduction

We consider a complex vector space \(\mathbb {C}^k\) and its projective space \({\mathrm P}(\mathbb {C}^k)\) equipped with its Borel \(\sigma \)-algebra \(\mathcal {B}\). For a nonzero vector \(x\in {\mathbb {C}}^k\), we denote \(\hat{x}\) the corresponding equivalence class of x in \({\mathrm P}({\mathbb {C}}^k)\). For a linear map \(v\in \mathrm {M}_k({\mathbb {C}})\) we denote \(v\cdot \hat{x} \) the element of the projective space represented by \(v\,x\) whenever \(v\,x\ne 0\). We equip \(\mathrm {M}_k({\mathbb {C}})\) with its Borel \(\sigma \)-algebra and let \(\mu \) be a measure on \(\mathrm {M}_k({\mathbb {C}})\) with a finite second moment, \(\int _{\mathrm {M}_k({\mathbb {C}})} \Vert v\Vert ^2\,\mathrm{d}\mu (v)<\infty \), that satisfies the stochasticity condition

$$\begin{aligned} \int _{\mathrm {M}_k({\mathbb {C}})} v^* v \,\mathrm {d} \mu (v) = \mathrm{Id}_{{{\mathbb {C}}}^k} \end{aligned}$$
(1)

(we discuss this condition below).

In this article we are interested in particular Markov chains \((\hat{x}_n)\) on \({\mathrm P}(\mathbb {C}^k)\), defined by

$$\begin{aligned} \hat{x}_{n+1}=V_n\cdot \hat{x}_{n}, \end{aligned}$$

where \(V_n\) is an \(\mathrm {M}_k({\mathbb {C}})\)-valued random variable with a probability density \( \Vert v x_n\Vert ^2/\Vert x_n\Vert ^2 \mathrm {d} \mu (v).\) Condition (1) ensures this is a probability density for any \(x_n\ne 0\). More precisely, such a Markov chain is associated with the transition kernel given for a set \(S\in \mathcal {B}\) and \(\hat{x}\in {\mathrm P}({\mathbb {C}}^k)\) by

$$\begin{aligned} \Pi (\hat{x},S)=\int _{\mathrm {M}_k({{\mathbb {C}}})} \mathbf {1}_{S}\left( v\cdot \hat{x}\right) \Vert v x\Vert ^2 \mathrm {d} \mu (v), \end{aligned}$$
(2)

where x is an arbitrary normalized vector representative of \(\hat{x}\). Moreover, the event \(\{vx=0\}\) always has probability 0, hence the Markov chain is well-defined on \({\mathrm P}({{\mathbb {C}}}^k)\). We recall that for any probability measure \(\nu \), \(\nu \Pi \) is the probability measure defined by

$$\begin{aligned} \nu \Pi (S)=\int _{{\mathrm P}\left( {{\mathbb {C}}}^k\right) }\Pi \left( \hat{x},S\right) \mathrm{d}\nu (\hat{x}) \end{aligned}$$

for any \(S\in \mathcal {B}\). A measure \(\nu \) is called invariant if \(\nu \Pi = \nu \).

We are interested in the large-time distribution of \((\hat{x}_n)\). Note that \(\hat{x}_n\) can be written as

$$\begin{aligned} \hat{x}_n = V_{n}\ldots V_{1}\cdot \hat{x}_0, \end{aligned}$$

so that the study of \(\hat{x}_n\) can be formulated in terms of random products of matrices. Markov chains associated to random products of matrices have been studied in a more general setting where the weight appearing in the transition kernel (2) is proportional to \(\Vert v x\Vert ^s\) for some \(s \ge 0\), instead of \(\Vert v x\Vert ^2\). The classical case of products of independent, identically distributed random matrices pioneered by Kesten, Furstenberg and Guivarc’h corresponds to \(s = 0\). In that case, for i.i.d. invertible random matrices \(Y_1,Y_2,\ldots \), denoting \(S_n=Y_n\ldots Y_1\), one is usually interested in the asymptotic properties of

$$\begin{aligned} \log \Vert S_n x\Vert , \end{aligned}$$

for any \(x\ne 0\). In particular, a law of large numbers, a central limit theorem and a large deviation principle have been obtained for this quantity, under contractivity and strong irreducibility assumptions [8, 11, 16]. Such results are closely linked to the uniqueness of the invariant measure of the Markov chain

$$\begin{aligned} \hat{x}_n=S_n\cdot \hat{x}. \end{aligned}$$

These results were generalized to the case \(s >0\) in [10]. Our framework corresponds to the case \(s=2\); in this case, and with the additional assumption (1), we provide a new method to study this Markov chain, and use it to derive the above results without assuming invertibility of the matrices, and with an optimal irreducibility assumption. We compare our approach with respect to that of [10] at the end of this section.

The method that we employ is motivated by an interpretation of this process as statistics of a quantum system being repeatedly indirectly measured. Let us expand on this as we introduce more notation and terminology. The set of states of a quantum system described by a finite dimensional Hilbert space \({{\mathbb {C}}}^k\) is the set of density matrices \({\mathcal {D}}_k:=\{\rho \in \mathrm {M}_k({{\mathbb {C}}})\ |\ \rho \ge 0,\ {{\text {tr}}}\,\rho =1\}\). This set is convex and the set of its extreme points is called the set of pure states. This latter set is in one to one correspondence with the projective space \({{\mathrm P}({{\mathbb {C}}}^k)}\) by the bijection \({{\mathrm P}({{\mathbb {C}}}^k)}\ni \hat{x}\mapsto \pi _{\hat{x}}\in {\mathcal {D}}_k\) with \(\pi _{\hat{x}}\) the orthogonal projector on the corresponding ray in \({{\mathbb {C}}}^k\). The time evolution of the system conditioned on a measurement outcome is encoded in a matrix v that updates the state of the system. The support of \(\mu \) is endowed with the meaning of the possible updates, and the system is updated according to v with a probability density \({{\text {tr}}}(v \rho v^*)\, \mathrm{d}\mu (v)\). Given v, a state \(\rho \) is mapped to a state \(v \rho v^*/{{\text {tr}}}(v \rho v^*)\). Iterating this procedure defines a random sequence \((\rho _n)\) in \({\mathcal {D}}_k\) called a quantum trajectory: after n measurements with resulting matrices \(v_1,\dots ,v_n\) the state of the system becomes

$$\begin{aligned} \rho _n=\frac{v_{n}\ldots v_{1}\rho _0 v_{1}^* \ldots v_{n}^*}{{{\text {tr}}}\left( v_{n}\ldots v_{1}\rho _0 v_{1}^* \ldots v_{n}^*\right) } \end{aligned}$$
(3)

where \((v_1,\ldots ,v_n)\) has probability density \({{\text {tr}}}(v_{n} \dots v_{1} \rho _0 v_{1}^*\ldots v_{n}^*)\,\mathrm{d}\mu ^{\otimes n}(v_1,\ldots ,v_n)\). In other words, the process Eq. (3) describes an evolution of a repeatedly measured quantum system.

A key result in the theory of quantum trajectories is the purification theorem obtained by Kümmerer and Maassen [17] showing that quantum trajectories \((\rho _n)\) defined on \(\mathcal {D}_k\) almost surely approach the set of pure states (which are the extreme points of \(\mathcal {D}_k\)) if and only if the following purification condition is satisfied:

(Pur)::

Any orthogonal projector \(\pi \) such that for any \(n\in {{\mathbb {N}}}\), \(\pi v_1^*\ldots v_n^* v_n\ldots v_1 \pi \propto \pi \) for \(\mu ^{\otimes n}\)-almost all \((v_1,\ldots ,v_n)\), is of rank one

(we write \(X \propto Y\) for XY two operators if there exists \(\lambda \in {{\mathbb {C}}}\) such that \(X=\lambda Y\)).

Under this assumption, the long-time behavior of the Markov chain is essentially dictated by its form on the set of pure states, i.e. for \(\rho _0=\pi _{\hat{x}_0}\). It is an immediate observation that

$$\begin{aligned} {{\text {tr}}}\left( v\pi _{\hat{x}_0}v^*\right) = \Vert v x_0\Vert ^2, \quad \frac{v\pi _{\hat{x}_0} v^*}{{{\text {tr}}}\left( v\pi _{\hat{x}_0}v^*\right) } = \pi _{v\cdot \hat{x}_0}, \end{aligned}$$
(4)

for all \(v\in \mathrm {M}_k({\mathbb {C}})\). This way our Markov chain \((\hat{x}_n)\) corresponds to the quantum trajectory \((\rho _n)\) described above when \(\rho _0\) is a pure state \(\pi _{\hat{x}_0}\).

Although ideas underlying our method are based on the connection of \((\hat{x}_n)\) with this physical problem, we will not explicitly use it in the paper. The notion of quantum trajectory originates in quantum optics [6], and Haroche’s Nobel prize winning experiment [9] is arguably the most prominent example of a system described by the above formalism. The reader interested in the involved mathematical structures might consult for example the review book [13] or the pioneering articles [14, 15, 17].

We will show that under the condition (Pur), the set of all invariant measures of the Markov chain (3) can be completely classified, depending on the operator \(\phi \) on \(\mathcal {D}_k\) describing the average evolution:

$$\begin{aligned} \phi (\rho )=\int _{\mathrm {M}_k({\mathbb {C}})} v \rho v^* \, \mathrm{d}\mu (v). \end{aligned}$$
(5)

The map \(\phi \) on \({\mathcal {D}}_k\) is completely positive and trace-preserving.Footnote 1 Such a map is often called a quantum channel (see e.g. [22]). It has in particular the property of mapping states to states. Brouwer’s fixed point theorem shows that there exists an invariant state, i.e. \(\rho \in {\mathcal {D}}_k\) such that \(\phi (\rho )=\rho \). A necessary and sufficient algebraic condition for uniqueness of this invariant state is (see e.g. [5, 7, 22])

(\(\phi \)-Erg)::

There exists a unique minimal non trivial subspace E of \({{\mathbb {C}}}^k\) such that \(\forall v\in {\text {supp}}\mu \), \(vE\subset E\).

If (\(\phi \)-Erg) holds with \(E={{\mathbb {C}}}^k\), then \(\phi \) is said to be irreducible. We chose the name (\(\phi \)-Erg) to avoid confusion with the notion of irreducibility for Markov chains. We moreover emphasize that we call this assumption (\(\phi \)-Erg) because it relies only on \(\phi \) and not on the different operators v in the support of \(\mu \): an equivalent statement of (\(\phi \)-Erg) is that there exists a unique minimal nonzero orthogonal projector \(\pi \) such that \(\phi (\pi )\le \lambda \pi \) for some \(\lambda \ge 0\) (see e.g. [20]).

We now state the main result of the paper:

Theorem 1.1

Assume that \(\mu \) satisfies assumptions (Pur) and (\(\phi \)-Erg). Then, the transition kernel \(\Pi \) has a unique invariant probability measure \(\nu _{\mathrm {inv}}\) and there exist \(m\in \{1,\ldots ,k\}\), \(C>0\) and \(0<\lambda <1\) such that for any probability measure \(\nu \) over \(\big ({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}}\big )\),

$$\begin{aligned} W_1\left( \frac{1}{m}\sum _{r=0}^{m-1} \nu \Pi ^{mn+r}, \nu _{\mathrm {inv}}\right) \le C \lambda ^n, \end{aligned}$$
(6)

where \(W_1\) is the Wasserstein metric of order 1.

The Wasserstein metric is constructed with respect to a natural metric on the complex projective space. This metric is defined, for \(\hat{x},\,\hat{y}\) in \({{\mathrm P}({{\mathbb {C}}}^k)}\), by

$$\begin{aligned} d\left( \hat{x},\hat{y}\right) = \left( 1-|\langle x,y\rangle |^2\right) ^{\frac{1}{2}}, \end{aligned}$$
(7)

where \(x,\,y\) are unit length representative vectors of \(\hat{x}\), \(\hat{y}\), and \(\langle \,\cdot \,,\,\cdot \,\rangle \) is the canonical hermitian inner product on \(\mathbb {C}^k\).

Let us now compare our results to those of the article [10] of Guivarc’h and Le Page. They consider a probability distribution \(\mu \) with support in \(\mathrm {GL}_k({\mathbb {C}})\), without requiring the normalization condition (1), and study the transition kernel on \({{\mathrm P}({{\mathbb {C}}}^k)}\) given, for \(S\in {\mathcal {B}}\), by

$$\begin{aligned} \Pi _{s}\left( \hat{x},S\right) \propto \int _{\mathrm {M}_k({\mathbb {C}})}\mathbf {1}_S\left( v\cdot \hat{x}\right) \Vert v x\Vert ^s \mathrm {d} \mu (v). \end{aligned}$$

In the case \(s=2\), Theorem A of [10] implies the conclusions of Theorem 1.1 under two assumptions:

  • strong irreducibility, in the sense that there is no non-trivial finite union of proper subspaces of \({{\mathbb {C}}}^k\) left invariant by all \(v\in \mathrm {supp}\,\mu \),

  • contractivity, in the sense that there exists a sequence \((a_n)\) in \(T_\mu \), the smallest closed sub-semigroup of \(\mathrm {GL}_k({{\mathbb {C}}})\) containing \({\text {supp}}\mu \), such that \(\lim _{n\rightarrow \infty }a_n/\Vert a_n\Vert \) exists and is of rank one.

It is, however, immediate that strong irreducibility of \(\mu \) implies (\(\phi \)-Erg) with \(E={{\mathbb {C}}}^k\). In addition, if we assume \({\text {supp}}\mu \subset \mathrm {GL}_k({\mathbb {C}})\) and \({\text {supp}}\mu \) is strongly irreducible, the equivalence

$$\begin{aligned} {\mathbf{(Pur)}}\iff T_\mu \text{ is } \text{ contracting } \end{aligned}$$

holds (see “Appendix A”). Our results therefore offer a strong refinement of [10] in the restricted framework of \(s=2\) with \(\int v^* v \,\mathrm{d}\mu (v)=\mathrm{Id}_{{{\mathbb {C}}}^k}\). This assumption, although mathematically restrictive, is automatically verified in the framework of repeated (indirect) quantum measurements as described earlier in this section.

The article is structured as follows. Section 2 is devoted to the first part of Theorem 1.1, that is the uniqueness of the invariant measure. In Sect. 3 we show the geometric convergence towards the invariant measure with respect to the 1-Wasserstein metric. In Sect. 4 we discuss the Lyapunov exponents of the process and relate them to the convergence between the Markov chain and an estimate of the chain used in our proofs.

Notation For \(x\in {{\mathbb {C}}}^k\setminus \{0\}\), \(\hat{x}\) is its equivalence class in \({\mathrm P}({{\mathbb {C}}}^k)\) and, for \(\hat{x}\) in \({\mathrm P}({{\mathbb {C}}}^k)\), x is an arbitrary norm one vector representative of \(\hat{x}\). If e.g. \({{\mathbb {P}}}_\nu \) (resp. \({\mathbb {P}}^\rho \)) is a probability measure (depending on some a priori object \(\nu \) (resp. \(\rho \))) then \({{\mathbb {E}}}_\nu \) (resp. \({{\mathbb {E}}}^\rho \)) is the expectation with respect to \({{\mathbb {P}}}_\nu \) (resp. \({{\mathbb {P}}}^\rho \)). \({{\mathbb {N}}}\) represents the set of positive integers \(\{1,2,\ldots \}\).

2 Uniqueness of the invariant measure

This section concerns essentially the first part of Theorem 1.1. More precisely, under (\(\phi \)-Erg) and (Pur) we show that the Markov chain has a unique invariant measure. Note that an invariant measure always exists since \({\mathrm P}({\mathbb {C}}^k)\) is compact. We start by introducing a probability space describing both the state \(\hat{x}\in {\mathrm P}({{\mathbb {C}}}^k)\) and the sequence of matrices \((v_1,v_2,\ldots )\) such that \((v_n\ldots v_1\cdot \hat{x})\) has the same distribution as the Markov chain \((\hat{x}_n)\). Then, in Proposition 2.1, we show that the marginal on the matrix sequence is the same for any \(\Pi \)-invariant probability measure as long as (\(\phi \)-Erg) holds. In Proposition 2.2 and Lemma 2.3 we show that \((\hat{x}_n)\) is asymptotically a function of \((v_1,v_2,\ldots )\). We conclude on the uniqueness of the invariant measure in Corollary 2.4.

We now proceed to introduce some additional notation. We consider the space of infinite sequences \(\Omega :=\mathrm {M}_k({{\mathbb {C}}})^{{\mathbb {N}}}\), write \(\omega = (v_1,v_2, \dots )\) for any such infinite sequence, and denote by \(\pi _n\) the canonical projection on the first n components, \(\pi _n(\omega )=(v_1,\ldots ,v_n)\). Let \({\mathcal {M}}\) be the Borel \(\sigma \)-algebra on \(\mathrm {M}_k({{\mathbb {C}}})\). For \(n\in {{\mathbb {N}}}\), let \(\mathcal {O}_n\) be the \(\sigma \)-algebra on \(\Omega \) generated by the n-cylinder sets, i.e. \(\mathcal {O}_n = \pi _n^{-1}({\mathcal {M}}^{\otimes n})\). We equip the space \(\Omega \) with the smallest \(\sigma \)-algebra \(\mathcal {O}\) containing \(\mathcal {O}_n\) for all \(n\in {{\mathbb {N}}}\). We let \({\mathcal {B}}\) be the Borel \(\sigma \)-algebra on \({\mathrm P}({\mathbb {C}}^k)\), and denote

$$\begin{aligned} \mathcal {J}_n={\mathcal {B}}\otimes \mathcal {O}_n,\qquad \mathcal {J}={\mathcal {B}}\otimes \mathcal {O}. \end{aligned}$$

This makes \(\big ({\mathrm P}({\mathbb {C}}^k)\times \Omega ,\mathcal {J}\big )\) a measurable space. With a small abuse of notation we denote the sub-\(\sigma \)-algebra \(\{\emptyset ,{\mathrm P}({\mathbb {C}}^k)\}\times \mathcal {O}\) by \(\mathcal {O}\), and equivalently identify any \(\mathcal {O}\)-measurable function f with the \(\mathcal {J}\)-measurable function f satisfying \(f(\hat{x},\omega ) = f(\omega )\).

For \(i\in {\mathbb {N}}\), we consider the random variables \(V_i : \Omega \rightarrow \mathrm {M}_k({\mathbb {C}})\),

$$\begin{aligned} V_i(\omega ) = v_i \quad \text{ for } \quad \omega =(v_1,v_2,\ldots ), \end{aligned}$$
(8)

and we introduce \({\mathcal {O}}_n\)-mesurable random variables \((W_n)\) defined for all \(n\in {\mathbb {N}}\) as

$$\begin{aligned} W_n=V_{n}\ldots V_{1}. \end{aligned}$$

With a small abuse of notation we identify cylinder sets and their bases, and extend this identification to several associated objects. In particular we identify \(O_n\in {\mathcal {M}}^{\otimes n}\) with \(\pi _n^{-1}(O_n)\), a function f on \({\mathcal {M}}^{\otimes n}\) with \(f \circ \pi _n\) and a measure \(\mu ^{\otimes n}\) with the measure \(\mu ^{\otimes n} \circ \pi _n\). Since \(\mu \) is not necessarily finite, we can not extend \((\mu ^{\otimes n})\) into a measure on \(\Omega \).

Let \(\nu \) be a probability measure over \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\). We extend it to a probability measure \(\mathbb {P}_\nu \) over \(({{\mathrm P}({{\mathbb {C}}}^k)}\times \Omega ,\mathcal {J})\) by letting, for any \(S\in {\mathcal {B}}\) and any cylinder set \(O_n \in \mathcal {O}_n\),

$$\begin{aligned} \mathbb {P}_\nu (S \times O_n):=\int _{S\times O_n} \Vert W_n(\omega )x\Vert ^2 \mathrm{d}\nu (\hat{x}) \mathrm{d}\mu ^{\otimes n}(\omega ). \end{aligned}$$
(9)

From relation (1), it is easy to check that the expression (9) defines a consistent family of probability measures and, by Kolmogorov’s theorem, this defines a unique probability measure \({{\mathbb {P}}}_\nu \) on \({{\mathrm P}({{\mathbb {C}}}^k)}\times \Omega \). In addition, the restriction of \(\mathbb {P}_\nu \) to \({\mathcal {B}}\otimes \{\emptyset ,\Omega \}\) is by construction \(\nu \).

We now define the random process \((\hat{x}_n)\). For \((\hat{x}, \omega )\in {{\mathrm P}({{\mathbb {C}}}^k)}\times \Omega \) we define \(\hat{x}_0(\hat{x}, \omega )=\hat{x}\). Note that for any n, the definition (9) of \({{\mathbb {P}}}_\nu \) imposes

$$\begin{aligned} {{\mathbb {P}}}_\nu \left( W_n x_0 = 0\right) =0. \end{aligned}$$

This allows us to define a sequence \((\hat{x}_n)\) of \((\mathcal {J}_n)\)-adapted random variables on the probability space \(({{\mathrm P}({{\mathbb {C}}}^k)}\times \Omega ,\mathcal {J}, {{\mathbb {P}}}_\nu )\) by letting

$$\begin{aligned} \hat{x}_n:= W_n\cdot \hat{x} \end{aligned}$$
(10)

whenever the expression makes sense, i.e. for any \(\omega \) such that \(W_n(\omega ) x\ne 0\), and extending it arbitrarily to the whole of \(\Omega \). The process \((\hat{x}_n)\) on \((\Omega \times {\mathrm P}({\mathbb {C}}^k),\mathcal {J}, \mathbb {P}_\nu )\) has the same distribution as the Markov chain defined by \(\Pi \) and initial probability measure \(\nu \).

Let us highlight the relation between \({{\mathbb {P}}}_\nu \) and density matrices. To that end, let

$$\begin{aligned} \rho _\nu := {\mathbb {E}}_\nu \left( \pi _{\hat{x}}\right) . \end{aligned}$$
(11)

By linearity and positivity of the expectation, \(\rho _\nu \in {\mathcal {D}}_k\). Note that, conversely, for a given \(\rho \in {\mathcal {D}}_k\) there exists \(\nu \) (in general non-unique) such that \(\rho _\nu = \rho \). For example, if a spectral decomposition of \(\rho \) is \(\rho =\sum _j p_j \pi _{x_j}\) then necessarily \(\sum _j p_j=1\), so that \(\nu = \sum _j p_j \delta _{\hat{x}_j}\) is a probability measure on \({{\mathrm P}({{\mathbb {C}}}^k)}\), and it satisfies the desired relation (11).

This relation motivates the following definition of probability measures over \((\Omega ,\mathcal {O})\). For \(\rho \in \mathcal {D}_k\) and any cylinder set \(O_n \in \mathcal {O}_n\), let

$$\begin{aligned} {{\mathbb {P}}}^{\rho }(O_n):= \int _{O_n}{{\text {tr}}}\big (W_n(\omega ) \rho W_n^*(\omega )\big ) \mathrm {d} \mu ^{\otimes n}(\omega ). \end{aligned}$$
(12)

In particular, for any \(S\in \mathcal {B}\) and \(A\in \mathcal {O}\),

$$\begin{aligned} {{\mathbb {P}}}_\nu (S\times A)=\int _{S}{{\mathbb {P}}}^{\pi _{\hat{x}}}(A)\, \mathrm{d}\nu (\hat{x}). \end{aligned}$$
(13)

The following proposition elucidates further the connection between \({{\mathbb {P}}}_\nu \) and \({{\mathbb {P}}}^{\rho _\nu }\).

Proposition 2.1

The marginal of \(\mathbb {P}_\nu \) on \(\mathcal {O}\) is the probability measure \({\mathbb {P}}^{\rho _\nu }\). Moreover, if (\(\phi \)-Erg) holds, \({{\mathbb {P}}}^{\rho _{\nu _a}}={{\mathbb {P}}}^{\rho _{\nu _b}}\) for any two \(\Pi \)-invariant probability measures \(\nu _a\) and \(\nu _b\).

Proof

By construction it is sufficient to check the equality of the measures on cylinder sets. Let \(O_n \in \mathcal {O}_n\); from the definition of \(\mathbb {P}_\nu \), and the linearity of the trace and the integral, we have

$$\begin{aligned} \mathbb {P}_\nu \left( {{\mathrm P}({{\mathbb {C}}}^k)}\times O_n\right)&=\int _{{{\mathrm P}({{\mathbb {C}}}^k)}\times O_n} {{\text {tr}}}\big (W^*_n(\omega )W_n(\omega )\pi _{\hat{x}}\big )\, \mathrm{d}\nu (\hat{x}) \mathrm{d}\mu ^{\otimes n}(\omega )\\&=\int _{O_n} {{\text {tr}}}\left( W_n^*(\omega )W_n(\omega )\int _{{{\mathrm P}({{\mathbb {C}}}^k)}} \pi _{\hat{x}} \,\mathrm{d}\nu (\hat{x})\right) \,\mathrm{d}\mu ^{\otimes n}(\omega )\\&=\int _{O_n} {{\text {tr}}}\big (W_n^*(\omega )W_n(\omega )\rho _\nu \big ) \,\mathrm{d}\mu ^{\otimes n}(\omega ). \end{aligned}$$

The equality between the marginal of \({{\mathbb {P}}}_\nu \) on \(\mathcal {O}\) and \({{\mathbb {P}}}^{\rho _\nu }\) follows.

If \(\nu \) is an invariant measure, on the one hand

$$\begin{aligned} {\mathbb {E}}_\nu \left( \pi _{\hat{x}_1}\right) ={\mathbb {E}}_\nu \left( \pi _{\hat{x}_0}\right) =\rho _\nu . \end{aligned}$$

On the other hand,

$$\begin{aligned} {\mathbb {E}}_\nu \left( \pi _{\hat{x}_1}\right) =&\int _{{{\mathrm P}({{\mathbb {C}}}^k)}\times \mathrm {M}_k({{\mathbb {C}}})} \frac{v\pi _{\hat{x}_0} v^*}{\Vert vx_0\Vert ^2}\Vert vx_0\Vert ^2\mathrm{d}\nu \left( \hat{x}_0\right) \mathrm{d}\mu (v)\\ =&\int _{\mathrm {M}_k({{\mathbb {C}}})} v\,{{\mathbb {E}}}_\nu \left( \pi _{\hat{x}_0}\right) \,v^*\mathrm{d}\mu (v)\\ =&\phi (\rho _\nu ), \end{aligned}$$

so that \(\rho _\nu \) is a fixed point of \(\phi \). Hence if (\(\phi \)-Erg) holds, \(\rho _\nu \) is the unique fixed point of \(\phi \) in \({\mathcal {D}}_k\). Hence \(\rho _{\nu _a}=\rho _{\nu _b}\) and \({{\mathbb {P}}}^{\rho _{\nu _a}}={{\mathbb {P}}}^{\rho _{\nu _b}}\) holds. \(\square \)

In the following we use the measure \({{\mathbb {P}}}^{\mathrm {ch}}={{\mathbb {P}}}^{\frac{1}{k}\mathrm{Id}_{{{\mathbb {C}}}^k}}\) associated to the operator \(\mathrm{Id}_{{\mathbb {C}}^k}/k\in {\mathcal {D}}_k\) as a reference measure. Since for any \(\rho \in {\mathcal {D}}_k\) there exists a constant c such that \(\rho \le c \,\frac{\mathrm{Id}_{{\mathbb {C}}^k}}{k}\), the measure \({{\mathbb {P}}}^\rho \) is absolutely continuous w.r.t. \({{\mathbb {P}}}^{\mathrm {ch}}\). We will denote absolute continuity between measures with the symbol \(\ll \), so that we have here

$$\begin{aligned} {{\mathbb {P}}}^\rho \ll {{\mathbb {P}}}^{\mathrm {ch}}, \end{aligned}$$

for all \(\rho \in {\mathcal {D}}_k\). The Radon–Nykodim derivative will be made explicit in Proposition 2.2. To that end, we use a particular \((\mathcal {O}_n)\)-adapted process. We define a sequence of matrix-valued random variables:

$$\begin{aligned} M_n:=\frac{W_n^*W_n}{{{\text {tr}}}\left( W_n^*W_n\right) } \quad \text{ if }\,\, {{\text {tr}}}\left( W_n^*W_n\right) \ne 0 \end{aligned}$$

and extend the definition arbitrarily whenever \({{\text {tr}}}(W_n^*W_n)=0\). The latter alternative appears with probability 0: indeed, \({{\mathbb {P}}}^{\mathrm {ch}}\big ({{\text {tr}}}(W_n^*W_n)=0\big )=0\) and then by the absolute continuity of \({{\mathbb {P}}}^\rho \) with respect to \({{\mathbb {P}}}^{\mathrm {ch}}\) we have \({{\mathbb {P}}}_\nu \big ({{\text {tr}}}(W_n^*W_n)=0\big )={{\mathbb {P}}}^{\rho _\nu }\big ({{\text {tr}}}(W_n^*W_n)=0\big )=0\) for any measure \(\nu \). The key property of \(M_n\), that we establish in the proof of Proposition 2.2, is that it is an \((\mathcal {O}_n)\)-martingale with respect to \({{\mathbb {P}}}^{\mathrm {ch}}\).

From the existence of a polar decomposition for \(W_n\), for each n, there exists a unitary matrix-valued random variable \(U_n\) such that

$$\begin{aligned} W_n=U_n\sqrt{{{\text {tr}}}\left( W_n^*W_n\right) }M_n^{\frac{1}{2}}. \end{aligned}$$
(14)

This process \((U_n)\) can be chosen to be \((\mathcal {O}_n)\)-adapted.

The key technical results about \(M_n\) needed for our proofs are summarized in the following proposition. Recall that any \(\mathcal {O}\)-measurable function f is extended to a \(\mathcal {J}\)-measurable function by setting \(f(\hat{x},\omega )=f(\omega )\) for any \((\hat{x},\omega )\in {\mathrm P}({{\mathbb {C}}}^k)\times \Omega \).

Proposition 2.2

  For any probability measure \(\nu \) over \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\), \((M_n)\) converges \(\mathbb {P}_\nu {\text {-}}\mathrm {a.s.}\) and in \(L^1\)-norm to an \(\mathcal {O}\)-measurable random variable \(M_\infty \). The change of measure formula

$$\begin{aligned} \frac{\mathrm{d}{\mathbb {P}}^\rho }{\mathrm{d}{{\mathbb {P}}}^{\mathrm {ch}}}=k\,{{\text {tr}}}(\rho M_\infty ) \end{aligned}$$
(15)

holds true for all \(\rho \in {\mathcal {D}}_k\).

Moreover, the measure \(\mu \) verifies (Pur) if and only if \(M_\infty \) is \({{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}\) a rank one projection for any probability measure \(\nu \) over \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\).

Proof

We start the proof by showing that \(M_n\) is a \({{\mathbb {P}}}^{\mathrm {ch}}\)-martingale. Recall that for all \(n\in {\mathbb {N}}\) and all \(O_n\in \mathcal {O}_n\),

$$\begin{aligned} {{\mathbb {P}}}^{\mathrm {ch}}(O_n)=\frac{1}{k} \int _{O_n} {{\text {tr}}}\big (W_{n}^*(\omega )W_n(\omega )\big ) \, \mathrm{d}\mu ^{\otimes n}(\omega ). \end{aligned}$$

From the definition of \(W_n\), Eq. (8),

$$\begin{aligned} M_{n+1} = \frac{W^*_{n}V^*_{{n+1}}V_{{n+1}} W_n}{{{\text {tr}}}\left( W^*_n W_n\right) }\, \frac{{{\text {tr}}}\big (W^*_n W_n\big )}{{{\text {tr}}}\big (W^*_{n+1} W_{n+1}\big )}. \end{aligned}$$
(16)

This implies that for an arbitrary \(\mathcal {O}_n\)-measurable random variable Y

$$\begin{aligned} {{\mathbb {E}}}^{\mathrm {ch}}\left( Y M_{n+1}\right)&= \frac{1}{k} \int _{\mathrm {M}_k({{\mathbb {C}}})^{n+1}} \frac{W^*_{n}V_{{n+1}}^*V_{{n+1}} W_n}{{{\text {tr}}}\left( W^*_n W_n\right) } \, Y \, {{\text {tr}}}\big (W^*_n W_n\big ) \, \mathrm{d}\mu ^{\otimes n+1} \\&=\frac{1}{k} \int _{\mathrm {M}_k({{\mathbb {C}}})^n} \frac{W^*_{n} W_n}{{{\text {tr}}}\big (W^*_n W_n\big )}\, Y \, {{\text {tr}}}\big (W^*_n W_n\big )\, \mathrm{d}\mu ^{\otimes n}\\&= {{\mathbb {E}}}^{\mathrm {ch}}(Y M_n), \end{aligned}$$

where the second equality follows from the stochasticity condition (1), \(\int v^* v \mathrm{d} \mu (v) = \mathrm{Id}_{{{\mathbb {C}}}_k}\). This shows that \((M_n)\) is an \((\mathcal {O}_n)\)-martingale w.r.t. \({{\mathbb {P}}}^{\mathrm {ch}}\). Since the sequence \((M_n)\) is composed of positive semidefinite matrices of trace one, its coordinates are a.s. uniformly bounded by 1. Therefore, the martingale property implies the \(L^1\) and a.s. convergence of \((M_n)\) to an \(\mathcal {O}\)-measurable random variable \(M_\infty \). Now note that for any \(\rho \in {\mathcal {D}}_k\),

$$\begin{aligned} {{\text {tr}}}\left( W_n^* W_n\rho \right) ={{\text {tr}}}(M_n\rho )\,k\,{{\text {tr}}}\left( W_n^* W_n \,\frac{\mathrm{Id}_{{\mathbb {C}}_k}}{k}\right) . \end{aligned}$$

This way, the convergence of \((M_n)\) implies the change of measure formula.

We now prove the last part of the proposition. Using the martingale property one can see that for all \(n\in {\mathbb {N}}\), and any fixed \(p \in \mathbb {N}\),

$$\begin{aligned} V_n^p\ :=\ \sum _{k=0}^{p-1}{{\mathbb {E}}}^{\mathrm {ch}}\left( M_{k+n+1}^2 - M_k^2\right)= & {} \sum _{k=0}^n{{\mathbb {E}}}^{\mathrm {ch}}\left( M_{k+p}^2\right) -\sum _{k=0}^n{{\mathbb {E}}}^{\mathrm {ch}}\left( M_k^2\right) \nonumber \\= & {} \sum _{k=0}^n{{\mathbb {E}}}^{\mathrm {ch}}\left( \left( M_{k+p}-M_k\right) ^2\right) \nonumber \\= & {} {{\mathbb {E}}}^{\mathrm {ch}}\left( \sum _{k=0}^n{{\mathbb {E}}}^{\mathrm {ch}}\left( \left( M_{k+p}-M_k\right) ^2\vert {\mathcal {O}}_k\right) \right) .\nonumber \\ \end{aligned}$$
(17)

Since \((M_n)\) is bounded and almost surely convergent, applying Lebesgue’s dominated convergence theorem to each \({{\mathbb {E}}}^{\mathrm {ch}}(M_{k+n+1}^2)\), \(k=0,\ldots ,p-1\) as \(n\rightarrow \infty \) implies that the term \(V_n^p\) is convergent when n goes to infinity. Then, using the monotone convergence theorem in the last line of (17), we get that

$$\begin{aligned} \lim _{n\rightarrow \infty }V_n^p={{\mathbb {E}}}^{\mathrm {ch}}\left( \sum _{k=0}^\infty {{\mathbb {E}}}^{\mathrm {ch}}\left( \left( M_{k+p}-M_k\right) ^2 \vert {\mathcal {O}}_k\right) \right) . \end{aligned}$$

It implies that the series \(\sum _{k=0}^\infty {{\mathbb {E}}}^{\mathrm {ch}}\big ((M_{k+p}-M_k)^2\vert {\mathcal {O}}_k\big )\) is almost surely finite. This yields that

$$\begin{aligned} \lim _{n\rightarrow \infty }{{\mathbb {E}}}^{\mathrm {ch}}\left( \left( M_{n+p}-M_n\right) ^2\vert {\mathcal {O}}_n\right) =0,\quad {{\mathbb {P}}}^{\mathrm {ch}}{\text {-}}\mathrm {a.s.}\end{aligned}$$

Since all the norms are equivalent in finite dimension, Jensen’s inequality implies

$$\begin{aligned} \lim _{n\rightarrow \infty }{{\mathbb {E}}}^{\mathrm {ch}}\left( \left\| M_{n+p}-M_n\right\| |\mathcal {O}_n\right) =0,\quad {{\mathbb {P}}}^{\mathrm {ch}}{\text {-}}\mathrm {a.s.}\end{aligned}$$
(18)

At this stage we use the polar decomposition of \((W_n)\), Eq. (14), to write

$$\begin{aligned} M_{n+p} = \frac{M_n^{\frac{1}{2}}U_n^*V_{n+1}^*\ldots V_{n+p}^*V_{n+p}\ldots V_{n+1}U_nM_n^{\frac{1}{2}}}{{{\text {tr}}}\left( M_n^{\frac{1}{2}}U_n^*V_{n+1}^*\ldots V_{n+p}^*V_{n+p}\ldots V_{n+1}U_nM_n^{\frac{1}{2}}\right) }. \end{aligned}$$

Then we get an expression for the conditional expectation, see the first part of the proof,

$$\begin{aligned} {{\mathbb {E}}}^{\mathrm {ch}}\big (\left\| M_{n+p}-M_n\right\| \vert {\mathcal {O}}_n\big )=&\int \left\| \frac{M_n^{\frac{1}{2}}U_n^*v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_nM_n^{\frac{1}{2}}}{{{\text {tr}}}\left( M_n^{\frac{1}{2}}U_n^*v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_nM_n^{\frac{1}{2}}\right) }-M_n\right\| \\&\qquad {{\text {tr}}}\left( v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_nM_nU_n^*\right) \,\mathrm{d}\mu ^{\otimes p}(v_1,\ldots ,v_p)\\ =&\int \left\| M_n^{\frac{1}{2}}U_n^*v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_nM_n^{\frac{1}{2}}\right. \\&\left. \qquad - M_n{{\text {tr}}}\left( v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_nM_nU_n^*\right) \right\| \\&\qquad \mathrm{d}\mu ^{\otimes p}(v_1,\ldots ,v_p). \end{aligned}$$

We used non-negativity of \({{\text {tr}}}(v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_nM_nU_n^*)\) to get the second equality. The above equation holds for \({{\mathbb {P}}}^{\mathrm {ch}}\)-almost all realizations \(\big (U_n(\omega )\big )\) of \((U_n)\). Since the group of unitary matrices is compact, for any fixed \(\omega \) there exists a subsequence along which \(\big (U_n(\omega )\big )\) converges to a unitary matrix \(U_\infty (\omega )\). Taking the limit along this subsequence in the above expression yields (we drop \(\omega \) for notational simplicity):

$$\begin{aligned}&\int _{\mathrm {M}_k({{\mathbb {C}}})^p}\left\| M_\infty ^{\frac{1}{2}}U_\infty ^*v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_\infty M_\infty ^{\frac{1}{2}} - M_\infty {{\text {tr}}}\left( v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_\infty M_\infty U_\infty ^*\right) \right\| \qquad \\&\qquad \qquad \mathrm{d}\mu ^{\otimes p}(v_1,\ldots ,v_p)=0. \end{aligned}$$

This implies that

$$\begin{aligned} M_\infty ^{\frac{1}{2}}U_\infty ^*v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_\infty M_\infty ^{\frac{1}{2}} = {{\text {tr}}}\left( v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_\infty M_\infty U_\infty ^*\right) M_\infty , \end{aligned}$$

for \(\mu ^{\otimes p}\)-almost all \((v_1,\ldots ,v_p)\).

Denoting by \(\pi _\infty \) the orthogonal projector onto the range of \(M_\infty \), the above condition is equivalent to \(\pi _\infty U_\infty ^*v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_\infty \pi _\infty =\lambda \pi _\infty \) with \(\lambda ={{\text {tr}}}(v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_\infty M_\infty U_\infty ^*)\). Finally, it follows that

$$\begin{aligned} U_\infty \pi _\infty U_\infty ^*v_{1}^*\ldots v_{p}^*v_{p}\ldots v_{1}U_\infty \pi _\infty U_\infty ^*\propto U_\infty \pi _\infty U_\infty ^*, \end{aligned}$$

for \(\mu ^{\otimes p}\)-almost all \((v_1,\ldots ,v_p)\). Since \(U_\infty \pi _\infty U_\infty ^*\) is an orthogonal projector, the condition (Pur) implies (reintroducing \(\omega \)) that \({\text {rank}}(M_\infty (\omega ))={\text {rank}}(U_\infty (\omega ) \pi _\infty (\omega ) U_\infty ^*(\omega ))=1\). Since \(M_\infty (\omega )\) is a trace one, positive semidefinite matrix this means that \(M_\infty (\omega )\) is a rank one projector. Since this conclusion holds true for \({{\mathbb {P}}}^{\mathrm {ch}}\)-almost all \(\omega \) this establishes that the condition (Pur) implies that \(M_\infty \) is \({{\mathbb {P}}}^{\mathrm {ch}}\)-almost surely a rank 1 projection.

For the converse implication, assume that \(M_\infty \) is \({{\mathbb {P}}}^{\mathrm {ch}}\)-almost surely a rank one projection but that (Pur) does not hold. Then there exists \(\pi \), a rank two orthogonal projector, such that for all \(n\in {{\mathbb {N}}}\),

$$\begin{aligned} \pi W_n^*W_n\pi \propto \pi , \end{aligned}$$

\(\mu ^{\otimes n}\)-almost everywhere. Since \(\mu ^{\otimes n}\)-almost everywhere \(M_n\propto W_n^*W_n\), we get that

$$\begin{aligned} \pi M_n \pi \propto \pi , \end{aligned}$$

\(\mu ^{\otimes n}\)-almost everywhere. Thus, \(\pi M_\infty \pi \propto \pi \) and, under our assumption that \({\text {rank}}M_\infty =1\;{{\mathbb {P}}}^{\mathrm {ch}}{\text {-}}\mathrm {a.s.}\) and \({\text {rank}}\pi =2\), this implies that \(\pi M_\infty \pi =0\), \({{\mathbb {P}}}^{\mathrm {ch}}\)-almost surely. On the other hand for all \(n\in {\mathbb {N}}\) we have \({{\mathbb {E}}}^{\mathrm {ch}}(M_n)=\mathrm{Id}_{{{\mathbb {C}}}^k}\), and the \(L^1\) convergence implies that \({{\mathbb {E}}}^{\mathrm {ch}}(M_\infty )=\mathrm{Id}_{{{\mathbb {C}}}^k}\). Then, \({{\mathbb {E}}}^{\mathrm {ch}}(\pi M_\infty \pi )=\pi \) which contradicts \(\pi M_\infty \pi =0\;{{\mathbb {P}}}^{\mathrm {ch}}{\text {-}}\mathrm {a.s.}\)\(\square \)

By the polar decomposition, the rank of \(W_n\) is equal to the rank of \(M_n\) and the proposition thus implies that \(W_n \rho _0 W_n^*/{{\text {tr}}}(W_n \rho _0 W_n^*)\) approaches the set of pure states for any \(\rho _0\in {\mathcal {D}}_k\) if and only if (Pur) holds. This is the result of Maassen and Kümmerer [17] mentioned in the introduction. Though \(M_n\) is not used in [17], the proof relies on similar ideas.

We are now in the position to show that the Markov chain \((\hat{x}_n)\) is asymptotically an \(\mathcal {O}\)-measurable process. This is expressed in the following lemma. Whenever (Pur) holds, we denote by \(\hat{z} \in {{\mathrm P}({{\mathbb {C}}}^k)}\) the \(\mathcal {O}\)-measurable random variable defined by

$$\begin{aligned} M_\infty = \pi _{\hat{z}}. \end{aligned}$$

Recall that \(d(\cdot ,\cdot )\), defined by Eq. (7), is our metric on \({{\mathrm P}({{\mathbb {C}}}^k)}\).

Lemma 2.3

Assume (Pur) holds. Then for any probability measure \(\nu \) on \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\),

$$\begin{aligned} \lim _{n\rightarrow \infty } d\left( \hat{x}_n,U_n \cdot \hat{z}\right) =0\quad \mathbb {P}_\nu {\text {-}}\mathrm {a.s.}\end{aligned}$$

Proof

We start the proof by showing that for any \(\nu \)

$$\begin{aligned} \lim _{n\rightarrow \infty } M_n^{\frac{1}{2}} \cdot \hat{x}=\hat{z}\quad {\mathbb {P}}_\nu {\text {-}}\mathrm {a.s.}\end{aligned}$$
(19)

Let \(\hat{x}\) be fixed and recall from Proposition 2.2 that (Pur) implies \(M_\infty =\pi _{\hat{z}}\). Since \(M_\infty x=\langle z,x\rangle z\), in order to show (19), it is enough to show that \(\hat{x}\) is \({{\mathbb {P}}}_\nu \)-almost surely not orthogonal to \(\hat{z}\). From Eq. (13) and the change of measure formula in Proposition 2.2,

$$\begin{aligned} \mathrm{d}{{\mathbb {P}}}_{\nu }\left( \hat{x}, \omega \right) = k\,{{\text {tr}}}\left( \pi _{\hat{x}} \pi _{\hat{z}(\omega )}\right) \, \mathrm{d}\big (\nu (\hat{x})\otimes {{\mathbb {P}}}^{\mathrm {ch}}(\omega )\big ). \end{aligned}$$

Hence the event \(\{{{\text {tr}}}(\pi _{\hat{x}}\pi _{\hat{z}})=|\langle z,x\rangle |^2=0\}\) has \({{\mathbb {P}}}_{\nu }\)-measure 0. This proves the required claim, and (19) follows from the almost sure convergence of \(M_n\) to \(\pi _{\hat{z}}\).

Now using the polar decomposition, Eq. (14), and the fact that proportionality of vectors amounts to equality of their equivalence classes in \({\mathrm P}({\mathbb {C}}^k)\), we have

$$\begin{aligned} \hat{x}_n = U_n M_n^{\frac{1}{2}}\cdot \hat{x}_0. \end{aligned}$$

The first part of the proof then yields

$$\begin{aligned} \lim _{n\rightarrow \infty } d\left( \hat{x}_n,U_n \cdot \hat{z}\right) =0,\quad \mathbb {P}_\nu {\text {-}}\mathrm {a.s.}\end{aligned}$$

\(\square \)

The uniqueness of the invariant measure which is the first part of Theorem 1.1 follows as a corollary.

Corollary 2.4

Assume (Pur) and (\(\phi \)-Erg). Then the Markov kernel \(\Pi \) admits a unique invariant probability measure.

Proof

For an invariant measure \(\nu \), the random variable \(\hat{x}_n\) is \(\nu \)-distributed for all \(n \in \mathbb {N}\). In particular, \( \mathbb {E}_\nu \big (f(\hat{x}_n)\big )\) is constant for any continuous function f. On the other hand Lemma 2.3 and Lebesgue’s dominated convergence theorem imply that

$$\begin{aligned} \lim _{n\rightarrow \infty } {\mathbb {E}}_{\nu }\big (f\left( \hat{x}_n\right) -f\left( U_n\cdot \hat{z}\right) \big ) =0. \end{aligned}$$

Hence

$$\begin{aligned} \lim _{n\rightarrow \infty } {\mathbb {E}}_{\nu }\big ( f\left( U_n\cdot \hat{z}\right) \big ) = \mathbb {E}_{\nu } \big (f\left( \hat{x}_0\right) \big ). \end{aligned}$$
(20)

Assume now that there exist two invariant measures \(\nu _a\) and \(\nu _b\). Since \(U_n\cdot \hat{z}\) is \(\mathcal {O}\)-measurable, Proposition 2.1 implies

$$\begin{aligned} {\mathbb {E}}_{\nu _a}\big (f\left( U_n\cdot \hat{z}\right) \big )={\mathbb {E}}_{\nu _b} \big (f\left( U_n\cdot \hat{z}\right) \big ). \end{aligned}$$

Then Eq. (20) applied with \(\nu =\nu _a\), resp. \(\nu = \nu _b\) gives

$$\begin{aligned} {\mathbb {E}}_{\nu _a}\big (f\left( \hat{x}_0\right) \big )={\mathbb {E}}_{\nu _b}\big (f\left( \hat{x}_0\right) \big ) \end{aligned}$$

which means that \(\nu _a=\nu _b\) and the uniqueness is proved. \(\square \)

Assuming only (Pur) we can actually completely characterize the set of invariant measures.

Proposition 2.5

Assuming (Pur) there exists a set \(\{F_j\}_{j=1}^d\) of mutually orthogonal subspaces of \({{\mathbb {C}}}^k\) such that for each \(j\in \{1,\ldots ,d\}\) there exists a unique \(\Pi \)-invariant probability measure \(\nu _j\) supported on \({\mathrm P}(F_j)\), and the set of \(\Pi \)-invariant probability measures is the convex hull of \(\{\nu _j\}_{j=1}^d\).

The subspaces \(F_j\) are the ranges of the extremal fixed points of \(\phi \) in \({\mathcal {D}}_k\). This is shown in the proof of Proposition 2.5, which we give in “Appendix B”.

Remark 2.6

Assuming (\(\phi \)-Erg) only, the chain might or might not have a unique invariant probability measure. Indeed, if \({\text {supp}}\mu \subset \mathrm {SU}(k)\), Assumption (Pur) is trivially not verified and, as proved in “Appendix C”, the uniqueness of the invariant measure depends on the smallest closed subgroup of \(\mathrm {SU}(k)\) containing \({\text {supp}}\mu \). To illustrate this point, in the same appendix, we study two examples with \(\mu \) supported on and giving equiprobability to two elements of \(\mathrm {SU}(2)\) such that (\(\phi \)-Erg) holds. In the first example \(\Pi \) has a unique invariant probability measure whereas in the second example \(\Pi \) has uncountably many mutually singular invariant probability measures.

3 Convergence

We now turn to the proof of the second part of Theorem 1.1, namely the geometric convergence in Wasserstein distance of the process \((\hat{x}_n)\) towards the invariant measure. We first recall a definition of this distance for compact metric spaces: for X a compact metric space equipped with its Borel \(\sigma \)-algebra, the Wasserstein distance of order 1 between two probability measures \(\sigma \) and \(\tau \) on X can be defined using the Kantorovich–Rubinstein duality theorem as

$$\begin{aligned} W_1(\sigma ,\tau )=\sup _{f\in \mathrm{Lip}_1(X)}\left| \int _{X} f\,\mathrm{d}\sigma - \int _X f\, \mathrm{d}\tau \right| , \end{aligned}$$

where \(\mathrm{Lip}_1(X)=\{f:X\rightarrow {\mathbb {R}} \ \mathrm {s.t.}\ \vert f(x)-f(y)\vert \le d(x,y)\}\) is the set of Lipschitz continuous functions with constant one, and \(d\) is the metric on X.

The proof of Eq. (6) consists of three parts. In the first part we show a geometric convergence in total variation of \({{\mathbb {P}}}^\rho \) to \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\) under the shift \(\theta (v_1,v_2,\ldots )=(v_2,v_3,\ldots )\). In the second one we show a geometric convergence of the chain \((\hat{x}_n)\) towards an \(\mathcal {O}\)-measurable process \((\hat{y}_n)\). Finally, we combine these results to prove Eq. (6).

3.1 Convergence for \(\mathcal {O}\)-measurable random variables

Let us first discuss the origin of the integer m in Eq. (6). Let E be a subspace of \({{\mathbb {C}}}^k\) s.t. \(vE\subset E\) for any \(v\in {\text {supp}}\mu \). Let \((E_1,\ldots ,E_\ell )\) be an orthogonal partition of E, i.e.\(E=E_1\oplus \cdots \oplus E_\ell \). We say that \((E_1,\ldots ,E_\ell )\) is a \(\ell \)-cycle of \(\phi \) if \(vE_j\subset E_{j+1}\) for \(\mu \)-a.e. v (with the convention \(E_{\ell +1}=E_1\)).Footnote 2 The set of \(\ell \in {{\mathbb {N}}}\) for which there exists an \(\ell \)-cycle is non-empty (as it contains 1) and bounded (as necessarily \(\ell \le k\)).

Definition 3.1

The largest \(\ell \in {{\mathbb {N}}}\) such that there exists a \(\ell \)-cycle of \(\phi \) is called the period of \(\phi \). We denote this period by m.

Remark 3.2

  • The above definition for the period of \(\phi \) is similar to that of the period of a \(\varphi \)-irreducible Markov chain. It is obvious that if \((E_1,\ldots ,E_\ell )\) is an \(\ell \)-cycle of \(\phi \) then it is also an \(\ell \)-cycle of \(\Pi \). However, the Markov chain defined by \(\Pi \) is not \(\varphi \)-irreducible in general. Hence the results of [19] on the period of \(\varphi \)-irreducible Markov chains do not apply and the characterization of the period of \(\Pi \) remains an open problem.

  • The above definition shows that the union \(\bigcup _{j=1}^m E_j\) is invariant by \(\mu \)-a.e. v. Hence, the strong irreducibility assumption discussed at the end of the introduction implies that \(m=1\).

The following result is a reformulation of the Perron–Frobenius theorem of Evans and Høegh-Krohn (see [7]). The original formulation in [7] makes the additional assumption that \(E={{\mathbb {C}}}^k\) in (\(\phi \)-Erg). For the present extension see e.g. [22]. In the following statement, and in the rest of the article, for X an operator on \({{\mathbb {C}}}^k\) we denote \(\Vert X\Vert _1={{\text {tr}}}|X|\) (all statements are identical with a different norm, but this choice will spare us a few irrelevant constants).

Theorem 3.3

Assume that (\(\phi \)-Erg) holds. Then there exists a unique \(\phi \)-invariant element \(\rho _{\mathrm {inv}}\) of \({\mathcal {D}}_k\) with range equal to the minimal invariant subspace E. In addition, there exist two positive constants c and \(\lambda <1\) such that, with m defined in Definition 3.1, for any \(\rho \in {\mathcal {D}}_k\) and for any \(n\in {{\mathbb {N}}}\),

$$\begin{aligned} \left\| \frac{1}{m}\sum _{r=0}^{m-1} \phi ^{mn+r}(\rho )-\rho _{\mathrm {inv}}\right\| _1\le c\lambda ^n. \end{aligned}$$
(21)

Proof

Theorem 4.2 in [7] implies that \(\rho _{\mathrm {inv}}\) is the unique \(\phi \)-invariant element of \({\mathcal {D}}_k\), that the eigenvalues of \(\phi \) of modulus one are exactly the m-th roots of unity, and that they are all simple. The statement follows, with \(\lambda \) any quantity strictly larger than the modulus of the largest non-peripheral eigenvalue. \(\square \)

Recall that \(\theta \) is the left shift operator on  \(\Omega \), i.e.

$$\begin{aligned} \theta (v_1,v_2,\ldots )=(v_2,v_3,\ldots ). \end{aligned}$$

The main result of this section is the following proposition. As announced it concerns the speed of convergence in total variation (expressed in terms of expectation values).

Proposition 3.4

Assume (\(\phi \)-Erg) holds. Then there exist two positive constants C and \(\lambda <1\) such that for any \(\mathcal {O}\)-measurable function f with essential bound \(\Vert f\Vert _\infty \), any \(\rho \in {\mathcal {D}}_k\) and all \(n\in {{\mathbb {N}}}\),

$$\begin{aligned} \left| {{\mathbb {E}}}^\rho \left( \frac{1}{m}\sum _{r=0}^{m-1}f\circ \theta ^{mn+r}\right) -{{\mathbb {E}}}^{\rho _{\mathrm {inv}}}(f)\right| \le C\Vert f\Vert _\infty \lambda ^n. \end{aligned}$$
(22)

Proof

We claim that for any bounded \(\mathcal {O}\)-measurable function f,

$$\begin{aligned} {{\mathbb {E}}}^\rho (f \circ \theta ) = {{\mathbb {E}}}^{\phi (\rho )}(f). \end{aligned}$$
(23)

It suffices to prove this relation for all \(\mathcal {O}_l\)-measurable functions for any integer l. Thus, let l be an integer and f an \(\mathcal {O}_l\)-measurable function. Then,

$$\begin{aligned} {{\mathbb {E}}}^\rho (f\circ \theta )=&\int _{\mathrm {M}_k({{\mathbb {C}}})^{l+1}} f(v_{2},\ldots ,v_{l+1}){{\text {tr}}}\left( v_{l+1}\ldots v_{1}\rho v_1^*\ldots v_{l+1}^*\right) \, \mathrm{d}\mu ^{\otimes (l+1)}(v_1,\ldots ,v_{l+1})\\ =&\int _{\mathrm {M}_k({{\mathbb {C}}})^{l}} f(v_{1},\ldots ,v_{l}){{\text {tr}}}\left( v_{l}\ldots v_{1}\phi (\rho )v_{1}^*\ldots v_{l}^*\right) \, \mathrm{d}\mu ^{\otimes l}(v_{1},\ldots ,v_{l}), \end{aligned}$$

which is equal to \({{\mathbb {E}}}^{\phi (\rho )}(f)\), as claimed.

Applying Eq. (23) multiple times and using the change of measure of Proposition 2.2 we obtain

$$\begin{aligned} {{\mathbb {E}}}^\rho \left( \frac{1}{m}\sum _{r=0}^{m-1} f\circ \theta ^{mn+r}\right)&= \frac{1}{m} \sum _{r=0}^{m-1} {{\mathbb {E}}}^{\phi ^{nm+r} (\rho )}(f) \\&= k \frac{1}{m} \sum _{r=0}^{m-1} {{\mathbb {E}}}^{\mathrm {ch}}\Big (f\, {{\text {tr}}}\big ( M_\infty \phi ^{nm+r} (\rho )\big )\Big ), \end{aligned}$$

for any \(\mathcal {O}\)-measurable function f. Using \(|{{\text {tr}}}(M_\infty A)|\le \Vert A\Vert _1\) for \(A=A^*\) (remark that \(M_\infty \in {\mathcal {D}}_k\) by construction) we then obtain

$$\begin{aligned} \left| {{\mathbb {E}}}^\rho \left( \frac{1}{m}\sum _{r=0}^{m-1} f\circ \theta ^{mn+r}\right) -{{\mathbb {E}}}^{\rho _{\mathrm {inv}}}(f)\right| \le \Vert f\Vert _\infty \, k\left\| \frac{1}{m}\sum _{r=0}^{m-1}\phi ^{mn+r}(\rho )-\rho _{\mathrm {inv}}\right\| _1 \end{aligned}$$

and Theorem 3.3 yields the proposition with \(C=ck\). \(\square \)

3.2 Convergence to an \({\mathcal {O}}\)-measurable process

Let us introduce two relevant processes: for all \(n\in {{\mathbb {N}}}\), let

$$\begin{aligned} \hat{z}_{n}(\omega )=\mathop {\mathrm {argmax}}_{\hat{x}\in {{\mathrm P}({{\mathbb {C}}}^k)}}\,\Vert W_n x\Vert ^2 \end{aligned}$$
(24)

and

$$\begin{aligned} \hat{y}_n=W_n\cdot \hat{z}_n. \end{aligned}$$
(25)

Both random variables \(\hat{y}_n\) and \(\hat{z}_n\) are \(\mathcal {O}_n\)-measurable.

The random variable \(\hat{z}_n\) corresponds to the maximum likelihood estimator of \(\hat{x}_0\). Note that the \({\text {argmax}}\) may not be uniquely defined. We can, however, define it in an \(\mathcal {O}_n\)-measurable way. The following results will not be affected by such a consideration, and we will not discuss such questions in the sequel. It follows from the definition of \(\hat{z}_n\) that

$$\begin{aligned} \left( W_n^*W_n\right) ^{\frac{1}{2}}\, z_n=\Vert W_n\Vert z_n, \quad {{\mathbb {P}}}^{\mathrm {ch}}{\text {-}}\mathrm {a.s.}\end{aligned}$$
(26)

We recall that \(z_n\) is a vector representative of the class \(\hat{z}_n\).

Concerning \(\hat{y}_n\), it can be seen as an estimator of \(\hat{x}_n\) given the maximum likelihood estimation of \(\hat{x}_0\). The following proposition establishes consistency of this estimator, we show the geometric contraction in the mean of \((\hat{x}_n)\) and \((\hat{y}_n)\). In fact we prove a slightly more general statement that the estimator based on the first n outcomes can be replaced by an estimator based on outcomes between l and \(l+n\). We will prove the almost-sure contraction in Proposition 4.4.

Proposition 3.5

Assume (Pur) holds. Then there exist two positive constants C and \(\lambda <1\) such that for any probability measure \(\nu \) over \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\),

$$\begin{aligned} {{\mathbb {E}}}_{\nu }\left( d\left( \hat{x}_{n+l}, \hat{y}_n\circ \theta ^l\right) \right) \le C\lambda ^n, \end{aligned}$$
(27)

holds for all non-negative integers l and n.

In order to prove Proposition 3.5 we study the largest two singular values of \(W_n\). As is customary in the study of products of random matrices, we make use of exterior products. We recall briefly the relevant definitions: for \(p\in {{\mathbb {N}}}\) and p vectors \(x_1, \ldots , x_p\) in \({{\mathbb {C}}}^k\) we denote by \(x_1\wedge \cdots \wedge x_p\) the alternating bilinear form \((y_1,\ldots , y_p)\mapsto \det \big (\langle x_i, y_j\rangle \big )_{i,j=1}^p\). Then, the set of all \(x_1\wedge \cdots \wedge x_p\) is a generating family for the set \(\wedge ^p{{\mathbb {C}}}^k\) of alternating bilinear forms on \({{\mathbb {C}}}^k\), and we can define a hermitian inner product by

$$\begin{aligned} \left\langle x_1\wedge \cdots \wedge x_p, y_1\wedge \cdots \wedge y_p\right\rangle = \det \left( \langle x_i, y_j\rangle \right) _{i,j=1}^p, \end{aligned}$$

and denote by \(\Vert x_1\wedge \cdots \wedge x_p\Vert \) the associated norm. It is immediate to verify that our metric \(d\), defined by (7), satisfies

$$\begin{aligned} d(\hat{x},\hat{y})=\frac{\Vert x\wedge y\Vert }{\Vert x\Vert \Vert y\Vert }. \end{aligned}$$
(28)

For an operator A on \({\mathbb {C}}^k\), we write \(\wedge ^p A\) for the operator on \(\wedge ^p{{\mathbb {C}}}^k\) defined by

$$\begin{aligned} \wedge ^p A \,(x_1\wedge \cdots \wedge x_p)=Ax_1\wedge \cdots \wedge Ax_p. \end{aligned}$$
(29)

Obviously \(\wedge ^p (AB)=\wedge ^p A\wedge ^p B\), so that \(\Vert \wedge ^p (AB)\Vert \le \Vert \wedge ^p A\Vert \Vert \wedge ^p B\Vert \). From e.g. Chapter XVI of [18] or Lemma III.5.3 of [4], we have in addition for \(1\le p\le k\)

$$\begin{aligned} \left\| \wedge ^p A\right\| =a_1(A)\ldots a_p(A), \end{aligned}$$
(30)

where \(a_1(A)\ge \cdots \ge a_k(A)\) are the singular values of A, i.e. the square roots of eigenvalues of \(A^* A\), labelled in decreasing order.

Our strategy to prove Proposition 3.5 is to bound the left hand side of Eq. (27) by a submultiplicative function \(f : \mathbb {N} \rightarrow \mathbb {R}_+\) and then use Fekete’s lemma. We will show that the function

$$\begin{aligned} f(n)=\int _{\mathrm {M}_k({{\mathbb {C}}})^n} \left\| \wedge ^2 v_n\ldots v_1\right\| \,\mathrm{d}\mu ^{\otimes n}(v_1,\ldots ,v_n) \end{aligned}$$
(31)

has the desired properties. The following lemma establishes an exponential decay of this function.

Lemma 3.6

Assume (Pur). Then there exist two positive constants C and \(\lambda <1\) such that

$$\begin{aligned} f(n)\le C\lambda ^n. \end{aligned}$$

Proof

First, we prove \(\lim _{n\rightarrow \infty }f(n)=0\). To prove it, we express the function f(n) using the process \(W_n\) as

$$\begin{aligned} f(n)={{\mathbb {E}}}^{\mathrm {ch}}\left( k\frac{\Vert \wedge ^2 W_n\Vert }{{{\text {tr}}}\left( W_n^*W_n\right) }\right) . \end{aligned}$$
(32)

By definition the eigenvalues of \(M_n^{\frac{1}{2}}\) are the singular values of \(W_n/\sqrt{{{\text {tr}}}(W_n^*W_n)}\). Since by Proposition 2.2, \(M_n\) converges \({{\mathbb {P}}}^{\mathrm {ch}}{\text {-}}\mathrm {a.s.}\) to a rank one projection,

$$\begin{aligned} \lim _{n\rightarrow \infty }a_1\left( \frac{W_n}{\sqrt{{{\text {tr}}}\left( W_n^*W_n\right) }}\right) a_2\left( \frac{W_n}{\sqrt{{{\text {tr}}}\left( W_n^*W_n\right) }}\right) =0\quad {{\mathbb {P}}}^{\mathrm {ch}}{\text {-}}\mathrm {a.s.}\end{aligned}$$

Using Eq. (30) we then conclude that

$$\begin{aligned} \lim _{n\mapsto \infty }\frac{\left\| \wedge ^2W_n\right\| }{{{\text {tr}}}\left( W_n^*W_n\right) }=0 \quad {{\mathbb {P}}}^{\mathrm {ch}}{\text {-}}\mathrm {a.s.}\end{aligned}$$
(33)

Since \(\Vert \wedge ^2 W_n\Vert \le \Vert W_n\Vert ^2\le {{\text {tr}}}(W_n^*W_n)\), the expression (32) and Lebesgue’s dominated convergence theorem imply \(\lim _{n\rightarrow \infty }f(n)=0\).

Second, remark that the function f is submultiplicative. Indeed, for \(p,q\in {\mathbb {N}}\) we have

$$\begin{aligned} \left\| \wedge ^2\left( v_{p+q}\ldots v_{1}\right) \right\| \le \left\| \wedge ^2\left( v_{p+q}\ldots v_{p+1}\right) \right\| \left\| \wedge ^2\left( v_{p}\ldots v_{1}\right) \right\| \end{aligned}$$

and the submultiplicativity follows.

By Fekete’s subadditive Lemma, \(\frac{\log f(n)}{n}\) converges to \(\inf _{n\in {{\mathbb {N}}}} \frac{\log f(n)}{n}\), which is (strictly) negative (and possibly equal to \(-\infty \)) since \(f(n)\rightarrow 0\). Then there exists \(0<\lambda <1\) such that \(f(n)\le \lambda ^n\) for large enough n, and the conclusion follows. \(\square \)

We are now in a position to prove Proposition 3.5.

Proof of Proposition 3.5

The Markov property of \((\hat{x}_n)\) implies that

$$\begin{aligned} {{\mathbb {E}}}_{\nu }\left( d\left( \hat{x}_{n+l}, \hat{y}_n\circ \theta ^l\right) \right) = {{\mathbb {E}}}_{\nu \Pi ^l}\left( d\left( \hat{x}_{n}, \hat{y}_n\right) \right) . \end{aligned}$$

Provided inequality (27) is established for \(l=0\), the right hand side of the previous equality can be bounded by \(C \lambda ^n\). It is hence sufficient to prove the inequality for \(l=0\).

The case \(l=0\) follows from Lemma 3.6 if for any \(n \in {{\mathbb {N}}}\) and any probability measure \(\nu \),

$$\begin{aligned} {\mathbb {E}}_{\nu }\left( d\left( \hat{x}_n, \hat{y}_n\right) \right) \le f(n). \end{aligned}$$
(34)

To obtain this inequality, note that from the definitions of \(\hat{x}_n\), \(\hat{y}_n\) and \(\hat{z}_n\), we have that

$$\begin{aligned} d\left( \hat{x}_n, \hat{y}_n\right)&=\frac{\left\| \wedge ^2 W_n\,(x_0\wedge z_n)\right\| }{\Vert W_n x_0\Vert \Vert W_n z_n\Vert }\\&\le \frac{\left\| \wedge ^2 W_n\right\| }{\left\| W_n x_0\right\| ^2}\frac{\left\| W_n x_0\right\| }{\Vert W_n\Vert }\\&\le \frac{\left\| \wedge ^2 W_n\right\| }{\left\| W_n x_0\right\| ^2}, \end{aligned}$$

holds \({{\mathbb {P}}}_\nu \)-almost surely. To get the first inequality we used \(\Vert W_n z_n\Vert = \Vert W_n\Vert \), and \(\Vert x_0\wedge z_n \Vert =d(\hat{x}_0,\hat{z}_n)\le 1\). In addition, by definition of \({{\mathbb {P}}}_\nu \),

$$\begin{aligned} {{\mathbb {E}}}_{\nu }\left( \frac{\left\| \wedge ^2 W_n\right\| }{\Vert W_n x_0\Vert ^2}\right)&= \int _{{\mathrm P}({{\mathbb {C}}}^k)\times \mathrm {M}_k({{\mathbb {C}}})^n} \frac{\left\| \wedge ^2W_n\right\| }{\Vert W_n x_0\Vert ^2}\,\Vert W_n x_0\Vert ^2\,\mathrm{d}\mu ^{\otimes n}\, \mathrm{d}\nu (\hat{x}_0) \\&= \int _{\mathrm {M}_k({{\mathbb {C}}})^n} {\left\| \wedge ^2W_n\right\| }\,\mathrm{d}\mu ^{\otimes n}(v_1,\ldots , v_n), \end{aligned}$$

which is f(n). Therefore (34) holds and Lemma 3.6 yields the proof. \(\square \)

3.3 Convergence in Wasserstein metric

The remainder of Sect. 3 is devoted to the proof of the second part of Theorem 1.1.

Proof of Eq. (6)

We are supposed to prove that

$$\begin{aligned} W_1\left( \frac{1}{m}\sum _{r=0}^{m-1} \nu \Pi ^{mn+r}, \nu _{\mathrm {inv}}\right) = \sup _{f\in \mathrm{Lip}_1({\mathrm P}({\mathbb {C}}^k))} \left| \mathbb {E}_\nu \left( \frac{1}{m}\sum _{r=0}^{m-1} f\left( \hat{x}_{mn+r}\right) \right) - \mathbb {E}_{\nu _{_{\mathrm {inv}}}}\left( f\left( \hat{x}_0\right) \right) \right| \end{aligned}$$

is exponentially decaying in n. The expression in the supremum on the right hand side is unchanged by adding an arbitrary constant to f. This freedom allows us to restrict the supremum to functions bounded by 1, i.e. \(\Vert f\Vert _\infty \le 1\).

Let \(f\in \mathrm{Lip}_1({\mathrm P}({\mathbb {C}}^k))\) be such a function. Our strategy is to approximate \(\hat{x}_{mn+r}\) by \(\hat{y}_{mp} \circ \theta ^{mq+r}\) with \(p=\lfloor \frac{n}{2} \rfloor \) and \(q=\lceil \frac{n}{2}\rceil \) so that in particular \(p+q =n\). Using telescopic estimates and the invariance of \(\nu _{\mathrm {inv}}\) we then have

$$\begin{aligned}&\left| {{\mathbb {E}}}_\nu \left( \frac{1}{m} \sum _{r=0}^{m-1} f\left( \hat{x}_{mn +r}\right) \right) - {{\mathbb {E}}}_{\nu _{\mathrm {inv}}}\left( f\left( \hat{x}_0\right) \right) \right| \\&\quad \le \frac{1}{m} \sum _{r=0}^{m-1} \left| {{\mathbb {E}}}_\nu \left( f\left( \hat{x}_{m(p+q)+r}\right) \right) - {{\mathbb {E}}}_\nu \left( f\left( \hat{y}_{mp} \circ \theta ^{mq+r}\right) \right) \right| \\&\qquad + \frac{1}{m} \sum _{r=0}^{m-1} \left| {{\mathbb {E}}}_{\nu _{\mathrm {inv}}} \left( f\left( \hat{y}_{mp} \circ \theta ^{mq+r}\right) \right) -{{\mathbb {E}}}_{\nu _{\mathrm {inv}}} \left( f\left( \hat{x}_{m(p+q)+r}\right) \right) \right| \\&\qquad + \left| \frac{1}{m} \sum _{r=0}^{m-1}{{\mathbb {E}}}_\nu \left( f\left( \hat{y}_{mp} \circ \theta ^{mq+r}\right) \right) - {{\mathbb {E}}}_{\nu _{\mathrm {inv}}} \left( f\left( \hat{y}_{mp}\right) \right) \right| . \end{aligned}$$

We bound the terms on the right hand side using Proposition 3.5 for the first two terms and Proposition 3.4 for the last term. To this end let C and \(\lambda < 1\) be such that the bounds in both these propositions hold true. Since f is 1-Lipschitz continuous we have

$$\begin{aligned} \left| f\left( \hat{x}_{m(p+q)+r}\right) - f\left( \hat{y}_{mp} \circ \theta ^{mq+r}\right) \right| \le d\left( \hat{x}_{m(p+q)+r},\hat{y}_{mp} \circ \theta ^{mq+r}\right) . \end{aligned}$$

Proposition 3.5 then implies that

$$\begin{aligned} \left| {{\mathbb {E}}}_\nu \left( f\left( \hat{x}_{m(p+q)+r}\right) \right) - {{\mathbb {E}}}_\nu \left( f\left( \hat{y}_{mp} \circ \theta ^{mq+r}\right) \right) \right| \le C\lambda ^{mp}, \end{aligned}$$

and similarly with \(\nu \) replaced by \(\nu _{\mathrm {inv}}\). Regarding the last term in the above telescopic estimate we have by Proposition 3.4,

$$\begin{aligned} \left| \frac{1}{m} \sum _{r=0}^{m-1}{{\mathbb {E}}}_\nu \left( f\left( \hat{y}_{mp} \circ \theta ^{mq+r}\right) \right) - {{\mathbb {E}}}_{\nu _{\mathrm {inv}}} \left( f\left( \hat{y}_{mp}\right) \right) \right| \le C \lambda ^{q}, \end{aligned}$$

where we used the constraint \(\Vert f\Vert _\infty \le 1\) discussed at the beginning of the proof.

Putting these estimates together we get

$$\begin{aligned} \left| {{\mathbb {E}}}_\nu \left( \frac{1}{m} \sum _{r=0}^{m-1} f\left( \hat{x}_{mn+r}\right) \right) - {{\mathbb {E}}}_{\nu _{\mathrm {inv}}}\left( f\left( \hat{x}_0\right) \right) \right| \le 3C\lambda ^{\left\lfloor \frac{n}{2} \right\rfloor } \end{aligned}$$

and this concludes the proof of Eq. (6) and therefore of Theorem 1.1. \(\square \)

4 Lyapunov exponents

In this section, we study the almost sure stability exponents. The main results of this section will assume (\(\phi \)-Erg) with the additional assumption that \(E={{\mathbb {C}}}^k\).

Remark 4.1

Assuming \(E={{\mathbb {C}}}^k\) amounts to saying that \(\phi \) has no transient part. Without this assumption, we would have to take into account the almost sure Lyapunov exponent corresponding to the escape from the transient part. See [3] for a precise account of these ideas.

The relevance of this assumption will stem from the following straightforward inequalities: if \(\rho \) is any element of \({\mathcal {D}}_k\) then one has

$$\begin{aligned} \frac{\mathrm{d}{{\mathbb {P}}}^\rho _{\vert {\mathcal {O}}_n}}{\mathrm{d}\mu ^{\otimes n}} \le \Vert W_n\Vert ^2, \end{aligned}$$

and if \(\rho \) is faithful (i.e. definite positive), then

$$\begin{aligned} \frac{\mathrm{d}{{\mathbb {P}}}^\rho _{\vert {\mathcal {O}}_n}}{\mathrm{d}\mu ^{\otimes n}} \ge \Vert \rho ^{-1}\Vert ^{-1}\Vert W_n\Vert ^2. \end{aligned}$$

In particular, under the assumption that (\(\phi \)-Erg) holds with \(E={{\mathbb {C}}}^k\), thus \(\rho _{\mathrm {inv}}>0\) and for any \(\rho \in {\mathcal {D}}_k\), we have

$$\begin{aligned} {{\mathbb {P}}}^\rho \ll {{\mathbb {P}}}^{\rho _{\mathrm {inv}}}. \end{aligned}$$
(35)

Let us start by proving the following lemma, which concerns the ergodicity of \(\theta \) with respect to the measure \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\).

Lemma 4.2

Assume that (\(\phi \)-Erg) holds. Then the shift \(\theta \) on \((\Omega ,\mathcal {O})\) is ergodic with respect to the probability measure \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\).

Proof

Let A, \(A'\) in \(\mathcal {O}_l\). From the definition of \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\), for j large enough, \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\big (A \cap \theta ^{-j}(A')\big )\) equals

$$\begin{aligned} \int _{A\times A'} {{\text {tr}}}\Big (v_{l}'\ldots v_{1}' \phi ^{j-l}\big (v_{l}\ldots v_{1} \rho _{\mathrm {inv}}v_{1}^*\ldots v_{l}^*\big ){v_{1}'}^*\ldots {v_{l}'}^* \Big ) \,\mathrm{d}\mu ^{\otimes l}(v_1,\ldots ,v_l)\, \mathrm{d}\mu ^{\otimes l}\left( v_1',\ldots ,v_l'\right) , \end{aligned}$$

and the Perron–Frobenius Theorem 3.3 implies

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n} \sum _{j=0}^{n-1} \phi ^j\big (v_{l}\ldots v_{1} \rho _{\mathrm {inv}}v_{1}^*\ldots v_{l}^*\big ) = {{\text {tr}}}\big (v_{l}\ldots v_{1} \rho _{\mathrm {inv}}v_{1}^*\ldots v_{l}^*\big )\, \rho _{\mathrm {inv}}\end{aligned}$$

for \(\mu ^{\otimes l}\)-almost all \((v_1,\ldots ,v_l)\) so that

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n} \sum _{j=0}^{n-1} {{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\big (A \cap \theta ^{-j}\left( A'\right) \big ) ={{\mathbb {P}}}^{\rho _{\mathrm {inv}}}(A)\,{{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\left( A'\right) , \end{aligned}$$

which proves the ergodicity.\(\square \)

Now we can state our result concerning Lyapunov exponents.

Proposition 4.3

Assume that (\(\phi \)-Erg) holds with \(E={{\mathbb {C}}}^k\), and that (Pur) holds. Assume \(\int \Vert v\Vert ^2\log \Vert v\Vert \mathrm{d}\mu (v)< \infty \). Then there exists numbers

$$\begin{aligned} \infty >\gamma _1\ge \gamma _2\ge \cdots \ge \gamma _k\ge -\infty \end{aligned}$$

such that for any probability measure \(\nu \) over \(({\mathrm P}({\mathbb {C}}^k),{\mathcal {B}})\):

  1. (1)

    for any \(p\in \{1,\ldots ,k\}\),

    $$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\log \left\| \wedge ^p W_n\right\| =\sum _{j=1}^p \gamma _j,\quad {{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}, \end{aligned}$$
    (36)
  2. (2)

    \(\gamma _2-\gamma _1<0\) with \(\gamma _2-\gamma _1\) understood as the limit of \(\frac{1}{n}\log \frac{\Vert \wedge ^2 W_n\Vert }{\Vert W_n\Vert ^2}\) whenever \(\gamma _1=-\infty \),

  3. (3)

    one has the convergence

    $$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n}(\log \Vert W_n x_0\Vert -\log \Vert W_n\Vert )=0\quad {{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}\end{aligned}$$
    (37)

Proof

Let us start by proving (1). Note that \(n\mapsto \log \Vert \wedge ^p W_n\Vert \) is subadditive by definition. The existence of the \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}{\text {-}}\mathrm {a.s.}\) limits \(\lim _{n\rightarrow \infty }\frac{1}{n}\log \Vert \wedge ^p W_n\Vert \) then follows from \({{\mathbb {E}}}^{\rho _{\mathrm {inv}}}(\log \Vert V\Vert ^2)\le \int \Vert v\Vert ^2\log \Vert v\Vert ^2d\mu (v)<\infty \), \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\circ \theta ^{-1}={{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\) and a direct application of Kingman’s subadditive ergodic theorem (see e.g. [21]). The fact that these limits are \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}{\text {-}}\mathrm {a.s.}\) constant comes from the \(\theta \)-ergodicity of \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\) proved in Lemma 4.2. Since by Eq. (35) any \({{\mathbb {P}}}^\rho \) is absolutely continuous with respect to \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}\), Proposition 2.1 and the \(\mathcal {O}\)-measurability of \(\Vert \wedge ^pW_n\Vert \) imply the convergence holds \({{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}\) The numbers \(\gamma _j\) are uniquely defined, by defining \(\sum _{j=1}^p \gamma _j\) as the \({{\mathbb {P}}}^{\rho _{\mathrm {inv}}}{\text {-}}\mathrm {a.s.}\) limit \(\lim _{n\rightarrow \infty }\frac{1}{n}\log \Vert \wedge ^p W_n\Vert \) and imposing the rule that \(\gamma _{j+1}=-\infty \) if \(\gamma _{j}=-\infty \). This convention and (30) impose that \(\gamma _{j+1}\le \gamma _j\) for \(j=1,\ldots ,k-1\).

Concerning (2), recall the quantity f(n) defined in Eq. (31). Then Eq. (32) and the inequality \({{\text {tr}}}\,W_n^*W_n \le k \Vert W_n\Vert ^2\) give

$$\begin{aligned} f(n)\ge {{\mathbb {E}}}^{\mathrm {ch}}\left( \frac{\left\| \wedge ^2 W_n\right\| }{\Vert W_n\Vert ^2} \right) . \end{aligned}$$

Jensen’s inequality implies

$$\begin{aligned} \frac{1}{n} \log f(n)\ge {{\mathbb {E}}}^{\mathrm {ch}}\left( \frac{1}{n} \log \frac{\left\| \wedge ^2 W_n\right\| }{\Vert W_n\Vert ^2} \right) \end{aligned}$$

so that by Lemma 3.6 and Fatou’s lemma, \(\log \lambda \ge \gamma _2-\gamma _1\) with \(\lambda \in (0,1)\).

Finally for (3), from Proposition 2.2, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{\Vert W_n x_0\Vert }{\Vert W_n\Vert }=\lim _{n\rightarrow \infty }\frac{\left\| M_n^{\frac{1}{2}}x_0\right\| }{\left\| M_{n}^{\frac{1}{2}}\right\| }=|\langle x_0, z\rangle |\quad {{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}\end{aligned}$$

Since \({{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}\,|\langle x_0, z\rangle |>0\), the proposition holds. \(\square \)

From this proposition we deduce the following almost sure convergence rate for the distance between the Markov chain \((\hat{x}_n)\) and the \((\mathcal {O}_n)\)-adapted process \((\hat{y}_n)\).

Proposition 4.4

Assume (Pur) holds and (\(\phi \)-Erg) holds with \(E={{\mathbb {C}}}^k\). Then, for any probability measure \(\nu \) on \(({{\mathrm P}({{\mathbb {C}}}^k)},{\mathcal {B}})\),

$$\begin{aligned} \limsup _{t\rightarrow \infty }\frac{1}{n}\log \big (d\left( \hat{x}_n, \hat{y}_n\right) \big )\le -(\gamma _1-\gamma _2)<0,\quad {{\mathbb {P}}}_\nu {\text {-}}\mathrm {a.s.}\end{aligned}$$

Proof

Identity (28) and the definition of \(\hat{z}_n\) imply

$$\begin{aligned} {d(\hat{x}_n,\hat{y}_n)}=\frac{\left\| \wedge ^2W_n\, x_0\wedge z_n\right\| }{\Vert W_nx_0\Vert \Vert W_n z_n\Vert } \le \frac{\left\| \wedge ^2W_n\right\| }{\Vert W_nx_0\Vert \Vert W_n\Vert }. \end{aligned}$$

Proposition 4.3 then yields the result. \(\square \)