In the simple quantum hypothesis testing problem, upper bounds on the error probabilities are shown based on a key operator inequality between a density operator and its pinching. Concerning the error exponents, the upper bounds lead to a non-commutative analogue of the Hoeffding bound, which is identical with the classical counterpart if the hypotheses, composed of two density operators, are mutually commutative. The upper bounds also provide a simple proof of the direct part of the quantum Stein’s lemma.

1 Introduction

Quantum hypothesis testing is a fundamental problem in quantum information theory, because it is one of the most simple problems where the difficulty derived from non-commutativity of operators appears. It is also closely related to other topics in quantum information theory, as in classical information theory. Actually, its relation with quantum channel coding is discussed in [7, 15].

Let us outline briefly significant results in classical hypothesis testing for probability distributions p n(⋅) versus q n(⋅), where p n(⋅) and q n(⋅) are i.i.d. extensions of some probability distributions p(⋅) and q(⋅) on a finite set \(\mathcal {X}\). In the classical case, the asymptotic behaviors of the first kind error probability α n and the second kind error probability β n for the optimal test were studied thoroughly as follows.

First, when α n satisfies the constant constraint α n ≤ ε (ε > 0), the error exponent of βn for the optimal test, say \(\beta _n^*(\varepsilon )\), is written asymptotically as

$$\displaystyle \begin{aligned} \limsup_{n\to\infty}\frac 1n\log \beta_n^* = -D(p||q){} \end{aligned} $$
(1)

for any ε, where D(p||q) is the relative entropy. The equality (1) is called Stein’s lemma (see e.g. [4, p.115]), and the quantum analogue of (1) was established recently [8, 14].

Next, when α n satisfies the exponential constraint α n ≤ e nr (r > 0), the error exponent of β n for the optimal test is asymptotically determined by

$$\displaystyle \begin{aligned} \limsup_{n\to\infty}\frac 1n\log\beta_n^\dag(r) &= -\min_{p': D(p'||p)\leq r}D(p'||q){} \end{aligned} $$
(2)
$$\displaystyle \begin{aligned} &= -\max_{0<s\leq 1}\frac{\Psi(s)-(1-s)r}{s}{} \end{aligned} $$
(3)

where the function Ψ(s) is defined as

(4)

Historically speaking, (2) and the test achieving it were shown in [9], followed by another expression (3) (see [3]), which we call the Hoeffding bound here. In quantum hypothesis testing, the error exponent of 1 − β n was studied in [14] to obtain a similar result to (3), which led to the strong converse property in quantum hypothesis testing. Concerning quantum fixed-length pure state source coding, the error exponent of erroneously decoded probability was determined in [5], where the optimality of the error exponent similar to (3) was discussed.

In this lecture (see [13]), a quantum analogue of the Hoeffding bound (3), (4) is introduced to derive a bound on the error exponent in quantum hypothesis testing. As a by-product of the process to derive the exponent, a simple proof of the quantum Stein’s lemma is also given.

2 Definition and Main Results

Let \(\mathcal {H}\) be a Hilbert space which represents a physical system in interest. We assume \(\mathrm {dim}\mathcal {H} < \infty \) for mathematical simplicity. Let us denote the set of linear operators on \(\mathcal {H}\) as \(\mathcal {L}(\mathcal {H})\) and define the set of density operators on \(\mathcal {H}\) by

(5)

We study the hypothesis testing problem for the null hypothesis

versus the alternative hypothesis

where ρ n and σ n are the nth tensor powers of arbitrarily given density operators ρ and σ in \(\mathcal {S} (\mathcal {H})\).

The problem is to decide which hypothesis is true based on the data drawn from a quantum measurement, which is described by a positive operator valued measure (POVM) on \(\mathcal {H}^{\otimes n}\), i.e., a resolution of identity ∑i M n,i = I n by non-negative operators M n = {M n,i} on \(\mathcal {H}^{\otimes n}\). If a POVM consists of projections on \(\mathcal {H}^{\otimes n}\), it is called a projection valued measure (PVM). In the hypothesis testing problem, however, it is sufficient to treat a two-valued POVM {M 0, M 1}, where the subscripts 0 and 1 indicate the acceptance of H 0 and H 1, respectively. Thus, an operator \(A_n \in \mathcal {L} (\mathcal {H}^{\otimes n})\) satisfying inequalities 0 ≤ A n ≤ I n is called a test in the sequel, since A n is identified with the POVM {A n, I n − A n}. For a test A n, the error probabilities of the first kind and the second kind are, respectively, defined by

Let us define the optimal value for β n(A n) under the constant constraint on α n(A n)

(6)

and let

(7)

which is called the quantum relative entropy. Then we have the following theorem, which is one of the most essential theorems in quantum information theory.

Proposition 277 (The Quantum Stein’s Lemma)

For all 0 < ε < 1, it holds that

$$\displaystyle \begin{aligned} \lim_{n\to\infty}\frac 1n\log \beta_n^* (\varepsilon ) = -D(\rho || \sigma ). {} \end{aligned} $$
(8)

The first proof of (8) was composed of two inequalities, the direct part and the converse part. The direct part, concerned with existence of good tests, claims that

$$\displaystyle \begin{aligned} \forall\ 0 <\varepsilon \leq 1, \qquad \limsup_{n\to\infty} \frac 1n \log \beta_n^* (\varepsilon ) \leq -D(\rho || \sigma ){} \end{aligned} $$
(9)

and it was given by Hiai and Petz [8]. In this lecture, the main focus is on the direct part. Note that the direct part (9) is equivalent to the existence of a sequence of tests {A n} such that

$$\displaystyle \begin{aligned} \lim_{n\to\infty} \alpha_n (A_n ) = 0 \quad \text{and} \quad \limsup_{n\to\infty} \frac 1n\log\beta_n(A_n ) \leq -D(\rho || \sigma ){} \end{aligned} $$
(10)

(see [14]). On the other hand, the converse part, concerned with nonexistence of too good tests, asserts that

$$\displaystyle \begin{aligned} \forall\ 0 < \varepsilon < 1, \qquad \liminf_{n\to\infty}\frac 1n\log \beta_n^* (\varepsilon ) \geq -D(\rho || \sigma ){} \end{aligned} $$
(11)

which was given by Ogawa and Nagaoka [14]. A direct proof of the equality (8) was also given by Hayashi [6] using the information spectrum approach in quantum setting [10, 12], and a considerably simple proof of the converse part (11) was given in [11].

In this lecture, the asymptotic behavior of the error exponent \(\frac 1n \log \beta _n (A_n )\) under the exponential constraint

$$\displaystyle \begin{aligned}\alpha_n (A_n )\leq e^{-nr}, \qquad r > 0\end{aligned}$$

is studied, and a non-commutative analogue of the Hoeffding bound [9] similar to (3) is given as follows.

Theorem 278 (Ogawa and Hayashi 2004, [13])

For all r > 0, there exists a sequence of tests {A n} which satisfies

$$\displaystyle \begin{aligned} \limsup_{n\to\infty}\frac 1n\log \alpha_n (A_n ) &\leq -r, {} \end{aligned} $$
(12)
$$\displaystyle \begin{aligned} \limsup_{n\to\infty}\frac 1n \log \beta_n (A_n ) &\leq - \max_{0<s\leq 1} \frac{\overline{\psi}(s)-(1-s)r}{s}{} \end{aligned} $$
(13)

where

(14)

We will prove the theorem in 4. If ρ and σ commutate, \(\overline {\psi }(s)\) is identical with the classical counterpart Ψ(s) defined in (4), and (13) coincides with the Hoeffding bound (3), which is optimal in classical hypothesis testing.

This lecture is organized as follows. In 3, upper bounds on the error probabilities are shown based on a key operator inequality [6]. Using the upper bounds, we will prove Theorem 278 in 4. In 5, we will make some remarks toward further investigations.

Section 7 is devoted to the definition of pinching (see, e.g., [2], p. 50), which is known as a special notion of the conditional expectation in literature on the operator algebra and is used effectively in 3. In 8, the key operator inequality used in 3 is summarized along with another proof of it for readers’ convenience.

3 Bounds on Error Probabilities

In the sequel, let \(\mathcal {E}_{\sigma _n}(\rho _n )\) be the conditional expectation of ρ n to the commutant of the ∗-subalgebra generated by σ n, which we call pinching (see 7) and denote it as \(\overline {\rho _n}\) for simplicity. Let v(σ n) be the number of eigenvalues of σ n mutually different from others as defined in 7. Then a key operator inequalityFootnote 1 follows from Lemma 285 in 8, which originally appeared in [6]

$$\displaystyle \begin{aligned} \rho_n \leq v(\sigma_n ) \overline{\rho} n.{} \end{aligned} $$
(15)

Note that the type counting argument provides

$$\displaystyle \begin{aligned} v(\sigma_n ) \leq (n + 1)^d {} \end{aligned} $$
(16)

where \(d \triangleq \dim \mathcal {H}\). Following [6], let us apply the operator monotonicity of the function x↦ − x s, 0 ≤ s ≤ 1 (see, e.g, [2, Sec. V.1]) to (15) so that we have

$$\displaystyle \begin{aligned} \overline{\rho_n}^{-s} \leq v(\sigma_n )^s \rho^{-s}_n \leq (n + 1)^{sd} \rho^{-s}_n.{} \end{aligned} $$
(17)

Following the notation used in [10, 12], let us define the projection {X > 0} for a Hermitian operator X =∑i x i E i as

(18)

where E i is the projection onto the eigenspace corresponding to an eigenvalue x i. In the sequel, we will focus on a test defined by

(19)

where a is a real parameter, and derive the upper bounds on the error probabilities for the test \(\overline {S}_n (a)\) as follows.

Theorem 279 (Ogawa and Hayashi 2004, [13])

$$\displaystyle \begin{aligned} \alpha_n \left(\overline{S}_n (a)\right) &\leq (n + 1)^d e^{-n\overline{\varphi}(a)}, {} \end{aligned} $$
(20)
$$\displaystyle \begin{aligned} \beta_n\left( \overline{S}_n (a)\right) &\leq (n + 1)^d e^{-n[\overline{\varphi}(a)+a]}{} \end{aligned} $$
(21)

where \(\overline {\varphi }(a)\) is defined by \(\overline {\psi }(s)\) given in (14) as

(22)

Proof

The definition of \(\overline {S}_n (a)\) and commutativity of operators \(\overline {\rho _n}\) and σ n lead to

$$\displaystyle \begin{aligned} \left(\overline{\rho_n}^{1-s}-e^{na(1-s)}\sigma_n^{1-s}\right)\overline{S}_n(a) &\geq0{} \end{aligned} $$
(23)
$$\displaystyle \begin{aligned} \left(\overline{\rho_n}-e^{nas} \sigma_n^s \right) \left(I_n - \overline{S}_n (a)\right) &\leq 0{} \end{aligned} $$
(24)

for all 0 ≤ s ≤ 1. Note that \(\overline {S}_n (a)\) also commutes with σ n. Therefore, the inequality (24), with the property of pinching (63) in 7, provides

(25)

In the same way, (23) yields

(26)

It follows from (63) and (17) that

(27)

for all 0 ≤ s ≤ 1. Combining (25)–(27), we have

$$\displaystyle \begin{aligned} \alpha_n\left( \overline{S}_n (a)\right) &\leq (n + 1)^{sd} e^{-n\left[\overline{\psi}(s)-as\right]}\\[0.2cm] &\leq (n + 1)^d e^{-n\left[\overline{\psi}(s)-as\right]},{} \end{aligned} $$
(28)
$$\displaystyle \begin{aligned} \beta_n\left( \overline{S}_n (a)\right) &\leq (n + 1)^{sd} e^{-n\left[\overline{\psi}(s)-as+a\right]}\\[0.2cm] &\leq (n + 1)^d e^{-n\left[\overline{\psi}(s)-as+a\right]},{} \end{aligned} $$
(29)

which lead to (20) and (21) by taking the maximum in the exponents. □

4 Proof of Theorem 278 and the Quantum Stein’s Lemma

In this section, we will prove Theorem 278 by using Theorem 279. To this end, the behavior of \(\overline {\varphi }(a)\) in the error exponents (20) and (21) is investigated in the following lemmas. We will also show that Theorem 279 provides a simple proof of the direct part of the quantum Stein’s lemma (10).

Lemma 280

\(\overline {\varphi }(a)\) is convex and monotonically nonincreasing.

Proof

The assertion immediately follows from the definition of \(\overline {\varphi }(a)\). Actually, we have for all 0 ≤ t ≤ 1

$$\displaystyle \begin{aligned} \overline{\varphi}(ta + (1 - t)b) &= \max_{0\leq s\leq 1} \{\overline{\psi}(s) - (ta + (1 - t)b)s \}\\[0.2cm] &\leq t \max_{0\leq s\leq 1} \{\overline{\psi}(s) - as\} + (1 - t) \max_{0\leq s\leq 1} \{\overline{\psi}(s) - bs\}\\[0.2cm] &= t\overline{\varphi}(a) + (1 - t)\overline{\varphi}(b).{} \end{aligned} $$
(30)

Next, let a ≤ b and \(s_b \triangleq \arg \max _{0\leq s\leq 1} \{\overline {\psi }(s) - bs\}\). Then we have

$$\displaystyle \begin{aligned} \overline{\varphi}(b) &= \overline{\psi}(s_b ) - bs_b\\[0.2cm] &\leq \overline{\psi}(s_b ) - as_b \\[0.2cm] &\leq \max_{0\leq s\leq 1} \{\overline{\psi}(s) - as\}\\[0.2cm] &= \overline{\varphi}(a).{}\end{aligned} $$
(31)

Lemma 281

\(\overline {\varphi }(a)\) ranges from 0 to infinity.

Proof

Since we can calculate the derivative of \(\overline {\psi }(s)\) explicitly, \(\overline {\psi }(s)\) is continuous and differentiable. Therefore, it follows from the mean value theorem that for s > 0 there exists 0 ≤ t ≤ s such that

$$\displaystyle \begin{aligned} \overline{\psi} (t) = \frac{\overline{\psi}(s)-\overline{\psi}(0)}{s-0}. {} \end{aligned} $$
(32)

Let \(a \leq \max _{0\leq t\leq 1} \overline {\psi }^{\prime }(t)\), then we have

$$\displaystyle \begin{aligned} a\geq\frac{\overline{\psi}(s)-\overline{\psi}(0)}{s-0}.{} \end{aligned} $$
(33)

and hence,

$$\displaystyle \begin{aligned} \overline{\psi}(0) \geq \overline{\psi}(s) - as{} \end{aligned} $$
(34)

which yields

$$\displaystyle \begin{aligned} 0 = \overline{\psi}(0) = \max_{0\leq s\leq 1} \{\overline{\psi}(s) - as\} = \overline{\varphi}(a).{} \end{aligned} $$
(35)

On the other hand, it is obvious that

$$\displaystyle \begin{aligned} \lim_{a\to-\infty} \overline{\varphi}(a) = \infty.{} \end{aligned} $$
(36)

Since \(\overline {\varphi }(a)\) is continuous, which follows from convexity by Lemma 280, the assertion follows from (35) and (36). □

Combined with the above lemma, Theorem 279 leads to Theorem 278 as follows.

Proof of Theorem 278

For all r > 0, there exists \(a_r \in \mathbb {R}\) such that \(r = \overline {\varphi }(a_r )\) from Lemma 281. Let \(\overline {u}(r) \triangleq \overline {\varphi }(a_r ) + a_r\), then it follows from Theorem 279 that

$$\displaystyle \begin{aligned} \limsup_{n\to\infty} \frac 1n \log \alpha_n( \overline{S}n (a_r ) ) &\leq -r {} \end{aligned} $$
(37)
$$\displaystyle \begin{aligned} \limsup_{n\to\infty}\frac 1n \log \beta_n \overline{S}_n (a_r )) &\leq -\overline{u}(r). {} \end{aligned} $$
(38)

Therefore, it suffices to show that

$$\displaystyle \begin{aligned} \overline{u}(r) =\max_{0\leq s\leq 1}\frac{\overline{\psi}-(1-s)r}{s} {} \end{aligned} $$
(39)

For all 0 ≤ s ≤ 1, we have from the definition of \(\overline {\varphi }(a)\)

$$\displaystyle \begin{aligned} r = \overline{\varphi}(a_r ) \geq \overline{\psi}(s) - a_r s{} \end{aligned} $$
(40)

and there exists a number s 0, 0 < s 0 ≤ 1, achieving the equality since \(r = \overline {\varphi }(a_r ) > 0\). On the other hand, the definitions of \(\overline {u}(r)\) and a r lead to

$$\displaystyle \begin{aligned} \overline{u}(r) = \overline{\varphi}(a_r ) + a_r = r + a_r . {} \end{aligned} $$
(41)

Eliminating a r from (40) and (41), we have

$$\displaystyle \begin{aligned} \overline{u}(r)\geq \frac{\overline{\psi}(s) - (1 - s)r}{s} {}) \end{aligned} $$
(42)

and s 0 achieves the equality in (42) as well. Thus, we have shown (39), and Theorem 278 has been proved. □

Next, observing that \(\overline {\psi }(0) = 0\) and \(\overline {\psi }^{\prime } (0) = D(\rho ||\sigma )\), we have

$$\displaystyle \begin{aligned} \overline{\varphi}(a) > 0 \qquad \text{for all } a < D(\rho ||\sigma ) {} \end{aligned} $$
(43)

which leads to the following theorem combined with Theorem 279.

Theorem 282 (Ogawa and Hayashi 2004, [13])

For all a < D(ρ||σ), we have

$$\displaystyle \begin{aligned} \lim_{n\to\infty} \alpha_n( \overline{S}_n (a)) &= 0{} \end{aligned} $$
(44)
$$\displaystyle \begin{aligned} \limsup_{n\to\infty}\frac 1n\log \beta_n (\overline{S}_n (a)) &\leq -a. {} \end{aligned} $$
(45)

Since a < D(ρ||σ) can be arbitrarily near D(ρ||σ), we have shown the direct part of the quantum Stein’s lemma (10) .

5 Toward Further Investigations

The error exponents derived here do not seem to be natural, since \(\overline {\psi }(s)\) lacks symmetry between ρ and σ that the original hypothesis testing problem has. We need further investigation to determine the error exponents in quantum hypothesis testing. In this section, we make a few remarks on some candidates for the alternative to \(\overline {\psi }(s)\) in the expectation that the error exponents would be written in the form of Theorem 278.

Among many candidates, let us consider the following functions:

(46)
(47)
(48)

where

(49 50)

The reason to consider these functions is as follows. First ψ 1(s) is a symmetrized version of \(\overline {\psi }(s)\), and Theorem 278 still holds with \(\overline {\psi }(s)\) replaced by ψ 1(s), since similar upper bounds to Theorem 279 using \(\tilde {\psi }(s)\) are valid by exchanging ρ and σ and replacing s with 1 − s. On the other hand, ψ 2(s) for − 1 ≤ s ≤ 0 appeared in [14] to show the strong converse property in quantum hypothesis testing. Concerning ψ 3(s), u 3(r) is a quantum analogue of (2). Actually, we can show that

$$\displaystyle \begin{aligned} u_3(r)=\min_{\rho':D(\rho'||\rho)\leq r} D(\rho'||\rho) {} \end{aligned} $$
(51)

by the same way as [14, Sec. VI]. At present it is not clear whether u 2(r) and u 3(r) are achievable exponents in quantum hypothesis testing. It should be noted, however, that ψ i(s), i = 1, 2, 3, are reduced to the classical one (4) if ρ and σ commute, and they have desirable properties

$$\displaystyle \begin{aligned} \psi_i(0)&=\psi_i(1)=0 \\ \psi_i^{\prime}(0)&=D(\rho||\sigma),\\ \psi_i^{\prime}(1)&=D(\rho||\sigma) \qquad i=1,2,3{} \end{aligned} $$
(52)

which are consistent with the quantum Stein’s lemma . The above properties of ψ 2(s) and ψ 3(s) are verified by the direct calculations while those of ψ 1(s) follow from the following fact:

$$\displaystyle \begin{aligned} \psi_1(s)&=\overline{\psi}(s)\geq \tilde{\psi}(s), \qquad \text{if {$s$} is sufficiently near {$0$}}{} \end{aligned} $$
(53)
$$\displaystyle \begin{aligned} \psi_1(s)&=\tilde{\psi}(s)\geq \overline{\psi}(s), \qquad \text{if {$s$} is sufficiently near {$1$}}{} \end{aligned} $$
(54)

which is a consequence of \(\overline {\psi }(0)=\psi _2(0)\), \(\tilde {\psi }(1) =\psi _2(1)\), and the following lemma.

Lemma 283

For all 0 ≤ s ≤ 1, we have

$$\displaystyle \begin{aligned} \overline{\psi}(s) &\leq \psi_2(s) {} \end{aligned} $$
(55)
$$\displaystyle \begin{aligned} \tilde {\psi}(s) &\leq \psi_2(s) {} \end{aligned} $$
(56)

Proof

Let us apply the monotonicity property of the quantum quasi-entropy [17, 18] to , 0 ≤ s ≤ 1,Footnote 2 so that we have

(57)

where we used (27) in the last inequality. Thus, we obtain

$$\displaystyle \begin{aligned} \overline{\psi}(s)\leq\psi_2(s)+\frac{sd}{n}\log(n+1){} \end{aligned} $$
(58)

for any natural number n, and we have (55) by letting n go to infinity. Exchanging ρ and σ and replacing s with 1 − s in (55), we obtain (56). □

It follows immediately from Lemma 283 that ψ 1(s) ≤ ψ 2(s), and it was pointed out in [14] that we have ψ 2(s) ≤ ψ 3(s) as a consequence of the Golden-Thompson inequality (see, e.g., [16, p. 128])

(59)

for Hermitian operators A and B with the equality if and only if A and B commute. These facts are stated as the following proposition

Proposition 284

It holds that

$$\displaystyle \begin{aligned} \psi_1(s)&\leq\psi_2(s)\leq\psi_3(s) && \forall\ 0\leq s\leq 1 {} \end{aligned} $$
(60)
$$\displaystyle \begin{aligned} u_1(r)&\leq u_2(r)\leq u_3(r) && \forall\ r>0{} \end{aligned} $$
(61)

Especially, if ρ and σ do not commute, we have ψ 2(s) < ψ 2(s) and u 2(r) < u 3(r).

As mentioned above, u 1(r) is an achievable exponent in quantum hypothesis testing, while it is not known whether u 2(r) and u 3(r) are achievable or not. It is interesting to study the achievability of these functions, especially that of u 2(r), and the problem is left open.

6 Concluding Remarks

In the quantum hypothesis problem, we have presented upper bounds on the error probabilities of the first and the second kind, based on a key operator inequality satisfied by a density operator and pinching of it. The upper bounds are regarded as a noncommutative analogue of the Hoeffding bound [9], which is the optimal bound in classical hypothesis testing, and the upper bounds provide a simple proof of the direct part of the quantum Stein’s lemma. Compared with [6], the proof is considerably simple and leads to the exponential convergence of the error probability of the first kind.

7 Definition of Pinching

In this section, we summarize the definition of pinching (see, e.g., [2, p. 50]) for readers’ convenience. Pinching is known as a special notion of the conditional expectation in the field of operator algebra.

Given a Hermitian operator \(A \in \mathcal {L}(\mathcal {H})\), let \(A = \sum _{i=1}^{v(A)} a_i E_i\) be its spectral decomposition, where v(A) is the number of eigenvalues of A mutually different from others, and each E i is the projection corresponding to an eigenvalue a i. The following map defined by using the PVM \(E = \{Ei \}_{i=1}^{v(A)}\) is called pinching:

(62)

The operator \(\mathcal {E}_A (B)\) is also called pinching when no confusion is likely to arise, and it is sometimes denoted as \(\mathcal {E}_E (B)\). It should be noted here that pinching is the conditional expectation (with respect to the tracial state) to the commutant of the ∗-subalgebra generated by A or PVM E, since \(\mathcal {E}_A(B)\) is the one and only operator which satisfies

(63)

for any operator \(C \in \mathcal {L}(\mathcal {H})\) commuting with A.

8 Key Operator Inequality

The following lemma has played an important role in this lecture. Although the lemma for a two-valued PVM has been widely used, it appeared in [6] for the general case. Here, we will show another proof of it for readers’ convenience.

Lemma 285 (Hayashi 2002, [6])

Given a PVM \(M = \{M_i \}_{i=1}^{v(M)}\) on \(\mathcal {H}\) , we have for all \( \rho \in \mathcal {S}(\mathcal {H})\)

$$\displaystyle \begin{aligned} \rho \leq v(M)\mathcal{E}_M (\rho ){}\end{aligned} $$
(64)

where \(\mathcal {E}_M (\rho )\) is the pinching defined in 7.

Proof

First, note that the following map, defined with respect to a non-negative operator \(A\in \mathcal {L}(\mathcal {H})\), is operator convex

$$\displaystyle \begin{aligned} f_A: X\in\mathcal{L}(\mathcal{H})\to X^* AX\in\mathcal{L}(\mathcal{H}) {}\end{aligned} $$
(65)

which is shown by a direct calculation

$$\displaystyle \begin{aligned} tf_A(X)+(1-t)f_A(Y)-f_A(tX+(1-t)Y)=t(1-t)(X-Y)^* A(X-Y)\geq 0{}\end{aligned} $$
(66)

for 0 ≤ t ≤ 1. Using the convexity, the lemma is verified as follows:

$$\displaystyle \begin{aligned} \frac{1}{v(M)^2}\rho &= \left(\frac{1}{v(M)}\sum_{i=1}^{v(M)} M_i\right)\rho \left(\frac{1}{v(M)}\sum_{i=1}^{v(M)}M_i\right)\\[0.15cm] &\leq \frac{1}{v(M)}\sum_{i=1}^{v(M)}M_i\rho M_i\\[0.15cm] &=\frac{1}{v(M)}\mathcal{E}_M(\rho).{} \end{aligned} $$
(67)