Abstract
Drawing on an analogy with the second law of thermodynamics for adiabatically isolated systems, Cover argued that data-processing inequalities may be seen as second laws for “computationally isolated systems,” namely, systems evolving without an external memory. Here we develop Cover’s idea in two ways: on the one hand, we clarify its meaning and formulate it in a general framework able to describe both classical and quantum systems. On the other hand, we prove that also the reverse holds: the validity of data-processing inequalities is not only necessary, but also sufficient to conclude that a system is computationally isolated. This constitutes an information-theoretic analogue of Lieb’s and Yngvason’s entropy principle. We finally speculate about the possibility of employing Maxwell’s demon to show that adiabaticity and memorylessness are in fact connected in a deeper way than what the formal analogy proposed here prima facie seems to suggest.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Cover, in the attempt to set the second law of thermodynamics in a computational framework, concludes his work with the following suggestive observations [1]:
The second law of thermodynamics says that uncertainty increases in closed physical systems and that the availability of useful energy decreases. If one can make the concept of “physical information” meaningful, it should be possible to augment the statement of the second law of thermodynamics with the statement,“useful information becomes less available.” Thus the ability of a physical system to act as a computer should slowly degenerate as the system becomes more amorphous and closer to equilibrium. A perpetual computer should be impossible [emphasis added].
Cover’s analysis can be summarized as follows. He first argues, more or less implicitly, that the computational analogue of an adiabatically isolated system should be taken to be a system evolving—i.e., computing—without an external memory. (For this reason, in what follows we use the term “computationally isolated” as a synonym for“memoryless.”) This observation leads him to consider stochastic memoryless processes, in particular discrete-time Markov chains. Cover then shows that, while entropy can increase or decrease in this setting, thus violating the thermodynamical second law, relative entropy instead never increases. We refer to this statement as Cover’s “computational second law.Footnote 1” On a technical side, what Cover proves in [1] is an expression of the monotonicity of the relative entropy under the action of a noisy channel. Thus Cover’s second law is in fact a particular data-processing inequality [2, 3], and we can imagine that there are as many computational second laws as there are data-processing inequalities, all formalizing the idea that the information content of a system cannot increase without the presence of an external memory.Footnote 2
Cover hence shows that the condition of being memoryless is sufficient for a system to obey data-processing inequalities, i.e., computational second laws. The question we address in this paper concerns the other direction: is it possible to show that the memoryless condition is also necessary for the validity of all data-processing inequalities? Equivalently stated: is it true that a system, if it is not computationally isolated, will necessarily violate some data-processing inequality? It is important to address these questions, if we want to understand how far the analogy between memorylessness and adiabaticity can be pushed. Here, in particular, we have in mind Lieb’s and Yngvason’s formulation of the second law of thermodynamics [7], according to which a non-decreasing entropy is not only necessary but also sufficient for the existence of an adiabatic process connecting two thermodynamical states.Footnote 3
The aim of this paper is to provide a comprehensive framework that is able to answer the above questions. More specifically, we prove here a family of reverse data-processing theorems, showing that as soon as a system is not computationally isolated, it must necessarily violate a data-processing inequality. The framework we construct is quite general and it can be applied to classical, quantum, and hybrid classical/quantum systems. In fact, it may even be extended in principle to generalized operational theories as it involves only basic notions like states, effects, and operations; this development is however beyond the scope of the present work.
Thus we are able to strengthen Cover’s computational second law in two ways: on the one hand, we give it a converse, in a way that is analogous to what Lieb and Yngvason did the second law of thermodynamics. On the other hand, we include in the analysis the possibility of dealing with quantum systems and quantum memories.
The paper is organized as follows. We being in Sect. 2 with reviewing the data-processing inequality for a classical Markov chain. This is the encoding–channel–decoding model considered by Shannon to describe the simplest communication scenario. In this scenario we prove our first reverse data-processing theorem. We also show how this relates with the theory of comparison of noisy channels, as introduced by Shannon [8] and later developed by Körner and Marton [9]. In Sect. 3 we state and prove a lemma that allows us to extend our considerations to the quantum case, and discuss the notion of quantum statistical morphisms. In Sect. 4 we study the case of a system, processing quantum information but outputting only classical data, and prove the corresponding reverse data-processing theorem. Section 5 presents the general case of a fully quantum computer, i.e., a process with quantum inputs and quantum outputs. Finally, in Sect. 6, we briefly discuss about analogies and differences between thermodynamical and computational second laws. In particular, we speculate about the possibility that Maxwell’s paradox (his “demon”) may enable a deeper relation between adiabatic processes and memoryless processes, going beyond the formal analogy considered in this work. At the end of the paper, three appendices are available: the first, reviewing conventions, notations, and terminology used in this work; the second, containing a version of the minimax theorem; and the third, presenting (just for the sake of completeness) an elementary proof of the separation theorem for convex sets.
This work contains ideas that were presented during the Sixth Nagoya Winter Workshop (NWW2015) held in Nagoya on 9–13 March 2015. Part of the technical results presented here were first introduced in previous papers by the author [10,11,12,13,14,15], building upon works of Shmaya [16] and Chefles [17].
2 A Reverse-Data Processing Theorem for Classical Channels
A data-processing inequality is a mathematical statement formalizing the fact that the information content of a signal cannot be increased by post-processing. As there are many ways to quantify information, so there are many corresponding data-processing inequalities. Such inequalities, however, despite formalizing the same intuitive concept, are not all logically equivalent: some may be stronger than (i.e., imply) others, some may be easier to prove, some may be better suited for a particular problem at hand. Data-processing inequalities usually find application in information theory when proving that a given approach (coding strategy) is optimal: if a better coding were possible, that would result in the violation of one or more data-processing inequalities, thus leading to an absurd. In this sense, data-processing inequalities provide a sort of “sanity check” of the result.
One of the simplest scenarios in which a data-processing inequality can be formulated is the following [2, 3]. Given are two noisy channels \(w_1:\mathscr {X}\rightarrow \mathscr {Y}\) and \(w_2:\mathscr {Y}\rightarrow \mathscr {Z}\). Then, for any set \(\mathscr {U}\) and any initial joint distribution p(x, u), the joint distribution \(\sum _xw_2(z|y)w_1(y|x)p(x,u)\) satisfies the following inequalities:
[Notations and definitions used here and in what follows are given for completeness in Appendix 1.] Referring to the situation depicted in Fig. 1 and interpreting U as the message, X as the signal, \(w_1\) as the communication channel, Y as the output signal, \(w_2\) as the decoding, and Z as the recovered message, the above inequality formalizes the fact that the information content carried by the signal about the message cannot be increased by any decoding performed locally at the receiver. Of course, this does not mean that decoding should be avoided (actually, in most cases a decoding is necessary to make the signal readable to the receiver), but that no decoding is able to add a posteriori more information to what is already carried by the signal.
Data-processing inequalities hence provide necessary conditions for the “locality” of the information-processing device. Namely, data-processing inequalities must be obeyed whenever the physical process carrying the message from the sender to the receiver is composed by computationally isolated parts (encoding, transmission, decoding, etc.). Any information that is communicated must be transmitted via a physical signal: as such, in the absence of an external memory, information can only decrease, never increase, along the transmission. Hence,“locality” in this sense can be understood as the condition that the process \(U\rightarrow X\rightarrow Y\rightarrow Z\) forms a Markov chain. For this reason, we refer to such locality as “Markov locality,” in order to avoid confusion with other connotations of the word.Footnote 4
In this paper we aim to derive statements that provide sufficient conditions for Markov locality, in the form of a set of information-theoretic inequalities. We refer to such statements as reverse data-processing theorems. For example, a first attempt in this direction would be to prove the following:
Given are two noisy channels \(w:\mathscr {X}\rightarrow \mathscr {Y}\) and \(w':\mathscr {X}\rightarrow \mathscr {Z}\). Suppose that, for any set \(\mathscr {U}\) and for any initial joint distribution p(x, u), the resulting distributions \(\sum _xw(y|x)p(x,u)\) and \(\sum _xw'(z|x)p(x,u)\) always satisfy the inequality \(I(U;Y)\ge I(U;Z )\). Then there exists a noisy channel \(\varphi :\mathscr {Y}\rightarrow \mathscr {Z}\) such that \(w'(z|x)=\sum _y\varphi (z|y)w(y|x)\).
Notice that, in the above statement, the two given channels w and \(w'\) are assumed to have the same input alphabet: this is a consequence of the fact that we are now formulating a reverse data-processing theorem, so that the existence of a Markov-local decoding (the channel \(\varphi \)) is something to be proved, rather than being a datum. Interpreting the four random variables (U, X, Y, Z) as before, if the reverse data-processing theorem holds, then we can conclude that any violation of Markov locality is detectable, in the precise sense that the data-processing inequality has to be violated at some point along the communication process.
2.1 Comparison of Noisy Channels
A reverse data-processing theorem can be understood as a statement about the comparison of two noisy channels. Hence we want to introduce ordering relations between noisy channels, capturing the idea that one channel is able to transmit “more information” than another. This problem, first considered by Shannon [8], is intimately related to the theory of statistical comparisons [18,19,20,21], even though this connection was not made until recently [22]. The theory of comparison of noisy channels received a thorough treatment by Körner and Marton, who in Ref. [9] introduce the following definitions (the notation used here follows [23]):
Definition 1
Given are two noisy channels, \(w:\mathscr {X}\rightarrow \mathscr {Y}\) and \(w':\mathscr {X}\rightarrow \mathscr {Z}\).
-
(i)
the channel w is said to be less noisy than \(w'\) if and only if, for any set \(\mathscr {U}\) and any joint distribution p(x, u), the resulting distributions \(\sum _x w(y|x)p(x,u)\) and \(\sum _x w'(z|x)p(x,u)\) always satisfy the inequality
$$\begin{aligned} H(U|Y)\le H(U|Z)\;; \end{aligned}$$(1) -
(ii)
the channel w is said to be degradable into \(w'\) if and only if there exists another channel \(\varphi :\mathscr {Y}\rightarrow \mathscr {Z}\) such that
$$\begin{aligned} w'(z|x)=\sum _y\varphi (z|y)w(y|x)\;. \end{aligned}$$(2)
\(\square \)
Since \(I(U;Y)\ge I(U;Z)\) if and only if \(H(U|Y)\le H(U|Z)\), we immediately notice that the reverse data-processing theorem, as tentatively formulated above, is equivalent to the implication (i)\(\implies \)(ii): indeed, the reverse implication, (ii)\(\implies \)(i), is the usual data-processing inequality. Körner and Marton provide an explicit counterexample showing that
This means that, if a reverse data-processing theorem holds, it must be formulated differently.
2.2 Replacing H with \(H_\mathrm{{min} } \)
Even though we know that “less noisy” does not imply“degradable,” in what follows we show that just a slight formal modification in the definition of “less noisy, ” Eq. (1), is enough to obtain the sought-after reverse data-processing theorem. Such a slight modification consists in replacing, in point (i) of Definition 1, the Shannon conditional entropy \(H(\cdot |\cdot )\) with the conditional min-entropy \(H_\mathrm{{min} } (\cdot |\cdot )\).
Theorem 1
Given are two noisy channels \(w:\mathscr {X}\rightarrow \mathscr {Y}\) and \(w':\mathscr {X}\rightarrow \mathscr {Z}\). The following are equivalent:
-
(i)
for any set \(\mathscr {U}\) and for any initial joint distribution p(x, u), the resulting distributions \(\sum _x w(y|x)p(x,u)\) and \(\sum _x w'(z|x)p(x,u)\) always satisfy the inequality
$$\begin{aligned} H_\mathrm{{min} } (U|Y)\le H_\mathrm{{min} } (U|Z)\;; \end{aligned}$$(4) -
(ii)
w is degradable into \(w'\), namely, there exists another channel \(\varphi :\mathscr {Y}\rightarrow \mathscr {Z}\) such that \(w'(z|x)=\sum _y\varphi (z|y)w(y|x)\;\).
Proof
(ii)\(\implies \)(i) is a direct consequence of the data-processing inequality for \(H_\mathrm{{min} } \). Suppose that there exists another conditional probability distribution \(\varphi (z|y)\) such that \(w'(z|x)=\sum _y\varphi (z|y)w(y|x)\). This means that the random variable Z is obtained locally from Y, i.e., the four random variables (U, X, Y, Z) form a Markov chain \(U\rightarrow X\rightarrow Y\rightarrow Z\). This implies that (4) holds.
In order to prove (i)\(\implies \)(ii), let us assume that the inequality in (4) holds for any initial joint distribution p(x, u). Exponentiating both sides, and using Eq. (59), this is equivalent to
namely,
for all choices of p(x, u). In the above equation, the noisy channels \(\varphi \) and \(\varphi '\) represent the decision functions that the statistician designs in order to optimally guess the value of U.
Let us choose U such that its support coincides with that of Z, i.e., \(\mathscr {U}\equiv \mathscr {Z}\). We can therefore denote its states by \(z'\). Let us also fix the guessing strategy on the right-hand side of (6) to be \(\varphi '(z'|z)\equiv \delta _{z',z}\), i.e., 1 if \(z'=z\) and 0 otherwise. Then, we know that there exists a decision function \(\varphi (z'|y)\) such that
In other words, for any \(p(x,z')\), there exists a \(\varphi (z'|y)\) such that the above inequality holds. This is equivalent to say that
We now invoke the minimax theorem (in the form reported in Appendix 2, Theorem 4) and exchange the order of the two optimizations:
Let us now introduce the quantity
First of all, we notice that the maximum in Eq. (12) is reached when the distribution \(p(x,z')\) is entirely concentrated on an entry where \(\varDelta _\varphi (z',x)\) is maximum, that is,
In general, \(\varDelta _\varphi (z',x)\) does not have a definite sign, however, since \(\sum _{z',x}\varDelta _\varphi (z',x)=0\) (as a consequence of the normalization of probabilities), it must be that \(\max _{z',x}\varDelta _\varphi (z',x)\ge 0\) (otherwise, of course, one would have \(\sum _{z',x}\varDelta _\varphi (z',x)<0\)). The above inequality hence implies that \(\min _\varphi \max _{z',x}\varDelta _\varphi (z',x)=0\). In turns this implies, again because \(\sum _{z',x}\varDelta _\varphi (z',x)=0\), that \(\varDelta _\varphi (z',x)=0\) for all \(z'\) and x. In other words, we showed that there exists a \(\varphi (z'|y)\) such that
for all \(z',x\), which coincides with the definition of degradability. \(\square \)
Remark 1
From the proof we see that in point (ii) of Theorem 1 it is possible to restrict, without loss of generality, the random variable U to be supported on the set \(\mathscr {Z}\), i.e., the same supporting the output of \(w'\). \(\square \)
3 The Fundamental Lemma for Quantum Channels
The following lemma plays a crucial role in the derivation of reverse data-processing theorems valid in the quantum case.
Lemma 1
Let \(\varPhi _A:\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi '_A:\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) be two quantum channels. For any set \(\mathscr {U}=\{u\}\), the following are equivalent:
-
(i)
for all ensembles \(\{p(u);\omega ^u_A\}\;\),
$$\begin{aligned} P_\mathrm{{guess} } \{p(u);\varPhi _A(\omega ^u_A)\}\ge P_\mathrm{{guess} } \{p(u);\varPhi '_A(\omega ^u_A)\}\;; \end{aligned}$$(17) -
(ii)
for any POVM \(\{Q^u_{B'}\}\), there exists a POVM \(\{P^u_B\}\) such that
$$\begin{aligned} {{\text {Tr}}}\!\left[ {\varPhi '_A(\omega _A)\ Q^u_{B'}}\right] ={{\text {Tr}}}\!\left[ {\varPhi _A(\omega _A)\ P^u_B}\right] \;, \end{aligned}$$(18)for all \(u\in \mathscr {U}\) and all \(\omega _A\in \mathsf {D}(\mathscr {H}_A)\).
Proof
The fact that (ii) implies (i) follows by definition of guessing probability. We therefore prove the converse, namely, that (i) implies (ii).
Let us rewrite condition (17) explicitly as follows: for all ensembles \(\{p(u);\omega ^u_A \}\;\),
where the maxima are taken over all possible POVMs. Introduce now an auxiliary Hilbert space \(\mathscr {H}_R\cong \mathscr {H}_{A}\), and denote by \(\phi ^+_{RA}\) a fixed maximally entangled in \(\mathsf {D}(\mathscr {H}_R\otimes \mathscr {H}_A)\). Construct then the Choi operators corresponding to channels \(\varPhi \) and \(\varPhi '\), namely,
Noticing that, for any ensemble \(\{p(u);\omega ^u_A\}\) with \(\sum _up(u)\omega ^u_A= I_A/d_A\), there exists a POVM \(\{E^u_R\}\) such that \(p(u)\omega ^u_A={\text {Tr}}_{R}\!\left[ {\phi ^+_{RA}\ (E^u_R\otimes I_A)}\right] \), we immediately see that, if condition (19) above holds, then, for any POVM \(\{E^u_R\}\),
We now prove that condition (21) above in turns implies that, for any collection of Hermitian operators \(\{O^u_R \}\),
The crucial observation here is that, given a collection of Hermitian operators \(\{O^u_R\}\;\), we can always derive from it a POVM \(\{E^u_R\}\) given by
with \(\varSigma _R\triangleq \sum _uO^u_R\) and \(\alpha >0\) sufficiently large so that \(O^u_R+\alpha I_R-|\mathscr {U}|^{-1}\varSigma _R\) is nonnegative for all u. Therefore, assuming that inequality (21) holds for any POVM \(\{E^u_R \}\), we have that
for any collection of Hermitian operators \(\{O^u_R\}\;\). Inequality (24) above is a consequence of condition (21) together with the identity \({\text {Tr}}_{B}\!\left[ {\chi _{RB}}\right] ={\text {Tr}}_{B'}\!\left[ {\chi '_{RB'}}\right] =I_R/d_R\). Hence we showed that condition (22) holds if condition (21) holds, even though the former looks at first sight more general than the latter. The vice versa is true simply because any POVM is, in particular, a family of Hermitian operators.
Let us now denote by \(\mathscr {L}(\mathscr {U})\) the set of operator tuples
with inner product
We then define \(\mathscr {C}(\chi ;\mathscr {U})\) as the convex subset of \(\mathscr {L}(\mathscr {U})\) containing tuples \(\mathbf {b}\) such that \(b^u\triangleq {\text {Tr}}_{B}\!\left[ {\chi _{RB}\ (I_R\otimes P^u_B)}\right] \), for varying POVM \(\{P^u_B\}\). [The fact that \(\mathscr {C}(\chi ;\mathscr {U})\) is convex is a direct consequence of the fact that the set of POVMs supported on \(\mathscr {U}\) is convex.] In the same way, we also define \(\mathscr {C}'(\chi ';\mathscr {U})\). For the sake of simplicity of notation, when no confusion arises, we simply denote \(\mathscr {C}(\chi ;\mathscr {U})\) as \(\mathscr {C}\) and \(\mathscr {C}'(\chi ';\mathscr {U})\) as \(\mathscr {C}'\). Using this notation, condition (22) becomes
for all \(\mathbf {a}\in \mathscr {L}(\mathscr {U})\). [Here \(a^u=O^u_R\).]
Hence, we turned the initial conditions involving guessing probabilities into a family of linear constraints on two convex sets, \(\mathscr {C}\) and \(\mathscr {C}'\). Then, a direct application of the separation theorem for convex sets (see Corollary 2 in Appendix 3), leads us to conclude that
In other words, condition (17) in the statement of the lemma implies that, for any POVM \(\{Q^u_{B'}\}\), there exists a POVM \(\{P^u_B\}\) such that
for all \(u\in \mathscr {U}\).
The final step consists in noticing that any state \(\omega _A\) can be written as \({\text {Tr}}_{R}\!\left[ {\phi ^+_{RA}\ (E_R\otimes I_A)}\right] \) for some \(E_R\in \mathsf {L}_+(\mathscr {H}_R)\). Therefore, multiplying both sides of (29) by \(E_R\) and taking the trace, we obtain
which of course holds for any choice of \(E_R\), that is, \(\omega _A\), as claimed. \(\square \)
Remark 2
As explained in the paragraph following Eq. (20), the above proof shows that, in particular, the ensembles \(\{p(u);\omega ^u_A \}\) in point (i) can be restricted, without loss of generality, to ensembles with maximally mixed average, i.e., \(\sum _up(u)\omega ^u_A\propto I_A\). \(\square \)
Remark 3
We notice that point (ii) can be alternatively formulated as follows: for any POVM \(\{Q^u_{B'}\}\), there exists a POVM \(\{P^u_B\}\) such that
for all \(u\in \mathscr {U}\), where \(\varPhi ^\dagger \) denotes the trace-dual defined in Eq. (55). \(\square \)
3.1 Quantum Statistical Morphisms
Let us now choose the set \(\mathscr {U}\) in Lemma 1 so that its size \(|\mathscr {U}|\) is equal to \((\text {dim}\mathscr {H}_{B'})^2\). Assuming that channels \(\varPhi \) and \(\varPhi '\) actually satisfy either (17) or (18), let us set the POVM \(\{Q^u_{B'}\}\) to be informationally complete, that is, \({\text {span}}\{Q^y_{B'}\}=\mathsf {L}(\mathscr {H}_{B'})\). Then, if \(\{P^u_B\}\) is any POVM satisfying the equality (33) in Remark 3, the relation
can be used to define a linear map \(\varGamma :\mathsf {L}(\mathscr {H}_{B})\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) with the following properties:
-
1.
let \(\{\varXi ^y_{B'}\}\) be the unique dual of \(\{Q^y_{B'}\}\), in the sense that \(X_{B'}=\sum _y{{\text {Tr}}}\!\left[ {Q^y_{B'}\ X_{B'}}\right] \varXi ^y_{B'}\;\), for all \(X_{B'}\in \mathsf {L}(\mathscr {H}_{B'})\;\); then the action of \(\varGamma \) is given by \(\varGamma (\cdot )=\sum _y{{\text {Tr}}}\!\left[ {P^y_{B}\ \cdot }\right] \;\varXi ^y_{B'}\;\);
-
2.
\(\varGamma \) is Hermiticity-preserving, i.e., \(X=X^\dag \) implies that \(\varGamma (X)=\left[ \varGamma (X) \right] ^\dag \;\);
-
3.
\(\varGamma \) is trace-preserving;
-
4.
\(\varPhi '=\varGamma \circ \varPhi \;\).
In particular, the map \(\varGamma \), as defined above, is positive and trace-preserving on the output (meant as the whole linear range) of \(\varPhi \). In order to prove this, let \(X_A\in \mathsf {L}(\mathscr {H}_A)\) be any operator such that \(\varPhi _A(X_A)\ge 0\). (Notice that \(X_A\) need not be positive itself.) Then \(\varGamma _B(\varPhi _A(X_A))\ge 0\). This is because \(\varGamma \circ \varPhi =\varPhi '\) and we know, from Eq. (18), that for any positive operator \(Q_{B'}\) there exists a positive operator \(P_B\) such that \({{\text {Tr}}}\!\left[ {Q_{B'}\ \varGamma _B(\varPhi _A(X_A))}\right] ={{\text {Tr}}}\!\left[ {Q_{B'}\ \varPhi '_A(X_A)}\right] ={{\text {Tr}}}\!\left[ {P_B\ \varPhi _A(X_A)}\right] \). Hence, we know that for any positive operator \(Q_{B'}\), \({{\text {Tr}}}\!\left[ {Q_{B'}\ \varGamma _B(\varPhi _A(X_A))}\right] \ge 0\) whenever \(\varPhi _A(X_A)\ge 0\), which is the definition of positivity.
Following the terminology of [24, 25], the following definition was introduced in [10]:
Definition 2
Given a channel \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\), a linear map \(\varGamma :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_C)\) is said to be a quantum statistical morphism of \(\varPhi \) if and only if, for any state \(\omega _A\) and any POVM \(\{Q^y_C\}\), there exists a POVM \(\{P^y_B\}\) such that
for all y. \(\square \)
It is easy to verify that an everywhere positive trace-preserving linear map is always a statistical morphisms for any channel, as long as the composition between the two is well defined. Then, the natural question is whether a linear map defined as \(\varGamma \) above can always be extended to become positive and trace-preserving everywhere, not only on the range of \(\varPhi \). The question was answered in the negative by Matsumoto, who gave an explicit counterexample in Ref. [26].
Vice versa, one may ask whether any linear map that is positive and trace-preserving on the range of \(\varPhi \) is a well-defined statistical morphism of \(\varPhi \) or not. Also in this case, the answer is in the negative: the fact that condition (35) must hold for any POVM (in particular, for any number of outcomes) is strictly stronger than just positivity, for which is enough if condition (35) holds only for two-outcome POVMs.
Statistical morphisms hence lie somewhere in between linear maps that are positive and trace-preserving (PTP) everywhere, and those that are so only on the range of \(\varPhi \):
We summarize the contents of this section in one definition and one corollary.
Definition 3
Given are two quantum channels \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi ':\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\). For a given set \(\mathscr {U}\), we say that \(\varPhi \) is \(\mathscr {U}\)-sufficient for \(\varPhi '\), in formula,
if and only if either of the conditions in Lemma 1 hold. \(\square \)
Corollary 1
Given are two quantum channels \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi ':\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\). The following are equivalent:
-
(i)
\(\varPhi \succeq _\mathscr {U}\varPhi '\), for any set \(\mathscr {U}\;\);
-
(ii)
there exists a quantum statistical morphism \(\varGamma :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) of \(\varPhi \) such that \(\varPhi '=\varGamma \circ \varPhi \;\).
Remark 4
Using the correspondence between ensembles and bipartite states, together with the relation between guessing probability and conditional min-entropy, given in Appendix 1 in Eqs. (54) and (60), we notice that the condition \(\varPhi \succeq _\mathscr {U}\varPhi '\) can be equivalently written as
where the entropies are computed with respect to states \(({\text {id}}_U\otimes \varPhi _A)(\omega _{UA})\) and \(({\text {id}}_U\otimes \varPhi '_A)(\omega _{UA})\), respectively. This is equivalent to the formulation used in Theorem 1. \(\square \)
4 A Semiclassical (Semiquantum) Reverse-Data Processing Theorem
We consider in this section the case in which the output of a quantum channel is classical, in the precise sense that the range is supported on a commutative subalgebra.
Theorem 2
Given are two quantum channels \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi ':\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\). Assuming that the output of \(\varPhi '\) is classical, i.e.,
the following are equivalent:
-
(i)
\(\varPhi \succeq _\mathscr {U}\varPhi '\), for any set \(\mathscr {U}\;\);
-
(ii)
\(\varPhi \succeq _\mathscr {U}\varPhi '\), for a set \(\mathscr {U}\) such that \(|\mathscr {U}|=\dim \mathscr {H}_{B'}\;\);
-
(iii)
there exists a quantum channel \(\varPsi :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) such that \(\varPhi '=\varPsi \circ \varPhi \;\).
Proof
Since the implications (iii)\(\implies \)(i)\(\implies \)(ii) are either trivial of direct consequence of the data-processing inequality for the guessing probability, we only prove the implication (ii)\(\implies \)(iii).
Since \(|\mathscr {U}|=\dim \mathscr {H}_{B'}\), we can use the elements \(u\in \mathscr {U}\) to label an orthonormal basis \(\{\left| u\right\rangle :u\in \mathscr {U}\}\) of \(\mathscr {H}_{B'}\). Assuming (ii), we know from Lemma 1 that (18) holds, so, in particular, we know that there exists a POVM \(\{P^u_B\}\) such that
for all u and all \(\omega _A\in \mathsf {D}(\mathscr {H}_A)\).
We now use the fact that the output of \(\varPhi '\) is classical and assume that any operator in the range of \(\varPhi '\) can be diagonalized on the basis \(\{\left| u\right\rangle \}\). This means that
Using Eq. (40), and defining a measure-and-prepare channel \(\varPsi :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) by the relation
we finally have that \(\varPhi '=\varPsi \circ \varPhi \). \(\square \)
Remark 5
In order to highlight the perfect analogy with Theorem 1, we recall that the relation between guessing probability and conditional min-entropy (see Appendix 1) allows us to rewrite points (i) and (ii) of Theorem 2 as:
See also Remark 4 above. \(\square \)
Remark 6
It is possible to show that Theorem 1 becomes a corollary of Theorem 2. Consider in fact the situation in which both \(\varPhi \) and \(\varPhi '\) are classical-quantum channels, namely, \(\varPhi :\mathscr {X}\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi ':\mathscr {X}\rightarrow \mathsf {L}(\mathscr {H}_{B'})\), with \(\varPhi (x)\triangleq \rho ^x_B\in \mathsf {D}(\mathscr {H}_B)\) and \(\varPhi '(x)\triangleq \sigma ^x_{B'}\in \mathsf {D}(\mathscr {H}_{B'})\). Assume moreover that \([\rho ^x,\rho ^{x'}]=0\) and \([\sigma ^x,\sigma ^{x'}]=0\), for all \(x,x'\in \mathscr {X}\). We are hence in a scenario much more restricted than that of Theorem 2: in fact, by identifying commuting states with the probability distributions of their eigenvalues, we recover the classical framework and the statement of Theorem 1. \(\square \)
Remark 7
Theorem 1, the classical reverse data-processing inequality, has thus two different proofs: one using the minimax theorem and another using the separation theorem for convex sets. Despite the fact that minimax theorem and separation theorem are ultimately equivalent [27], the minimax theorem allows for an easier treatment of the approximate case, which is a very relevant point but goes beyond the scope of the present contribution. The interested reader may refer to Refs. [15, 28]. \(\square \)
5 A Fully Quantum Reverse Data-Processing Theorem
We consider in this section the case of two completely general quantum channels, with the only restriction that the input space is the same for both.
Theorem 3
Given are two quantum channels, \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi ':\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\), and an auxiliary Hilbert space \(\mathscr {H}_{B''}\cong \mathscr {H}_{B'}\). The following are equivalent:
-
(i)
\({\text {id}}_{B''}\otimes \varPhi _A\succeq _\mathscr {U}{\text {id}}_{B''}\otimes \varPhi '_A\), for any set \(\mathscr {U}\;\);
-
(ii)
\({\text {id}}_{B''}\otimes \varPhi _A\succeq _\mathscr {U}{\text {id}}_{B''}\otimes \varPhi '_A\), for a set \(\mathscr {U}\) such that \(|\mathscr {U}|=\dim (\mathscr {H}_{B''}\otimes \mathscr {H}_{B'})=(\dim \mathscr {H}_{B'})^2\;\);
-
(iii)
there exists a quantum channel \(\varPsi :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) such that \(\varPhi '=\varPsi \circ \varPhi \;\).
Remark 8
In terms of the conditional min-entropy, points (i) and (ii) above can be written as
with obvious meaning of symbols. See also Remarks 4 and 5 above. \(\square \)
Proof
Since the implication (iii)\(\implies \)(i)\(\implies \)(ii) is straightforward, we prove here only that (ii)\(\implies \)(iii).
Let \(\mathscr {H}_{B'''}\) be a further auxiliary Hilbert space such that \(\mathscr {H}_{B'''}\cong \mathscr {H}_{B''}\cong \mathscr {H}_{B'}\). We begin by showing that, if \({\text {id}}_{B''}\otimes \varPhi _A\succeq _\mathscr {U}{\text {id}}_{B''}\otimes \varPhi '_A\), then, for any POVM \(\{Q^u_{B''B'}\}\;\), there exists a POVM \(\{P^u_{B''B}\}\) such that
where \(\phi ^+_{B'''B''}\) is a maximally entangled state in \(\mathscr {H}_{B'''}\otimes \mathscr {H}_{B''}\). In fact, Lemma 1 states that, for any POVM \(\{Q^u_{B''B'}\}\), there exists a POVM \(\{P^u_{B''B}\}\) such that
for all \(u\in \mathscr {U}\). In particular, for any family of states \(\{\xi ^x_{B''} \}_x\) on \(\mathscr {H}_{B''}\), we have
for all u and all x. Let us choose \(\xi ^x_{B''}={\text {Tr}}_{B'''}\!\left[ {\phi ^+_{B'''B''}\ (\varXi ^x_{B'''}\otimes I_{B''})}\right] \) for some complete set of positive operators \(\{\varXi ^x_{B'''} \}_x\). Hence Eq. (47) becomes
for all u and all x. But since the family \(\{\varXi ^x_{B'''}\}_x\) has been chosen to be complete, the above equality implies the equality of the operators in Eq. (45).
Now, we can use generalized teleportation and show that
where \(\{\beta ^u_{B''B'''}:u\in \mathscr {U} \}\) are the \((\dim \mathscr {H}_{B'})^2\) projectors onto the Bell states, and \(\{W^u_{B'''}:u\in \mathscr {U}\}\) are suitable isometries from \(\mathscr {H}_{B'''}\) to \(\mathscr {H}_{B'}\). But then, using Eq. (45) with \(Q^u_{B''B'}=\beta ^u_{B''B'}\), we obtain
Hence, defining a quantum channel \(\varPsi :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) as
we finally have that \(\varPhi '=\varPsi \circ \varPhi \), as claimed. \(\square \)
Remark 9
Theorem 3 holds also if the identity channel \({\text {id}}_{B''}\) is replaced by a complete channel, namely, a channel \(\Upsilon :\mathsf {L}(\mathscr {H}_{B''})\rightarrow \mathsf {L}(\mathscr {H}_{B''})\) that is bijective (in the sense of the linear map): linearly independent inputs are transformed into linearly independent outputs. This is so because linearly independent states \(\xi ^x_{B''}\) in Eq. (47) remain linearly independent after the action of \(\Upsilon \). In this way, the proof can continue along the same lines.
We notice, in particular, that a channel can be complete despite being entanglement breaking or measure-and-prepare. This implies that the ensembles used to probe channels \({\text {id}}_{B''}\otimes \varPhi _A\) and \({\text {id}}_{B''}\otimes \varPhi '_A\) can always be chosen, without loss of generality, to comprise separable states only. \(\square \)
6 The Computational Second Law: An Analogy
The aim of this section is to construct an analogy, clarifying and somehow strengthening that given by Cover [1], between data-processing theorems and the second law of thermodynamics. In what follows we abandon a formally rigorous language, preferring instead a generic language better suited to highlight the similarities and differences between thermodynamics and information theory.
Theorems 1, 2, and 3, a part from the formal complications necessary to describe classical and quantum systems together, have all the same simple interpretation that we summarize in two statements (A) and (B):
and
The direct statement corresponds to Cover’s law, as formulated in [1] (see the quotation at the beginning of this paper). Here “useful information” is precisely the information that the signal carries about the message, and it is measured by the conditional min-entropy, which is directly related to the guessing probability. The reverse statement, which is consequence of the reverse data-processing theorems that we proved, corresponds to Lieb’s and Yngvason’s entropy principle [7].
In order to make our discussion more concrete, let us consider a thermodynamical system prepared at time \(t_0\) and evolving through successive times \(t_1\ge t_0\) and \(t_2\ge t_1\), as depicted in Fig. 2. The second law of thermodynamics, in the formulation usually attributed to Clausius, states that the following inequality is necessarily obeyed:
where \(\varDelta H=H(S_2)-H(S_1)\) is the change in thermodynamical entropy of the system and \(\varDelta Q\) is the heat absorbed by the system.Footnote 5 The above equation basically says that the only way to decrease the entropy of a system is to extract heat from the system. This implies that, if a system is adiabatically isolated (i.e., no heat is exchanged, only mechanical work), then its entropy cannot decrease. Equivalently stated: a decrease in entropy represents a definite witness of the fact that the system is not adiabatically isolated and is dumping heat in the environment.
This part of the second law can be seen as the analogue of statement (A) above, that is, the usual data-processing inequality. Suppose now that the system S is an information signal. As before, it is prepared at time \(t_0\) and then it undergoes a process that is information-theoretic, rather than thermodynamical. If we observe the signal at two times \(t_1\ge t_0\) and \(t_2\ge t_1\), then we know that if the process is Markov local, then the data-processing inequality holds, namely, the information carried by the signal cannot increase going from \(t_1\) to \(t_2\). Therefore, any increase in the information carried by the signal is a definite witness of the fact that the process is not Markov local, namely, that an external memory was used as a side resource at some point along the process.
We now come to the reverse statement (B), arguing that it is the analogue of Lieb’s and Yngvason’s entropy principle. The latter states that, assuming the validity of a set of axioms about simple thermodynamical systems,Footnote 6 a non-decreasing entropy between \(t_1\) and \(t_2\) is not only necessary (Clausius’ principle) but also sufficient for the existence of an adiabatic process between the two times. It is clear that the analogy works in this case too: the reverse data-processing theorems we proved constitute the information-theoretic analogue of Lieb’s and Yngvason’s entropy principle. An overview of the analogy is summarized in the table below.
Despite the tantalizing analogies, there are however two points (at least) that we should keep in mind before jumping to rash conclusions. The first one is that, while in thermodynamics a process is usually given by an initial state and a final state, in information theory a process is a channel, which acts upon any input it receives.
The second point is that the relation presented here between adiabaticity and Markov locality (or memorylessness) has been discussed only on a formal level, but no claim has been made about any quantitative relation between the two concepts. However, we would like to conclude this paper speculating about the possibility to envisage a deeper relation between adiabaticity and Markov locality, going beyond the formal analogy presented above. An adiabatically isolated system cannot exchange heat, but can interact with a mechanical device and exchange work with it. Since it is possible to imagine a purely mechanical memory (at least of finite size), it seems that the presence of a memory, in itself, should not violate adiabaticity. But then, a scenario similar to that of Maxwell’s demon immediately comes to mind. Indeed, Maxwell’s demon violates the second law using nothing but its memory: its actions, included the measurements it performs, are assumed to be otherwise perfectly adiabatic.Footnote 7 Hence, it seems that adiabaticity does not play well with the presence of an external memory, even if this is taken to be perfectly mechanical. This fact suggests that adiabaticity and Markov locality may be even closer than what the analogies in Table 1 prima facie seem to suggest. This and other questions are left open for future investigations.
Notes
- 1.
Since the entropy of a distribution p is the negative of the relative entropy of p with respect to the uniform distribution, it is clear that Cover’s computational second law formally constitutes a relaxation of the second law of thermodynamics. Indeed, the former is satisfied in situations violating the latter. We will say more about the relation between thermodynamical and computational second laws in Sect. 6.
- 2.
- 3.
More on this point can be found in Sect. 6.
- 4.
In this work, memoryless process, Markov local process, and computationally isolated process are all synonyms. We prefer however to maintain all three terms because they in fact highlight different aspects of the same information-theoretic concept.
- 5.
In the precise sense that \(\varDelta Q\) is positive if heat is injected into the system and negative if heat is extracted from the system; see, e.g., Ref. [29].
- 6.
The most important and debated of which is the comparability hypothesis: the interested reader may refer to Uffink [30].
- 7.
Thus the demon can be imagined as a “perfect clockwork.”
References
Cover, T. M.: Which processes satisfy the second law. In Physical Origins of Time Asymmetry (eds. Halliwell, J. J., Pérez-Mercader, J. & Zurek, W. H.) 98–107 (Cambridge University Press, 1996).
Csiszár, I., Körner, J.: Information Theory. (Cambridge University Press, Second Edition, 2011).
Cover, T. M., Thomas, J. A.: Elements of Information Theory. (John Wiley & Sons, Second Edition, 2006)
Merhav, N.: Physics of the Shannon limits. IEEE Trans. Inform. Theory 56(9), pages 4274–4285 (2010).
Merhav, N.: Data processing theorems and the second law of thermodynamics. IEEE Trans. on Inform. Theory 57(8), pages 4926–4939 (2011).
Merhav, N.: Statistical physics and information theory. Foundations and Trends in Communications and Information Theory 6(1-2), pages 1–212 (2009).
Lieb, E. H., Yngvason, J.: The physics and mathematics of the second law of thermodynamics. Physics Reports 310, 1–96 (1999).
Shannon, C. E.: A note on a partial ordering for communication channels. Information and Control 1, 390–397 (1958).
Körner, J., and Marton, K.: Comparison of two noisy channels. Topics in information theory, (16):411–423, 1977.
Buscemi, F.: Comparison of Quantum Statistical Models: Equivalent Conditions for Sufficiency. Commun. Math. Phys. 310, 625–647 (2012).
Buscemi, F.: All Entangled Quantum States Are Nonlocal. Phys. Rev. Lett. 108, 200401 (2012).
Buscemi, F., Datta, N., Strelchuk, S.: Game-theoretic characterization of antidegradable channels. Journal of Mathematical Physics 55, 92202 (2014).
Buscemi, F.: Complete Positivity, Markovianity, and the Quantum Data-Processing Inequality, in the Presence of Initial System-Environment Correlations. Phys. Rev. Lett. 113, 140502 (2014).
Buscemi, F., Datta, N.: Equivalence between divisibility and monotonic decrease of information in classical and quantum stochastic processes. Phys. Rev. A 93, 12101 (2016).
Buscemi, F.: Degradable channels, less noisy channels, and quantum statistical morphisms: an equivalence relation. Probl. Inf. Trans. to appear. arXiv:1511.08893 [quant-ph].
Shmaya, E.: Comparison of information structures and completely positive maps. J. Phys. A: Math. Gen. 38, 9717 (2005).
Chefles, A.: The Quantum Blackwell Theorem and Minimum Error State Discrimination. arXiv:0907.0866 [quant-ph] (2009).
Blackwell, D.: Equivalent Comparisons of Experiments. The Annals of Mathematical Statistics, 24(2):265–272, 1953.
Torgersen, E.: Comparison of Statistical Experiments. Encyclopedia of Mathematics and its Applications. (Cambridge University Press, 1991).
Cohen, J.E., Kemperman, J.H.B., and Zbăganu, G.: Comparisons of Stochastic Matrices, with Applications in Information Theory, Statistics, Economics, and Population Sciences. (Birkhäuser, 1998).
Liese, F., Miescke, K.-J.: Statistical Decision Theory. (Springer New York, 2008).
Raginsky,M.: Shannon meets Blackwell and Le Cam: Channels, codes, and statistical experiments. 2011 IEEE International Symposium on Information Theory Proceedings, pages 1220–1224, July 2011.
El Gamal, A. A.: Broadcast Channels With And Without Feedback. Circuits, Systems and Computers, 1977. Conference Record. 1977 11th Asilomar Conference on, pages 180–183, 1977.
Morse, N., and Sacksteder, R.: Statistical Isomorphism. Ann. Math. Stat. 37, 203-214 (1966).
Čencov, N. N.: Statistical Decision Rules and Optimal Inference. (American Mathematical Society, 1982).
Matsumoto, K.: An example of a quantum statistical model which cannot be mapped to a less informative one by any trace preserving positive map. arXiv:1409.5658 [quant-ph, stat] (2014).
Frenk, J. B. G., Kassay, G., and Kolumbán, J.: On equivalent results in minimax theory. European Journal of Operational Research 157, 46–58 (2004).
Jenčová, A.: Comparison of quantum channels and statistical experiments. ISIT 2016. arXiv:1512.07016 [quant-ph].
Borgnakke, C., Sonntag, R. E.:Fundamentals of Thermodynamics. (John Wiley & Sons, 2009).
Uffink, J.: Bluff Your Way in the Second Law of Thermodynamics. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 32, 305–394 (2001).
Wilde, M. M.:Quantum Information Theory. (Cambridge University Press, 2013.) Also available at arXiv:1106.1445 [quant-ph].
Tomamichel, M.: Quantum Information Processing with Finite Resources. SpringerBriefs in Mathematical Physics 5, 2016.
Rockafellar, R.T.: Convex Analysis. (Princeton University Press, 1970).
Acknowledgements
It is a pleasure to thank, in alphabetical order, Ettore Bernardi, Jeremy Butterfield, Weien Chen, Giulio Chiribella, Giacomo Mauro D’Ariano, Nilanjana Datta, Gábor Hofer-Szabó, Koji Maruyama, Keiji Matsumoto, Milán Mosonyi, Masanao Ozawa, Veiko Palge, Paolo Perinotti, David Reeb, and Mark Wilde, whose comments helped in shaping the present ideas at various stages during the past few years. This work was supported in part by JSPS KAKENHI, grants no. 26247016 and no. 17K17796. Support from the program for FRIAS-Nagoya IAR Joint Project Group is also acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: Definitions and Notations
Here we review some basic notions and clarify the notation that is used in the paper. The reader familiar with the standard toolbox used in quantum information theory (see, e.g., Ref. [31]) can safely skip to the next section.
All set and spaces considered here are finite or finite dimensional. We denote sets as \(\mathscr {X},\mathscr {Y},\mathscr {Z},\mathscr {U},\dots \) and their elements as \(x,y,z,u,\dots \;\). Sets support probability distributions, for example, p(x). When we speak of a random variable, for example, X, we mean that it is supported by the set \(\mathscr {X}\), in the sense that its states are labeled by \(x\in \mathscr {X}\), and that each state can occur with probability \(p(x)={\text {Pr}}\{X=x\}\). When a pair (or a triple etc) of random variables are considered, we write (X, Y) to mean a bipartite random variable supported on the cartesian product \(\mathscr {X}\times \mathscr {Y}=\{(x,y):x\in \mathscr {X},y\in \mathscr {Y}\}\) and distributed with joint probability p(x, y). Classical noisy channels are represented by conditional input–output probability distributions w(y|x): in this case we understand that the channel w has input alphabet \(\mathscr {X}\) and output alphabet \(\mathscr {Y}\) and write \(w:\mathscr {X}\rightarrow \mathscr {Y}\).
Quantum systems are labeled by \(A,B,C,\dots \) and their corresponding finite dimensional Hilbert spaces are denoted as \(\mathscr {H}_A,\mathscr {H}_B,\mathscr {H}_C,\dots \;\). The set of linear operators on a Hilbert space \(\mathscr {H}\) is denoted as \(\mathsf {L}(\mathscr {H})\), the set of Hermitian operators as \(\mathsf {L}_H(\mathscr {H})\), the set of positive semidefinite operators as \(\mathsf {L}_+(\mathscr {H})\), and the set of density operators (or states), i.e., positive semidefinite with unit trace, as \(\mathsf {D}(\mathscr {H})\). Vectors are denoted as kets \(\left| \phi \right\rangle \), while if we write \(\phi \) we mean the corresponding state, that is, the projector \(\left| \phi \right\rangle \left\langle \phi \right| \). Given an orthonormal basis \(\{\left| x\right\rangle :x\in \mathscr {X}\}\) for a Hilbert space, we sometime call the set of orthogonal projectors \(\left| x\right\rangle \left\langle x\right| \) “flags,” since these can be used to model a classical random variable with distinguishable states. For example, given a random variable X with states \(x\in \mathscr {X}\) and distribution p(x), we will often think of it as“embedded” in a Hilbert space \(\mathscr {H}_X\), with \(\dim \mathscr {H}_X=|\mathscr {X}|\), and described by the state \(\sum _xp(x)\left| x\right\rangle \left\langle x\right| \). This is a convention commonly used in quantum information theory as it significantly simplifies the analysis of hybrid classical-quantum scenarios.
A family \(\{P^x_A:x\in \mathscr {X}\}\) of operators \(P^x_A\in \mathsf {L}_+(\mathscr {H}_A)\) such that \(\sum _xP^x_A=I_A\) is called a POVM on \(\mathscr {H}_A\). An ensemble is given by giving a set \(\mathscr {X}\), a probability distribution p(x) and a family of states \(\rho ^x_A\in \mathsf {L}(\mathscr {H}_A)\): we denote it for brevity as \(\{p(x);\rho ^x_A \}\), where the set \(\mathscr {X}\) is usually understood from the context. Extending the idea mentioned in the preceding paragraph of embedding classical random variables in orthogonal states of a suitable Hilbert space, it is also common to interpret an ensemble as a bipartite state as follows:
A linear map \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\) is said to be a quantum channel if and only if it is completely positive and trace-preserving. Given a linear map \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\), its trace-dual \(\varPhi ^\dag :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_A)\) is the linear map defined by the relation
for all \(X\in \mathsf {L}(\mathscr {H}_A)\) and all \(Y\in \mathsf {L}(\mathscr {H}_B)\). \(\varPhi \) is a channel if and only if \(\varPhi ^\dag \) is completely positive and unit-preserving, i.e., \(\varPhi ^\dagger _B(I_B)=I_A\).
Given a pair of random variables (X, U), the guessing probability of U given X is
where the optimization is done over all channels (decoding strategies) \(\varphi :\mathscr {X}\rightarrow \mathscr {U}\). In other words, it is the probability of correctly guessing U using the ideal observer decoding strategy on X.
The quantum analogue of this is the problem of correctly guessing U given an ensemble of quantum states \(\{p(u);\rho ^u_A \}\;\). In this case, the role of datum X is played by the quantum system A and the guessing probability is
where the optimization is done over all POVMs \(\{P^u_A:u\in \mathscr {U}\}\). Notice that in this paper we only consider the case of guessing a classical random variable given a quantum system, so in the expression \(P_\mathrm{{guess} } (U|A)\) the roles of U (random variable) and A (quantum system) should always be clearly understandable from the context.
1.1 Entropies
The letter H is used to denote the entropy. More precisely, in the case of classical random variables \(H(X)\triangleq -\sum _xp(x)\log _2p(x)\); in the case of a quantum state \(\rho _A\), \(H(A)\triangleq -\sum _i\lambda _i\log _2\lambda _i\), where the \(\lambda \)’s are the eigenvalues of \(\rho _A\). Following common terminology, the entropy of a classical variable is called the Shannon entropy, while the entropy of a state is called the von Neumann entropy.
Given a pair of random variables (X, Y), the conditional entropy is \(H(X|Y)=H(XY)-H(Y)\) and the mutual information is \(I(X;Y)=H(X)+H(Y)-H(XY)=H(X)-H(X|Y)\). Given a bipartite state \(\rho _{AB}\in \mathsf {D}(\mathscr {H}_A\otimes \mathscr {H}_B)\), all the definitions are extended by analogy, for example, \(H(A|B)=H(AB)-H(B)\), where H(AB) is the von Neumann entropy of \(\rho _{AB}\) and H(B) is the von Neumann entropy of the reduced state \(\rho _B={\text {Tr}}_{A}\!\left[ {\rho _{AB}}\right] \).
von Neumann and Shannon entropies are not the only entropies that are relevant in information theory. Lately, in particular, alternative entropies have been found to play a central role in various information-theoretic scenarios. Such entropies, whose classification is beyond the scope of this work, include for example Rényi entropies and, in particular, min- and max-entropies, see, e.g., Ref. [32]. The one that is relevant for this work is the so-called conditional min-entropy which is given by
in the case of two classical random variables, and
in the case of an ensembles of quantum states. In fact, \(H_\mathrm{{min} } (U|A)\) is the conditional min-entropy of the classical-quantum state \(\rho _{XA}\) defined in Eq. (54).
Appendix 2: The Minimax Theorem
Here we state a form of the Minimax Theorem as needed in the proof of Theorem 1, see, e.g., Lemma 4.13 in Ref. [21]:
Theorem 4
Let \(\mathscr {S}\subset \mathbb {R}^s\) be a closed convex set and \(\mathscr {L}\subset \mathbb {R}^d\) be a polytope. If \(f:\mathscr {S}\times \mathscr {L}\rightarrow \mathbb {R}\) is continuous and satisfies
for all \(\alpha \in [0, 1]\), \(y,y_1,y_2\in \mathscr {S}\), and \(z,z_1,z_2\in \mathscr {L}\), then
In proving Theorem 1 we specialize the above statement to the case in which \(\mathscr {S}\) is the set of classical channels \(\varphi :Y\rightarrow Z\) (indeed convex and closed) and \(\mathscr {L}\) is the set of joint probability distributions on \(\mathscr {X}\times \mathscr {Z}\) (indeed a polytope). Last thing to check is that conditions (61) and (62) hold: this is a consequence of the fact that the function in the case considered is actually linear in both its variables.
Appendix 3: The Separation Theorem
Here we give an elementary geometrical proof of the Hahn-Banach separation theorem in its simplest case, i.e. where the sets considered are closed and bounded. For a more general treatment the interested reader may refer to, e.g., Ref. [33].
Theorem 5
Let \(C\in \mathbb {R}^n\) be a closed and bounded convex set, and let \(y\in \mathbb {R}^n\) be a vector that does not belong to C, i.e. \(y\notin C\). Then, there exists a vector \(k\in \mathbb {R}^n\) and a constant \(\alpha \in \mathbb {R}\) such that \(k\cdot x<\alpha <k\cdot y\), for all \(x\in C\). We say that the hyperplane \(\mathscr {L}:=\{z\in \mathbb {R}^n:z\cdot k=\alpha \}\) separates C and y strictly.
Proof
Let \(x_0\in C\) be a point such that
Its existence is guaranteed by the Weierstrass’ extreme value theorem. The strict inequality comes from the fact that \(y\notin C\), by assumption.
Let us now define
and
We note now that
and that
Now, let us consider any \(x\in C\). By convexity, \((1-p)x_0+px\in C\), for any \(p\in [0, 1]\). Then, we have that
where we used the formula \(\left| \left| {w_0+w_1}\right| \right| ^2= \left| \left| {w_0}\right| \right| ^2+\left| \left| {w_1}\right| \right| ^2+2w_0\cdot w_1\), valid for all \(w_0,w_1\in \mathbb {R}^n\). Therefore,
Let us now consider the case \(p\ne 0\). Then,
and, taking the limit for \(p\rightarrow 0\), we finally obtain
which implies that \(k\cdot x\le k\cdot x_0<\alpha <k\cdot y\), for any \(x\in C\), as claimed. \(\square \)
For our purpose the following reformulation of Theorem 5 is particularly useful:
Corollary 2
Let \(C_1\) and \(C_2\) be two closed and bounded convex sets in \(\mathbb {R}^n\). Then, \(C_1\supseteq C_2\) if and only, for every vector \(k\in \mathbb {R}^n\),
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Buscemi, F. (2018). Reverse Data-Processing Theorems and Computational Second Laws. In: Ozawa, M., Butterfield, J., Halvorson, H., Rédei, M., Kitajima, Y., Buscemi, F. (eds) Reality and Measurement in Algebraic Quantum Theory. NWW 2015. Springer Proceedings in Mathematics & Statistics, vol 261. Springer, Singapore. https://doi.org/10.1007/978-981-13-2487-1_6
Download citation
DOI: https://doi.org/10.1007/978-981-13-2487-1_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2486-4
Online ISBN: 978-981-13-2487-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)