Keywords

1 Introduction

Cover, in the attempt to set the second law of thermodynamics in a computational framework, concludes his work with the following suggestive observations [1]:

The second law of thermodynamics says that uncertainty increases in closed physical systems and that the availability of useful energy decreases. If one can make the concept of “physical information” meaningful, it should be possible to augment the statement of the second law of thermodynamics with the statement,“useful information becomes less available.” Thus the ability of a physical system to act as a computer should slowly degenerate as the system becomes more amorphous and closer to equilibrium. A perpetual computer should be impossible [emphasis added].

Cover’s analysis can be summarized as follows. He first argues, more or less implicitly, that the computational analogue of an adiabatically isolated system should be taken to be a system evolving—i.e., computing—without an external memory. (For this reason, in what follows we use the term “computationally isolated” as a synonym for“memoryless.”) This observation leads him to consider stochastic memoryless processes, in particular discrete-time Markov chains. Cover then shows that, while entropy can increase or decrease in this setting, thus violating the thermodynamical second law, relative entropy instead never increases. We refer to this statement as Cover’s “computational second law.Footnote 1” On a technical side, what Cover proves in [1] is an expression of the monotonicity of the relative entropy under the action of a noisy channel. Thus Cover’s second law is in fact a particular data-processing inequality [2, 3], and we can imagine that there are as many computational second laws as there are data-processing inequalities, all formalizing the idea that the information content of a system cannot increase without the presence of an external memory.Footnote 2

Cover hence shows that the condition of being memoryless is sufficient for a system to obey data-processing inequalities, i.e., computational second laws. The question we address in this paper concerns the other direction: is it possible to show that the memoryless condition is also necessary for the validity of all data-processing inequalities? Equivalently stated: is it true that a system, if it is not computationally isolated, will necessarily violate some data-processing inequality? It is important to address these questions, if we want to understand how far the analogy between memorylessness and adiabaticity can be pushed. Here, in particular, we have in mind Lieb’s and Yngvason’s formulation of the second law of thermodynamics [7], according to which a non-decreasing entropy is not only necessary but also sufficient for the existence of an adiabatic process connecting two thermodynamical states.Footnote 3

The aim of this paper is to provide a comprehensive framework that is able to answer the above questions. More specifically, we prove here a family of reverse data-processing theorems, showing that as soon as a system is not computationally isolated, it must necessarily violate a data-processing inequality. The framework we construct is quite general and it can be applied to classical, quantum, and hybrid classical/quantum systems. In fact, it may even be extended in principle to generalized operational theories as it involves only basic notions like states, effects, and operations; this development is however beyond the scope of the present work.

Thus we are able to strengthen Cover’s computational second law in two ways: on the one hand, we give it a converse, in a way that is analogous to what Lieb and Yngvason did the second law of thermodynamics. On the other hand, we include in the analysis the possibility of dealing with quantum systems and quantum memories.

The paper is organized as follows. We being in Sect. 2 with reviewing the data-processing inequality for a classical Markov chain. This is the encoding–channel–decoding model considered by Shannon to describe the simplest communication scenario. In this scenario we prove our first reverse data-processing theorem. We also show how this relates with the theory of comparison of noisy channels, as introduced by Shannon [8] and later developed by Körner and Marton [9]. In Sect. 3 we state and prove a lemma that allows us to extend our considerations to the quantum case, and discuss the notion of quantum statistical morphisms. In Sect. 4 we study the case of a system, processing quantum information but outputting only classical data, and prove the corresponding reverse data-processing theorem. Section 5 presents the general case of a fully quantum computer, i.e., a process with quantum inputs and quantum outputs. Finally, in Sect. 6, we briefly discuss about analogies and differences between thermodynamical and computational second laws. In particular, we speculate about the possibility that Maxwell’s paradox (his “demon”) may enable a deeper relation between adiabatic processes and memoryless processes, going beyond the formal analogy considered in this work. At the end of the paper, three appendices are available: the first, reviewing conventions, notations, and terminology used in this work; the second, containing a version of the minimax theorem; and the third, presenting (just for the sake of completeness) an elementary proof of the separation theorem for convex sets.

This work contains ideas that were presented during the Sixth Nagoya Winter Workshop (NWW2015) held in Nagoya on 9–13 March 2015. Part of the technical results presented here were first introduced in previous papers by the author [10,11,12,13,14,15], building upon works of Shmaya [16] and Chefles [17].

2 A Reverse-Data Processing Theorem for Classical Channels

A data-processing inequality is a mathematical statement formalizing the fact that the information content of a signal cannot be increased by post-processing. As there are many ways to quantify information, so there are many corresponding data-processing inequalities. Such inequalities, however, despite formalizing the same intuitive concept, are not all logically equivalent: some may be stronger than (i.e., imply) others, some may be easier to prove, some may be better suited for a particular problem at hand. Data-processing inequalities usually find application in information theory when proving that a given approach (coding strategy) is optimal: if a better coding were possible, that would result in the violation of one or more data-processing inequalities, thus leading to an absurd. In this sense, data-processing inequalities provide a sort of “sanity check” of the result.

One of the simplest scenarios in which a data-processing inequality can be formulated is the following [2, 3]. Given are two noisy channels \(w_1:\mathscr {X}\rightarrow \mathscr {Y}\) and \(w_2:\mathscr {Y}\rightarrow \mathscr {Z}\). Then, for any set \(\mathscr {U}\) and any initial joint distribution p(xu), the joint distribution \(\sum _xw_2(z|y)w_1(y|x)p(x,u)\) satisfies the following inequalities:

$$ I(U;Y)\ge I(U;Z)\;. $$

[Notations and definitions used here and in what follows are given for completeness in Appendix 1.] Referring to the situation depicted in Fig. 1 and interpreting U as the message, X as the signal, \(w_1\) as the communication channel, Y as the output signal, \(w_2\) as the decoding, and Z as the recovered message, the above inequality formalizes the fact that the information content carried by the signal about the message cannot be increased by any decoding performed locally at the receiver. Of course, this does not mean that decoding should be avoided (actually, in most cases a decoding is necessary to make the signal readable to the receiver), but that no decoding is able to add a posteriori more information to what is already carried by the signal.

Fig. 1
figure 1

Shannon’s basic communication scheme: a message U is encoded on the signal X (i.e., a joint distribution (UX) is given), which is transmitted to the receiver via the communication channel \(w_1\). The receiver obtains the output Y and processes it according to the decoding function (another channel \(w_2\)) to obtain the recovered message Z

Data-processing inequalities hence provide necessary conditions for the “locality” of the information-processing device. Namely, data-processing inequalities must be obeyed whenever the physical process carrying the message from the sender to the receiver is composed by computationally isolated parts (encoding, transmission, decoding, etc.). Any information that is communicated must be transmitted via a physical signal: as such, in the absence of an external memory, information can only decrease, never increase, along the transmission. Hence,“locality” in this sense can be understood as the condition that the process \(U\rightarrow X\rightarrow Y\rightarrow Z\) forms a Markov chain. For this reason, we refer to such locality as “Markov locality,” in order to avoid confusion with other connotations of the word.Footnote 4

In this paper we aim to derive statements that provide sufficient conditions for Markov locality, in the form of a set of information-theoretic inequalities. We refer to such statements as reverse data-processing theorems. For example, a first attempt in this direction would be to prove the following:

Given are two noisy channels \(w:\mathscr {X}\rightarrow \mathscr {Y}\) and \(w':\mathscr {X}\rightarrow \mathscr {Z}\). Suppose that, for any set \(\mathscr {U}\) and for any initial joint distribution p(xu), the resulting distributions \(\sum _xw(y|x)p(x,u)\) and \(\sum _xw'(z|x)p(x,u)\) always satisfy the inequality \(I(U;Y)\ge I(U;Z )\). Then there exists a noisy channel \(\varphi :\mathscr {Y}\rightarrow \mathscr {Z}\) such that \(w'(z|x)=\sum _y\varphi (z|y)w(y|x)\).

Notice that, in the above statement, the two given channels w and \(w'\) are assumed to have the same input alphabet: this is a consequence of the fact that we are now formulating a reverse data-processing theorem, so that the existence of a Markov-local decoding (the channel \(\varphi \)) is something to be proved, rather than being a datum. Interpreting the four random variables (UXYZ) as before, if the reverse data-processing theorem holds, then we can conclude that any violation of Markov locality is detectable, in the precise sense that the data-processing inequality has to be violated at some point along the communication process.

2.1 Comparison of Noisy Channels

A reverse data-processing theorem can be understood as a statement about the comparison of two noisy channels. Hence we want to introduce ordering relations between noisy channels, capturing the idea that one channel is able to transmit “more information” than another. This problem, first considered by Shannon [8], is intimately related to the theory of statistical comparisons [18,19,20,21], even though this connection was not made until recently [22]. The theory of comparison of noisy channels received a thorough treatment by Körner and Marton, who in Ref. [9] introduce the following definitions (the notation used here follows [23]):

Definition 1

Given are two noisy channels, \(w:\mathscr {X}\rightarrow \mathscr {Y}\) and \(w':\mathscr {X}\rightarrow \mathscr {Z}\).

  1. (i)

    the channel w is said to be less noisy than \(w'\) if and only if, for any set \(\mathscr {U}\) and any joint distribution p(xu), the resulting distributions \(\sum _x w(y|x)p(x,u)\) and \(\sum _x w'(z|x)p(x,u)\) always satisfy the inequality

    $$\begin{aligned} H(U|Y)\le H(U|Z)\;; \end{aligned}$$
    (1)
  2. (ii)

    the channel w is said to be degradable into \(w'\) if and only if there exists another channel \(\varphi :\mathscr {Y}\rightarrow \mathscr {Z}\) such that

    $$\begin{aligned} w'(z|x)=\sum _y\varphi (z|y)w(y|x)\;. \end{aligned}$$
    (2)

   \(\square \)

Since \(I(U;Y)\ge I(U;Z)\) if and only if \(H(U|Y)\le H(U|Z)\), we immediately notice that the reverse data-processing theorem, as tentatively formulated above, is equivalent to the implication (i)\(\implies \)(ii): indeed, the reverse implication, (ii)\(\implies \)(i), is the usual data-processing inequality. Körner and Marton provide an explicit counterexample showing that

(3)

This means that, if a reverse data-processing theorem holds, it must be formulated differently.

2.2 Replacing H with \(H_\mathrm{{min} } \)

Even though we know that “less noisy” does not imply“degradable,” in what follows we show that just a slight formal modification in the definition of “less noisy, ” Eq. (1), is enough to obtain the sought-after reverse data-processing theorem. Such a slight modification consists in replacing, in point (i) of Definition 1, the Shannon conditional entropy \(H(\cdot |\cdot )\) with the conditional min-entropy \(H_\mathrm{{min} } (\cdot |\cdot )\).

Theorem 1

Given are two noisy channels \(w:\mathscr {X}\rightarrow \mathscr {Y}\) and \(w':\mathscr {X}\rightarrow \mathscr {Z}\). The following are equivalent:

  1. (i)

    for any set \(\mathscr {U}\) and for any initial joint distribution p(xu), the resulting distributions \(\sum _x w(y|x)p(x,u)\) and \(\sum _x w'(z|x)p(x,u)\) always satisfy the inequality

    $$\begin{aligned} H_\mathrm{{min} } (U|Y)\le H_\mathrm{{min} } (U|Z)\;; \end{aligned}$$
    (4)
  2. (ii)

    w is degradable into \(w'\), namely, there exists another channel \(\varphi :\mathscr {Y}\rightarrow \mathscr {Z}\) such that \(w'(z|x)=\sum _y\varphi (z|y)w(y|x)\;\).

Proof

(ii)\(\implies \)(i) is a direct consequence of the data-processing inequality for \(H_\mathrm{{min} } \). Suppose that there exists another conditional probability distribution \(\varphi (z|y)\) such that \(w'(z|x)=\sum _y\varphi (z|y)w(y|x)\). This means that the random variable Z is obtained locally from Y, i.e., the four random variables (UXYZ) form a Markov chain \(U\rightarrow X\rightarrow Y\rightarrow Z\). This implies that (4) holds.

In order to prove (i)\(\implies \)(ii), let us assume that the inequality in (4) holds for any initial joint distribution p(xu). Exponentiating both sides, and using Eq. (59), this is equivalent to

$$\begin{aligned} P_\mathrm{{guess} } (U|Y)\ge P_\mathrm{{guess} } (U|Z)\;, \end{aligned}$$
(5)

namely,

$$\begin{aligned} \max _{\varphi }\sum _{u,y,x}\varphi (u|y)w(y|x)p(x,u) \ge \max _{\varphi '}\sum _{u,z,x}\varphi '(u|z)w'(z|x)p(x,u)\;, \end{aligned}$$
(6)

for all choices of p(xu). In the above equation, the noisy channels \(\varphi \) and \(\varphi '\) represent the decision functions that the statistician designs in order to optimally guess the value of U.

Let us choose U such that its support coincides with that of Z, i.e., \(\mathscr {U}\equiv \mathscr {Z}\). We can therefore denote its states by \(z'\). Let us also fix the guessing strategy on the right-hand side of (6) to be \(\varphi '(z'|z)\equiv \delta _{z',z}\), i.e., 1 if \(z'=z\) and 0 otherwise. Then, we know that there exists a decision function \(\varphi (z'|y)\) such that

$$\begin{aligned} 0&\ge \sum _{z',z,x}\delta _{z',z}w'(z|x)p(x,z')-\sum _{z',y,x}\varphi (z'|y)w(y|x)p(x,z')\end{aligned}$$
(7)
$$\begin{aligned}&=\sum _{z',x}w'(z'|x)p(x,z')-\sum _{z',y,x}\varphi (z'|y)w(y|x)p(x,z')\end{aligned}$$
(8)
$$\begin{aligned}&=\sum _{z',x}\left[ w'(z'|x)p(x,z')-\sum _y\varphi (z'|y)w(y|x)p(x,z') \right] \end{aligned}$$
(9)
$$\begin{aligned}&=\sum _{z',x}\left[ w'(z'|x)-\sum _y\varphi (z'|y)w(y|x) \right] p(x,z')\;. \end{aligned}$$
(10)

In other words, for any \(p(x,z')\), there exists a \(\varphi (z'|y)\) such that the above inequality holds. This is equivalent to say that

$$\begin{aligned} \max _{p}\min _{\varphi }\sum _{z',x}\left[ w'(z'|x)-\sum _y\varphi (z'|y)w(y|x) \right] p(x,z')\le 0\;. \end{aligned}$$
(11)

We now invoke the minimax theorem (in the form reported in Appendix 2, Theorem 4) and exchange the order of the two optimizations:

$$\begin{aligned} \min _{\varphi }\max _{p}\sum _{z',x}\left[ w'(z'|x)-\sum _y\varphi (z'|y)w(y|x) \right] p(x,z')\le 0\;. \end{aligned}$$
(12)

Let us now introduce the quantity

$$\begin{aligned} \varDelta _\varphi (z',x)\triangleq w'(z'|x)-\sum _y\varphi (z'|y)w(y|x)\;. \end{aligned}$$
(13)

First of all, we notice that the maximum in Eq. (12) is reached when the distribution \(p(x,z')\) is entirely concentrated on an entry where \(\varDelta _\varphi (z',x)\) is maximum, that is,

$$\begin{aligned} 0&\ge \min _{\varphi }\max _{p}\sum _{z',x}\left[ w'(z'|x)-\sum _y\varphi (z'|y)w(y|x) \right] p(x,z')\end{aligned}$$
(14)
$$\begin{aligned}&= \min _{\varphi }\max _{z',x}\varDelta _\varphi (z'x)\;. \end{aligned}$$
(15)

In general, \(\varDelta _\varphi (z',x)\) does not have a definite sign, however, since \(\sum _{z',x}\varDelta _\varphi (z',x)=0\) (as a consequence of the normalization of probabilities), it must be that \(\max _{z',x}\varDelta _\varphi (z',x)\ge 0\) (otherwise, of course, one would have \(\sum _{z',x}\varDelta _\varphi (z',x)<0\)). The above inequality hence implies that \(\min _\varphi \max _{z',x}\varDelta _\varphi (z',x)=0\). In turns this implies, again because \(\sum _{z',x}\varDelta _\varphi (z',x)=0\), that \(\varDelta _\varphi (z',x)=0\) for all \(z'\) and x. In other words, we showed that there exists a \(\varphi (z'|y)\) such that

$$\begin{aligned} w'(z'|x)=\sum _y\varphi (z'|y)w(y|x), \end{aligned}$$
(16)

for all \(z',x\), which coincides with the definition of degradability.   \(\square \)

Remark 1

From the proof we see that in point (ii) of Theorem 1 it is possible to restrict, without loss of generality, the random variable U to be supported on the set \(\mathscr {Z}\), i.e., the same supporting the output of \(w'\).   \(\square \)

3 The Fundamental Lemma for Quantum Channels

The following lemma plays a crucial role in the derivation of reverse data-processing theorems valid in the quantum case.

Lemma 1

Let \(\varPhi _A:\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi '_A:\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) be two quantum channels. For any set \(\mathscr {U}=\{u\}\), the following are equivalent:

  1. (i)

    for all ensembles \(\{p(u);\omega ^u_A\}\;\),

    $$\begin{aligned} P_\mathrm{{guess} } \{p(u);\varPhi _A(\omega ^u_A)\}\ge P_\mathrm{{guess} } \{p(u);\varPhi '_A(\omega ^u_A)\}\;; \end{aligned}$$
    (17)
  2. (ii)

    for any POVM \(\{Q^u_{B'}\}\), there exists a POVM \(\{P^u_B\}\) such that

    $$\begin{aligned} {{\text {Tr}}}\!\left[ {\varPhi '_A(\omega _A)\ Q^u_{B'}}\right] ={{\text {Tr}}}\!\left[ {\varPhi _A(\omega _A)\ P^u_B}\right] \;, \end{aligned}$$
    (18)

    for all \(u\in \mathscr {U}\) and all \(\omega _A\in \mathsf {D}(\mathscr {H}_A)\).

Proof

The fact that (ii) implies (i) follows by definition of guessing probability. We therefore prove the converse, namely, that (i) implies (ii).

Let us rewrite condition (17) explicitly as follows: for all ensembles \(\{p(u);\omega ^u_A \}\;\),

$$\begin{aligned} \max _{P}\sum _up(u){{\text {Tr}}}\!\left[ {\varPhi _A(\omega ^u_A)\ P^u_B}\right] \ge \max _{Q}\sum _up(u){{\text {Tr}}}\!\left[ {\varPhi '_A(\omega ^u_A)\ Q^u_{B'}}\right] \;, \end{aligned}$$
(19)

where the maxima are taken over all possible POVMs. Introduce now an auxiliary Hilbert space \(\mathscr {H}_R\cong \mathscr {H}_{A}\), and denote by \(\phi ^+_{RA}\) a fixed maximally entangled in \(\mathsf {D}(\mathscr {H}_R\otimes \mathscr {H}_A)\). Construct then the Choi operators corresponding to channels \(\varPhi \) and \(\varPhi '\), namely,

$$\begin{aligned} \chi _{RB}\triangleq ({\text {id}}_R\otimes \varPhi _A)\phi ^+_{RA}\qquad \text {and}\qquad \chi '_{RB'}\triangleq ({\text {id}}_R\otimes \varPhi '_A)\phi ^+_{RA}\;. \end{aligned}$$
(20)

Noticing that, for any ensemble \(\{p(u);\omega ^u_A\}\) with \(\sum _up(u)\omega ^u_A= I_A/d_A\), there exists a POVM \(\{E^u_R\}\) such that \(p(u)\omega ^u_A={\text {Tr}}_{R}\!\left[ {\phi ^+_{RA}\ (E^u_R\otimes I_A)}\right] \), we immediately see that, if condition (19) above holds, then, for any POVM \(\{E^u_R\}\),

$$\begin{aligned} \max _P\sum _u{{\text {Tr}}}\!\left[ {\chi _{RB}\ (E^u_R\otimes P^u_{B})}\right] \ge \max _Q\sum _u{{\text {Tr}}}\!\left[ {\chi '_{RB'}\ (E^u_R\otimes Q^u_{B'})}\right] \;. \end{aligned}$$
(21)

We now prove that condition (21) above in turns implies that, for any collection of Hermitian operators \(\{O^u_R \}\),

$$\begin{aligned} \max _P\sum _u{{\text {Tr}}}\!\left[ {\chi _{RB}\ (O^u_R\otimes P^u_{B})}\right] \ge \max _Q\sum _u{{\text {Tr}}}\!\left[ {\chi '_{RB'}\ (O^u_R\otimes Q^u_{B'})}\right] \;. \end{aligned}$$
(22)

The crucial observation here is that, given a collection of Hermitian operators \(\{O^u_R\}\;\), we can always derive from it a POVM \(\{E^u_R\}\) given by

$$\begin{aligned} E^u_R\triangleq \frac{1}{\alpha |\mathscr {U}|}\left\{ O^u_R+\alpha I_R-\frac{1}{|\mathscr {U}|}\varSigma _R \right\} \;, \end{aligned}$$
(23)

with \(\varSigma _R\triangleq \sum _uO^u_R\) and \(\alpha >0\) sufficiently large so that \(O^u_R+\alpha I_R-|\mathscr {U}|^{-1}\varSigma _R\) is nonnegative for all u. Therefore, assuming that inequality (21) holds for any POVM \(\{E^u_R \}\), we have that

$$\begin{aligned}&\max _P\sum _u{{\text {Tr}}}\!\left[ {\chi _{RB}\ (O^u_R\otimes P^u_{B})}\right] \nonumber \\&=\alpha |\mathscr {U}|\max _P\sum _u{{\text {Tr}}}\!\left[ {\chi _{RB}\ (E^u_R\otimes P^u_{B})}\right] -\alpha {{\text {Tr}}}\!\left[ {\chi _{RB}}\right] +\frac{1}{|\mathscr {U}|}{{\text {Tr}}}\!\left[ {\chi _{RB}\ (\varSigma _R\otimes I_B)}\right] \nonumber \\&=\alpha |\mathscr {U}|\max _P\sum _u{{\text {Tr}}}\!\left[ {\chi _{RB}\ (E^u_R\otimes P^u_{B})}\right] -\alpha +\frac{1}{|\mathscr {U}|}{{\text {Tr}}}\!\left[ {{\text {Tr}}_{B}\!\left[ {\chi _{RB}}\right] \ \varSigma _R}\right] \nonumber \\&\ge \alpha |\mathscr {U}|\max _Q\sum _u{{\text {Tr}}}\!\left[ {\chi '_{RB'}\ (E^u_R\otimes Q^u_{B'})}\right] -\alpha +\frac{1}{|\mathscr {U}|}{{\text {Tr}}}\!\left[ {{\text {Tr}}_{B'}\!\left[ {\chi '_{RB'}}\right] \ \varSigma _R}\right] \\&=\max _Q\sum _u{{\text {Tr}}}\!\left[ {\chi '_{RB'}\ (O^u_R\otimes Q^u_{B'})}\right] ,\nonumber \end{aligned}$$
(24)

for any collection of Hermitian operators \(\{O^u_R\}\;\). Inequality (24) above is a consequence of condition (21) together with the identity \({\text {Tr}}_{B}\!\left[ {\chi _{RB}}\right] ={\text {Tr}}_{B'}\!\left[ {\chi '_{RB'}}\right] =I_R/d_R\). Hence we showed that condition (22) holds if condition (21) holds, even though the former looks at first sight more general than the latter. The vice versa is true simply because any POVM is, in particular, a family of Hermitian operators.

Let us now denote by \(\mathscr {L}(\mathscr {U})\) the set of operator tuples

$$\begin{aligned} \mathbf {a}\equiv \left( a^u:u\in \mathscr {U} \right) \;,\qquad a^u\in \mathsf {L}_H(\mathscr {H}_R)\;, \end{aligned}$$
(25)

with inner product

$$\begin{aligned} \mathbf {a}\cdot \mathbf {b}\triangleq \sum _u{{\text {Tr}}}\!\left[ {a^u b^u}\right] \;. \end{aligned}$$
(26)

We then define \(\mathscr {C}(\chi ;\mathscr {U})\) as the convex subset of \(\mathscr {L}(\mathscr {U})\) containing tuples \(\mathbf {b}\) such that \(b^u\triangleq {\text {Tr}}_{B}\!\left[ {\chi _{RB}\ (I_R\otimes P^u_B)}\right] \), for varying POVM \(\{P^u_B\}\). [The fact that \(\mathscr {C}(\chi ;\mathscr {U})\) is convex is a direct consequence of the fact that the set of POVMs supported on \(\mathscr {U}\) is convex.] In the same way, we also define \(\mathscr {C}'(\chi ';\mathscr {U})\). For the sake of simplicity of notation, when no confusion arises, we simply denote \(\mathscr {C}(\chi ;\mathscr {U})\) as \(\mathscr {C}\) and \(\mathscr {C}'(\chi ';\mathscr {U})\) as \(\mathscr {C}'\). Using this notation, condition (22) becomes

$$\begin{aligned} \max _{\mathbf {b}\in \mathscr {C}}\mathbf {a}\cdot \mathbf {b}\ge \max _{\mathbf {b}'\in \mathscr {C}'}\mathbf {a}\cdot \mathbf {b}'\;, \end{aligned}$$
(27)

for all \(\mathbf {a}\in \mathscr {L}(\mathscr {U})\). [Here \(a^u=O^u_R\).]

Hence, we turned the initial conditions involving guessing probabilities into a family of linear constraints on two convex sets, \(\mathscr {C}\) and \(\mathscr {C}'\). Then, a direct application of the separation theorem for convex sets (see Corollary 2 in Appendix 3), leads us to conclude that

$$\begin{aligned} \mathscr {C}(\chi ;\mathscr {U})\supseteq \mathscr {C}'(\chi ';\mathscr {U})\;. \end{aligned}$$
(28)

In other words, condition (17) in the statement of the lemma implies that, for any POVM \(\{Q^u_{B'}\}\), there exists a POVM \(\{P^u_B\}\) such that

$$\begin{aligned} {\text {Tr}}_{B}\!\left[ {\chi _{RB}\ (I_R\otimes P^u_B)}\right] ={\text {Tr}}_{B'}\!\left[ {\chi '_{RB'}\ (I_R\otimes Q^u_{B'})}\right] \;, \end{aligned}$$
(29)

for all \(u\in \mathscr {U}\).

The final step consists in noticing that any state \(\omega _A\) can be written as \({\text {Tr}}_{R}\!\left[ {\phi ^+_{RA}\ (E_R\otimes I_A)}\right] \) for some \(E_R\in \mathsf {L}_+(\mathscr {H}_R)\). Therefore, multiplying both sides of (29) by \(E_R\) and taking the trace, we obtain

$$\begin{aligned} {{\text {Tr}}}\!\left[ {\varPhi _A(\omega _A)\ P^u_B}\right]&={{\text {Tr}}}\!\left[ {\chi _{RB}\ (E_R\otimes P^u_B)}\right] \end{aligned}$$
(30)
$$\begin{aligned}&={{\text {Tr}}}\!\left[ {\chi '_{RB'}\ (E_R\otimes Q^u_{B'})}\right] \end{aligned}$$
(31)
$$\begin{aligned}&={{\text {Tr}}}\!\left[ {\varPhi '_A(\omega _A)\ Q^u_{B'}}\right] \;, \end{aligned}$$
(32)

which of course holds for any choice of \(E_R\), that is, \(\omega _A\), as claimed.   \(\square \)

Remark 2

As explained in the paragraph following Eq. (20), the above proof shows that, in particular, the ensembles \(\{p(u);\omega ^u_A \}\) in point (i) can be restricted, without loss of generality, to ensembles with maximally mixed average, i.e., \(\sum _up(u)\omega ^u_A\propto I_A\).   \(\square \)

Remark 3

We notice that point (ii) can be alternatively formulated as follows: for any POVM \(\{Q^u_{B'}\}\), there exists a POVM \(\{P^u_B\}\) such that

$$\begin{aligned} \left( \varPhi '\right) ^\dag \left( Q^u_{B'}\right) =\varPhi ^\dag \left( P^u_B\right) \;, \end{aligned}$$
(33)

for all \(u\in \mathscr {U}\), where \(\varPhi ^\dagger \) denotes the trace-dual defined in Eq. (55).    \(\square \)

3.1 Quantum Statistical Morphisms

Let us now choose the set \(\mathscr {U}\) in Lemma 1 so that its size \(|\mathscr {U}|\) is equal to \((\text {dim}\mathscr {H}_{B'})^2\). Assuming that channels \(\varPhi \) and \(\varPhi '\) actually satisfy either (17) or (18), let us set the POVM \(\{Q^u_{B'}\}\) to be informationally complete, that is, \({\text {span}}\{Q^y_{B'}\}=\mathsf {L}(\mathscr {H}_{B'})\). Then, if \(\{P^u_B\}\) is any POVM satisfying the equality (33) in Remark 3, the relation

$$\begin{aligned} Q^y_{B'}\longmapsto P^y_B\;,\qquad y\in \mathscr {Y}\;, \end{aligned}$$
(34)

can be used to define a linear map \(\varGamma :\mathsf {L}(\mathscr {H}_{B})\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) with the following properties:

  1. 1.

    let \(\{\varXi ^y_{B'}\}\) be the unique dual of \(\{Q^y_{B'}\}\), in the sense that \(X_{B'}=\sum _y{{\text {Tr}}}\!\left[ {Q^y_{B'}\ X_{B'}}\right] \varXi ^y_{B'}\;\), for all \(X_{B'}\in \mathsf {L}(\mathscr {H}_{B'})\;\); then the action of \(\varGamma \) is given by \(\varGamma (\cdot )=\sum _y{{\text {Tr}}}\!\left[ {P^y_{B}\ \cdot }\right] \;\varXi ^y_{B'}\;\);

  2. 2.

    \(\varGamma \) is Hermiticity-preserving, i.e., \(X=X^\dag \) implies that \(\varGamma (X)=\left[ \varGamma (X) \right] ^\dag \;\);

  3. 3.

    \(\varGamma \) is trace-preserving;

  4. 4.

    \(\varPhi '=\varGamma \circ \varPhi \;\).

In particular, the map \(\varGamma \), as defined above, is positive and trace-preserving on the output (meant as the whole linear range) of \(\varPhi \). In order to prove this, let \(X_A\in \mathsf {L}(\mathscr {H}_A)\) be any operator such that \(\varPhi _A(X_A)\ge 0\). (Notice that \(X_A\) need not be positive itself.) Then \(\varGamma _B(\varPhi _A(X_A))\ge 0\). This is because \(\varGamma \circ \varPhi =\varPhi '\) and we know, from Eq. (18), that for any positive operator \(Q_{B'}\) there exists a positive operator \(P_B\) such that \({{\text {Tr}}}\!\left[ {Q_{B'}\ \varGamma _B(\varPhi _A(X_A))}\right] ={{\text {Tr}}}\!\left[ {Q_{B'}\ \varPhi '_A(X_A)}\right] ={{\text {Tr}}}\!\left[ {P_B\ \varPhi _A(X_A)}\right] \). Hence, we know that for any positive operator \(Q_{B'}\), \({{\text {Tr}}}\!\left[ {Q_{B'}\ \varGamma _B(\varPhi _A(X_A))}\right] \ge 0\) whenever \(\varPhi _A(X_A)\ge 0\), which is the definition of positivity.

Following the terminology of [24, 25], the following definition was introduced in [10]:

Definition 2

Given a channel \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\), a linear map \(\varGamma :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_C)\) is said to be a quantum statistical morphism of \(\varPhi \) if and only if, for any state \(\omega _A\) and any POVM \(\{Q^y_C\}\), there exists a POVM \(\{P^y_B\}\) such that

$$\begin{aligned} {{\text {Tr}}}\!\left[ {(\varGamma _B\circ \varPhi _A)(\omega _A)\ Q^y_C}\right] ={{\text {Tr}}}\!\left[ {\varPhi _A(\omega _A)\ P^y_B}\right] \;, \end{aligned}$$
(35)

for all y.    \(\square \)

It is easy to verify that an everywhere positive trace-preserving linear map is always a statistical morphisms for any channel, as long as the composition between the two is well defined. Then, the natural question is whether a linear map defined as \(\varGamma \) above can always be extended to become positive and trace-preserving everywhere, not only on the range of \(\varPhi \). The question was answered in the negative by Matsumoto, who gave an explicit counterexample in Ref. [26].

Vice versa, one may ask whether any linear map that is positive and trace-preserving on the range of \(\varPhi \) is a well-defined statistical morphism of \(\varPhi \) or not. Also in this case, the answer is in the negative: the fact that condition (35) must hold for any POVM (in particular, for any number of outcomes) is strictly stronger than just positivity, for which is enough if condition (35) holds only for two-outcome POVMs.

Statistical morphisms hence lie somewhere in between linear maps that are positive and trace-preserving (PTP) everywhere, and those that are so only on the range of \(\varPhi \):

(36)

We summarize the contents of this section in one definition and one corollary.

Definition 3

Given are two quantum channels \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi ':\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\). For a given set \(\mathscr {U}\), we say that \(\varPhi \) is \(\mathscr {U}\)-sufficient for \(\varPhi '\), in formula,

$$\begin{aligned} \varPhi \succeq _\mathscr {U}\varPhi '\;, \end{aligned}$$
(37)

if and only if either of the conditions in Lemma 1 hold.   \(\square \)

Corollary 1

Given are two quantum channels \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi ':\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\). The following are equivalent:

  1. (i)

    \(\varPhi \succeq _\mathscr {U}\varPhi '\), for any set \(\mathscr {U}\;\);

  2. (ii)

    there exists a quantum statistical morphism \(\varGamma :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) of \(\varPhi \) such that \(\varPhi '=\varGamma \circ \varPhi \;\).

Remark 4

Using the correspondence between ensembles and bipartite states, together with the relation between guessing probability and conditional min-entropy, given in Appendix 1 in Eqs. (54) and (60), we notice that the condition \(\varPhi \succeq _\mathscr {U}\varPhi '\) can be equivalently written as

$$\begin{aligned} H_\mathrm{{min} } (U|B)\le H_\mathrm{{min} } (U|B'), \end{aligned}$$
(38)

where the entropies are computed with respect to states \(({\text {id}}_U\otimes \varPhi _A)(\omega _{UA})\) and \(({\text {id}}_U\otimes \varPhi '_A)(\omega _{UA})\), respectively. This is equivalent to the formulation used in Theorem 1.   \(\square \)

4 A Semiclassical (Semiquantum) Reverse-Data Processing Theorem

We consider in this section the case in which the output of a quantum channel is classical, in the precise sense that the range is supported on a commutative subalgebra.

Theorem 2

Given are two quantum channels \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi ':\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\). Assuming that the output of \(\varPhi '\) is classical, i.e.,

$$\begin{aligned}{}[\varPhi '(X),\varPhi '(Y)]=0,\qquad \forall X,Y\in \mathsf {L}(\mathscr {H}_A)\;, \end{aligned}$$
(39)

the following are equivalent:

  1. (i)

    \(\varPhi \succeq _\mathscr {U}\varPhi '\), for any set \(\mathscr {U}\;\);

  2. (ii)

    \(\varPhi \succeq _\mathscr {U}\varPhi '\), for a set \(\mathscr {U}\) such that \(|\mathscr {U}|=\dim \mathscr {H}_{B'}\;\);

  3. (iii)

    there exists a quantum channel \(\varPsi :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) such that \(\varPhi '=\varPsi \circ \varPhi \;\).

Proof

Since the implications (iii)\(\implies \)(i)\(\implies \)(ii) are either trivial of direct consequence of the data-processing inequality for the guessing probability, we only prove the implication (ii)\(\implies \)(iii).

Since \(|\mathscr {U}|=\dim \mathscr {H}_{B'}\), we can use the elements \(u\in \mathscr {U}\) to label an orthonormal basis \(\{\left| u\right\rangle :u\in \mathscr {U}\}\) of \(\mathscr {H}_{B'}\). Assuming (ii), we know from Lemma 1 that (18) holds, so, in particular, we know that there exists a POVM \(\{P^u_B\}\) such that

$$\begin{aligned} {{\text {Tr}}}\!\left[ {\varPhi '_A(\omega _A)\ \left| u\right\rangle \left\langle u\right| _{B'}}\right] ={{\text {Tr}}}\!\left[ {\varPhi (\omega _A)\ P^u_B}\right] \;, \end{aligned}$$
(40)

for all u and all \(\omega _A\in \mathsf {D}(\mathscr {H}_A)\).

We now use the fact that the output of \(\varPhi '\) is classical and assume that any operator in the range of \(\varPhi '\) can be diagonalized on the basis \(\{\left| u\right\rangle \}\). This means that

$$\begin{aligned} \varPhi '_A(\cdot )=\sum _{u\in \mathscr {U}}{{\text {Tr}}}\!\left[ {\varPhi '_A(\cdot )\ \left| u\right\rangle \left\langle u\right| _{B'}}\right] \left| u\right\rangle \left\langle u\right| _{B'}\;. \end{aligned}$$
(41)

Using Eq. (40), and defining a measure-and-prepare channel \(\varPsi :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) by the relation

$$\begin{aligned} \varPsi (\cdot )\triangleq \sum _u{{\text {Tr}}}\!\left[ {\cdot \ P^u_B }\right] \left| u\right\rangle \left\langle u\right| _{B'}\;, \end{aligned}$$
(42)

we finally have that \(\varPhi '=\varPsi \circ \varPhi \).   \(\square \)

Remark 5

In order to highlight the perfect analogy with Theorem 1, we recall that the relation between guessing probability and conditional min-entropy (see Appendix 1) allows us to rewrite points (i) and (ii) of Theorem 2 as:

$$\begin{aligned} H_\mathrm{{min} } (U|B)\le H_\mathrm{{min} } (U|B')\;. \end{aligned}$$
(43)

See also Remark 4 above.    \(\square \)

Remark 6

It is possible to show that Theorem 1 becomes a corollary of Theorem 2. Consider in fact the situation in which both \(\varPhi \) and \(\varPhi '\) are classical-quantum channels, namely, \(\varPhi :\mathscr {X}\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi ':\mathscr {X}\rightarrow \mathsf {L}(\mathscr {H}_{B'})\), with \(\varPhi (x)\triangleq \rho ^x_B\in \mathsf {D}(\mathscr {H}_B)\) and \(\varPhi '(x)\triangleq \sigma ^x_{B'}\in \mathsf {D}(\mathscr {H}_{B'})\). Assume moreover that \([\rho ^x,\rho ^{x'}]=0\) and \([\sigma ^x,\sigma ^{x'}]=0\), for all \(x,x'\in \mathscr {X}\). We are hence in a scenario much more restricted than that of Theorem 2: in fact, by identifying commuting states with the probability distributions of their eigenvalues, we recover the classical framework and the statement of Theorem 1.   \(\square \)

Remark 7

Theorem 1, the classical reverse data-processing inequality, has thus two different proofs: one using the minimax theorem and another using the separation theorem for convex sets. Despite the fact that minimax theorem and separation theorem are ultimately equivalent [27], the minimax theorem allows for an easier treatment of the approximate case, which is a very relevant point but goes beyond the scope of the present contribution. The interested reader may refer to Refs. [15, 28].    \(\square \)

5 A Fully Quantum Reverse Data-Processing Theorem

We consider in this section the case of two completely general quantum channels, with the only restriction that the input space is the same for both.

Theorem 3

Given are two quantum channels, \(\varPhi :\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_B)\) and \(\varPhi ':\mathsf {L}(\mathscr {H}_A)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\), and an auxiliary Hilbert space \(\mathscr {H}_{B''}\cong \mathscr {H}_{B'}\). The following are equivalent:

  1. (i)

    \({\text {id}}_{B''}\otimes \varPhi _A\succeq _\mathscr {U}{\text {id}}_{B''}\otimes \varPhi '_A\), for any set \(\mathscr {U}\;\);

  2. (ii)

    \({\text {id}}_{B''}\otimes \varPhi _A\succeq _\mathscr {U}{\text {id}}_{B''}\otimes \varPhi '_A\), for a set \(\mathscr {U}\) such that \(|\mathscr {U}|=\dim (\mathscr {H}_{B''}\otimes \mathscr {H}_{B'})=(\dim \mathscr {H}_{B'})^2\;\);

  3. (iii)

    there exists a quantum channel \(\varPsi :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) such that \(\varPhi '=\varPsi \circ \varPhi \;\).

Remark 8

In terms of the conditional min-entropy, points (i) and (ii) above can be written as

$$\begin{aligned} H_\mathrm{{min} } (U|B''B)\le H_\mathrm{{min} } (U|B''B')\;, \end{aligned}$$
(44)

with obvious meaning of symbols. See also Remarks 4 and 5 above.    \(\square \)

Proof

Since the implication (iii)\(\implies \)(i)\(\implies \)(ii) is straightforward, we prove here only that (ii)\(\implies \)(iii).

Let \(\mathscr {H}_{B'''}\) be a further auxiliary Hilbert space such that \(\mathscr {H}_{B'''}\cong \mathscr {H}_{B''}\cong \mathscr {H}_{B'}\). We begin by showing that, if \({\text {id}}_{B''}\otimes \varPhi _A\succeq _\mathscr {U}{\text {id}}_{B''}\otimes \varPhi '_A\), then, for any POVM \(\{Q^u_{B''B'}\}\;\), there exists a POVM \(\{P^u_{B''B}\}\) such that

$$\begin{aligned} \begin{aligned}&{\text {Tr}}_{B''B'}\!\left[ {(\phi ^+_{B'''B''}\otimes \varPhi '_A(\cdot ))\ (I_{B'''}\otimes Q^u_{B''B'})}\right] \\&={\text {Tr}}_{B''B}\!\left[ {(\phi ^+_{B'''B''}\otimes \varPhi _A(\cdot ))\ (I_{B'''}\otimes P^u_{B''B})}\right] \;, \end{aligned} \end{aligned}$$
(45)

where \(\phi ^+_{B'''B''}\) is a maximally entangled state in \(\mathscr {H}_{B'''}\otimes \mathscr {H}_{B''}\). In fact, Lemma 1 states that, for any POVM \(\{Q^u_{B''B'}\}\), there exists a POVM \(\{P^u_{B''B}\}\) such that

$$\begin{aligned} {{\text {Tr}}}\!\left[ {({\text {id}}_{B''}\otimes \varPhi '_A)(\cdot _{B''A})\ Q^u_{B''B'} }\right] ={{\text {Tr}}}\!\left[ {({\text {id}}_{B''}\otimes \varPhi _A)(\cdot _{B''A})\ P^u_{B''B} }\right] \;, \end{aligned}$$
(46)

for all \(u\in \mathscr {U}\). In particular, for any family of states \(\{\xi ^x_{B''} \}_x\) on \(\mathscr {H}_{B''}\), we have

$$\begin{aligned} {{\text {Tr}}}\!\left[ {({\text {id}}_{B''}\otimes \varPhi '_A)(\xi ^x_{B''}\otimes \cdot _A)\ Q^u_{B''B'} }\right] ={{\text {Tr}}}\!\left[ {({\text {id}}_{B''}\otimes \varPhi _A)(\xi ^x_{B''}\otimes \cdot _A)\ P^u_{B''B} }\right] \;, \end{aligned}$$
(47)

for all u and all x. Let us choose \(\xi ^x_{B''}={\text {Tr}}_{B'''}\!\left[ {\phi ^+_{B'''B''}\ (\varXi ^x_{B'''}\otimes I_{B''})}\right] \) for some complete set of positive operators \(\{\varXi ^x_{B'''} \}_x\). Hence Eq. (47) becomes

$$\begin{aligned}&{{\text {Tr}}}\!\left[ {({\text {id}}_{B'''}\otimes {\text {id}}_{B''}\otimes \varPhi '_A)(\phi ^+_{B'''B''}\otimes \cdot _A)\ (\varXi ^x_{B'''}\otimes Q^u_{B''B'}) }\right] \end{aligned}$$
(48)
$$\begin{aligned}&={{\text {Tr}}}\!\left[ {({\text {id}}_{B'''}\otimes {\text {id}}_{B''}\otimes \varPhi _A)(\phi ^+_{B'''B''}\otimes \cdot _A)\ (\varXi ^x_{B'''}\otimes P^u_{B''B}) }\right] \;, \end{aligned}$$
(49)

for all u and all x. But since the family \(\{\varXi ^x_{B'''}\}_x\) has been chosen to be complete, the above equality implies the equality of the operators in Eq. (45).

Now, we can use generalized teleportation and show that

$$\begin{aligned} \varPhi '_A(\cdot )=\sum _uW^u_{B'''}\left\{ {\text {Tr}}_{B''B'}\!\left[ {(\phi ^+_{B'''B''}\otimes \varPhi '_A(\cdot ))\ (I_{B'''}\otimes \beta ^u_{B''B'}) }\right] \right\} (W^u_{B'''})^\dag \;, \end{aligned}$$
(50)

where \(\{\beta ^u_{B''B'''}:u\in \mathscr {U} \}\) are the \((\dim \mathscr {H}_{B'})^2\) projectors onto the Bell states, and \(\{W^u_{B'''}:u\in \mathscr {U}\}\) are suitable isometries from \(\mathscr {H}_{B'''}\) to \(\mathscr {H}_{B'}\). But then, using Eq. (45) with \(Q^u_{B''B'}=\beta ^u_{B''B'}\), we obtain

$$\begin{aligned} \varPhi '_A(\cdot )=\sum _uW^u_{B'''}\left\{ {\text {Tr}}_{B''B}\!\left[ {(\phi ^+_{B'''B''}\otimes \varPhi _A(\cdot ))\ (I_{B'''}\otimes P^u_{B''B}) }\right] \right\} (W^u_{B'''})^\dag \;. \end{aligned}$$
(51)

Hence, defining a quantum channel \(\varPsi :\mathsf {L}(\mathscr {H}_B)\rightarrow \mathsf {L}(\mathscr {H}_{B'})\) as

$$\begin{aligned} \varPsi (\cdot )\triangleq \sum _uW^u_{B'''}\left\{ {\text {Tr}}_{B''B}\!\left[ {(\phi ^+_{B'''B''}\otimes \cdot )\ (I_{B'''}\otimes P^u_{B''B}) }\right] \right\} (W^u_{B'''})^\dag \;, \end{aligned}$$
(52)

we finally have that \(\varPhi '=\varPsi \circ \varPhi \), as claimed.   \(\square \)

Remark 9

Theorem 3 holds also if the identity channel \({\text {id}}_{B''}\) is replaced by a complete channel, namely, a channel \(\Upsilon :\mathsf {L}(\mathscr {H}_{B''})\rightarrow \mathsf {L}(\mathscr {H}_{B''})\) that is bijective (in the sense of the linear map): linearly independent inputs are transformed into linearly independent outputs. This is so because linearly independent states \(\xi ^x_{B''}\) in Eq. (47) remain linearly independent after the action of \(\Upsilon \). In this way, the proof can continue along the same lines.

We notice, in particular, that a channel can be complete despite being entanglement breaking or measure-and-prepare. This implies that the ensembles used to probe channels \({\text {id}}_{B''}\otimes \varPhi _A\) and \({\text {id}}_{B''}\otimes \varPhi '_A\) can always be chosen, without loss of generality, to comprise separable states only.   \(\square \)

6 The Computational Second Law: An Analogy

The aim of this section is to construct an analogy, clarifying and somehow strengthening that given by Cover [1], between data-processing theorems and the second law of thermodynamics. In what follows we abandon a formally rigorous language, preferring instead a generic language better suited to highlight the similarities and differences between thermodynamics and information theory.

Fig. 2
figure 2

Suppose that a system, prepared at time \(t_0\), undergoes a process, and that we observe it at two later times \(t_1\ge t_0\) and \(t_2\ge t_1\). Thermodynamical case: Clausius’ principle and Lieb’s and Yngvason’s entropy principle state that \(\varDelta H=H(S_2)-H(S_1)\ge 0\) if and only if the process bringing the system from \(t_1\) to \(t_2\) can be realized adiabatically (i.e., exchanging only work and no heat). This is equivalent to say that: (i) a decrease in entropy can only be achieved by exchanging heat with an external reservoir; (ii) if the process cannot be realized adiabatically then there is some initial configuration \(S_0\) for which a decrease in entropy occurs. Information-theoretic case: the data-processing inequality and the reverse data-processing theorems state that \(\varDelta H_\mathrm{{min} } =H_\mathrm{{min} } (U|S_2)-H_\mathrm{{min} } (U|S_1)\ge 0\) for all U, if and only if the process bringing the system from \(t_1\) to \(t_2\) is Markov local (i.e., there exists a memoryless channel \(\varPsi \) such that \(S_2=\varPsi (S_1)\)). This is equivalent to say that: (i) a decrease in the conditional min-entropy can only be achieved in the presence of an external memory storing information about the message and feeding it back into the system at later times; (ii) if the process is not Markov local, then there exists some initial message-signal joint distribution for which a decrease of \(H_\mathrm{{min} } \) occurs

Theorems 12, and 3, a part from the formal complications necessary to describe classical and quantum systems together, have all the same simple interpretation that we summarize in two statements (A) and (B):

and

The direct statement corresponds to Cover’s law, as formulated in [1] (see the quotation at the beginning of this paper). Here “useful information” is precisely the information that the signal carries about the message, and it is measured by the conditional min-entropy, which is directly related to the guessing probability. The reverse statement, which is consequence of the reverse data-processing theorems that we proved, corresponds to Lieb’s and Yngvason’s entropy principle [7].

In order to make our discussion more concrete, let us consider a thermodynamical system prepared at time \(t_0\) and evolving through successive times \(t_1\ge t_0\) and \(t_2\ge t_1\), as depicted in Fig. 2. The second law of thermodynamics, in the formulation usually attributed to Clausius, states that the following inequality is necessarily obeyed:

$$\begin{aligned} \varDelta H\ge \frac{\varDelta Q}{T}, \end{aligned}$$
(53)

where \(\varDelta H=H(S_2)-H(S_1)\) is the change in thermodynamical entropy of the system and \(\varDelta Q\) is the heat absorbed by the system.Footnote 5 The above equation basically says that the only way to decrease the entropy of a system is to extract heat from the system. This implies that, if a system is adiabatically isolated (i.e., no heat is exchanged, only mechanical work), then its entropy cannot decrease. Equivalently stated: a decrease in entropy represents a definite witness of the fact that the system is not adiabatically isolated and is dumping heat in the environment.

This part of the second law can be seen as the analogue of statement (A) above, that is, the usual data-processing inequality. Suppose now that the system S is an information signal. As before, it is prepared at time \(t_0\) and then it undergoes a process that is information-theoretic, rather than thermodynamical. If we observe the signal at two times \(t_1\ge t_0\) and \(t_2\ge t_1\), then we know that if the process is Markov local, then the data-processing inequality holds, namely, the information carried by the signal cannot increase going from \(t_1\) to \(t_2\). Therefore, any increase in the information carried by the signal is a definite witness of the fact that the process is not Markov local, namely, that an external memory was used as a side resource at some point along the process.

We now come to the reverse statement (B), arguing that it is the analogue of Lieb’s and Yngvason’s entropy principle. The latter states that, assuming the validity of a set of axioms about simple thermodynamical systems,Footnote 6 a non-decreasing entropy between \(t_1\) and \(t_2\) is not only necessary (Clausius’ principle) but also sufficient for the existence of an adiabatic process between the two times. It is clear that the analogy works in this case too: the reverse data-processing theorems we proved constitute the information-theoretic analogue of Lieb’s and Yngvason’s entropy principle. An overview of the analogy is summarized in the table below.

Despite the tantalizing analogies, there are however two points (at least) that we should keep in mind before jumping to rash conclusions. The first one is that, while in thermodynamics a process is usually given by an initial state and a final state, in information theory a process is a channel, which acts upon any input it receives.

The second point is that the relation presented here between adiabaticity and Markov locality (or memorylessness) has been discussed only on a formal level, but no claim has been made about any quantitative relation between the two concepts. However, we would like to conclude this paper speculating about the possibility to envisage a deeper relation between adiabaticity and Markov locality, going beyond the formal analogy presented above. An adiabatically isolated system cannot exchange heat, but can interact with a mechanical device and exchange work with it. Since it is possible to imagine a purely mechanical memory (at least of finite size), it seems that the presence of a memory, in itself, should not violate adiabaticity. But then, a scenario similar to that of Maxwell’s demon immediately comes to mind. Indeed, Maxwell’s demon violates the second law using nothing but its memory: its actions, included the measurements it performs, are assumed to be otherwise perfectly adiabatic.Footnote 7 Hence, it seems that adiabaticity does not play well with the presence of an external memory, even if this is taken to be perfectly mechanical. This fact suggests that adiabaticity and Markov locality may be even closer than what the analogies in Table 1 prima facie seem to suggest. This and other questions are left open for future investigations.

Table 1 Summary of the analogies between the second law of thermodynamics and its computational analogue discussed here