1 Squashed Entanglement

One of the core goals in the theory of entanglement is its quantification, for which purpose a large number of either operationally or mathematically/axiomatically motivated entanglement measures and monotones have been introduced and studied intensely since the 1990s [9, 21].

In this paper we will discuss one specific such measure, the so-called squashed entanglement [13], defined as

$$\begin{aligned} E_{\text {sq}}\left( \rho ^{AB}\right) := \inf \frac{1}{2} I(A:B|E) \text { s.t.} {\text {Tr}}_E \rho ^{ABE}=\rho ^{AB}, \end{aligned}$$
(1)

where \(I(A:B|E) = S(AE)+S(BE)-S(E)-S(ABE)\) is the (quantum) conditional mutual information, which by strong subadditivity of the von Neumann entropy is always non-negative [29]; and \(\rho ^{ABE}\) as above is called an extension of \(\rho ^{AB}\). This definition appears to have been put forward first in [43, 44], where it was also remarked that by restricting the extension of \(\rho ^{AB}\) to have the form \(\rho ^{ABE} = \sum _i p_i | \varphi _i\rangle \!\langle \varphi _i |^{AB} \otimes | i\rangle \!\langle i |^E\), the minimization reduces to the well-known entanglement of formation [5],

$$\begin{aligned} E_F(\rho ^{AB}) = \min \sum _i p_i S(\varphi _i^A) \text { s.t.} \sum _i p_i | \varphi _i\rangle \!\langle \varphi _i | = \rho . \end{aligned}$$
(2)

While it is fairly straightforward to see from their definitions that both \(E_{\text {sq}}\) and \(E_F\) are convex functions of the state, the former has many properties that the latter lacks, among them additivity and monogamy [13, 26] as well as [14, 20], cf. [9, 42]. Namely, abbreviating \(E_{\text {sq}}(\rho ^{AB}) = E_{\text {sq}}(A:B)\), we have

$$\begin{aligned} E_{\text {sq}}(A:B_1B_2) \ge E_{\text {sq}}(A:B_1) + E_{\text {sq}}(A:B_2). \end{aligned}$$
(3)

In particular, if \(\rho ^{AB}\) is k-extendible, meaning that there exists a state \(\rho ^{AB_1\ldots B_k}\) such that \(\rho ^{AB} = \rho ^{AB_i}\) for all i (and that w.l.o.g. is symmetric with respect to permutations of the B-systems), then

$$\begin{aligned} E_{\text {sq}}(A:B) \le \frac{1}{k}\log |A|. \end{aligned}$$
(4)

While clearly \(E_{\text {sq}} \le E_F\), in the other direction, squashed entanglement is an upper bound on the distillable entanglement and indeed on the distillable secret key in a state [9, 13], which makes it very useful to the theory of state distillation and channel capacities, cf. [41].

One of the properties much desirable for a quantitative entanglement measure is faithfulness, i.e. the fact that it is zero if and only if the state is separable, and otherwise strictly positive. To be truly useful, such a statement ought to come in the form of a relationship between the value of the entanglement measure, and a suitably chosen distance from the set of separable states. After being an open problem for a while, this was finally obtained a few years ago by Brandão et al. [7], and later improved by us [28].

In the present paper, we will reproduce this finding in a conceptually simple and appealing way, by first showing a relation between the value of squashed entanglement and the distance from k-extendible states, and then invoking a suitable de Finetti theorem to bound the distance from separable states. (That in the limit of \(k\rightarrow \infty \) the state has to be separable was known for some time [36,37,38], but we shall use more recent, quantitative, versions). We go on to contrast this finding with the faithfulness of entanglement of formation. Then, we put the technical result of Fawzi and Renner [17, Theorem 5.1], on which our proof crucially relies, in the context of other conjectured inequalities and subsequent results; motivated by a much more general observation in classical probability, we propose as an open problem to find the “right” quantum generalization.

2 Main Result

Now we show that the monogamy bound, Eq. (4), has a partial converse:

Theorem 1

Consider a state \(\rho ^{AB}\) with \(E_{\text {sq}}(\rho ) \le \epsilon \). Then, for every integer k, there exists a k-extendible state \(\sigma ^{AB}\) such that \(\Vert \rho -\sigma \Vert _1 \le (k-1)\sqrt{2\ln 2}\sqrt{\epsilon }\).

In particular, \(\rho \) is \(O\left( \root 4 \of {\epsilon }\right) \)-close to a \(\varOmega \left( \frac{1}{\root 4 \of {\epsilon }}\right) \)-extendible state.

Corollary 1

For every state \(\rho ^{AB}\) with \(E_{\text {sq}}(\rho ) \le \epsilon \), there exists a separable state \(\sigma \) with

$$\begin{aligned} \Vert \rho -\sigma \Vert _1 \le 3.1 |B| \root 4 \of {\epsilon }. \end{aligned}$$

In particular, squashed entanglement is faithful: \(E_{\text {sq}}(\rho )=0\) if and only if the state \(\rho \) is separable.

For comparison, the earlier result of Brandão et al. [7, Cor. 1] yields

$$\begin{aligned} \Vert \rho -\sigma \Vert _1 \le \sqrt{|A| |B|}\Vert \rho -\sigma \Vert _2 \le 12 \sqrt{|A| |B|} \sqrt{\epsilon }. \end{aligned}$$
(5)

The Hilbert-Schmidt (2-)norm bound seems not available with our techniques, but the trace (1-)norm behaviour is qualitatively reproduced here, albeit with a worse polynomial dependence on \(\epsilon \) but with a slightly better constant. In particular, it is perhaps of interest that in our bound in Corollary 1 only the dimensionality of one of the two systems appears (cf. however [8, Eq. (66)]).

The proof of this theorem relies essentially on a recent result by Fawzi and Renner [17], stating that for every tripartite state \(\rho ^{AEB}\) there exists a cptp map \(\widetilde{R}:\mathcal {L}(E) \rightarrow \mathcal {L}(EB)\) such that

$$\begin{aligned} -\log F\bigl (\rho ^{AEB},({{\text {id}}}_A\otimes \widetilde{R})\rho ^{AE}\bigr )^2 \le I(A:B|E)_\rho , \end{aligned}$$
(6)

with the fidelity F of two states \(\alpha \) and \(\beta \) defined as \(F(\alpha ,\beta ) = \Vert \sqrt{\alpha }\sqrt{\beta } \Vert _1\).

Proof

Choose an extension \(\rho ^{ABE}\) for \(\rho ^{AB}\), and use the map \(\widetilde{R}\) from Eq. (6). Now we employ a basic inequality from [18, Theorem 1], saying

$$\begin{aligned} 1-F(\alpha ,\beta ) \le \frac{1}{2} \Vert \alpha -\beta \Vert _1 \le \sqrt{1-F(\alpha ,\beta )^2}, \end{aligned}$$
(7)

for the fidelity \(F(\alpha ,\beta )=\Vert \sqrt{\alpha }\sqrt{\beta }\Vert _1\). Hence, from Eq. (6),

$$\begin{aligned} t := \sqrt{4\ln 2\,I(A:B|E)} \ge \Vert \rho ^{AEB} - ({{\text {id}}}_A\otimes \widetilde{R})\rho ^{AE} \Vert _1. \end{aligned}$$

But since \(({{\text {id}}}_A\otimes \widetilde{R})\rho ^{AE} \approx \rho ^{AEB}\), we may apply the same map again, say \(k-1\) times, always to the E system of \(\rho ^{AEB}\), arriving at a state

$$\begin{aligned} \omega ^{AEB_1\ldots B_k} = \left( {{\text {id}}}_A \otimes \widetilde{R}^{E\rightarrow EB_k}\circ \cdots \circ \widetilde{R}^{E\rightarrow EB_2}\right) \rho ^{AEB_1}, \end{aligned}$$

which has the property that for each i, \(\Vert \omega ^{AB_i} - \rho ^{AB} \Vert _1 \le (i-1)t\), by the triangle inequality and the contractive property of the trace norm under cptp maps. Hence, tracing out E and considering the symmetrization of the B systems, i.e.

$$\begin{aligned} \varOmega ^{AB_1\ldots B_k} = \frac{1}{k!}\sum _{\pi \in S_k} (\mathbb {1}\otimes U^{\pi })\omega ^{AB_1\ldots B_k}(\mathbb {1}\otimes U^{\pi })^\dagger , \end{aligned}$$

we have that it is manifestly permutation symmetric on the B systems, and for all i,

$$\begin{aligned} \Vert \varOmega ^{AB_i} - \rho ^{AB} \Vert _1 \le \frac{k-1}{2}t. \end{aligned}$$
(8)

Minimizing over all extensions as required by the definition of squashed entanglement, allowing I(A : B|E) to get arbitrarily close to \(2\epsilon \), concludes the proof of the theorem. \(\square \)

To show the corollary, we use [30, Theorem 2 and Corollary  5] or alternatively [10, Theorem II.7’], which say that a k-extendible state is at trace distance at most \(\frac{2|B|^2}{k}\) from a separable state. To use the former result, which requires Bose-symmetric extensions, we have to go from the permutation symmetric \(\varOmega ^{AB_1\ldots B_k}\) to a permutation invariant purification

$$\begin{aligned}&|\varPsi \rangle ^{AA'B_1B_1'\ldots B_kB_k'} = \left( \sqrt{\varOmega ^{AB_1\ldots B_k}}\otimes \mathbb {1}\right) |\varPhi \rangle ^{AA'}|\varPhi \rangle ^{B_1B_1'}\cdots |\varPhi \rangle ^{B_kB_k'}, \end{aligned}$$

with the non-normalized maximally entangled state \(|\varPhi \rangle = \sum _i |i\rangle |i\rangle \). The choice

$$\begin{aligned} k = \left\lfloor \root 4 \of {\frac{2}{\ln 2}}\frac{|B|}{\root 4 \of {\epsilon }} \right\rfloor \end{aligned}$$

then does the rest. \(\square \)

3 Comparison with Entanglement of Formation

It is instructive to compare the monogamy relation Eq. (4) and its “converse”, Theorem 1 for the squashed entanglement, with the analogous statements for the entanglement of formation:

Proposition 1

In a bipartite system AB, with \(d=\min \{|A|,|B|\}\), if the state \(\rho ^{AB}\) is \(\delta \)-close in trace norm to a separable state \(\sigma ^{AB}\), meaning that if \(\frac{1}{2} \Vert \rho -\sigma \Vert _1 \le \delta \), then

$$\begin{aligned} E_F(\rho ) \le \sqrt{\delta } \log d + (1+\sqrt{\delta }) h_2\!\left( \!\frac{\sqrt{\delta }}{1+\sqrt{\delta }}\!\right) . \end{aligned}$$
(9)

Conversely, if \(E_F(\rho ) \le \epsilon \), then this implies that there is a separable state \(\sigma \) with \(\frac{1}{2} \Vert \rho -\sigma \Vert _1 \le \sqrt{\ln 2}\sqrt{\epsilon }\).

Proof

The first part is originally due to Nielsen [31], with a slightly different form of the bound. The present almost optimal bound is from [48, Corollary 4]. For the second part, consider an optimal decomposition \(\rho = \sum _i p_i | \varphi _i\rangle \!\langle \varphi _i |\), such that

$$\begin{aligned} \begin{aligned} \epsilon \ge \sum _i p_i \frac{1}{2} I(A:B)_{\varphi _i}&\ge \sum _i p_i \frac{1}{4\ln 2}\Vert \varphi _i^{AB} - \varphi _i^{A}\otimes \varphi _i^{B}\Vert _1^2 \\&\ge \frac{1}{4\ln 2} \left\| \rho - \sum _i p_i \varphi _i^{A}\otimes \varphi _i^{B} \right\| _1^2, \end{aligned} \end{aligned}$$

and the right hand state inside the trace norm is manifestly separable. \(\square \)

In other words, while entanglement of formation is essentially about the distance from separable states, squashed entanglement is about the distance from highly extendible states (up to log-dimensionality factors and polynomial relation of \(\epsilon \) and \(\delta \)). Note that squashed entanglement, like the entanglement of formation, is asymptotically continuous [21]: Alicki and Fannes [3] showed that for \(\frac{1}{2} \Vert \rho ^{AB}-\sigma ^{AB}\Vert _1 \le \epsilon \le 1\), \( \displaystyle \bigl | E_{\text {sq}}(\rho )-E_{\text {sq}}(\sigma ) \bigr | \le 16\epsilon \log |A| + 4h_2(2\epsilon ),\) where \(h_2(x) = -x\log x-(1-x)\log (1-x)\) is the binary entropy. Using the bounds presented in [48], it can be improved to

$$\begin{aligned} \bigl | E_{\text {sq}}(\rho )-E_{\text {sq}}(\sigma ) \bigr | \le 4\epsilon \log |A| + 2(1+\epsilon ) h_2\!\left( \!\frac{\epsilon }{1+\epsilon }\!\right) . \end{aligned}$$

This explains the occurrence of states such as the \(d\times d\) fully antisymmetric state \(\alpha _d\), which is at trace distance 1 from the separable states for all d, but has \(E_{\text {sq}}(\alpha _d) \le \frac{2}{d}\) which is arbitrarily small for large d [11, 12]. Indeed, this state is \((d-1)\)-extendible, so by monogamy of \(E_{\text {sq}}\) it has to have small squashed entanglement. Conversely by Theorem 1, this is the only way in which a state can have small squashed entanglement. On the other hand, the large distance from separable, and the dimension-dependent constants in Corollary 1 and Eq. (5), are entirely due to the fact that in large dimension, quite highly extendible states can be far away from being separable.

4 Recovery Maps and Related Facts and Conjectures

The form (6) of the Fawzi–Renner bound [17] was arrived at in a succession of speculative steps. The initial insight is no doubt Petz’s [32, 33], who showed a general statement on the relative entropy

$$\begin{aligned} D(\rho \Vert \sigma ) = {\text {Tr}}\rho (\log \rho -\log \sigma ). \end{aligned}$$

Indeed, while for any two states \(\rho \) and \(\sigma \) on a system H and a cptp map \(T:\mathcal {L}(H)\rightarrow \mathcal {L}(K)\), \(D(\rho \Vert \sigma ) \ge D(T\rho \Vert T\sigma )\) – this is equivalent to strong subadditivity [29] –, Petz showed that equality holds if and only if there exists a cptp map R such that \(RT\sigma =\sigma \) and \(RT\rho =\rho \). What is more, this map can be constructed in a unified way from T and \(\sigma \) alone, as the transpose channel, or Petz recovery map \(R=R(T,\sigma )\), given by

$$\begin{aligned} R(\xi ) = \sqrt{\sigma }\, T^*\!\left( (T\sigma )^{-1/2} \xi (T\sigma )^{-1/2} \right) \! \sqrt{\sigma }, \end{aligned}$$
(10)

where \(T^*\) is the adjoint map to T, at least in the finite dimensional case (cf. [4]). These transpose channels have found increasing attention in recent years, see e.g. [2, 27].

The above problem involving the conditional mutual information is recovered by letting \(T={\text {Tr}}_B\), \(\rho = \rho ^{AEB}\) and \(\sigma = \rho ^A \otimes \rho ^{EB}\), where it can be checked that

$$\begin{aligned} \begin{aligned} I(A:B|E)&= I(A:EB)-I(A:E) \\&= D(\rho ^{AEB}\Vert \rho ^A\otimes \rho ^{EB}) - D(\rho ^{AE}\Vert \rho ^A\otimes \rho ^E). \end{aligned} \end{aligned}$$

In this case, the Petz recovery map reads

$$\begin{aligned} R(\xi ) = \sqrt{\rho ^{EB}} \left( \sqrt{\rho ^E}^{-1} \xi \sqrt{\rho ^E}^{-1} \otimes \mathbb {1}^B \right) \sqrt{\rho ^{EB}}, \end{aligned}$$
(11)

and the recovered state from \(\rho ^{AE}\) is

$$\begin{aligned} \begin{aligned} \omega ^{AEB}&= ({{\text {id}}}^A \otimes R^{E\rightarrow EB})\rho ^{AE} \\&= \sqrt{\rho ^{EB}} \left( \sqrt{\rho ^E}^{-1} \rho ^{AE} \sqrt{\rho ^E}^{-1} \otimes \mathbb {1}^B \right) \sqrt{\rho ^{EB}}. \end{aligned} \end{aligned}$$

This map was used to elucidate the structure of \(\rho ^{AEB}\) [19]: The result is that there has to exist a decomposition \(E = \bigoplus _j e_j^L \otimes e_j^R\) of E as a direct sum of tensor products, such that

$$\begin{aligned} \rho ^{AEB} = \bigoplus _j p_j \sigma _j^{A e_j^L} \otimes \tau _j^{e_j^R B}. \end{aligned}$$

(In particular, \(\rho ^{AB}\) is separable). Such states have been called “quantum Markov chains” [1].

The recovery map of Fawzi and Renner [17] looks very similar to the form (11):

$$\begin{aligned} \widetilde{R}(\xi ) = V\!\sqrt{\rho ^{EB}}\! \left( \sqrt{\rho ^E}^{-1} U\xi U^\dagger \sqrt{\rho ^E}^{-1} \!\otimes \! \mathbb {1}^B \right) \!\sqrt{\rho ^{EB}} V^\dagger , \end{aligned}$$
(12)

with certain unitaries U (on E) and V (on EB).

The near-equality case of Petz’s theorem seems to have attracted only little attention until recently, for instance as shown here in the context of squashed entanglement, or in the approach of Brandão and Harrow to finite quantum de Finetti theorems [8], or potentially in considerations of many-body physics [25]. One notable exception is the case of a pure state \(\rho ^{ABE}\), for which \(I(A:B|E) = I(A:BE)-I(A:E) \approx 0\) corresponds to the treatment of approximate quantum error correction due to Schumacher and Westmoreland [34].

The conjecture that the Petz recovery map R in Eq. (11) might yield \(\omega ^{ABE} \approx \rho ^{ABE}\) in trace norm seems to have been formulated first by Kim [24], cf. [47]:

$$\begin{aligned} I(A:B|E) {\mathop {\ge }\limits ^{?!}} \varOmega \left( \Vert \rho ^{AEB}-({{\text {id}}}\otimes R)\rho ^{AE}\Vert _1^2\right) . \end{aligned}$$
(13)

See also Zhang [51] (cf. [47] once more) for this, who suggested the generalized version

$$\begin{aligned} D(\rho \Vert \sigma )-D(T\rho \Vert T\sigma ) {\mathop {\ge }\limits ^{?!}} \varOmega \left( \Vert \rho -RT\rho \Vert _1^2\right) . \end{aligned}$$
(14)

Berta et al. [6] then proposed the more natural conjecture with the fidelity on the right hand side of Eq. (13), motivated by the observation that the latter is a Rényi relative entropy:

$$\begin{aligned} I(A:B|E) {\mathop {\ge }\limits ^{?!}} -\log F\bigl ( \rho ^{AEB},({{\text {id}}}\otimes R)\rho ^{AE} \bigr )^2. \end{aligned}$$
(15)

By the well-known relations connecting fidelity and trace norm, this would imply Kim’s conjecture (13). While all of the above conjectures remain open (though supported by increasing numerical evidence), Fawzi and Renner’s Eq. (6) proves a variant of the last inequality, with \(\widetilde{R}\) instead of R. The crucial point of course is that this new map still only acts on E, and as the identity on A.

Similarly, Seshadreesan et al. [35, Conjecture 26 and Sect. 6.1] suggested the following most general form extending (14), encompassing all of the above:

$$\begin{aligned} D(\rho \Vert \sigma )-D(T\rho \Vert T\sigma ) {\mathop {\ge }\limits ^{?!}} -\log F(\rho ,RT\rho )^2, \end{aligned}$$
(16)

again motivated by a way of writing both sides of the above as (Rényi) relative entropies or variants thereof.

Since the first arXiv posting of the present paper, statements of this form have been proven for slight variants of the Petz recovery map, specifically the “swivelled” (or “rotated”) Petz maps (cf. [15])

$$\begin{aligned} R_t(\xi ) = \sigma ^{-it} R\left( T(\sigma )^{it} \xi T(\sigma )^{-it} \right) \sigma ^{it}, \end{aligned}$$

which reduces to the Petz recovery map \(R=R_0\) for \(t=0\), and their convex combinations. Namely, Wilde [46], invoking the Hadamard three-line theorem, shows that there exists a \(t\in {{\mathbb R}}\) (generally depending on all of T, \(\sigma \) and \(\rho \)) such that eq. (16) [and similarly eq. (15)] holds with \(R_t\) in place of R.

$$\begin{aligned} D(\rho \Vert \sigma )-D(T\rho \Vert T\sigma ) \ge \inf _t \bigl (-\log F(\rho ,R_tT\rho )^2\bigr ), \end{aligned}$$

This was then extended to infinite dimension by Junge et al. [23], and improved to a universal average over t rather than the minimum on the right hand side:

$$\begin{aligned} D(\rho \Vert \sigma )-D(T\rho \Vert T\sigma ) \ge \int \mathrm{d}t \beta _0(t) \bigl (-\log F(\rho ,R_{t/2}T\rho )^2\bigr ), \end{aligned}$$

with the probability density \(\beta _0(t) = \frac{\pi }{2}(1+\cosh (\pi t))^{-1}\).

Sutter et al. [40] presented an essentially elementary, yet highly nontrivial, argument proving a lower bound for some unknown convex combination \(\widetilde{R}\) of the \(R_t\), and in terms of the measured relative entropy:

$$\begin{aligned} D(\rho \Vert \sigma )-D(T\rho \Vert T\sigma ) \ge D_{\mathbb {M}}(\rho \Vert \widetilde{R}T\rho ). \end{aligned}$$

This was again improved by Sutter et al. [39] using complex interpolation tools, yielding

$$\begin{aligned} D(\rho \Vert \sigma )-D(T\rho \Vert T\sigma ) \ge D_{\mathbb {M}}\left( \rho \,\Big \Vert \int \mathrm{d}t \beta _0(t) R_{t/2}T\rho \right) , \end{aligned}$$

with the same function \(\beta _0\) as above.

5 The Classical Case

It is well-known that for classical random variables XYZ, conditional independence, i.e. \(I(X:Z|Y) = 0\), implies that XYZ is a Markov chain in that order. Furthermore, this is a robust characterization, as the following two inequalities show, which we are going to prove. They provide much of the motivation for the conjectures and results presented in the previous section.

Lemma 1

If \(I(X:Z|Y) = \epsilon \) for a distribution P(XYZ), then there exists a Markov chain of the same alphabets, with distribution \(Q(XYZ) = P(XY)P(Z|Y)\), such that the relative entropy distance between P and Q is small: \(D(P_{XYZ}\Vert Q) = \epsilon \). By Pinsker’s inequality, this implies \(\Vert P_{XYZ}-Q \Vert _1 \le \sqrt{2\ln 2}\sqrt{\epsilon }\).

This is a special case of the following more general result:

Lemma’ 1

For any two probability distributions P and Q on the same set \(\mathcal {X}\), and a stochastic map \(T:\mathcal {X}\rightarrow \mathcal {U}\), there exists another stochastic map R, called the transpose channel, and which depends only on Q and T, such that \(RTQ=Q\) and

$$\begin{aligned} D(P\Vert Q) - D(TP\Vert TQ) \ge D(P\Vert RTP). \end{aligned}$$
(17)

Furthermore, this is an identity if T is deterministic.

The transpose channel is defined by the property that

$$\begin{aligned} T(u|x)Q(x) = R(x|u)\,(TQ)(u), \end{aligned}$$

and this is the classical case of Petz’s recovery map.

Proof

Like many classical entropy inequalities, it is an instance of log-concavity.

We have two probability vectors \(P = (p_x)_{x\in \mathcal {X}}\) and \(Q = (q_x)_{x\in \mathcal {X}}\), and a stochastic matrix \(T = [t_{ux}]_{u\in \mathcal {U}}^{x\in \mathcal {X}}\) (meaning that for all x, \(\sum _{u\in \mathcal {U}} t_{ux} = 1\)). The adjoint of cptp map translates into the linear map given by the transpose matrix \(T^t\). Then,

$$\begin{aligned} TP = \left( \sum _{x\in \mathcal {X}} t_{ux}p_x \right) _{u\in \mathcal {U}}, \quad TQ = \left( \sum _{x\in \mathcal {X}} t_{ux}q_x \right) _{u\in \mathcal {U}}, \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} RTP&= \Bigl ( q_x \Bigl (T^t \bigl ( (TP)_u/(TQ)_u \bigr )_{u\in \mathcal {U}}\Bigr )_x \Bigr )_{x\in \mathcal {X}} \\&= \left( q_x \sum _{u\in \mathcal {U}} t_{ux}\frac{\sum _{x'} t_{ux'}p_{x'}}{\sum _{x'} t_{ux'}q_{x'}} \right) _{x\in \mathcal {X}}, \end{aligned} \end{aligned}$$

leading to the following expressions for the three relative entropies concerned:

$$\begin{aligned} D(P\Vert Q)&= \sum _{x\in \mathcal {X}} p_x \log \frac{p_x}{q_x}, \\ D(TP\Vert TQ)&= \sum _{u\in \mathcal {U}} \left( \sum _{x\in \mathcal {X}} t_{ux}p_x\right) \log \frac{\sum _{x'} t_{ux'}p_{x'}}{\sum _{x'} t_{ux'}q_{x'}}, \\ D(P\Vert RTP)&= \sum _{x\in \mathcal {X}} p_x \log \left( \frac{p_x}{q_x} \frac{1}{\sum _u t_{ux}\frac{\sum _{x'} t_{ux'}p_{x'}}{\sum _{x'} t_{ux'}q_{x'}}} \right) . \end{aligned}$$

The claimed inequality, that the first expression is larger or equal to the sum of the last two, can be rearranged as \(D(P\Vert Q) - D(P\Vert RTP) \ge D(TP\Vert TQ)\), which simplifies to

$$\begin{aligned}&\sum _{x\in \mathcal {X}} p_x \log \left( \sum _{u\in \mathcal {U}} t_{ux} \frac{\sum _{x'} t_{ux'}p_{x'}}{\sum _{x'} t_{ux'}q_{x'}} \right) \\&\quad \ge \sum _{x\in \mathcal {X}} p_x \sum _{u\in \mathcal {U}} t_{ux} \log \frac{\sum _{x'} t_{ux'}p_{x'}}{\sum _{x'} t_{ux'}q_{x'}}. \end{aligned}$$

However, this is true for each term x, due to the concavity of the logarithm, and \(\sum _u t_{ux} = 1\).

It can be checked from this that if the channel T is deterministic, i.e. if for each \(x\in \mathcal {X}\) there is only one \(u\in \mathcal {U}\) such that \(t_{ux} > 0\), then equality holds; in particular this is the case where T is the marginal map from \(\mathcal {X}\times \mathcal {Y}\) to \(\mathcal {X}\). \(\square \)

Observe that the inequality (17) implies the Conjectures (13), (14), (15) and (16) in the classical case, because of \(D(P\Vert Q) \ge -\log F(P,Q)^2\). The results of [46] and [40] reproduce this relaxed version of the classical case, because when restricted to diagonal density matrices, the swivelled Petz maps \(R_t\) reduce to \(R_0=R\) for all t. Notably the approach of [40] is strikingly close to our above classical proof by log-concavity, using pinching to remove non-commutativity and otherwise using only operator monotonicity and concavity of the logarithm; at the same time it relies on looking at asymptotically many copies of the state, which is one of the reasons why \(-\log F\) appears in the end result rather than the relative entropy.

It is known, by numerical counterexamples, that (17) is false in the quantum case, already for qubits, and also restricting to the case \(T={\text {Tr}}_B\), \(\rho = \rho ^{AEB}\) and \(\sigma = \rho ^A \otimes \rho ^{EB}\) [24]. However, one might be tempted to speculate that with a variant of the Fawzi–Renner map, say some \(\widehat{R}\) (perhaps even a rotated or averaged Petz map \(R_t\)), we might have

$$\begin{aligned} I(A:B|E) {\mathop {\ge }\limits ^{?!}} D\bigl ( \rho ^{AEB} \Vert ({{\text {id}}}\otimes \widehat{R})\rho ^{AE} \bigr ), \end{aligned}$$
(18)

which would also imply (6). However, since the first circulation of our earliest unpublished notes [47], this conjecture has been subjected to serious scrutiny, and recently Fawzi and Fawzi [16] have found an explicit counterexample by rigorous numerical computer calculations: there does not exists a map \(\widehat{R}\) recovering \(\sigma \), i.e. \(\widehat{R}T\sigma =\sigma \) and at the same time satisfying Eq. (18).

6 Discussion

We have shown how Fawzi and Renner’s recent breakthrough in the characterization of small quantum conditional mutual information has consequences for the faithfulness of squashed entanglement. We believe that the same approach can be used also to address the faithfulness of the multi-party squashed entanglement [50], however technical issues remain, which are explained in the Appendix.

The breakthrough of [17], and the subsequent results, also finally clarify the “right” robust version of quantum Markov chains, which are equivalently characterized by \(I(A:B|E)\approx 0\) and by the existence of a recovery map such that \(\rho ^{AEB} \approx ({{\text {id}}}_A\otimes \widetilde{R})\rho ^{AE}\), cf. [6, Prop. 35]. For classical probability distributions, yet another way of expressing this is to say that there exists a Markov chain close to the given density, but this is not the case in the quantum analogue [11, 12, 22], at least if one wants to avoid introducing strong dimensional dependence.

To conclude, looking back at the conjectures and theorems reviewed above, and contrasting them with the clear picture emerging from the classical case, we wish to suggest a target for further investigation, which takes us in a direction different from the conjecture (16) and its descendants.

Namely, the question is, whether it is possible to define a recovery map \(\widehat{R}=\widehat{R}(T,\sigma )\) for every pair of a cptp map T and a state \(\sigma \) in its domain, such that \(\widehat{R}T\sigma =\sigma \) and

$$\begin{aligned} D(\rho \Vert \sigma )-D(T\rho \Vert T\sigma ) {\mathop {\ge }\limits ^{?!}} \widetilde{D}\bigl (\rho \Vert \widehat{R}T\rho \bigr ), \end{aligned}$$
(19)

with a suitable divergence \(\widetilde{D}\), and such that the following functoriality properties hold.

  • Normalization: To the identity map \({{\text {id}}}\) and any state (of full rank), the identity map is associated: \(\widehat{R}({{\text {id}}},\tau ) = {{\text {id}}}\).

  • Tensor: If \(\widehat{R}_i = \widehat{R}(T_i,\sigma _i)\) is associated to maps \(T_i\) and states \(\sigma _i\), then the map associated to \(T_1\otimes T_2\) and state \(\sigma _1\otimes \sigma _2\), is \(\widehat{R}(T_1\otimes T_2,\sigma _1\otimes \sigma _2) = \widehat{R}_1 \otimes \widehat{R}_2\).

This would clearly imply the inequality (18), with \(\widetilde{D}\) in place of D. Hence, it cannot be true for the usual (Umegaki) relative entropy [16]. Note that the Petz map quite evidently obeys the functoriality properties, in fact in addition also another one:

  • Composition: For cptp maps \(T_i\) on suitable space, such that we can form their composition \(T_2 \circ T_1\), and a state \(\sigma \) such that we have associated maps \(\widehat{R}_1 = \widehat{R}(T_1,\sigma )\) and \(\widehat{R}_2 = \widehat{R}(T_2,T_1\sigma )\), we have \(\widehat{R}(T_2\circ T_1,\sigma ) = \widehat{R}_1 \circ \widehat{R}_2\).

Can all these constraints be satisfied simultaneously? And if so, what would be the applications of such a result? Note that the Petz recovery map is a very useful tool in “pretty good” state discrimination and quantum error correction [4, 34]; the functoriality above along with (19) is meant to preserve these good properties. The current status of this question is the following: We know that one can indeed define a “universal” recovery map \(\widehat{R}\) for inequality (19), with either \(\widetilde{D}=-\log F\) or \(\widetilde{D}=D_{\mathbb {M}}\) – in fact in the convex hull of the swivelled Petz maps \(R_t\) –, where universality refers to the map depending only on T and \(\sigma \). It furthermore satisfies the normalization property, as well as tensorization with the identity [23, 39].