1 Introduction

One-shot information theory investigates the behaviour of tasks in communication and cryptography under general unstructured processes, as opposed to independent and identically distributed (i.i.d) processes, where the states or the tasks themselves have a certain tensor product structure. This is crucial for information theoretically secure cryptography, where one cannot place any kind of assumption on the actions of the adversary (see, for example, [2, 3]). To prove security for such protocols, a common strategy is to show that some smooth min-entropy is sufficiently large. For this reason, the smooth min-entropy [4, 5] is one of the most important quantities in one-shot information theory.

The smooth min-entropy \(H^{\epsilon }_{\min }(K|E)_{\rho }\) for the classical-quantum state \({\rho = \sum _{k} p(k)|{k}\rangle \langle {k}|\otimes \rho _{E|k}}\) characterises the amount of randomness one can extract from the classical register K independent of the adversary’s register E [6]. It behaves very differently from the von Neumann conditional entropy, which characterises tasks in the i.i.d setting, and the difference between the two can be very large. Roughly speaking, the smooth min-entropy places a much higher weight on the worst possible scenario of the conditioning register, whereas the von Neumann entropy places an equal weight on all possible scenarios.

An important and interesting argument, which works with the von Neumann conditional entropy but fails with the smooth min-entropy, is that of proving lower bounds on the entropy using an approximation chain. We call a sequence of states,Footnote 1\((\sigma ^{(k)}_{A_1^k B})_{k=1}^n\) an \(\epsilon \)-approximation chain for the state \(\rho _{A_1^n B}\) if for every k, we can approximate the partial state \(\rho _{A_1^k B}\) as \(\Vert {\rho _{A_1^k B} - \sigma ^{(k)}_{A_1^k B}}\Vert _1 \le {\epsilon }\). If one can further prove that these states satisfy \(H(A_k | A_{1}^{k-1} B)_{\sigma ^{(k)}} \ge c\) for some \(c>0\) sufficiently large, then the following simple argument shows that \(H(A_1^n | B)_{\rho }\) is large:

$$\begin{aligned} H(A_1^n | B)_{\rho }&= \sum _{k=1}^n H(A_k | A_{1}^{k-1} B)_{\rho } \\&\ge \sum _{k=1}^n \left( H(A_k | A_{1}^{k-1} B)_{\sigma ^{(k)}} - g(\epsilon ) \right) \\&\ge n(c- g(\epsilon )) \end{aligned}$$

where we used continuity of the von Neumann conditional entropy in the second line (\(g(\epsilon ) = O(\epsilon \log \frac{|A|}{\epsilon })\) is a “small” function of \(\epsilon \)). It is well known that a similar argument is not possible with the smooth min-entropy. Consequently, identities for the smooth min-entropy, like the chain rules [7], are much more restrictive. Tools like entropy accumulation [8, 9] also seem quite rigid, in the sense that they cannot be applied unless certain (Markov chain or non-signalling) conditions apply. It is also not clear how one could relax the conditions for such tools. In this paper, we consider scenarios consisting of approximation chains, similar to the above, along with additional conditions and prove lower bounds on the appropriate smooth min-entropies.

We begin by considering the scenario of approximately independent registers, that is, a state \(\rho _{A_1^n B}\), which for every \(1 \le k \le n\) satisfies

$$\begin{aligned} \frac{1}{2}\left\| \rho _{A_1^k B} - \rho _{A_k} \otimes \rho _{A_1^{k-1} B} \right\| _1 \le \epsilon . \end{aligned}$$
(1)

for some small \(\epsilon >0\) and arbitrarily large n (in particular \(n \gg \frac{1}{\epsilon }\)). That is, for every k, the system \(A_k\) is almost independent of the system B and everything else, which came before it. For simplicity, let us further assume that for all k the state \(\rho _{A_k} = \rho _{A_1}\). Intuitively, one expects that the smooth min-entropy (with the smoothing parameter depending on \(\epsilon \) and not on n)Footnote 2 for such a state will be large and close to \(\approx n (H(A_1)-g'(\epsilon ))\) (for some small function \(g'(\epsilon )\)). However, it is not possible to prove this result using techniques, which rely only on the triangle inequality and smoothing. The triangle inequality, in general, can only be used to bound the trace distance between \(\rho _{A_1^n B}\) and \(\otimes _{k=1}^n \rho _{A_k} \otimes \rho _B\) by \(n \epsilon \), which will result in a trivial bound when \(n \gg \frac{1}{\epsilon }\).Footnote 3 Instead, we show that a bound on the entropic distance given by the smooth max-relative entropy between these two states can be used to prove a lower bound for the smooth min-entropy in this scenario.

While an upper bound of \(n\epsilon \) is trivial and meaningless for the trace distance for large n, it is still a meaningful bound for the relative entropy between two states, which is unbounded in general. We can show that the above approximation conditions (Eq. 1) also imply that relative entropy distance between \(\rho _{A_1^n B}\) and \(\otimes _{k=1}^n \rho _{A_k} \otimes \rho _B\) is \(n f(\epsilon )\) for some small function \(f(\epsilon )\). The substate theorem [10] allows us to transform this relative entropy bound into a smooth max-relative entropy bound. For two general states \(\rho _{AB}\) and \(\eta _{AB}\), such that \(d:= D^{\delta }_{\max }(\rho _{AB} || \eta _{AB})\), we can easily bound the smooth min-entropy of \(\rho \) in terms of the min-entropy of \(\eta \) by observing that

$$\begin{aligned} \rho _{AB} \approx _{\delta } \tilde{\rho }_{AB} \le 2^d \eta _{AB} \le 2^{-(H_{\min }(A|B)_{\eta } - d)} {{\,\mathrm{\mathbb {1}}\,}}_A \otimes \sigma _B \end{aligned}$$
(2)

for some state \(\sigma _B\), which satisfies \(D_{\max }(\eta _{AB} || {{\,\mathrm{\mathbb {1}}\,}}_A \otimes \sigma _B) = -H_{\min }(A|B)_{\eta }\). This implies that

$$\begin{aligned} H_{\min }^{\delta }(A|B)_{\rho } \ge H_{\min }(A|B)_{\eta } - D^{\delta }_{\max }(\rho _{AB} || \eta _{AB}) \end{aligned}$$
(3)

We call this an entropic triangle inequality, since it is based on the triangle inequality property of \(D_{\max }\). We can further improve this smooth min-entropy triangle inequality to (Lemma 3.5)

$$\begin{aligned} H_{\min }^{\epsilon +\delta }(A|B)_{\rho }&\ge \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\eta } - \frac{\alpha }{\alpha -1} D^\epsilon _{\max } (\rho _{AB}|| \eta _{AB}) - \frac{g_1(\delta , \epsilon )}{\alpha -1} \end{aligned}$$
(4)

for some function \(g_1\), \(\epsilon + \delta <1\) and \(1 < \alpha \le 2\). Our general strategy for the scenarios considered in this paper is to first bound the “one-shot information theoretic” distance (the smooth max-relative entropy distance) between the real state \(\rho \) (\(\rho _{A_1^n B}\) in the above scenario) and a virtual, but nicer state, \(\eta \) (\(\otimes _{k=1}^n \rho _{A_k} \otimes \rho _B\) above) by \(nf(\epsilon )\) for some small \(f(\epsilon )\). Then, we use Eq. 4 above to reduce the problem of bounding the smooth min-entropy on state \(\rho \) to that of bounding a \(\alpha \)-Rényi entropy on the state \(\eta \). Using this strategy, in Corollary 4.4, we prove that for states satisfying the approximately independent registers assumptions, we have for \(\delta = O\left( \epsilon \log \frac{|A|}{\epsilon } \right) \) that

$$\begin{aligned} H_{\min }^{\delta ^{\frac{1}{4}}}(A_1^n|B)_{\rho }&\ge n \left( H (A_1)_{\rho } - O(\delta ^{\frac{1}{4}}) \right) - O\left( \frac{1}{\delta ^{3/4}} \right) . \end{aligned}$$
(5)

Another scenario we consider here is that of approximate entropy accumulation. In the setting for entropy accumulation, a sequence of channels \({{\,\mathrm{\mathcal {M}}\,}}_k: R_{k-1} \rightarrow A_k B_k R_k\) for \(1 \le k \le n\) sequentially act on a state \(\rho ^{(0)}_{R_0 E}\) to produce the state \(\rho _{A_1^n B_1^n E} = {{\,\mathrm{\mathcal {M}}\,}}_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E})\). It is assumed that the channels \({{\,\mathrm{\mathcal {M}}\,}}_k\) are such that the Markov chain \(A_1^{k-1} \leftrightarrow B_1^{k-1} E \leftrightarrow B_k\) is satisfied for every k. This ensures that the register \(B_k\) does not reveal any additional information about \(A_1^{k-1}\) than what was previously revealed by \(B_1^{k-1}E\). The entropy accumulation theorem [8], then provides a tight lower bound for the smooth min-entropy \(H^{\delta }_{\min }(A_1^n | B_1^n E)\). We consider an approximate version of the above setting where the channels \({{\,\mathrm{\mathcal {M}}\,}}_k\) themselves do not necessarily satisfy the Markov chain condition, but they can be \(\epsilon \)-approximated by a sequence of channels \({{\,\mathrm{\mathcal {M}}\,}}'_k\), which satisfies certain Markov chain conditions. Such relaxations are important to understand the behaviour of cryptographic protocols, like device-independent quantum key distribution [11, 12], which are implemented with imperfect devices [13, 14]. Once again we can model this scenario as an approximation chain: for every \(1 \le k \le n\), the state produced in the kth step satisfies

$$\begin{aligned} \rho _{A_1^k B_1^k E}&= {{\,\textrm{tr}\,}}_{R_k} \circ {{\,\mathrm{\mathcal {M}}\,}}_k \left( {{\,\mathrm{\mathcal {M}}\,}}_{k-1} \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E}) \right) \\&\approx _{\epsilon } {{\,\textrm{tr}\,}}_{R_k} \circ {{\,\mathrm{\mathcal {M}}\,}}'_k \left( {{\,\mathrm{\mathcal {M}}\,}}_{k-1} \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E}) \right) := \sigma ^{(k)}_{A_1^k B_1^k E}. \end{aligned}$$

Moreover, the assumptions on the channel \({{\,\mathrm{\mathcal {M}}\,}}'_k\) guarantee that the state \(\sigma ^{(k)}_{A_1^k B_1^k E}\) satisfies the Markov chain condition \(A_1^{k-1} \leftrightarrow B_1^{k-1} E \leftrightarrow B_k\), and so the chain rules and bounds used for entropy accumulation apply for it too. Roughly speaking, we use the chain rules for divergences [15] to show that the divergence distance between the states \(\rho _{A_1^n B_1^n E} = {{\,\mathrm{\mathcal {M}}\,}}_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E})\) and the virtual state \(\sigma _{A_1^n B_1^n E} = {{\,\mathrm{\mathcal {M}}\,}}'_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}'_1(\rho ^{(0)}_{R_0 E})\) is relatively small, and then reduce the problem of lower bounding the smooth min-entropy of \(\rho _{A_1^n B_1^n E}\) to that of lower bounding an \(\alpha \)-Rényi entropy of \(\sigma _{A_1^n B_1^n E}\), which can be done by using the chain rules developed for entropy accumulation.Footnote 4 In Theorem 5.1, we show the following smooth min-entropy lower bound for the state \(\rho _{A_1^n B_1^n E}\) for sufficiently small \(\epsilon \) and an arbitrary \(\delta >0\)

$$\begin{aligned} H_{\min }^{\delta }(A_1^n|B_1^n E)_{\rho } \ge \sum _{k=1}^n \inf _{\omega }&H(A_k | B_k \tilde{R}_{k-1})_{{{\,\mathrm{\mathcal {M}}\,}}'_k(\omega )} - nO(\epsilon ^{\frac{1}{24}}) - O\left( \frac{1}{\epsilon ^{\frac{1}{24}}} \right) \end{aligned}$$
(6)

where the infimum is over all possible input states \(\omega _{R_{k-1} \tilde{R}_{k-1}}\) for reference register \(\tilde{R}_{k-1}\) isomorphic to \(R_{k-1}\), and the dimensions |A| and |B| are assumed constant while using the asymptotic notation.

In the companion paper [1], we use the techniques developed in this paper to provide a solution for the source correlation problem in quantum key distribution (QKD) [16]. Briefly speaking, the security proofs of QKD require that one of the honest parties produce randomly and independently sampled quantum states in each round of the protocol. However, the states produced by a realistic quantum source will be somewhat correlated across different rounds due to imperfections. These correlations are called source correlations. Proving security for QKD under such a correlated source is challenging and no general satisfying solution was known. In [1], we use the entropic triangle inequality to reduce the security of a QKD protocol with a correlated source to that of the QKD protocol with a depolarised variant of the perfect source, for which security can be proven using existing techniques.

2 Background and Notation

For n quantum registers \((X_1, X_2, \ldots , X_n)\), the notation \(X_i^j\) refers to the set of registers \((X_i, X_{i+1}, \ldots , X_{j})\). We use the notation [n] to denote the set \(\{1,2, \ldots , n\}\). For a register A, |A| represents the dimension of the underlying Hilbert space. If X and Y are Hermitian operators, then the operator inequality \(X \ge Y\) denotes the fact that \(X-Y\) is a positive semidefinite operator and \(X>Y\) denotes that \(X-Y\) is a strictly positive operator. A quantum state (or briefly just state) refers to a positive semidefinite operator with unit trace. At times, we will also need to consider positive semidefinite operators with trace less than equal to 1. We call these operators subnormalised states. We will denote the set of registers a quantum state describes (equivalently, its Hilbert space) using a subscript. For example, a quantum state on registers A and B, will be written as \(\rho _{AB}\) and its partial states on registers A and B, will be denoted as \(\rho _{A}\) and \(\rho _{B}\). The identity operator on register A is denoted using \({{\,\mathrm{\mathbb {1}}\,}}_A\). A classical-quantum state on registers X and B is given by \(\rho _{XB} = \sum _{x} p(x) |{x}\rangle \langle {x}| \otimes \rho _{B|x}\), where \(\rho _{B|x}\) are normalised quantum states on register B.

The term “channel” is used for completely positive trace preserving (CPTP) linear maps between two spaces of Hermitian operators. A channel \({{\,\mathrm{\mathcal {N}}\,}}\) mapping registers A to B will be denoted by \({{\,\mathrm{\mathcal {N}}\,}}_{A \rightarrow B}\). We write \(\text {supp}(X)\) to denote the support of the Hermitian operator X and use \(X \ll Y\) to denote that \(\text {supp}(X) \subseteq \text {supp}(Y)\).

The trace norm is defined as \(\left\| X \right\| _1:= {{\,\textrm{tr}\,}}\big (\left( X^\dag X \right) ^{\frac{1}{2}}\big )\). The fidelity between two positive operators P and Q is defined as \(F(P,Q)= \left\| \sqrt{P}\sqrt{Q} \right\| _1^2\). The generalised fidelity between two subnormalised states \(\rho \) and \(\sigma \) is defined as

$$\begin{aligned} F_*(\rho , \sigma ) := \left( \left\| \sqrt{\rho }\sqrt{\sigma } \right\| _1 + \sqrt{(1- {{\,\textrm{tr}\,}}\rho )(1- {{\,\textrm{tr}\,}}\sigma )} \right) ^2. \end{aligned}$$
(7)

The purified distance between two subnormalised states \(\rho \) and \(\sigma \) is defined as

$$\begin{aligned} P(\rho , \sigma ) = \sqrt{1- F_{*}(\rho , \sigma )}. \end{aligned}$$
(8)

We will also use the diamond norm distance as a measure of the distance between two channels. For a linear transform \({{\,\mathrm{\mathcal {N}}\,}}_{A \rightarrow B}\) from operators on register A to operators on register B, the diamond norm distance is defined as

$$\begin{aligned} \left\| {{\,\mathrm{\mathcal {N}}\,}}_{A \rightarrow B} \right\| _{\diamond } := \max _{X_{AR}: \left\| X_{AR} \right\| _1 \le 1} \left\| {{\,\mathrm{\mathcal {N}}\,}}_{A \rightarrow B}(X_{AR}) \right\| _1 \end{aligned}$$
(9)

where the supremum is over all Hilbert spaces R (fixing \(|R|= |A|\) is sufficient) and operators \(X_{AR}\) such that \(\left\| X_{AR} \right\| _1 \le 1\).

Throughout this paper, we use base 2 for both the functions \(\log \) and \(\exp \). We follow the notation in Tomamichel’s book [17] for Rényi entropies. For \(\alpha \in (0,1) \cup (1,2)\), the Petz \(\alpha \)-Rényi relative entropy between the positive operators P and Q is defined as

$$\begin{aligned} \bar{D}_{\alpha }(P || Q) = {\left\{ \begin{array}{ll} \frac{1}{\alpha -1} \log {{\,\textrm{tr}\,}}\frac{\left( P^\alpha Q^{1-\alpha } \right) }{{{\,\textrm{tr}\,}}(P)} &{} \text { if } (\alpha <1 \text { and } P \not \perp Q) \text { or } (P \ll Q)\\ \infty &{} \text { else}. \end{array}\right. } \end{aligned}$$
(10)

The sandwiched \(\alpha \)-Rényi relative entropy for \(\alpha \in [\frac{1}{2},1) \cup (1,\infty ]\) between the positive operator P and Q is defined as

$$\begin{aligned} \tilde{D}_{\alpha }(P || Q) = {\left\{ \begin{array}{ll} \frac{1}{\alpha -1} \log \frac{{{\,\textrm{tr}\,}}(Q^{-\frac{\alpha '}{2}} P Q^{-\frac{\alpha '}{2}})^{\alpha }}{{{\,\textrm{tr}\,}}(P)} &{} \text { if } (\alpha <1 \text { and } P \not \perp Q) \text { or } (P \ll Q)\\ \infty &{} \text { else}. \end{array}\right. } \end{aligned}$$
(11)

where \(\alpha ' = \frac{\alpha -1}{\alpha }\). In the limit \(\alpha \rightarrow \infty \), the sandwiched divergence becomes equal to the max-relative entropy, \(D_{\max }\), which is defined as

$$\begin{aligned} D_{\max } (P|| Q) := \inf \left\{ \lambda \in \mathbb {R}: P \le 2^{\lambda }Q \right\} . \end{aligned}$$
(12)

In the limit of \(\alpha \rightarrow 1\), both the Petz and the sandwiched relative entropies equal the quantum relative entropy, D(P||Q), which is defined as

$$\begin{aligned} D(P|| Q) := {\left\{ \begin{array}{ll} \frac{{{\,\textrm{tr}\,}}\left( P \log P - P \log Q \right) }{{{\,\textrm{tr}\,}}(P)} &{} \text { if } (P \ll Q)\\ \infty &{} \text { else}. \end{array}\right. } \end{aligned}$$
(13)

Given any divergence \(\mathbb {D}\), we can define the (stabilised) channel divergence based on \(\mathbb {D}\) between two channels \({{\,\mathrm{\mathcal {N}}\,}}_{A \rightarrow B}\) and \({{\,\mathrm{\mathcal {M}}\,}}_{A \rightarrow B}\) as [18, 19]

$$\begin{aligned} \mathbb {D}({{\,\mathrm{\mathcal {N}}\,}}|| {{\,\mathrm{\mathcal {M}}\,}}) := \sup _{\rho _{AR}} \mathbb {D}({{\,\mathrm{\mathcal {N}}\,}}_{A \rightarrow B}(\rho _{AR}) || {{\,\mathrm{\mathcal {M}}\,}}_{A \rightarrow B}(\rho _{AR})) \end{aligned}$$
(14)

where R is reference register of arbitrary size (\(|R| = |A|\) can be chosen when \(\mathbb {D}\) satisfies the data processing inequality).

We can use the divergences defined above to define the following conditional entropies for the subnormalised state \(\rho _{AB}\):

$$\begin{aligned} \bar{H}_{\alpha }^{\uparrow } (A|B)_{\rho }&:= \sup _{\sigma _B} - \bar{D}_{\alpha }(\rho _{AB} || {{\,\mathrm{\mathbb {1}}\,}}_A \otimes \sigma _B) \end{aligned}$$
(15)
$$\begin{aligned} \tilde{H}_{\alpha }^{\uparrow } (A|B)_{\rho }&:= \sup _{\sigma _B} - \tilde{D}_{\alpha }(\rho _{AB} || {{\,\mathrm{\mathbb {1}}\,}}_A \otimes \sigma _B) \end{aligned}$$
(16)
$$\begin{aligned} \bar{H}_{\alpha }^{\downarrow } (A|B)_{\rho }&:= - \bar{D}_{\alpha }(\rho _{AB} || {{\,\mathrm{\mathbb {1}}\,}}_A \otimes \rho _B) \end{aligned}$$
(17)
$$\begin{aligned} \tilde{H}_{\alpha }^{\downarrow } (A|B)_{\rho }&:= - \tilde{D}_{\alpha }(\rho _{AB} || {{\,\mathrm{\mathbb {1}}\,}}_A \otimes \rho _B) \end{aligned}$$
(18)

for appropriate \(\alpha \) in the domain of the divergences. The supremum in the definition for \(\bar{H}_{\alpha }^{\uparrow }\) and \( \tilde{H}_{\alpha }^{\uparrow }\) is over all quantum states \(\sigma _B\) on register B.

For \(\alpha \rightarrow 1\), all these conditional entropies are equal to the von Neumann conditional entropy H(A|B). \(\tilde{H}_{\infty }^{\uparrow } (A|B)_{\rho }\) is usually called the min-entropy. The min-entropy is usually denoted as \(H_{\min }(A|B)_{\rho }\) and for a subnormalised state can also be defined as

$$\begin{aligned} H_{\min }(A|B)_{\rho }&:= \sup \left\{ \lambda \in \mathbb {R}: \text { there exists state }\sigma _B \text { such that } \rho _{AB} \le 2^{-\lambda } {{\,\mathrm{\mathbb {1}}\,}}_A \otimes \sigma _B \right\} . \end{aligned}$$
(19)

For the purpose of smoothing, define the \(\epsilon \)-ball around the subnormalised state \(\rho \) as the set

$$\begin{aligned} B_{\epsilon }(\rho ) := \{ \tilde{\rho } \ge 0 : P(\rho , \tilde{\rho }) \le \epsilon \text { and } {{\,\textrm{tr}\,}}\tilde{\rho } \le 1\}. \end{aligned}$$
(20)

We define the smooth max-relative entropy as

$$\begin{aligned} D_{\max }^{\epsilon }(\rho || \sigma ) := \min _{\tilde{\rho } \in B_{\epsilon }(\rho )} D_{\max }(\tilde{\rho } || \sigma ) \end{aligned}$$
(21)

The smooth min-entropy of \(\rho _{AB}\) is defined as

$$\begin{aligned} H_{\min }^{\epsilon }(A|B)_{\rho } := \max _{\tilde{\rho } \in B_{\epsilon }(\rho )} H_{\min }(A|B)_{\tilde{\rho }}. \end{aligned}$$
(22)

3 Entropic Triangle Inequality for the Smooth Min-entropy

In this section, we derive a simple entropic triangle inequality (Lemma 3.5) for the smooth min-entropy of the form in Eq. 4. This Lemma is a direct consequence of the following triangle inequality for \(\tilde{D}_{\alpha }\) (see [20, Theorem 3.1] for a triangle inequality, which changes the second argument of \(\tilde{D}_{\alpha }\)).

Lemma 3.1

Let \(\rho \) and \(\eta \) be subnormalised states and Q be a positive operator, then for \(\alpha >1\), we have

$$\begin{aligned} \tilde{D}_{\alpha }(\rho || Q) \le \tilde{D}_{\alpha }(\eta || Q) + \frac{\alpha }{\alpha -1} D_{\max }(\rho || \eta ) + \frac{1}{\alpha -1} \log \frac{{{\,\textrm{tr}\,}}(\eta )}{{{\,\textrm{tr}\,}}(\rho )} \end{aligned}$$

and for \(\alpha <1\) if one of \(\tilde{D}_{\alpha }(\eta || Q)\) and \(D_{\max }(\rho || \eta )\) is finite (otherwise we cannot define their difference), we have

$$\begin{aligned} \tilde{D}_{\alpha }(\rho || Q) \ge \tilde{D}_{\alpha }(\eta || Q) - \frac{\alpha }{1- \alpha } D_{\max }(\rho || \eta )- \frac{1}{1-\alpha } \log \frac{{{\,\textrm{tr}\,}}(\eta )}{{{\,\textrm{tr}\,}}(\rho )}. \end{aligned}$$

Proof

If \(D_{\max }(\rho || \eta )= \infty \), then both statements are true trivially. Otherwise, we have that \(\rho \le 2^{D_{\max }(\rho || \eta )} \eta \) and also \(\rho \ll \eta \). Now, if \(\rho \not \ll Q\) then \(\eta \not \ll Q\). Hence, for \(\alpha >1\) if \(\tilde{D}_{\alpha }(\rho || Q) = \infty \), then \(\tilde{D}_{\alpha }(\eta || Q) = \infty \), which means the Lemma is also satisfied in this condition. For \(\alpha <1\), if \(\tilde{D}_{\alpha }(\rho || Q) = \infty \), then the Lemma is also trivially satisfied. For the remaining cases we have,

$$\begin{aligned} 2^{(\alpha -1) \tilde{D}_{\alpha }(\rho || Q)}&= \frac{{{\,\textrm{tr}\,}}\left( Q^{-\frac{\alpha -1}{2\alpha }} \rho Q^{-\frac{\alpha -1}{2\alpha }} \right) ^{\alpha }}{{{\,\textrm{tr}\,}}(\rho )} \\&\le \frac{{{\,\textrm{tr}\,}}\left( Q^{-\frac{\alpha -1}{2\alpha }} 2^{D_{\max }(\rho || \eta )} \eta Q^{-\frac{\alpha -1}{2\alpha }} \right) ^{\alpha }}{{{\,\textrm{tr}\,}}(\rho )}\\&= \frac{{{\,\textrm{tr}\,}}(\eta )}{{{\,\textrm{tr}\,}}(\rho )} 2^{\alpha D_{\max }(\rho || \eta )} 2^{(\alpha -1) \tilde{D}_{\alpha }(\eta || Q)} \end{aligned}$$

where we used the fact that \({{\,\textrm{tr}\,}}(f(X))\) is monotone increasing if the function f is monotone increasing. Dividing by \((\alpha -1)\) now gives the result. \(\square \)

We define smooth \(\alpha \)-Rényi conditional entropy as follows to help us amplify the above inequality.

Definition 3.2

(\(\epsilon \)-smooth \(\alpha \)-Rényi conditional entropy). For \(\alpha \in (1, \infty ]\) and \(\epsilon \in [0,1]\), we define the \(\epsilon \)-smooth \(\alpha \)-Rényi conditional entropy as

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha , \epsilon }(A|B)_{\rho }:= \max _{\tilde{\rho }_{AB} \in B_{\epsilon }(\rho _{AB})} \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\tilde{\rho }}. \end{aligned}$$
(23)

Lemma 3.3

For \(\alpha \in (1, \infty ]\) and \(\epsilon \in [0,1)\), and states \(\rho _{AB}\) and \(\eta _{AB}\) we have

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha , \epsilon }(A|B)_{\rho } \ge \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\eta } - \frac{\alpha }{\alpha -1} D^\epsilon _{\max } (\rho _{AB}|| \eta _{AB}) - \frac{1}{\alpha -1} \log \frac{1}{1-\epsilon ^2}. \end{aligned}$$

Proof

Let \(\tilde{\rho }_{AB} \in B_{\epsilon }(\rho _{AB})\) be a subnormalised state such that \(D_{\max }(\tilde{\rho }_{AB}|| \eta _{AB})= D^\epsilon _{\max } (\rho _{AB}|| \eta _{AB})\). Using Lemma 3.1 for \(\alpha >1\), we have that for every state \(\sigma _B\), we have

$$\begin{aligned} \tilde{D}_{\alpha } (\tilde{\rho }_{AB} || {{\,\mathrm{\mathbb {1}}\,}}_A \otimes \sigma _B)&\le \tilde{D}_{\alpha } (\eta _{AB} || {{\,\mathrm{\mathbb {1}}\,}}_A \otimes \sigma _B) + \frac{\alpha }{\alpha -1} D^{\epsilon }_{\max }({\rho }_{AB} || \eta _{AB}) + \frac{1}{\alpha -1}\log \frac{1}{1- \epsilon ^2} \end{aligned}$$
(24)

where we used the fact that \(\tilde{\rho }_{AB} \in B_{\epsilon }(\rho _{AB})\) which implies that \({{\,\textrm{tr}\,}}(\tilde{\rho }_{AB}) \ge 1- \epsilon ^2\). Since, the above bound is true for arbitrary states \(\sigma _B\), we can multiply it by \(-1\) and take the supremum to derive

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\tilde{\rho }} \ge \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\eta } - \frac{\alpha }{\alpha -1} D^\epsilon _{\max }(\rho _{AB}|| \eta _{AB}) - \frac{1}{\alpha -1} \log \frac{1}{1-\epsilon ^2}. \end{aligned}$$

The desired bound follows by using the fact that \(\tilde{H}^{\uparrow }_{\alpha , \epsilon }(A|B)_{{\rho }} \ge \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\tilde{\rho }}\). \(\square \)

Lemma 3.4

For a state \(\rho _{AB}\), \(\epsilon \in [0,1)\), and \(\delta \in (0,1)\) such that \(\epsilon + \delta < 1\) and \(\alpha \in (1,2]\), we have

$$\begin{aligned} H_{\min }^{\epsilon +\delta }(A|B)_{\rho } \ge \tilde{H}^{\uparrow }_{\alpha , \epsilon }(A|B)_{\rho } - \frac{g_0(\delta )}{\alpha -1} \end{aligned}$$

where \(g_0(x):= - \log (1- \sqrt{1-x^2})\).

Proof

First, note that

$$\begin{aligned} H_{\min }^{\epsilon +\delta }(A|B)_{\rho }&\ge \sup _{\tilde{\rho } \in B_{\epsilon }(\rho _{AB})} H_{\min }^{\delta }(A|B)_{\tilde{\rho }}. \end{aligned}$$
(25)

To prove this, consider a \(\tilde{\rho }_{AB} \in B_{\epsilon }(\rho _{AB})\) and \(\rho '_{AB} \in B_{\delta }(\tilde{\rho }_{AB})\) such that \(H_{\min }(A|B)_{{\rho '}} = H^{\delta }_{\min }(A|B)_{\tilde{\rho }}\). Then, using the triangle inequality for the purified distance, we have

$$\begin{aligned} P(\rho _{AB}, \rho '_{AB})&\le P(\rho _{AB}, \tilde{\rho }_{AB}) + P(\tilde{\rho }_{AB}, \rho '_{AB}) \\&\le \epsilon + \delta \end{aligned}$$

which implies that \(H_{\min }^{\epsilon +\delta }(A|B)_{\rho } \ge H_{\min }(A|B)_{{\rho '}} = H^{\delta }_{\min }(A|B)_{\tilde{\rho }}\). Since, this is true for all \(\tilde{\rho } \in B_{\epsilon }(\rho _{AB})\) the bound in Eq. 25 is true.

Using this, we have

$$\begin{aligned} H_{\min }^{\epsilon +\delta }(A|B)_{\rho }&\ge \sup _{\tilde{\rho } \in B_{\epsilon }(\rho _{AB})} H_{\min }^{\delta }(A|B)_{\tilde{\rho }}\\&\ge \sup _{\tilde{\rho } \in B_{\epsilon }(\rho _{AB})} \left\{ \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\tilde{\rho }} - \frac{g_0(\delta )}{\alpha -1} \right\} \\&= \tilde{H}^{\uparrow }_{\alpha , \epsilon }(A|B)_{\rho } - \frac{g_0(\delta )}{\alpha -1} \end{aligned}$$

where we have used [8, Lemma B.10]Footnote 5 (originally proven in [21]) in the second step. \(\square \)

We can combine these two lemmas to derive the following result.

Lemma 3.5

For \(\alpha \in (1,2]\), \(\epsilon \in [0,1)\), and \(\delta \in (0,1)\) such that \(\epsilon + \delta < 1\) and two states \(\rho _{AB}\) and \(\eta _{AB}\), we have

$$\begin{aligned} H_{\min }^{\epsilon +\delta }(A|B)_{\rho }&\ge \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\eta } - \frac{\alpha }{\alpha -1} D^\epsilon _{\max } (\rho _{AB}|| \eta _{AB}) - \frac{g_1(\delta , \epsilon )}{\alpha -1} \end{aligned}$$
(26)

where \(g_1(x, y):= - \log (1- \sqrt{1-x^2}) - \log (1-y^2)\).

Proof

We can combine Lemmas 3.3 and 3.4 as follows to derive the bound in the Lemma:

$$\begin{aligned} H_{\min }^{\epsilon +\delta }(A|B)_{\rho }&\ge \tilde{H}^{\uparrow }_{\alpha , \epsilon }(A|B)_{\rho } - \frac{g_0(\delta )}{\alpha -1}\\&\ge \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\eta } - \frac{\alpha }{\alpha -1} D^\epsilon _{\max } (\rho _{AB}|| \eta _{AB}) - \frac{1}{\alpha -1}\left( g_0(\delta ) + \log \frac{1}{1-\epsilon ^2} \right) . \end{aligned}$$

\(\square \)

We can use the asymptotic equipartition theorem for smooth min-entropy and max-relative entropy [21,22,23] to derive the following novel triangle inequality for the von Neumann conditional entropy. Although we do not use this inequality in this paper, we believe it is interesting and may prove useful in the future.

Corollary 3.6

For \(\alpha \in (1,2]\) and states \(\rho _{AB}\) and \(\eta _{AB}\), we have that

$$\begin{aligned} H(A|B)_{\rho }&\ge \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\eta } - \frac{\alpha }{\alpha -1} D(\rho _{AB}|| \eta _{AB}). \end{aligned}$$
(27)

Proof

Using Lemma 3.5 with \(\alpha \in (1,2]\), the states \(\rho ^{\otimes n}_{AB}\), and \(\eta ^{\otimes n}_{AB}\) and any \(\epsilon > 0\) and \(\delta > 0\) satisfying the conditions for the Lemma, we get

$$\begin{aligned}&H_{\min }^{\epsilon +\delta }(A_1^n|B_1^n)_{\rho ^{\otimes n}} \ge \tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n)_{\eta ^{\otimes n}} - \frac{\alpha }{\alpha -1} D^\epsilon _{\max } (\rho _{AB}^{\otimes n}|| \eta ^{\otimes n}_{AB}) - \frac{g_1(\delta , \epsilon )}{\alpha -1} \\&\quad \Rightarrow \frac{1}{n} H_{\min }^{\epsilon +\delta }(A_1^n|B_1^n)_{\rho ^{\otimes n}} \ge \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\eta } - \frac{\alpha }{\alpha -1} \frac{1}{n} D^\epsilon _{\max } (\rho _{AB}^{\otimes n}|| \eta ^{\otimes n}_{AB}) - \frac{1}{n} \frac{g_1(\delta , \epsilon )}{\alpha -1}. \end{aligned}$$

Taking the limit of the above for \(n \rightarrow \infty \), we get

$$\begin{aligned}&\lim _{n \rightarrow \infty } \frac{1}{n} H_{\min }^{\epsilon +\delta }(A_1^n|B_1^n)_{\rho ^{\otimes n}} \ge \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\eta } - \lim _{n \rightarrow \infty }\frac{\alpha }{\alpha -1} \frac{1}{n} D^\epsilon _{\max } (\rho _{AB}^{\otimes n}|| \eta ^{\otimes n}_{AB})\\&\quad - \lim _{n \rightarrow \infty } \frac{1}{n} \frac{g_1(\delta , \epsilon )}{\alpha -1} \\&\quad \Rightarrow H(A|B)_{\rho } \ge \tilde{H}^{\uparrow }_{\alpha }(A|B)_{\eta } - \frac{\alpha }{\alpha -1} D(\rho _{AB}|| \eta _{AB}) \end{aligned}$$

which proves the claim. \(\square \)

4 Approximately Independent Registers

In this section, we introduce our technique for using the smooth min-entropy triangle inequality for considering approximations by studying a state \(\rho _{A_1^n B}\) such that for every \(k \in [n]\)

$$\begin{aligned} \left\| \rho _{A_1^{k}B} - \rho _{A_k} \otimes \rho _{A_1^{k-1}B} \right\| _1 \le \epsilon . \end{aligned}$$
(28)

We assume that the registers \(A_k\) all have the same dimension equal to |A|. One should think of the registers \(A_k\) as the secret information produced during some protocol, which also provides the register B to an adversary. We would like to prove that \(H^{f(\epsilon )}_{\min }(A_1^n | B)\) is large (lower bounded by \(\Omega (n)\)) under the above approximate independence conditions for some reasonably small function f of \(\epsilon \) and close to \(nH(A_1)\), if we assume the states \(\rho _{A_k}\) are identical.

Let us first examine the case where the state \(\rho \) above is classical. We use the standard notation for probability distributions to address elements of \(\rho \), so that \(\rho (a_1^n, b):= \langle {a_1^n, b}| \rho _{A_1^n B} |{a_1^n, b}\rangle \), where \(|{a_1^n, b}\rangle \) is standard basis vector. To show that in this case the smooth min-entropy is high, we will show that the set where the conditional probability \(\rho (a_1^n |b):= \frac{\rho (a_1^n b)}{\rho (b)}\) can be large, has a small probability using the Markov inequality. We will use the following lemma for this purpose.

Lemma 4.1

Suppose pq are probability distributions on \(\mathcal {X}\) such that \(\frac{1}{2}\left\| p-q \right\| _1 \le \epsilon \), then \(S \subseteq \mathcal {X}\) defined as \(S:= \{x \in \mathcal {X}: p(x) \le (1+\epsilon ^{1/2}) q(x) \}\) is such that \(q(S) \ge 1- \epsilon ^{1/2}\) and \(p(S)\ge 1- \epsilon ^{1/2}- \epsilon \).

Proof

Let \(S^c:= \mathcal {X} {\setminus } S\), where S is the set defined above. If \(q(S^c) = 0\), the statement in the lemma is satisfied. If \(q(S^c) > 0\), we have that

$$\begin{aligned} \epsilon \ge \frac{1}{2}\left\| p-q \right\| _1&= \max _{H \subseteq \mathcal {X}} \vert p(H)- q(H) \vert \\&\ge q(S^c) \left| \frac{p(S^c)}{q(S^c)} - 1 \right| \\&\ge q(S^c)\left( \frac{p(S^c)}{q(S^c)} - 1 \right) \\&= q(S^c)\left( \frac{\sum _{x \in S^c} p(x)}{\sum _{x \in S^c} q(x)} - 1 \right) \\&\ge q(S^c)\left( \frac{\sum _{x \in S^c} (1+\epsilon ^{\frac{1}{2}})q(x)}{\sum _{x \in S^c} q(x)} - 1 \right) \\&= q(S^c) \epsilon ^{\frac{1}{2}} \end{aligned}$$

which implies that \(q(S^c) \le \epsilon ^{\frac{1}{2}}\). Now, the statement of the Lemma follows. \(\square \)

We will also assume for the sake of simplicity that \(\rho _{A_k}\) are identical for all \(k \in [n]\). Using the Lemma above, for every \(k \in [n]\), we know that the set

$$\begin{aligned} B_k :&= \left\{ (a_1^n, b): \rho (a_1^k, b)> (1+\sqrt{\epsilon })\rho (a_1^{k-1}, b)\rho (a_k) \right\} \\&= \left\{ (a_1^n, b): \rho (a_k | a_1^{k-1}, b) > (1+\sqrt{\epsilon })\rho (a_k) \right\} \end{aligned}$$

satisfies \(\Pr _\rho (B_k) \le 2\sqrt{\epsilon }\). We can now define \(L = \sum _{k=1}^n \chi _{B_k}\), which is a random variable that simply counts the number of bad sets \(B_k\) an element \((a_1^n, b)\) belongs to. Using the Markov inequality, we have

$$\begin{aligned} \Pr _{\rho }\left[ L > n\epsilon ^{\frac{1}{4}}\right] \le \frac{{{\,\mathrm{\mathbb {E}}\,}}_\rho [L]}{n\epsilon ^{\frac{1}{4}}} \le 2\epsilon ^{\frac{1}{4}}. \end{aligned}$$

We can define the bad set \(\mathcal {B}:= \left\{ (a_1^n, b): L(a_1^n, b)>n\epsilon ^{\frac{1}{4}} \right\} \), then we can define the subnormalised distribution \(\tilde{\rho }_{A_1^n B}\) as

$$\begin{aligned} \tilde{\rho }_{A_1^n B} (a_1^n, b) = {\left\{ \begin{array}{ll} \rho _{A_1^n B} (a_1^n, b) &{} (a_1^n, b) \not \in \mathcal {B} \\ 0 &{} \text {else} \end{array}\right. }. \end{aligned}$$

We have \(P(\tilde{\rho }_{A_1^n B}, \rho _{A_1^n B}) \le \sqrt{2}\epsilon ^{1/8}\). Further, note that for every \((a_1^n, b) \not \in \mathcal {B}\), we have

$$\begin{aligned} \rho (a_1^n | b)&= \prod _{k=1}^n \rho (a_k | a_1^{k-1}, b) \\&= \prod _{k: (a_1^n, b) \not \in B_k} \rho (a_k | a_1^{k-1}, b) \prod _{k: (a_1^n, b) \in B_k} \rho (a_k | a_1^{k-1}, b) \\&\le (1+\sqrt{\epsilon })^n \prod _{k: (a_1^n, b) \not \in B_k} \rho _{A_k}(a_k)\\&\le (1+\sqrt{\epsilon })^n 2^{-n(1-\epsilon ^{\frac{1}{4}})H_{\min }(A_1)} \end{aligned}$$

where in the third line we have used the fact that if \((a_1^n, b) \not \in B_k\), then \(\rho (a_k | a_1^{k-1}b) \le (1+\sqrt{\epsilon })\rho _{A_k}(a_k)\) and in the last line we have used the fact that for \((a_1^k, b) \not \in \mathcal {B}\), we have \(|\{k \in [n]: (a_1^{n},b) \not \in B_k\}| = n- L(a_1^n, b) \ge n(1-\epsilon ^{\frac{1}{4}})\), that all the states \(\rho _{A_k}\) are identical and \(2^{-H_{\min }(A_k)} = \max _{a_k} \rho _{A_k}(a_k)\). Note that we have essentially proven and used a \(D_{\max }\) bound above. This proves the following lower bound for the smooth min-entropy of \(\rho \)

$$\begin{aligned} H^{\sqrt{2}\epsilon ^{\frac{1}{8}}}_{\min } (A_1^n | B) \ge n(1-\epsilon ^{\frac{1}{4}})H_{\min }(A_1) - n \log (1+\sqrt{\epsilon }). \end{aligned}$$
(29)

The right-hand side above can be improved to get the Shannon entropy H instead of the min-entropy \(H_{\min }\). However, we will not pursue this here, since this bound is sufficient for the purpose of our discussion.

Although, we are unable to generalise the classical argument above to the quantum case, it provides a great amount of insight into the approximately independent registers problem. Two important examples of distributions, which satisfy the approximate independence conditions above were mentioned in Footnotes 2 and 3 earlier. To create the first distribution, we flip a biased coin B, which is 0 with probability \(\epsilon \) and 1 otherwise. If \(B=0\), then \(A_1^n\) is set to the constant all zero string otherwise it is sampled randomly and independently. For this distribution, once the bad event (\(B=0\)) is removed, the new distribution has a high min-entropy. On the other hand, for the second distribution, \(Q_{A_1^{2n}B_1^{2n}}\), we have that the random bits \(B_i\) are chosen independently, with each being equal to 0 with probability \(\epsilon \) and 1 otherwise. If the bit \(B_i\) is 0, then \(A_i\) is set equal to \(A_{i-1}\) otherwise it is sampled independently. In this case, there is no small probability (small as a function of \(\epsilon \)) event, that one can simply remove, so that the distribution becomes i.i.d. However, we expect that with high probability the number of \(B_i = 0\) is close to \(2n \epsilon \). Given that the distribution samples all the other \(A_i\) independently, the smooth min-entropy for the distribution should be close to \(2n(1-\epsilon )H(A_1)\). The above argument shows that any distribution satisfying the approximate independence conditions in Eq. 28 can be handled by combining the methods used for these two example distributions, that is, deleting the bad part of the distribution and recognising that the probability for every element in the rest of the space behaves independently on average.

The above classical argument is difficult to generalise to quantum states primarily because the quantum equivalents of Lemma 4.1 are not as nice and simple. Furthermore, quantum conditional probabilities themselves are also difficult to use. Fortunately, the substate theorem serves as the perfect tool for developing a smooth max-relative entropy bound, which we can then use with the min-entropy triangle inequality. The quantum substate theorem [10, 24] provides an upper bound on the smooth max relative entropy \(D^{\epsilon }_{\max }(\rho || \sigma )\) between two states in terms of their relative entropy \(D(\rho || \sigma )\).

Theorem 4.2

(Quantum substate theorem [24]). Let \(\rho \) and \(\sigma \) be two states on the same Hilbert space. Then for any \(\epsilon \in (0,1)\), we have

$$\begin{aligned} D^{\sqrt{\epsilon }}_{\max }(\rho || \sigma ) \le \frac{D(\rho || \sigma )+1}{\epsilon } + \log \frac{1}{1-\epsilon }. \end{aligned}$$
(30)

In this section, we will also frequently use the multipartite mutual information [25,26,27]. For a state \(\rho _{X_1^n}\), the multipartite mutual information between the registers \((X_1, X_2, \ldots , X_n)\) is defined as

$$\begin{aligned} I(X_1: X_2: \cdots : X_n)_\rho := D(\rho _{X_1^n}|| \rho _{X_1} {{\,\mathrm{\otimes }\,}}\rho _{X_2} {{\,\mathrm{\otimes }\,}}\cdots {{\,\mathrm{\otimes }\,}}\rho _{X_n}). \end{aligned}$$
(31)

In other words, it is the relative entropy between \(\rho _{X_1^n}\) and \(\rho _{X_1} {{\,\mathrm{\otimes }\,}}\rho _{X_2} {{\,\mathrm{\otimes }\,}}\cdots {{\,\mathrm{\otimes }\,}}\rho _{X_n}\). It can easily be shown that the multipartite mutual information satisfies the following identities:

$$\begin{aligned} I(X_1: X_2: \cdots : X_n)_\rho&= \sum _{k=1}^n H(X_k)_\rho - H(X_1 \cdots \ X_n)_\rho \end{aligned}$$
(32)
$$\begin{aligned}&= \sum _{k=2}^n I(X_k : X_1^{k-1}). \end{aligned}$$
(33)

Going back to proving a bound for the quantum approximately independent registers problem, note that using the Alicki–Fannes–Winter (AFW) bound [28, 29] for mutual information [30, Theorem 11.10.4], Eq. 28 implies that for every \(k \in [n]\)

$$\begin{aligned} I(A_k : A_1^{k-1}B)_\rho \le \epsilon \log |A| + g_2\left( \frac{\epsilon }{2} \right) \end{aligned}$$
(34)

where \(g_2(x):= (x+1) \log (x+1) - x \log (x)\). With this in mind, we can now focus our efforts on proving the following theorem.

Theorem 4.3

Let registers \(A_k\) have dimension |A| for all \(k \in [n]\). Suppose a quantum state \(\rho _{A_1^n B}\) is such that for every \(k \in [n],\) we have

$$\begin{aligned} I(A_k : A_1^{k-1}B)_{\rho } \le \epsilon \end{aligned}$$
(35)

for some \(0< \epsilon < 1\). Then, we have that

$$\begin{aligned} H_{\min }^{\epsilon ^{\frac{1}{4}}+\epsilon }(A_1^n|B)_{\rho }&\ge \sum _{k=1}^n H (A_k)_{\rho } - 3 n \epsilon ^{\frac{1}{4}} \log (1+2|A|) \nonumber \\&\quad - \frac{2 \log (1+2|A|)}{\epsilon ^{3/4}} - \frac{2 \log (1+2|A|)}{\epsilon ^{1/4}} \left( \log (1- \sqrt{\epsilon }) + g_1(\epsilon , \epsilon ^{\frac{1}{4}}) \right) \end{aligned}$$
(36)

where \(g_1(x, y):= - \log (1- \sqrt{1-x^2}) - \log (1-y^2)\). In particular, when all the states \(\rho _{A_k}\) are identical, we have

$$\begin{aligned} H_{\min }^{\epsilon ^{\frac{1}{4}}+\epsilon }(A_1^n|B)_{\rho }&\ge n \left( H (A_1)_{\rho } - 3 \epsilon ^{\frac{1}{4}} \log (1+2|A|) \right) \nonumber \\&\quad - \frac{2 \log (1+2|A|)}{\epsilon ^{3/4}} - \frac{2 \log (1+2|A|)}{\epsilon ^{1/4}} \left( \log (1- \sqrt{\epsilon }) + g_1(\epsilon , \epsilon ^{\frac{1}{4}}) \right) . \end{aligned}$$
(37)

Proof

First note that we have,

$$\begin{aligned} I(A_1: A_2: \cdots : A_n: B)&= D(\rho _{A_1^n B} || \bigotimes _{k=1}^n \rho _{A_k} \otimes \rho _B) \\&= \sum _{k=1}^n I(A_k : A_1^{k-1}B) \\&\le n \epsilon . \end{aligned}$$

Using the substate theorem, we now have

$$\begin{aligned} D_{\max }^{\epsilon ^{\frac{1}{4}}}\left( \rho _{A_1^n B} \Bigg \Vert \bigotimes _{k=1}^n \rho _{A_k} \otimes \rho _B \right)&\le \frac{D(\rho _{A_1^n B} || \bigotimes _{k=1}^n \rho _{A_k} \otimes \rho _B) + 1}{\sqrt{\epsilon }} - \log (1- \sqrt{\epsilon }) \nonumber \\&\le n \sqrt{\epsilon } + \frac{1}{\sqrt{\epsilon }} - \log (1- \sqrt{\epsilon }). \end{aligned}$$
(38)

We now define the auxiliary state \(\eta _{A_1^n B}:= \bigotimes _{k=1}^n \rho _{A_k} \otimes \rho _B\). Using Lemma 3.5, for \(\alpha \in (1,2)\), we can transform the smooth min-entropy into an \(\alpha \)-Rényi entropy on the auxiliary product state \(\eta _{A_1^n B}\) as follows:

$$\begin{aligned}&H_{\min }^{\epsilon ^{\frac{1}{4}}+\epsilon }(A_1^n|B)_{\rho } \\&\quad \ge \tilde{H}^{\uparrow }_{\alpha }(A_1^n|B)_{\eta } - \frac{\alpha }{\alpha -1} D^{\epsilon ^{\frac{1}{4}}}_{\max } (\rho _{A_1^n B}|| \eta _{A_1^n B}) - \frac{g_1(\epsilon , \epsilon ^{\frac{1}{4}})}{\alpha -1} \\&\quad = \sum _{k=1}^n \tilde{H}^{\uparrow }_{\alpha }(A_k)_{\rho } - \frac{\alpha }{\alpha -1} D^{\epsilon ^{\frac{1}{4}}}_{\max } (\rho _{A_1^n B}|| \eta _{A_1^n B}) - \frac{g_1(\epsilon , \epsilon ^{\frac{1}{4}})}{\alpha -1} \\&\quad \ge \sum _{k=1}^n H (A_k)_{\rho } - n(\alpha -1) \log ^2(1+2|A|)- \frac{\alpha }{\alpha -1} D^{\epsilon ^{\frac{1}{4}}}_{\max } (\rho _{A_1^n B}|| \eta _{A_1^n B}) - \frac{g_1(\epsilon , \epsilon ^{\frac{1}{4}})}{\alpha -1} \\&\quad \ge \sum _{k=1}^n H (A_k)_{\rho } - n(\alpha -1) \log ^2(1+2|A|)- \frac{\alpha }{\alpha -1} n \sqrt{\epsilon }\\&\quad - \frac{\alpha }{\alpha -1} \frac{1}{\sqrt{\epsilon }} - \frac{\alpha }{\alpha -1} \log (1- \sqrt{\epsilon })- \frac{g_1(\epsilon , \epsilon ^{\frac{1}{4}})}{\alpha -1}. \end{aligned}$$

In the third line above, we have used [8, Lemma B.9] (which is an improvement of [21, Lemma 8]), which is valid as long as \(\alpha <1+ \frac{1}{\log (1+ 2|A|)}\). We will select \(\alpha = 1 + \frac{\epsilon ^{1/4}}{\log (1+ 2|A|)}\) for which the above \(\alpha \) bound is satisfied, this gives us

$$\begin{aligned} H_{\min }^{\epsilon ^{\frac{1}{4}}+\epsilon }(A_1^n|B)_{\rho }&\ge \sum _{k=1}^n H (A_k)_{\rho } - 3 n \epsilon ^{\frac{1}{4}} \log (1+2|A|) - \frac{2 \log (1+2|A|)}{\epsilon ^{3/4}} \\&\quad - \frac{2 \log (1+2|A|)}{\epsilon ^{1/4}} \left( \log (1- \sqrt{\epsilon }) + g_1(\epsilon , \epsilon ^{\frac{1}{4}}) \right) . \end{aligned}$$

\(\square \)

We can now plug the bound in Eq. 34 to derive the following Corollary.

Corollary 4.4

Let registers \(A_k\) have dimension |A| for all \(k \in [n]\). Suppose a quantum state \(\rho _{A_1^n B}\) is such that for every \(k \in [n],\) we have

$$\begin{aligned} \left\| \rho _{A_1^k B} - \rho _{A_k} \otimes \rho _{A_1^{k-1} B} \right\| _1 \le \epsilon . \end{aligned}$$
(39)

Then, we have that for \(\delta = \epsilon \log |A| + g_2\left( \frac{\epsilon }{2} \right) \) such that \(0< \delta <1\),

$$\begin{aligned} H_{\min }^{\delta ^{\frac{1}{4}}+\delta }(A_1^n|B)_{\rho }&\ge \sum _{k=1}^n H (A_k)_{\rho } - 3 n \delta ^{\frac{1}{4}} \log (1+2|A|) \nonumber \\&\quad - \frac{2 \log (1+2|A|)}{\delta ^{3/4}} - \frac{2 \log (1+2|A|)}{\delta ^{1/4}} \left( \log (1- \sqrt{\delta }) + g_1(\delta , \delta ^{\frac{1}{4}}) \right) \end{aligned}$$
(40)

where \(g_1(x, y) = - \log (1- \sqrt{1-x^2}) - \log (1-y^2)\) and \(g_2(x) = (x+1)\log (x+1) - x \log (x)\). In particular, when all the states \(\rho _{A_k}\) are identical, we have

$$\begin{aligned} H_{\min }^{\delta ^{\frac{1}{4}}+\delta }(A_1^n|B)_{\rho }&\ge n \left( H (A_1)_{\rho } - 3 \delta ^{\frac{1}{4}} \log (1+2|A|) \right) \nonumber \\&\quad - \frac{2 \log (1+2|A|)}{\delta ^{3/4}} - \frac{2 \log (1+2|A|)}{\delta ^{1/4}} \left( \log (1- \sqrt{\delta }) + g_1(\delta , \delta ^{\frac{1}{4}}) \right) . \end{aligned}$$
(41)

4.1 Weak approximate asymptotic equipartition

We can modify the proof of Theorem 4.3 to prove a weak approximate asymptotic equipartition property (AEP).

Theorem 4.5

Let registers \(A_k\) have dimension |A| for all \(k \in [n]\) and the registers \(B_k\) have dimension |B| for all \(k \in [n]\). Suppose a quantum state \(\rho _{A_1^n B_1^n E}\) is such that for every \(k \in [n],\) we have

$$\begin{aligned} \left\| \rho _{A_1^k B_1^k E} - \rho _{A_k B_k} \otimes \rho _{A_1^{k-1} B_1^{k-1} E} \right\| _1 \le \epsilon . \end{aligned}$$
(42)

Then, we have that for \(\delta = \epsilon \log \left( |A||B| \right) + g_2\left( \frac{\epsilon }{2} \right) \) such that \(0< \delta <1\),

$$\begin{aligned} H_{\min }^{\delta ^{\frac{1}{4}}+\delta }(A_1^n|B_1^n E)_{\rho }&\ge \sum _{k=1}^n H (A_k|B_k)_{\rho } - 3 n \delta ^{\frac{1}{4}} \log (1+2|A|) \nonumber \\&\quad - \frac{2 \log (1+2|A|)}{\delta ^{3/4}} - \frac{2 \log (1+2|A|)}{\delta ^{1/4}} \left( \log (1- \sqrt{\delta }) + g_1(\delta , \delta ^{\frac{1}{4}}) \right) \end{aligned}$$
(43)

where \(g_1(x, y) = - \log (1- \sqrt{1-x^2}) - \log (1-y^2)\) and \(g_2(x) = (x+1)\log (x+1) - x \log (x)\). In particular, when all the states \(\rho _{A_k B_k}\) are identical, we have

$$\begin{aligned} H_{\min }^{\delta ^{\frac{1}{4}}+\delta }(A_1^n|B_1^n E)_{\rho }&\ge n \left( H (A_1|B_1)_{\rho } - 3 \delta ^{\frac{1}{4}} \log (1+2|A|) \right) \nonumber \\&\quad - \frac{2 \log (1+2|A|)}{\delta ^{3/4}} - \frac{2 \log (1+2|A|)}{\delta ^{1/4}} \left( \log (1- \sqrt{\delta }) + g_1(\delta , \delta ^{\frac{1}{4}}) \right) . \end{aligned}$$
(44)

Proof

To prove this, we use the auxiliary state \(\eta _{A_1^n B_1^n E}:= \bigotimes \rho _{A_k B_k} \otimes \rho _E\). Then, we have

$$\begin{aligned} D(\rho _{A_1^n B_1^n E} || \eta _{A_1^n B_1^n E})&= I(A_1 B_1: A_2 B_2: \cdots : A_n B_n: E)_{\rho } \\&= \sum _{k=1}^n I(A_k B_k: A_1^{k-1} B_1^{k-1} E)_{\rho } \\&\le n\left( \epsilon \log \left( |A||B| \right) + g\left( \frac{\epsilon }{2} \right) \right) = n \delta \end{aligned}$$

where we used the AFW bound for mutual information in the last line [30, Theorem 11.10.4]. The rest of the proof follows the proof of Theorem 4.3, only difference being that now we have \(\tilde{H}^{\uparrow }_{\alpha }(A_1^n | B_1^n E)_{\eta } = \sum _{k=1}^n \tilde{H}^{\uparrow }_{\alpha }(A_k | B_k)_{\rho }\). \(\square \)

We call this generalisation weak because the smoothing term (\(\delta \)) depends on size of the side information |B|. In Appendix E, we show that under the assumptions of the theorem, some sort of bound on the dimension of the registers B is necessary otherwise one cannot have a non-trivial bound on the smooth min-entropy.

4.2 Simple security proof for sequential device independent quantum key distribution

The approximately independent register scenario and the associated min-entropy lower bound can be used to provide simple “proof of concept” security proofs for cryptographic protocols. In this section, we will briefly sketch a proof for sequential device independent quantum key distribution (DIQKD) to demonstrate this idea. The protocol for sequential DIQKD used in [31] is presented as Protocol 1.

Sequential DIQKD protocol

Parameters:

\(\bullet \) \(\omega _{\text {exp}}\) is the expected winning probability for the honest implementation of the device

\(\bullet \) \(n \ge 1\) is the number of rounds in the protocol

\(\bullet \) \(\gamma \in (0,1]\) is the fraction of test rounds

Protocol:

1. For every \(0 \le i \le n\) perform the following steps:

   (a) Alice chooses a random \(T_i \in \{ 0,1\}\) with \(\Pr [T_i =1] =\gamma \).

   (b) Alice sends \(T_i\) to Bob.

   (c) If \(T_i = 0\), Alice and Bob set the questions \((X_i,Y_i) = (0,2)\), otherwise they sample \((X_i,Y_i)\) uniformly at random from \(\{ 0,1\}\).

   (d) Alice and Bob use their device with the questions \((X_i,Y_i)\) and obtain the outputs \(A_i, B_i\).

2. Alice announces her questions \(X_1^n\) to Bob.

3. Error correction: Alice and Bob use an error correction procedure, which lets Bob obtain the raw key \(\tilde{A}_1^n\) (if the error correction protocol succeeds, then \(A_1^n = \tilde{A}_1^n\)). In case the error correction protocol aborts, they abort the QKD protocol too.

4. Parameter Estimation: Bob uses \(B_1^n\) and \(\tilde{A}_1^n\) to compute the average winning probability \(\omega _{\text {avg}}\) on the test rounds. He aborts if \(\omega _{\text {avg}} < \omega _{\text {exp}}\)

5. Privacy Amplification: Alice and Bob use a privacy amplification protocol to create a secret key K from \(A_1^n\) (using \(\tilde{A}_1^n\) for Bob).

Protocol 1

We consider a simple model for DIQKD, where Eve (the adversary) distributes a state \(\rho ^{(0)}_{E_A E_B E}\) between Alice and Bob at the beginning of the protocol. Alice and Bob then use their states sequentially as given in Protocol 1. The kth round of the protocol produces the questions \(X_k, Y_k\) and \(T_k\), the answers \(A_k\) and \( B_k\) and transforms the shared state from \(\rho ^{(k-1)}_{E_A E_B E}\) to \(\rho ^{(k)}_{E_A E_B E}\).

Given the questions and answers of the previous rounds, the state shared between Alice and Bob and their devices in each round can be viewed as a device for playing the CHSH game. Suppose in the kth round, the random variables produced in the previous \(k-1\) rounds are \(r_{k-1}:= x_1^{k-1}, y_1^{k-1}, t_1^{k-1}, a_1^{k-1}, b_1^{k-1}\) and that the state shared between Alice and Bob is \(\rho ^{(k-1)}_{E_A E_B E| r_{k-1}}\). We can then define \(\Pr [W_k |r_{k-1}]\) to be the winning probability of the CHSH game played by Alice and Bob using the state and their devices in the kth round. Note that Alice’s device cannot distinguish whether the CHSH game is played in a round or is used for key generation. We can further take an average over all the previous round’s random variables to derive the probability of winning the kth game

$$\begin{aligned} \Pr [W_k] = {{\,\mathrm{\mathbb {E}}\,}}_{r_{k-1}}\left[ \Pr [W_k |r_{k-1}]\right] . \end{aligned}$$
(45)

Alice and Bob randomly sample a subset of the rounds (using the random variable \(T_k\)) and play the CHSH game on this subset. If the average winning probability of CHSH game on this subset is small, they abort the protocol. For simplicity and brevity, we will assume here that the state \(\rho ^{(0)}_{E_A E_B E}\) distributed between Alice and Bob at the start of the protocol by Eve has an average winning probability at least \(\omega _{{\text {exp}}}\), that is,

$$\begin{aligned} \frac{1}{n}\sum _{k=1}^n \Pr [W_k] \ge \omega _{{\text {exp}}} - \delta \end{aligned}$$
(46)

for some small \(\delta >0\). Using standard sampling arguments it can be argued that either this is true or the protocol aborts with high probability.

For any shared state \(\sigma _{E_A E_B E}\) (where \(E_A\) is held by Alice, \(E_B\) is held by Bob and E is held by the adversary) and local measurement devices, if Alice and Bob win the CHSH game with a probability \(\omega \in (\frac{3}{4}, \frac{2+\sqrt{2}}{4}]\), then Alice’s answer A to the game is random given the questions XY and the register E held by adversary. This is quantified by the following entropic bound [32] (see [31, Lemma 5.3] for the following form)

$$\begin{aligned} H(A| XY E) \ge f(\omega ) = {\left\{ \begin{array}{ll} 1 - h\left( \frac{1}{2} + \frac{1}{2}\sqrt{16 \omega (\omega -1) +3} \right) &{} \text {if } \omega \in [\frac{3}{4}, \frac{2+\sqrt{2}}{4}] \\ 0 &{} \text {if } \omega \in [0, \frac{3}{4}) \end{array}\right. } \end{aligned}$$
(47)

where \(h(\cdot )\) is the binary entropy. The function f is convex over the interval \(\left[ 0, \frac{2+\sqrt{2}}{4}\right] \). We plot it in the interval \([\frac{3}{4}, \frac{2+\sqrt{2}}{4}]\) in Fig. 1.

Fig. 1
figure 1

The lower bound in Eq. 47 for the interval \([\frac{3}{4}, \frac{2+\sqrt{2}}{4}]\)

For \(\epsilon >0\), we choose the parameter \(\omega _{\text {exp}} \in {[}\frac{3}{4} + \delta , \frac{2+\sqrt{2}}{4}]\) to be large enough so that

$$\begin{aligned} 1- f(\omega _{\text {exp}} - \delta ) = h\left( \frac{1}{2} + \frac{1}{2}\sqrt{16 (\omega _{\text {exp}}-\delta ) (\omega _{\text {exp}} - \delta -1) +3} \right) \le \epsilon ^4. \end{aligned}$$
(48)

We will now use Eq. 47 to bound the von Neumann entropy of the answers given Eve’s information for the sequential DIQKD protocol. We have

$$\begin{aligned} H(A_1^n | X_1^n Y_1^n T_1^n E)&= \sum _{k=1}^n H(A_k | A_1^{k-1} X_1^n Y_1^n T_1^n E) \\&\overset{(1)}{=}\ \sum _{k=1}^n H(A_k | A_1^{k-1} X_1^k Y_1^k T_1^k E) \\&\overset{(2)}{=}\ \sum _{k=1}^n H(A_k | X_k Y_k R_{k-1} E) \\&= \sum _{k=1}^n {{\,\mathrm{\mathbb {E}}\,}}_{r_{k-1} \sim R_{k-1}} \left[ H(A_k | X_k Y_k E)_{\rho ^{(k)}_{|r_{k-1}}}\right] \\&\overset{(3)}{\ge }\ \sum _{k=1}^n {{\,\mathrm{\mathbb {E}}\,}}_{r_{k-1} \sim R_{k-1}} \left[ f\left( \Pr [W_k | r_{k-1}] \right) \right] \\&\ge \sum _{k=1}^n f\left( \Pr [W_k] \right) \\&\ge n f\left( \frac{1}{n} \sum _{k=1}^n \Pr [W_k] \right) \\&\ge nf(\omega _{\text {exp}}-\delta ) \ge n(1- \epsilon ^4) \end{aligned}$$

where in (1) we have used the fact that the questions sampled in the rounds after the kth round are independent of the random variables in the previous rounds, in (2) we use the fact that Alice’s answers are independent of the random variable \(T_k\) given the question \(X_k\) and we also grouped the random variables generated in the previous round into the random variable \(R_{k-1}:= A_1^{k-1} B_1^{k-1} X_1^{k-1} Y_1^{k-1} T_1^{k-1}\), in (3) we use the bound in Eq. 47, and in the next two steps we use convexity of f. If instead of the von Neumann entropy on the left-hand side above we had the smooth min-entropy, then the bound above would be sufficient to prove the security of DIQKD. However, this argument cannot be easily generalised to the smooth min-entropy because a chain rule like the one used in the first step does not exist for the smooth min-entropy (entropy accumulation [8, 9] generalises exactly such an argument). We can use the argument used for the approximately independent register case to transform this von Neumann entropy bound to a smooth min-entropy bound.

This bound results in the following bound on the multipartite mutual information

$$\begin{aligned} I(A_1: \cdots : A_n: X_1^n Y_1^n T_1^n E)&= \sum _{k=1}^n H(A_k) + H(X_1^n Y_1^n T_1^n E) - H(A_1^n X_1^n Y_1^n T_1^n E) \\&= \sum _{k=1}^n H(A_k) - H(A_1^n | X_1^n Y_1^n T_1^n E) \\&\le n - n(1-\epsilon ^4) = n \epsilon ^4 \end{aligned}$$

where we have used the dimension bound \(H(A_k) \le 1\) for every \(k \in [n]\). This is the same as the multipartite mutual information bound we derived while analysing approximately independent registers in Theorem 4.3. We can simply use the smooth min-entropy bound derived there here as well. This gives us the bound

$$\begin{aligned} H^{2\epsilon }_{\min }(A_1^n | X_1^n Y_1^n T_1^n E)&\ge \sum _{k=1}^n H(A_k) - 3n \epsilon \log 5 - O\left( \frac{1}{\epsilon ^3} \right) \nonumber \\&=n(1- 3\epsilon \log 5) - O\left( \frac{1}{\epsilon ^3} \right) \end{aligned}$$
(49)

where we have used the fact that the answers \(A_k\) can always be assumed to be uniformly distributed [31, 32]. For every \(\epsilon >0\), we can choose a sufficiently large n so that this bound is large and positive.

We note that this method is only able to provide “proof of concept” or existence type security proofs. This proof method couples the value of the security parameter for privacy amplification \(\epsilon \) with the average winning probability, which is not desirable. The parameter \(\epsilon \) is chosen according to the security requirements of the protocol and is typically very small. For such values of \(\epsilon \), the average winning probability of the protocol will have to be extremely close to the maximum and we cannot realistically expect practical implementations to achieve such high winning probabilities. However, we do expect that this method will make it easier to create “proof of concept” type proofs for new cryptographic protocols in the future.

5 Approximate Entropy Accumulation

Fig. 2
figure 2

The setting for entropy accumulation and Theorem 5.1. For \(k \in [n]\), the channels \({{\,\mathrm{\mathcal {M}}\,}}_k\) are repeatedly applied to the registers \(R_{k-1}\) to produce the “secret” information \(A_k\) and the side information \(B_k\)

In general, it is very difficult to estimate the smooth min-entropy of states produced during cryptographic protocols. The entropy accumulation theorem (EAT) [8] provides a tight and simple lower bound for the smooth min-entropy \(H^{\epsilon }_{\min }(A_1^n | B_1^n E)_{\rho }\) of sequential processes, under certain Markov chain conditions. The state \(\rho _{A_1^n B_1^n E}\) in the setting for EAT is produced by a sequential process of the form shown in Fig. 2. The process begins with the registers \(R_0\) and E. In the context of a cryptographic protocol, the register \(R_0\) is usually held by the honest parties, whereas the register E is held by the adversary. Then, in each round \(k \in [n]\) of the process, a channel \({{\,\mathrm{\mathcal {M}}\,}}_{k}: R_{k-1} \rightarrow A_k B_k R_k\) is applied on the register \(R_{k-1}\) to produce the registers \(A_k, B_k\) and \(R_k\). The registers \(A_1^n\) usually contain a partially secret raw key and the registers \(B_1^n\) contain the side information about \(A_1^n\) revealed to the adversary during the protocol. EAT requires that for every \(k \in [n]\), the side information \(B_k\) satisfies the Markov chain \(A_1^{k-1} \leftrightarrow B_1^{k-1}E \leftrightarrow B_k\), that is, the side information revealed in the kth round does not reveal anything more about the secret registers of the previous rounds than was already known to the adversary through \(B_1^{k-1}E\). Under this assumption, EAT provides the following lower bound for the smooth min-entropy

$$\begin{aligned} H^{\epsilon }_{\min }(A_1^n | B_1^n E)_{\rho } \ge \sum _{k=1}^n \inf _{\omega _{R_{k-1} \tilde{R}}} H(A_k | B_k \tilde{R})_{{{\,\mathrm{\mathcal {M}}\,}}_k (\omega )} - c\sqrt{n} \end{aligned}$$
(50)

where the infimum is taken over all input states to the channels \({{\,\mathrm{\mathcal {M}}\,}}_k\) and \(c>0\) is a constant depending only on |A| (size of registers \(A_k\)) and \(\epsilon \). We will state and prove an approximate version of EAT. Consider the sequential process in Fig. 2 again. Now, suppose that the channels \({{\,\mathrm{\mathcal {M}}\,}}_k\) do not necessarily satisfy the Markov chain conditions mentioned above, but each of the channels \({{\,\mathrm{\mathcal {M}}\,}}_k\) can be \(\epsilon \)-approximated by \({{\,\mathrm{\mathcal {M}}\,}}'_k\) which satisfy the Markov chain \(A_1^{k-1} \leftrightarrow B_1^{k-1}E \leftrightarrow B_k\) for a certain collection of inputs. The approximate entropy accumulation theorem below provides a lower bound on the smooth min-entropy in such a setting. The proof of this theorem again uses the technique based on the smooth min-entropy triangle inequality developed in the previous section. In this setting too, we have a chain of approximations. For each \(k \in [n]\), we have

$$\begin{aligned} \rho _{A_1^k B_1^k E}&= {{\,\textrm{tr}\,}}_{R_k} \circ {{\,\mathrm{\mathcal {M}}\,}}_k \left( {{\,\mathrm{\mathcal {M}}\,}}_{k-1} \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E}) \right) \\&\approx _{\epsilon } {{\,\textrm{tr}\,}}_{R_k} \circ {{\,\mathrm{\mathcal {M}}\,}}'_k \left( {{\,\mathrm{\mathcal {M}}\,}}_{k-1} \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E}) \right) := \sigma ^{(k)}_{A_1^k B_1^k E}. \end{aligned}$$

According to the Markov chain assumption for the channels \({{\,\mathrm{\mathcal {M}}\,}}'_k\), the state \(\sigma ^{(k)}_{A_1^k B_1^k E}\), satisfies the Markov chain \(A_1^{k-1} \leftrightarrow B_1^{k-1}E \leftrightarrow B_k\). Therefore, we expect that the register \(A_k\) adds some entropy to the smooth min-entropy \(H^{\epsilon '}_{\min }(A_1^n | B_1^n E)_{\rho }\) and that the information leaked through \(B_1^n\) is not too large. We show that this is indeed the case in the approximate entropy accumulation theorem.

The approximate entropy accumulation theorem can be used to analyse and prove the security of cryptographic protocols under certain imperfections. The entropy accumulation theorem itself is used to prove the security of sequential device independent quantum key distribution (DIQKD) protocols [12]. In these protocols, the side information \(B_k\) produced during each of the rounds are the questions used during the round to play a non-local game, like the CHSH game. In the ideal case, these questions are sampled independently of everything which came before. As an example of an imperfection, we can imagine that some physical effect between the memory storing the secret bits \(A_1^{k-1}\) and the device producing the questions may lead to a small correlation between the side information produced during the kth round and the secret bits \(A_1^{k-1}\) (also see [13, 14]). The approximate entropy accumulation theorem below can be used to prove security of DIQKD under such imperfections. We do not, however, pursue this example here and leave applications of this theorem for future work. In Sect. 5.4, we modify this Theorem to incorporate testing for EAT.

Theorem 5.1

For \(k \in [n]\), let the registers \(A_k\) and \(B_k\) be such that \(|A_k| = |A|\) and \(|B_k| = |B|\). For \(k \in [n]\), let \({{\,\mathrm{\mathcal {M}}\,}}_k\) be channels from \(R_{k-1} \rightarrow R_{k} A_k B_k\) and

$$\begin{aligned} \rho _{A_1^n B_1^n E} = {{\,\textrm{tr}\,}}_{R_n} \circ {{\,\mathrm{\mathcal {M}}\,}}_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1 (\rho ^{(0)}_{R_0 E}) \end{aligned}$$
(51)

be the state produced by applying these maps sequentially. Suppose the channels \({{\,\mathrm{\mathcal {M}}\,}}_k\) are such that for every \(k \in [n]\), there exists a channel \({{\,\mathrm{\mathcal {M}}\,}}'_{k}\) from \(R_{k-1} \rightarrow R_{k} A_k B_k\) such that

  1. 1.

    \({{\,\mathrm{\mathcal {M}}\,}}'_{k}\) \(\epsilon \)-approximates \({{\,\mathrm{\mathcal {M}}\,}}_{k}\) in the diamond norm:

    $$\begin{aligned} \frac{1}{2}\left\| {{\,\mathrm{\mathcal {M}}\,}}_k- {{\,\mathrm{\mathcal {M}}\,}}'_k \right\| _{\diamond } \le \epsilon \end{aligned}$$
    (52)
  2. 2.

    For every choice of a sequence of channels \({{\,\mathrm{\mathcal {N}}\,}}_i \in \{ {{\,\mathrm{\mathcal {M}}\,}}_i, {{\,\mathrm{\mathcal {M}}\,}}'_i \}\) for \(i \in [k-1]\), the state \({{\,\mathrm{\mathcal {M}}\,}}'_k \circ {{\,\mathrm{\mathcal {N}}\,}}_{k-1} \circ \cdots \circ {{\,\mathrm{\mathcal {N}}\,}}_1 (\rho ^{(0)}_{R_0 E})\) satisfies the Markov chain

    $$\begin{aligned} A_1^{k-1} \leftrightarrow B_1^{k-1}E \leftrightarrow B_k. \end{aligned}$$
    (53)

Then, for \(0<\delta , \epsilon _1, \epsilon _2<1\) such that \(\epsilon _1 + \epsilon _2 <1\), \(\alpha \in \left( 1, 1+ \frac{1}{\log (1+2|A|)} \right) \) and \(\beta >1\), we have

$$\begin{aligned}&H_{\min }^{\epsilon _1+\epsilon _2}(A_1^n|B_1^n E)_{\rho } \ge \sum _{k=1}^n \inf _{\omega _{R_{k-1} \tilde{R}}} H(A_k | B_k \tilde{R})_{{{\,\mathrm{\mathcal {M}}\,}}'_k(\omega )} - n(\alpha -1) \log ^2(1+ 2 |A|) \nonumber \\&\quad - \frac{\alpha }{\alpha -1} n \log \left( 1 + \delta \left( 4^{\frac{\alpha -1}{\alpha }\log (|A||B|)}-1 \right) \right) \nonumber \\&\quad - \frac{\alpha }{\alpha -1} n z_{\beta }(\epsilon , \delta ) - \frac{1}{\alpha -1}\left( g_1(\epsilon _2, \epsilon _1)+ \frac{\alpha g_0(\epsilon _1)}{\beta -1} \right) . \end{aligned}$$
(54)

where

$$\begin{aligned} z_\beta (\epsilon , \delta ) := \frac{\beta +1}{\beta -1}\log \left( \left( 1+ \sqrt{(1-\delta )\epsilon } \right) ^{\frac{\beta }{\beta +1}} + \left( \frac{\sqrt{(1-\delta )\epsilon }}{\delta ^\beta } \right) ^{{\frac{1}{\beta +1}}} \right) \end{aligned}$$
(55)

and \(g_1(x,y) = - \log (1- \sqrt{1-x^2}) - \log (1-y^2)\) and the infimum in Eq. 54 is taken over all input states \(\omega _{R_{k-1} \tilde{R}}\) to the channels \({{\,\mathrm{\mathcal {M}}\,}}'_k\) where \(\tilde{R}\) is a reference register (\(\tilde{R}\) can be chosen isomorphic to \(R_{k-1}\)).

For the choice of \(\beta = 2\), \(\delta = \epsilon ^{\frac{1}{8}}\), we have

$$\begin{aligned} z_2(\epsilon , \delta ) \le 3 \log \left( \left( 1+\epsilon ^{\frac{1}{2}} \right) ^{\frac{2}{3}} + \epsilon ^{\frac{1}{12}} \right) . \end{aligned}$$

We also have that

$$\begin{aligned} \log \left( 1 + \delta 2^{\frac{\alpha -1}{\alpha } 2 \log (|A||B|)} \right) \le (|A||B|)^2\epsilon ^{\frac{1}{8}}. \end{aligned}$$

Finally, if we define \(\epsilon _r:= (|A||B|)^2 \epsilon ^{\frac{1}{8}} + 3 \log \left( \left( 1+\epsilon ^{\frac{1}{2}} \right) ^{\frac{2}{3}} + \epsilon ^{\frac{1}{12}} \right) \), and choose \(\alpha = \sqrt{\epsilon _r}\), we get the bound

$$\begin{aligned}&H_{\min }^{\epsilon _1+\epsilon _2}(A_1^n|B_1^n E)_{\rho } \ge \sum _{k=1}^n \inf _{\omega _{R_k \tilde{R}_k}} H(A_k | B_k \tilde{R}_k)_{{{\,\mathrm{\mathcal {M}}\,}}'_k(\omega _{R_k \tilde{R}_k})} \nonumber \\&\quad - n\sqrt{\epsilon _r} (\log ^2(1+ 2 |A|) + 2) - \frac{1}{\sqrt{\epsilon _r}}\left( g_1(\epsilon _2, \epsilon _1)+ 2 g_0(\epsilon _1) \right) \end{aligned}$$
(56)

The entropy loss per round in the above bound behaves as \(\sim \epsilon ^{\frac{1}{24}}\). This dependence on \(\epsilon \) is indeed very poor. In comparison, we can carry out a similar proof argument for classical probability distributions to get a dependence of \(O(\sqrt{\epsilon })\) (Theorem F.1). The exponent of \(\epsilon \) in our bound seems to be almost a factor of 12 off from the best possible bound. Roughly speaking, while carrying out the proof classically, we can bound the relevant channel divergences in the proof by \(O\left( \epsilon \right) \), whereas in Eq. 56, we were only able to bound the channel divergence by \(\sim \epsilon ^{1/12}\). This leads to the deterioration of performance we see here as compared to the classical case. We will discuss this further in Sect. 5.5.

In order to prove this theorem, we will use a channel divergence based chain rule. Recently proven chain rules for \(\alpha \)-Rényi relative entropy [15, Corollary 5.2] state that for \(\alpha >1\) and states \(\rho _A\) and \(\sigma _A\), and channels \(\mathcal {E}_{A \rightarrow B}\) and \(\mathcal {F}_{A \rightarrow B}\), we have

$$\begin{aligned} \tilde{D}_{\alpha }(\mathcal {E}_{A \rightarrow B}(\rho _{A})|| \mathcal {F}_{A \rightarrow B}(\sigma _{A})) \le \tilde{D}_{\alpha }(\rho _{A}|| \sigma _{A}) + \tilde{D}_{\alpha }^{\text {reg}}(\mathcal {E}_{A \rightarrow B} || \mathcal {F}_{A \rightarrow B}) \end{aligned}$$
(57)

where \(\tilde{D}_{\alpha }^{\text {reg}}(\mathcal {E}_{A \rightarrow B} || \mathcal {F}_{A \rightarrow B}):= \lim _{n \rightarrow \infty } \frac{1}{n} \tilde{D}_{\alpha }(\mathcal {E}_{A \rightarrow B}^{\otimes n} || \mathcal {F}_{A \rightarrow B}^{\otimes n})\) and \(\tilde{D}_{\alpha }(\cdot || \cdot )\) is the channel divergence (see Eq. 14).

Now observe that if we were guaranteed that for the maps in Theorem 5.1 above, \(\tilde{D}_{\alpha }^{\text {reg}}({{\,\mathrm{\mathcal {M}}\,}}_k || {{\,\mathrm{\mathcal {M}}\,}}'_k) \le \epsilon \) for every k for some \(\alpha >1\). Then, we could use the chain rule in Eq. 57 as follows

$$\begin{aligned}&\tilde{D}_{\alpha }({{\,\mathrm{\mathcal {M}}\,}}_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E}) || {{\,\mathrm{\mathcal {M}}\,}}'_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}'_1(\rho ^{(0)}_{R_0 E})) \\&\quad \le \tilde{D}_{\alpha }({{\,\mathrm{\mathcal {M}}\,}}_{n-1} \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E}) || {{\,\mathrm{\mathcal {M}}\,}}'_{n-1} \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}'_1(\rho ^{(0)}_{R_0 E})) + \tilde{D}_{\alpha }^{\text {reg}}({{\,\mathrm{\mathcal {M}}\,}}_n || {{\,\mathrm{\mathcal {M}}\,}}'_n) \\&\quad \le \cdots \\&\quad \le \tilde{D}_{\alpha }(\rho ^{(0)}_{R_0 E}||\rho ^{(0)}_{R_0 E}) + \sum _{k=1}^n \tilde{D}_{\alpha }^{\text {reg}}({{\,\mathrm{\mathcal {M}}\,}}_k || {{\,\mathrm{\mathcal {M}}\,}}'_k)\\&\quad \le n \epsilon . \end{aligned}$$

Once we have the above result we can simply use the well known relation between smooth max-relative entropy and \(\alpha \)-Rényi relative entropy [17, Proposition 6.5] to get the bound

$$\begin{aligned}&{D}_{\max }^{\epsilon '}({{\,\mathrm{\mathcal {M}}\,}}_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E}) || {{\,\mathrm{\mathcal {M}}\,}}'_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}'_1(\rho ^{(0)}_{R_0 E})) \\&\quad \le \tilde{D}_{\alpha }({{\,\mathrm{\mathcal {M}}\,}}_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E}) || {{\,\mathrm{\mathcal {M}}\,}}'_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}'_1(\rho ^{(0)}_{R_0 E})) + \frac{g_0(\epsilon ')}{\alpha -1}\\&\quad \le n\epsilon + O(1). \end{aligned}$$

This bound can subsequently be used in Lemma 3.5 to relate the smooth min-entropy of the real state \({{\,\mathrm{\mathcal {M}}\,}}_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E})\) with the \(\alpha -\)Rényi conditional entropy of the auxiliary state \({{\,\mathrm{\mathcal {M}}\,}}'_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}'_1(\rho ^{(0)}_{R_0 E})\), for which we can use the original entropy accumulation theorem.

In order to prove Theorem 5.1, we broadly follow this idea. However, the condition \(\left\| {{\,\mathrm{\mathcal {M}}\,}}_k- {{\,\mathrm{\mathcal {M}}\,}}'_k \right\| _{\diamond } \le \epsilon \) does not lead to any kind of bound on \(\tilde{D}_{\alpha }^{\text {reg}}\) or any other channel divergence. We will get around this issue by instead using mixed channels \({{\,\mathrm{\mathcal {M}}\,}}^{\delta }_{k}:= (1-\delta ){{\,\mathrm{\mathcal {M}}\,}}'_{k} + \delta {{\,\mathrm{\mathcal {M}}\,}}_{k}\). Also, instead of trying to bound channel divergence in terms of \(\tilde{D}_{\alpha }^{\text {reg}}\), we will bound the \(D^\#_{\alpha }\) (defined in the next section) channel divergence and use its chain rule. We develop the relevant \(\alpha \)-Rényi divergence bounds for this divergence in the next two subsections and then prove the theorem above in Sec 5.3.

5.1 Divergence bound for approximately equal states

We will use the sharp Rényi divergence \(D^{\#}_{\alpha }\) defined in Ref. [15] (see [33] for the following equivalent definition) in this section. For \(\alpha > 1\) and two positive operators P and Q, it is defined

$$\begin{aligned} D^{\#}_{\alpha }(P || Q) := \min _{A \ge P} \hat{D}_{\alpha } (A || Q) \end{aligned}$$
(58)

where \(\hat{D}_{\alpha } (A || Q)\) is the \(\alpha \)-Rényi geometric divergence [34]. For \(\alpha >1\), it is defined as

$$\begin{aligned} \hat{D}_{\alpha } (A || Q) = {\left\{ \begin{array}{ll} \frac{1}{\alpha -1}\log {{\,\textrm{tr}\,}}\left( Q \left( Q^{-\frac{1}{2}} A Q^{-\frac{1}{2}} \right) ^\alpha \right) &{} \text {if } A \ll Q\\ \infty &{}\text {otherwise.} \end{array}\right. } \end{aligned}$$
(59)

A in the optimisation above is any operator \(A \ge P\). In general, such an operator A is unnormalised. We will prove a bound on \(D^{\#}_{\alpha }\) between two states in terms of the distance between them and their max-relative entropy. In order to prove this bound, we require the following simple generalisation of the pinching inequality (see for example [17, Sect. 2.6.3]).

Lemma 5.2

(Asymmetric pinching). For \(t>0\), a positive semidefinite operator \(X \ge 0 \) and orthogonal projections \(\Pi \) and \(\Pi _\perp = {{\,\mathrm{\mathbb {1}}\,}}- \Pi \), we have that

$$\begin{aligned} X \le (1+t) \Pi X \Pi + \left( 1+ \frac{1}{t} \right) \Pi _\perp X \Pi _\perp . \end{aligned}$$
(60)

Proof

We will write the positive matrix X as the block matrix

$$\begin{aligned} X= \begin{pmatrix} X_1 \ X_2 \\ X_2^*\ X_3 \end{pmatrix} \end{aligned}$$

where the blocks are partitioned according to the direct sum \({{\,\mathrm{\text {im}}\,}}(\Pi ) \oplus {{\,\mathrm{\text {im}}\,}}(\Pi _\perp )\). Then, the statement in the Lemma is equivalent to proving that

$$\begin{aligned} \begin{pmatrix} X_1 \ {} &{} X_2 \\ X_2^*\ {} &{} X_3 \end{pmatrix} \le \begin{pmatrix} (1+t) X_1 \ {} &{} 0 \\ 0 \ {} &{} 0 \end{pmatrix} + \begin{pmatrix} 0 \ {} &{} 0 \\ 0 \ {} &{} \left( 1+ \frac{1}{t} \right) X_3 \end{pmatrix} \end{aligned}$$

which is equivalent to proving that

$$\begin{aligned} 0 \le \begin{pmatrix} t X_1 \ {} &{} - X_2 \\ - X_2^*\ {} &{} \frac{1}{t} X_3 \end{pmatrix}. \end{aligned}$$

This is true because

$$\begin{aligned} \begin{pmatrix} t X_1 \ {} &{} - X_2 \\ - X_2^*\ {} &{} \frac{1}{t} X_3 \end{pmatrix} = \begin{pmatrix} - t^{1/2} \ {} &{} 0 \\ 0 \ {} &{} t^{-1/2} \end{pmatrix} \begin{pmatrix} X_1 \ {} &{} X_2 \\ X_2^*\ {} &{} X_3 \end{pmatrix} \begin{pmatrix} - t^{1/2} \ {} &{} 0 \\ 0 \ {} &{} t^{-1/2} \end{pmatrix} \ge 0 \end{aligned}$$

since \(X \ge 0\). \(\square \)

Lemma 5.3

Let \(\epsilon >0\) and \(\alpha \in (1, \infty )\), \(\rho \) and \(\sigma \) be two normalised quantum states on the Hilbert space \(\mathbb {C}^n\) such that \(\frac{1}{2}\left\| \rho - \sigma \right\| _1 \le \epsilon \) and also \(D_{\max }(\rho ||\sigma ) \le d < \infty \), then we have the bound

$$\begin{aligned} D^{\#}_{\alpha }(\rho || \sigma ) \le \frac{\alpha +1}{\alpha -1}\log \left( (1+ \sqrt{\epsilon })^{\frac{\alpha }{\alpha +1}} + \left( 2^{\alpha d} \sqrt{\epsilon } \right) ^{\frac{1}{\alpha +1}} \right) . \end{aligned}$$
(61)

Note: For a fixed \(\alpha \in (1, \infty )\), this upper bound tends to zero as \(\epsilon \rightarrow 0\). On the other hand, for a fixed \(\epsilon \in (0,1)\), the upper bound tends to infinity as \(\alpha \rightarrow 1\) (that is, the bound becomes trivial). In Appendix B, we show that a bound of this form for \(D^{\#}_{\alpha }\) necessarily diverges for \(\epsilon >0\) as \(\alpha \rightarrow 1\).

Proof

Since, \(D_{\max }(\rho ||\sigma ) < \infty \), we have that \(\rho \ll \sigma \). We can assume that \(\sigma \) is invertible. If it was not, then we could always restrict our vector space to the subspace \(\text {supp}(\sigma )\).

Let \(\rho - \sigma = P- Q\), where \(P \ge 0\) is the positive part of the matrix \(\rho - \sigma \) and \(Q \ge 0\) is its negative part. We then have that \({{\,\textrm{tr}\,}}(P)= {{\,\textrm{tr}\,}}(Q) \le \epsilon \).

Further, let

$$\begin{aligned} \sigma ^{-\frac{1}{2}} P \sigma ^{-\frac{1}{2}} = \sum _{i=1}^n \lambda _i |{x_i}\rangle \langle {x_i}| \end{aligned}$$
(62)

be the eigenvalue decomposition of \(\sigma ^{-\frac{1}{2}} P \sigma ^{-\frac{1}{2}}\). Define the real vector \(q \in \mathbb {R}^n\) as

$$\begin{aligned} q(i) := \langle {x_i}| \sigma |{x_i}\rangle . \end{aligned}$$

Note that q is a probability distribution. Observe that

$$\begin{aligned} \mathbb {E}_{I \sim q} \left[ \lambda _I\right]&= \sum _{i=1}^n \lambda _i \langle {x_i}| \sigma |{x_i}\rangle \\&= {{\,\textrm{tr}\,}}\left( \sigma \sum _{i=1}^n \lambda _i |{x_i}\rangle \langle {x_i}| \right) \\&= {{\,\textrm{tr}\,}}\left( \sigma \sigma ^{-\frac{1}{2}} P \sigma ^{-\frac{1}{2}} \right) \\&= {{\,\textrm{tr}\,}}(P)\\&\le \epsilon . \end{aligned}$$

Also, observe that \(\lambda _i \ge 0\) for all \(i \in [n]\) because \(\sigma ^{-\frac{1}{2}} P \sigma ^{-\frac{1}{2}} \ge 0\). Let’s define

$$\begin{aligned} S := \{i \in [n]: \lambda _i \le \sqrt{\epsilon }\}. \end{aligned}$$
(63)

Since, \(\lambda _i \ge 0\) for all \(i \in [n]\), we can use the Markov inequality to show:

$$\begin{aligned} \Pr _q (I \in S^c)&= \Pr _q (\lambda _I > \sqrt{\epsilon }) \\&\le \frac{{{\,\mathrm{\mathbb {E}}\,}}_{I \sim q}\left[ \lambda _I\right] }{\sqrt{\epsilon }} \\&\le \sqrt{\epsilon }. \end{aligned}$$

Thus, if we define the projectors \(\Pi := \sum _{i \in S} |{x_i}\rangle \langle {x_i}|\) and \(\Pi _{\perp }:= \sum _{i \in S^c} |{x_i}\rangle \langle {x_i}| = {{\,\mathrm{\mathbb {1}}\,}}- \Pi \), we have

$$\begin{aligned} {{\,\textrm{tr}\,}}(\sigma \Pi _\perp )&= \sum _{i \in S^c} \langle {x_i}| \sigma |{x_i}\rangle \nonumber \\&= \Pr _q (I \in S^c) \nonumber \\&\le \sqrt{\epsilon }. \end{aligned}$$
(64)

Moreover, by the definition of set S (Eq. 63) we have

$$\begin{aligned} \Pi \sigma ^{-\frac{1}{2}} P \sigma ^{-\frac{1}{2}} \Pi&= \sum _{i \in S} \lambda _i |{x_i}\rangle \langle {x_i}| \le \sqrt{\epsilon } \Pi \end{aligned}$$
(65)

and using \(D_{\max }(\rho || \sigma ) \le d\), we have that

$$\begin{aligned} \sigma ^{-\frac{1}{2}} \rho \sigma ^{-\frac{1}{2}} \le 2^d {{\,\mathrm{\mathbb {1}}\,}}. \end{aligned}$$
(66)

Now, observe that since \(\sigma ^{-\frac{1}{2}} \rho \sigma ^{-\frac{1}{2}} \ge 0\), for an arbitrary \(t >0\), using Lemma 5.2 we have

$$\begin{aligned} \sigma ^{-\frac{1}{2}} \rho \sigma ^{-\frac{1}{2}}&\le (1+t) \Pi \sigma ^{-\frac{1}{2}} \rho \sigma ^{-\frac{1}{2}} \Pi + \left( 1+ \frac{1}{t} \right) \Pi _\perp \sigma ^{-\frac{1}{2}} \rho \sigma ^{-\frac{1}{2}} \Pi _\perp \\&\le (1+t) \Pi \left( {{\,\mathrm{\mathbb {1}}\,}}+ \sigma ^{-\frac{1}{2}} P \sigma ^{-\frac{1}{2}} \right) \Pi + \left( 1+ \frac{1}{t} \right) 2^d \Pi _\perp \\&\le (1+t)(1+ \sqrt{\epsilon }) \Pi + \left( 1+ \frac{1}{t} \right) 2^d \Pi _\perp \end{aligned}$$

where we have used \(\rho \le \sigma + P\) to bound the first term and Eq. 66 to bound the second term in the second line, and Eq. 65 to bound \(\Pi \sigma ^{-\frac{1}{2}} P \sigma ^{-\frac{1}{2}} \Pi \) in the last step.

We will define \(A_t:= (1+t)(1+ \sqrt{\epsilon }) \sigma ^{\frac{1}{2}}\Pi \sigma ^{\frac{1}{2}} + \left( 1+ \frac{1}{t} \right) 2^d \sigma ^{\frac{1}{2}}\Pi _\perp \sigma ^{\frac{1}{2}}\). Above, we have shown that \(A_t \ge \rho \) for every \(t>0\). Therefore, for each \(t>0\), \(D^\#_\alpha (\rho || \sigma ) \le \hat{D}_\alpha (A_t || \sigma )\). We will now bound \(\hat{D}_\alpha (A_t || \sigma )\) for \(\alpha \in (1, \infty )\) as:

$$\begin{aligned} \hat{D}_\alpha (A_t || \sigma )&= \frac{1}{\alpha -1} \log {{\,\textrm{tr}\,}}\left( \sigma \left( \sigma ^{-\frac{1}{2}} A_t \sigma ^{-\frac{1}{2}} \right) ^\alpha \right) \\&= \frac{1}{\alpha -1} \log {{\,\textrm{tr}\,}}\left( \sigma \left( (1+t)(1+ \sqrt{\epsilon }) \Pi + \left( 1+ \frac{1}{t} \right) 2^d \Pi _\perp \right) ^\alpha \right) \\&= \frac{1}{\alpha -1} \log {{\,\textrm{tr}\,}}\left( \sigma \left( (1+t)^\alpha (1+ \sqrt{\epsilon })^\alpha \Pi + \left( 1+ \frac{1}{t} \right) ^\alpha 2^{d\alpha } \Pi _\perp \right) \right) \\&= \frac{1}{\alpha -1} \log \left( (1+t)^\alpha (1+ \sqrt{\epsilon })^\alpha {{\,\textrm{tr}\,}}\left( \sigma \Pi \right) + \left( 1+ \frac{1}{t} \right) ^\alpha 2^{d\alpha } {{\,\textrm{tr}\,}}\left( \sigma \Pi _\perp \right) \right) \\&\le \frac{1}{\alpha -1} \log \left( (1+t)^\alpha (1+ \sqrt{\epsilon })^\alpha + \left( 1+ \frac{1}{t} \right) ^\alpha 2^{d\alpha } \sqrt{\epsilon } \right) \end{aligned}$$

where in the last line we use \({{\,\textrm{tr}\,}}(\sigma \Pi )\le 1\) and \({{\,\textrm{tr}\,}}(\sigma \Pi _\perp )\le \sqrt{\epsilon }\) (Eq. 64). Finally, since \(t>0\) was arbitrary, we can choose the \(t>0\) which minimizes the right-hand side. For this choice of \(t_{\min }= \left( \frac{2^{\alpha d} \sqrt{\epsilon }}{\left( 1+ \sqrt{\epsilon } \right) ^{\alpha }} \right) ^{\frac{1}{\alpha +1}}\), we get

$$\begin{aligned} \hat{D}_\alpha (A_{t_{\min }} || \sigma )&\le \frac{\alpha +1}{\alpha -1}\log \left( (1+ \sqrt{\epsilon })^{\frac{\alpha }{\alpha +1}} + 2^{\frac{\alpha }{\alpha +1} d} \epsilon ^{\frac{1}{2(\alpha +1)}} \right) \end{aligned}$$

which proves the required bound. \(\square \)

5.2 Bounding the channel divergence for two channels close to each other

Suppose there are two channels \({{\,\mathrm{\mathcal {N}}\,}}\) and \({{\,\mathrm{\mathcal {M}}\,}}\) mapping registers from the space A to B such that \(\frac{1}{2} \left\| {{\,\mathrm{\mathcal {N}}\,}}- {{\,\mathrm{\mathcal {M}}\,}} \right\| _\diamond \le \epsilon \). In general, the channel divergence between two such channels can be infinite because there may be states \(\rho \) such that \({{\,\mathrm{\mathcal {N}}\,}}(\rho ) \not \ll {{\,\mathrm{\mathcal {M}}\,}}(\rho )\). In order to get around this issue, we will use the \(\delta -\)mixed channel, \({{\,\mathrm{\mathcal {M}}\,}}_\delta \). For \(\delta \in (0,1)\), we define \({{\,\mathrm{\mathcal {M}}\,}}_\delta \) as

$$\begin{aligned} \mathcal {M}_\delta := (1- \delta ) \mathcal {M} + \delta \mathcal {N}. \end{aligned}$$

This guarantees that \(D_{\max }({{\,\mathrm{\mathcal {N}}\,}}|| {{\,\mathrm{\mathcal {M}}\,}}_\delta ) \le \log \frac{1}{\delta }\), which is enough to ensure that the divergences we are interested in are finite. Moreover, by mixing \({{\,\mathrm{\mathcal {M}}\,}}\) with \({{\,\mathrm{\mathcal {N}}\,}}\), we only decrease the distance:

$$\begin{aligned} \frac{1}{2}\left\| \mathcal {M}_\delta - {{\,\mathrm{\mathcal {N}}\,}} \right\| _{\diamond }&= \frac{1}{2}\left\| (1-\delta ){{\,\mathrm{\mathcal {M}}\,}}+ \delta {{\,\mathrm{\mathcal {N}}\,}}- {{\,\mathrm{\mathcal {N}}\,}} \right\| _{\diamond } \nonumber \\&= (1-\delta )\frac{1}{2}\left\| {{\,\mathrm{\mathcal {M}}\,}}- {{\,\mathrm{\mathcal {N}}\,}} \right\| _{\diamond }\nonumber \\&\le (1-\delta ) \epsilon . \end{aligned}$$
(67)

We will now show that \(D^\#_{\alpha }({{\,\mathrm{\mathcal {N}}\,}}|| {{\,\mathrm{\mathcal {M}}\,}}_{\delta })\) is small for an appropriately chosen \(\delta \). By the definition of channel divergence, we have that

$$\begin{aligned} D^\#_{\alpha }({{\,\mathrm{\mathcal {N}}\,}}|| {{\,\mathrm{\mathcal {M}}\,}}_{\delta }) = \sup _{\rho _{AR}} D^\#_{\alpha }({{\,\mathrm{\mathcal {N}}\,}}(\rho _{AR}) || {{\,\mathrm{\mathcal {M}}\,}}_{\delta }(\rho _{AR})) \end{aligned}$$

where R is an arbitrary reference system (\({{\,\mathrm{\mathcal {N}}\,}}, {{\,\mathrm{\mathcal {M}}\,}}_{\delta }\) map register A to register B). We will show that for every \(\rho _{AR}\), \(D^\#_{\alpha }({{\,\mathrm{\mathcal {N}}\,}}(\rho _{AR}) || {{\,\mathrm{\mathcal {M}}\,}}_{\delta }(\rho _{AR}))\) is small. Note that

$$\begin{aligned} {{\,\mathrm{\mathcal {M}}\,}}_{\delta }(\rho _{AR})&= (1-\delta ){{\,\mathrm{\mathcal {M}}\,}}(\rho _{AR}) + \delta {{\,\mathrm{\mathcal {N}}\,}}(\rho _{AR})\\&\ge \delta {{\,\mathrm{\mathcal {N}}\,}}(\rho _{AR}) \end{aligned}$$

which implies that \(D_{\max }({{\,\mathrm{\mathcal {N}}\,}}(\rho _{AR}) || {{\,\mathrm{\mathcal {M}}\,}}_{\delta }(\rho _{AR})) \le \log \frac{1}{\delta }\). Also, using Eq. 67 have that

$$\begin{aligned} \frac{1}{2}\left\| \mathcal {M}_\delta (\rho _{AR})- {{\,\mathrm{\mathcal {N}}\,}}(\rho _{AR}) \right\| _1 \le (1-\delta )\epsilon . \end{aligned}$$

Using Lemma 5.3, we have for every \(\alpha \in (1, \infty )\)

$$\begin{aligned} D^\#_{\alpha }({{\,\mathrm{\mathcal {N}}\,}}(\rho _{AR}) || {{\,\mathrm{\mathcal {M}}\,}}_{\delta }(\rho _{AR})) \le \frac{\alpha +1}{\alpha -1}\log \left( \left( 1+ \sqrt{(1-\delta )\epsilon } \right) ^{\frac{\alpha }{\alpha +1}} + \left( \frac{\sqrt{(1-\delta )\epsilon }}{\delta ^\alpha } \right) ^{{\frac{1}{\alpha +1}}} \right) . \end{aligned}$$

Since, this is true for all \(\rho _{AR}\), for every \(\alpha \in (1, \infty )\) we have

$$\begin{aligned} D^\#_{\alpha }({{\,\mathrm{\mathcal {N}}\,}}|| {{\,\mathrm{\mathcal {M}}\,}}_\delta ) \le \frac{\alpha +1}{\alpha -1}\log \left( \left( 1+ \sqrt{(1-\delta )\epsilon } \right) ^{\frac{\alpha }{\alpha +1}} + \left( \frac{\sqrt{(1-\delta )\epsilon }}{\delta ^\alpha } \right) ^{{\frac{1}{\alpha +1}}} \right) . \end{aligned}$$

Note that since \(\delta \) was arbitrary, we can choose it appropriately to make sure that the above bound is small, for example by choosing \(\delta = \epsilon ^\frac{1}{4 \alpha }\), we get the bound

$$\begin{aligned} D^\#_{\alpha }({{\,\mathrm{\mathcal {N}}\,}}|| {{\,\mathrm{\mathcal {M}}\,}}_\delta ) \le \frac{\alpha +1}{\alpha -1}\log \left( (1+ \sqrt{\epsilon })^{\frac{\alpha }{\alpha +1}} + \epsilon ^{\frac{1}{4(\alpha +1)}} \right) \end{aligned}$$

which is a small function of \(\epsilon \) in the sense that it tends to 0 as \(\epsilon \rightarrow 0\). We summarise the bound derived above in the following lemma.

Lemma 5.4

Let \(\epsilon >0\). Suppose channels \({{\,\mathrm{\mathcal {N}}\,}}\) and \({{\,\mathrm{\mathcal {M}}\,}}\) from register A to B are such that \(\frac{1}{2}\left\| {{\,\mathrm{\mathcal {N}}\,}}- {{\,\mathrm{\mathcal {M}}\,}} \right\| _{\diamond } \le \epsilon \). For \(\delta \in (0,1)\), we can define the mixed channel \({{\,\mathrm{\mathcal {M}}\,}}_\delta := (1- \delta ) \mathcal {M} + \delta \mathcal {N}\). Then, for every \(\alpha \in (1, \infty )\), we have the following bound on the channel divergence

$$\begin{aligned} D^\#_{\alpha }({{\,\mathrm{\mathcal {N}}\,}}|| {{\,\mathrm{\mathcal {M}}\,}}_\delta ) \le \frac{\alpha +1}{\alpha -1}\log \left( \left( 1+ \sqrt{(1-\delta )\epsilon } \right) ^{\frac{\alpha }{\alpha +1}} + \left( \frac{\sqrt{(1-\delta )\epsilon }}{\delta ^\alpha } \right) ^{{\frac{1}{\alpha +1}}} \right) . \end{aligned}$$
(68)

5.3 Proof of the approximate entropy accumulation theorem

We use the mixed channels defined in the previous section to define the auxiliary state \({{\,\mathrm{\mathcal {M}}\,}}^\delta _n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}^\delta _1(\rho ^{(0)}_{R_0 E})\) for our proof. It is easy to show using the divergence bounds in Sect. 5.2 and the chain rule for \(D^\#_{\alpha }\) entropies that the relative entropy distance between the real state and this choice of the auxiliary state is small. However, the state \({{\,\mathrm{\mathcal {M}}\,}}^\delta _n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}^\delta _1(\rho ^{(0)}_{R_0 E})\) does not necessarily satisfy the Markov chain conditions required for entropy accumulation. Thus, we also need to reprove the entropy lower bound on this state by modifying the approach used in the proof of the original entropy accumulation theorem.

Proof of Theorem 5.1

Using Lemma 5.4, for every \(\delta \in (0,1)\) and for each \(k \in [n]\) we have that for every \(\beta > 1\), the mixed maps \({{\,\mathrm{\mathcal {M}}\,}}^\delta _k:= (1-\delta ){{\,\mathrm{\mathcal {M}}\,}}'_k + \delta {{\,\mathrm{\mathcal {M}}\,}}_k\) satisfy

$$\begin{aligned} D^\#_\beta ({{\,\mathrm{\mathcal {M}}\,}}_k || {{\,\mathrm{\mathcal {M}}\,}}^\delta _k)&\le \frac{\beta +1}{\beta -1}\log \left( \left( 1+ \sqrt{(1-\delta )\epsilon } \right) ^{\frac{\beta }{\beta +1}} + \left( \frac{\sqrt{(1-\delta )\epsilon }}{\delta ^\beta } \right) ^{{\frac{1}{\beta +1}}} \right) \nonumber \\&:= z_\beta (\epsilon , \delta ) \end{aligned}$$
(69)

where we defined the right-hand side above as \(z_\beta (\epsilon , \delta )\). This can be made “small” by choosing \(\delta = \epsilon ^{\frac{1}{4\beta }}\) as was shown in the previous section. We use these maps to define the auxiliary state as

$$\begin{aligned} \sigma _{A_1^n B_1^n E} := {{\,\mathrm{\mathcal {M}}\,}}^\delta _n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}^\delta _1(\rho ^{(0)}_{R_0 E}). \end{aligned}$$
(70)

Now, we have that for \(\beta >1\) and \(\epsilon _1 >0\)

$$\begin{aligned}&D^{\epsilon _1}_{\max }(\rho _{A_1^n B_1^n E}||\sigma _{A_1^n B_1^n E}) \nonumber \\&\quad \le \tilde{D}_\beta (\rho _{A_1^n B_1^n E}||\sigma _{A_1^n B_1^n E}) + \frac{g_0(\epsilon _1)}{\beta -1} \nonumber \\&\quad \le D^\#_\beta (\rho _{A_1^n B_1^n E}||\sigma _{A_1^n B_1^n E}) + \frac{g_0(\epsilon _1)}{\beta -1}\nonumber \\&\quad = D^\#_\beta ({{\,\mathrm{\mathcal {M}}\,}}_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E}) || {{\,\mathrm{\mathcal {M}}\,}}^\delta _n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}^\delta _1(\rho ^{(0)}_{R_0 E})) + \frac{g_0(\epsilon _1)}{\beta -1}\nonumber \\&\quad \le D^\#_\beta ({{\,\mathrm{\mathcal {M}}\,}}_{n-1} \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1(\rho ^{(0)}_{R_0 E}) || {{\,\mathrm{\mathcal {M}}\,}}^\delta _{n-1} \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}^\delta _1(\rho ^{(0)}_{R_0 E})) + D^\#_\beta ({{\,\mathrm{\mathcal {M}}\,}}_n || {{\,\mathrm{\mathcal {M}}\,}}^\delta _n) + \frac{g_0(\epsilon _1)}{\beta -1}\nonumber \\&\quad \le \cdots \nonumber \\&\quad \le \sum _{k=1}^n D^\#_\beta ({{\,\mathrm{\mathcal {M}}\,}}_k || {{\,\mathrm{\mathcal {M}}\,}}^\delta _k) + \frac{g_0(\epsilon _1)}{\beta -1}\nonumber \\&\quad \le n z_\beta (\epsilon , \delta ) + \frac{g_0(\epsilon _1)}{\beta -1} \end{aligned}$$
(71)

where the first line follows from [17, Proposition 6.5], the second line follows from [15, Proposition 3.4], fourth line follows from the chain rule for \(D^\#_\beta \) [15, Proposition 4.5], and the last line follows from Eq. 69.

For \(\epsilon _2 >0\) and \(\alpha \in (1, 1+ \frac{1}{\log (1+2|A|)})\), we can plug the above in the bound provided by Lemma 3.5 to get

$$\begin{aligned} H_{\min }^{\epsilon _1+\epsilon _2}(A_1^n|B_1^n E)_{\rho }&\ge \tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\sigma } - \frac{\alpha }{\alpha -1} n z_{\beta }(\epsilon , \delta ) \nonumber \\&\quad - \frac{1}{\alpha -1}\left( g_1(\epsilon _2, \epsilon _1)+ \frac{\alpha g_0(\epsilon _1)}{\beta -1} \right) . \end{aligned}$$
(72)

We have now reduced our problem to lower bounding \(\tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\sigma }\). Note that we cannot directly use the entropy accumulation here, since the mixed maps \({{\,\mathrm{\mathcal {M}}\,}}^\delta _k = (1-\delta ){{\,\mathrm{\mathcal {M}}\,}}'_k + \delta {{\,\mathrm{\mathcal {M}}\,}}_k\), which means that with \(\delta \) probability the \(B_k\) register may be correlated with \(A_1^{k-1}\) even given \(B_1^{k-1}E\), and it may not satisfy the Markov chain required for entropy accumulation.

The application of the maps \({{\,\mathrm{\mathcal {M}}\,}}^\delta _k\) can be viewed as applying the channel \({{\,\mathrm{\mathcal {M}}\,}}'_k\) with probability \(1-\delta \) and the channel \({{\,\mathrm{\mathcal {M}}\,}}_k\) with probability \(\delta \). We can define the channels \({{\,\mathrm{\mathcal {N}}\,}}_k\) which map the registers \(R_{k-1}\) to \(R_k A_k B_k C_k\), where \(C_k\) is a binary register. The action of \({{\,\mathrm{\mathcal {N}}\,}}_k\) can be defined as:

  1. 1.

    Sample the classical random variable \(C_k \in \{0,1\}\) independently. \(C_k= 1\) with probability \(1-\delta \) and 0 otherwise.

  2. 2.

    If \(C_k =1\) apply the map \({{\,\mathrm{\mathcal {M}}\,}}'_k\) on \(R_{k-1}\), else apply \({{\,\mathrm{\mathcal {M}}\,}}_k\) on \(R_{k-1}\).

Let us call \(\theta _{A_1^n B_1^n C_1^n E} ={{\,\mathrm{\mathcal {N}}\,}}_n \circ \cdots \circ {{\,\mathrm{\mathcal {N}}\,}}_1 (\rho ^{(0)}_{R_0 E})\). Clearly \({{\,\textrm{tr}\,}}_{C_1^n}\left( \theta _{A_1^n B_1^n C_1^n E} \right) = \sigma _{A_1^n B_1^n E}\). Thus, we have

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\sigma }&= \tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\theta }\nonumber \\&\ge \tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n C_1^n E)_{\theta }. \end{aligned}$$
(73)

We will now focus on lower bounding \(\tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n C_1^n E)_{\theta }\). Using [17, Proposition 5.1], we have that

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n C_1^n E)_{\theta } = \frac{\alpha }{1-\alpha } \log \sum _{c_1^n} \theta (c_1^n) \exp \left( \frac{1-\alpha }{\alpha }\tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\theta _{|c_1^n}} \right) . \end{aligned}$$

We will show that for a given \(c_1^n\), the conditional entropy \(\tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\theta _{|c_1^n}}\) accumulates whenever the “good” map \({{\,\mathrm{\mathcal {M}}\,}}'_k\) is used and loses some entropy for the rounds where the “bad” map \({{\,\mathrm{\mathcal {M}}\,}}_k\) is used. The fact that \(c_1^n\) contains far more 1s than 0s with a large probability then allows us to prove a lower bound on \(\tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n C_1^n E)_{\theta }\).

Claim 5.5

Define \(h_k:= \inf _{\omega } \tilde{H}^{\downarrow }_{\alpha }(A_k | B_k \tilde{R}_{k-1})_{{{\,\mathrm{\mathcal {M}}\,}}'_k(\omega )}\) where the infimum is over all states \(\omega _{R_{k-1} \tilde{R}_{k-1}}\) for a register \(\tilde{R}_{k-1}\), which is isomorphic to \(R_{k-1}\), and \(s:= \log (|A||B|^2)\). Then, we have

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\theta _{|c_1^n}} \ge \sum _{k=1}^n \left( \delta ({c_k,1}) h_k - \delta ({c_k,0})s \right) \end{aligned}$$
(74)

where \(\delta ({x,y})\) is the Kronecker delta function (\(\delta ({x,y})=1\) if \(x=y\) and 0 otherwise).

Proof

We will prove the statement

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A_1^k|B_1^k E)_{\theta _{|c_1^k}} \ge \tilde{H}^{\uparrow }_{\alpha }(A_1^{k-1}|B_1^{k-1} E)_{\theta _{|c_1^{k-1}}} + \left( \delta ({c_k,1}) h_k - \delta ({c_k,0})s \right) \end{aligned}$$

then the claim will follow inductively. We will consider two cases: when \(c_k\!=\!0\) and when \(c_k \!=\!1\). First suppose, \(c_k=0\) then \(\theta _{A_1^k B_1^k E | c_1^k} \!=\! {{\,\textrm{tr}\,}}_{R_k} \!\circ \!{{\,\mathrm{\mathcal {M}}\,}}_k^{R_{k-1} \rightarrow R_{k} A_k B_k} \left( \theta _{R_{k-1} A_1^{k-1} B_1^{k-1} E| c_1^{k}} \right) \). In this case, we have

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A_1^k|B_1^k E)_{\theta _{|c_1^k}}&\ge \tilde{H}^{\uparrow }_{\alpha }(A_1^{k-1}|B_1^k E)_{\theta _{|c_1^k}} - \log |A| \\&\ge \tilde{H}^{\uparrow }_{\alpha }(A_1^{k-1}|B_1^{k-1} E)_{\theta _{|c_1^k}} - \log \left( |A||B|^2 \right) \\&= \tilde{H}^{\uparrow }_{\alpha }(A_1^{k-1}|B_1^{k-1} E)_{\theta _{|c_1^{k-1}}} - s \end{aligned}$$

where in the first line we have used the dimension bound in Lemma D.1, in the second line we have used the dimension bound in Lemma D.3 and in the last line we have used \(\theta _{A_1^{k-1}B_1^{k-1} E|c_1^k} = \theta _{A_1^{k-1}B_1^{k-1} E|c_1^{k-1}}\).

Now, suppose that \(c_k =1\). In this case, we have that \(\theta _{A_1^k B_1^k E | c_1^k} = {{\,\textrm{tr}\,}}_{R_k} \circ {{\,\mathrm{\mathcal {M}}\,}}_k' \left( \theta _{R_{k-1} A_1^{k-1} B_1^{k-1} E| c_1^{k}} \right) \) and since \(\theta _{R_{k-1} A_1^{k-1} B_1^{k-1} E| c_1^{k}} = \Phi _{k-1} \circ \Phi _{k-2} \cdots \circ \Phi _{1}(\rho ^{(0)}_{R_0 E})\) where each of the \(\Phi _i \in \{{{\,\mathrm{\mathcal {M}}\,}}_i, {{\,\mathrm{\mathcal {M}}\,}}'_i\}\), using the hypothesis of the theorem we have that the state \(\theta _{A_1^k B_1^k E | c_1^k} = {{\,\mathrm{\mathcal {M}}\,}}_k' \left( \theta _{R_{k-1} A_1^{k-1} B_1^{k-1} E| c_1^{k}} \right) \) satisfies the Markov chain

$$\begin{aligned} A_1^{k-1} \leftrightarrow B_1^{k-1} E \leftrightarrow B_k. \end{aligned}$$

Now, using Corollary C.5 (the \( \tilde{H}^{\uparrow }_{\alpha }\) counterpart for [8, Corollary 3.5], which is the main chain rule used for proving entropy accumulation), we have

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A_1^k|B_1^k E)_{\theta _{|c_1^k}}&\ge \tilde{H}^{\uparrow }_{\alpha }(A_1^{k-1}|B_1^{k-1} E)_{\theta _{|c_1^k}} + \inf _{\omega } \tilde{H}^{\downarrow }_{\alpha }(A_k | B_k \tilde{R}_{k-1})_{{{\,\mathrm{\mathcal {M}}\,}}'_k(\omega )} \\&= \tilde{H}^{\uparrow }_{\alpha }(A_1^{k-1}|B_1^{k-1} E)_{\theta _{|c_1^{k-1}}} + h_k \end{aligned}$$

where in the last line we have again used \(\theta _{A_1^{k-1}B_1^{k-1} E|c_1^k} = \theta _{A_1^{k-1}B_1^{k-1} E|c_1^{k-1}}\). Combining these two cases, we have

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A_1^k|B_1^k E)_{\theta _{|c_1^k}} \ge \tilde{H}^{\uparrow }_{\alpha }(A_1^{k-1}|B_1^{k-1} E)_{\theta _{|c_1^{k-1}}} + \left( \delta ({c_k,1}) h_k - \delta ({c_k,0})s \right) . \end{aligned}$$
(75)

Using this bound n times starting with \(\tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\theta _{|c_1^n}}\) gives us the bound required in the claim. \(\square \)

For the sake of clarity let \(l_k(c_k):= \left( \delta ({c_k,1}) h_k - \delta ({c_k,0})s \right) \). We will now evaluate

$$\begin{aligned} \sum _{c_1^n} \theta (c_1^n) \exp \left( \frac{1-\alpha }{\alpha }\tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\theta _{|c_1^n}} \right)&\le \sum _{c_1^n} \theta (c_1^n) \exp \left( \frac{1-\alpha }{\alpha }\sum _{k=1}^n l_k(c_k) \right) \nonumber \\&= \sum _{c_1^n} \prod _{k=1}^n \theta (c_k) 2^{\frac{1-\alpha }{\alpha }l_k(c_k)}\nonumber \\&= \prod _{k=1}^n \sum _{c_k} \theta (c_k) 2^{\frac{1-\alpha }{\alpha }l_k(c_k)}. \end{aligned}$$
(76)

Then, we have

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n C_1^n E)_{\theta }&= \frac{\alpha }{1-\alpha } \log \sum _{c_1^n} \theta (c_1^n) \exp _2 \left( \frac{1-\alpha }{\alpha }\tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\theta _{|c_1^n}} \right) .\nonumber \\&\ge \frac{\alpha }{1-\alpha } \sum _{k=1}^n \log \sum _{c_k} \theta (c_k) 2^{\frac{1-\alpha }{\alpha }l_k(c_k)} \nonumber \\&= \frac{\alpha }{1-\alpha } \sum _{k=1}^n \log \left( (1-\delta ) 2^{\frac{1-\alpha }{\alpha }h_k} + \delta 2^{-\frac{1-\alpha }{\alpha }s} \right) \nonumber \\&= \sum _{k=1}^n h_k - \frac{\alpha }{\alpha -1} \sum _{k=1}^n \log \left( 1-\delta + \delta 2^{\frac{\alpha -1}{\alpha }(s+ h_k)} \right) \nonumber \\&\ge \sum _{k=1}^n h_k - \frac{\alpha }{\alpha -1} n \log \left( 1+ \delta \left( 2^{\frac{\alpha -1}{\alpha }(s+ \log |A|)}-1 \right) \right) \end{aligned}$$
(77)

where in the second line we have used Eq. 76 and in the last line we have used the fact that \(h_k \le \log |A|\) for all \(k \in [n]\).

We restricted the choice of \(\alpha \) to the region \(\left( 1, 1+ \frac{1}{\log (1+2 |A|)} \right) \) in the theorem, so that we can now use [8, Lemma B.9] to transform the above to

$$\begin{aligned}&\tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n C_1^n E)_{\theta } \ge \sum _{k=1}^n \inf _{\omega _{R_{k-1} \tilde{R}_{k-1}}} H(A_k | B_k \tilde{R}_{k-1})_{{{\,\mathrm{\mathcal {M}}\,}}'_k(\omega )} - n(\alpha -1) \log ^2(1+ 2 |A|) \nonumber \\&\quad - \frac{\alpha }{\alpha -1} n \log \left( 1 + \delta \left( 2^{\frac{\alpha -1}{\alpha }2\log (|A||B|)}-1 \right) \right) . \end{aligned}$$
(78)

Putting Eqs. 72, 73, and 78 together, we have

$$\begin{aligned}&H_{\min }^{\epsilon _1+\epsilon _2}(A_1^n|B_1^n E)_{\rho } \ge \sum _{k=1}^n \inf _{\omega _{R_{k-1} \tilde{R}_{k-1}}} H(A_k | B_k \tilde{R}_{k-1})_{{{\,\mathrm{\mathcal {M}}\,}}'_k(\omega )} - n(\alpha -1) \log ^2(1+ 2 |A|) \\&\quad - \frac{\alpha }{\alpha -1} n \log \left( 1 + \delta \left( 2^{\frac{\alpha -1}{\alpha }2\log (|A||B|)}-1 \right) \right) \\&\quad - \frac{\alpha }{\alpha -1} n z_{\beta }(\epsilon , \delta )\frac{1}{\alpha -1}\left( g_1(\epsilon _2, \epsilon _1)+ \frac{\alpha g_0(\epsilon _1)}{\beta -1} \right) . \end{aligned}$$

\(\square \)

5.4 Testing

We will now incorporate testing into the approximate entropy accumulation theorem proven in Theorem 5.1. We follow [9], which is itself based on [35] for this purpose. Testing enables one to prove a lower bound on the smooth min-entropy of a state produced by the process in Fig. 2 conditioned on the output of a classical event. This is particularly useful for proving tight and practical bounds in cryptographic protocols.

In this section, we will consider the channels \({{\,\mathrm{\mathcal {M}}\,}}_k\) and \({{\,\mathrm{\mathcal {M}}\,}}'_k\) which map registers \(R_{k-1}\) to \(A_k B_k X_k R_k\) such that \(X_k\) is a classical value which is determined using the registers \(A_k\) and \(B_k\). Concretely, suppose that for every k, there exist a channel \(\mathcal {T}_k: A_k B_k \rightarrow A_k B_k X_k\) of the form

$$\begin{aligned} \mathcal {T}_k (\omega _{A_k B_k}) = \sum _{y, z} \Pi _{A_k}^y \otimes \Pi _{B_k}^z \omega _{A_k B_k} \Pi _{A_k}^y \otimes \Pi _{B_k}^z \otimes |{f(y,z)}\rangle \langle {f(y,z)}|_{X_k} \end{aligned}$$
(79)

where \(\{\Pi _{A_k}^y\}_y\) and \(\{\Pi _{B_k}^z\}_z\) are orthogonal projectors and f is some deterministic function which uses the measurements y and z to create the output register \(X_k\).

In order to define the min-tradeoff functions, we let \(\mathbb {P}\) be the set of probability distributions over the alphabet of X registers. Let R be any register isomorphic to \(R_{k-1}\). For a probability \(q \in \mathbb {P}\) and a channel \({{\,\mathrm{\mathcal {N}}\,}}_{k}: R_{k-1} \rightarrow A_k B_k X_k R_k\), we also define the set

$$\begin{aligned} \Sigma _k (q | {{\,\mathrm{\mathcal {N}}\,}}_{k}) := \left\{ \nu _{A_k B_k X_k R_k R} = {{\,\mathrm{\mathcal {N}}\,}}_{k}(\omega _{R_{k-1}R}): \text { for a state } \omega _{R_{k-1}R} \text { such that }\nu _{X_k}= q \right\} . \end{aligned}$$
(80)

Definition 5.6

A function \(f: \mathbb {P} \rightarrow \mathbb {R}\) is called a min-tradeoff function for the channels \(\{{{\,\mathrm{\mathcal {N}}\,}}_k \}_{k=1}^n\) if for every \(k \in [n]\), it satisfies

$$\begin{aligned} f(q) \le \inf _{\nu \in \Sigma _k (q| {{\,\mathrm{\mathcal {N}}\,}}_{k})} H(A_k | B_k R)_{\nu }. \end{aligned}$$
(81)

We will also need the definitions of the following simple properties of the min-tradeoff functions for our entropy accumulation theorem:

$$\begin{aligned}&\text {Max}(f) := \max _{q \in \mathbb {P}} f(q) \end{aligned}$$
(82)
$$\begin{aligned}&\text {Min}(f) := \min _{q \in \mathbb {P}} f(q)\end{aligned}$$
(83)
$$\begin{aligned}&\text {Min}_{\Sigma }(f) := \min _{q: \Sigma (q) \ne \varnothing } f(q)\end{aligned}$$
(84)
$$\begin{aligned}&\text {Var}(f) := \max _{q: \Sigma (q) \ne \varnothing } \sum _{x} q(x)f(\delta _x)^2 - \left( \sum _{x} q(x)f(\delta _x) \right) ^2 \end{aligned}$$
(85)

where \(\Sigma (q):= \bigcup _k \Sigma _k(q)\) and \(\delta _x\) is the distribution with unit weight on the alphabet x.

Theorem 5.7

For \(k \in [n]\), let the registers \(A_k\) and \(B_k\) be such that \(|A_k| = |A|\) and \(|B_k| = |B|\). For \(k \in [n]\), let \({{\,\mathrm{\mathcal {M}}\,}}_k\) be channels from \(R_{k-1} \rightarrow R_{k} A_k B_k X_k\) and

$$\begin{aligned} \rho _{A_1^n B_1^n X_1^n E} = {{\,\textrm{tr}\,}}_{R_n} \circ {{\,\mathrm{\mathcal {M}}\,}}_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1 (\rho ^{(0)}_{R_0 E}) \end{aligned}$$
(86)

be the state produced by applying these maps sequentially. Further, let \({{\,\mathrm{\mathcal {M}}\,}}_k\) be such that \({{\,\mathrm{\mathcal {M}}\,}}_k = \mathcal {T}_k \circ {{\,\mathrm{\mathcal {M}}\,}}^{(0)}_k\) for \(\mathcal {T}_k\) defined in Eq. 79 and some channel \({{\,\mathrm{\mathcal {M}}\,}}^{(0)}_k: R_{k-1} \rightarrow R_{k} A_k B_k\). Suppose the channels \({{\,\mathrm{\mathcal {M}}\,}}_k\) are such that for every \(k \in [n]\), there exists a channel \({{\,\mathrm{\mathcal {M}}\,}}'_{k}\) from \(R_{k-1} \rightarrow R_{k} A_k B_k X_k\) such that

  1. 1.

    \({{\,\mathrm{\mathcal {M}}\,}}'_k = \mathcal {T}_k \circ {{\,\mathrm{\mathcal {M}}\,}}^{\prime (0)}_k\) for some channel \({{\,\mathrm{\mathcal {M}}\,}}^{\prime (0)}_k: R_{k-1} \rightarrow R_{k} A_k B_k\).

  2. 2.

    \({{\,\mathrm{\mathcal {M}}\,}}'_{k}\) \(\epsilon \)-approximates \({{\,\mathrm{\mathcal {M}}\,}}_{k}\) in the diamond norm:

    $$\begin{aligned} \frac{1}{2}\left\| {{\,\mathrm{\mathcal {M}}\,}}_k- {{\,\mathrm{\mathcal {M}}\,}}'_k \right\| _{\diamond } \le \epsilon \end{aligned}$$
    (87)
  3. 3.

    For every choice of a sequence of channels \({{\,\mathrm{\mathcal {N}}\,}}_i \in \{ {{\,\mathrm{\mathcal {M}}\,}}_i, {{\,\mathrm{\mathcal {M}}\,}}'_i \}\) for \(i \in [k-1]\), the state \({{\,\mathrm{\mathcal {M}}\,}}'_k \circ {{\,\mathrm{\mathcal {N}}\,}}_{k-1} \circ \cdots \circ {{\,\mathrm{\mathcal {N}}\,}}_1 (\rho ^{(0)}_{R_0 E})\) satisfies the Markov chain

    $$\begin{aligned} A_1^{k-1} \leftrightarrow B_1^{k-1}E \leftrightarrow B_k. \end{aligned}$$
    (88)

Then, for an event \(\Omega \) defined using \(X_1^n\), an affine min-tradeoff function f for \(\{{{\,\mathrm{\mathcal {M}}\,}}'_k\}_{k=1}^n\) such that for every \(x_1^n \in \Omega \), \(f(\text {freq}(x_1^n)) \ge h\), for parameters \(0<\delta , \epsilon _1, \epsilon _3<1\) and \(\epsilon _2:= 2\sqrt{\frac{\epsilon _1}{P_{\rho }(\Omega )}}\) such that \(\epsilon _2 + \epsilon _3 <1\), \(\alpha \in \left( 1,2 \right) \), and \(\beta >1\), we have

$$\begin{aligned}&H_{\min }^{\epsilon _2+\epsilon _3}(A_1^n|B_1^n E)_{\rho _{|\Omega }}\nonumber \\&\quad \ge nh - \frac{(\alpha -1)\ln (2)}{2}\left( \log (2 |A|^2 + 1) + \sqrt{2+ \text {Var}(f)} \right) ^2 - n(\alpha -1)^2 K_\alpha \nonumber \\&\qquad - \frac{\alpha }{\alpha -1} n \log \left( 1 + \delta \left( 4^{\frac{\alpha -1}{\alpha }(\log (|A||B|) + \text {Max}(f) - \text {Min}(f)+1)}-1 \right) \right) \nonumber \\&\qquad - \frac{\alpha }{\alpha -1} n z_{\beta }(\epsilon , \delta ) - \frac{1}{\alpha -1}\left( \alpha \log \frac{1}{P_{{\rho }}(\Omega ) - \epsilon _1}+ g_1(\epsilon _3, \epsilon _2)+ \frac{\alpha g_0(\epsilon _1)}{\beta -1} \right) . \end{aligned}$$
(89)

where

$$\begin{aligned}&z_\beta (\epsilon , \delta ) := \frac{\beta +1}{\beta -1}\log \left( \left( 1+ \sqrt{(1-\delta )\epsilon } \right) ^{\frac{\beta }{\beta +1}} + \left( \frac{\sqrt{(1-\delta )\epsilon }}{\delta ^\beta } \right) ^{{\frac{1}{\beta +1}}} \right) \end{aligned}$$
(90)
$$\begin{aligned}&K_\alpha := \frac{1}{6(2-\alpha )^3 \ln (2)} 2^{(\alpha -1)\left( 2\log |A| + (\text {Max}(f) - \text {Min}_{\Sigma }(f)) \right) }\nonumber \\&\quad \ln ^3\left( 2^{\left( 2\log |A| + (\text {Max}(f) - \text {Min}_{\Sigma }(f)) \right) } + e^2 \right) \end{aligned}$$
(91)

and \(g_1(x,y) = - \log (1- \sqrt{1-x^2}) - \log (1-y^2)\).

Proof

Just as in the proof of Theorem 5.1, we define

$$\begin{aligned} {{\,\mathrm{\mathcal {M}}\,}}^\delta _k := (1- \delta ) {{\,\mathrm{\mathcal {M}}\,}}'_k + \delta {{\,\mathrm{\mathcal {M}}\,}}_k \end{aligned}$$
(92)

for every k and the state

$$\begin{aligned} \sigma _{A_1^n B_1^n X_1^n E} := {{\,\mathrm{\mathcal {M}}\,}}^\delta _n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}^\delta _1(\rho ^{(0)}_{R_0 E}). \end{aligned}$$
(93)

so that for \(\beta >1\) and \(\epsilon _1 >0\), we have

$$\begin{aligned} D^{\epsilon _1}_{\max }(\rho _{A_1^n B_1^n X_1^n E}||\sigma _{A_1^n B_1^n X_1^n E})&\le n z_\beta (\epsilon , \delta ) + \frac{g_0(\epsilon _1)}{\beta -1}. \end{aligned}$$
(94)

Define \(d_{\beta }:= n z_\beta (\epsilon , \delta ) + \frac{g_0(\epsilon _1)}{\beta -1}\). The bound above implies that there exists a state \(\tilde{\rho }_{A_1^n B_1^n X_1^n E}\), which is also classical on \(X_1^n\) such that

$$\begin{aligned} P\left( \rho _{A_1^n B_1^n X_1^n E}, \tilde{\rho }_{A_1^n B_1^n X_1^n E} \right) \le \epsilon _1 \end{aligned}$$
(95)

and

$$\begin{aligned} \tilde{\rho }_{A_1^n B_1^n X_1^n E} \le 2^{d_\beta } \sigma _{A_1^n B_1^n X_1^n E}. \end{aligned}$$
(96)

The registers \(X_1^n\) for \(\tilde{\rho }\) can be chosen to be classical, since the channel measuring \(X_1^n\) only decreases the distance between \(\tilde{\rho }\) and \(\rho \), and the new state produced would also satisfy Eq. 96. As the registers \(X_1^n\) are classical for both \(\sigma \) and \(\tilde{\rho }\), we can condition these states on the event \(\Omega \). We will call the probability of the event \(\Omega \) for the state \(\sigma \) and \(\tilde{\rho }\) \(P_{\sigma }(\Omega )\) and \(P_{\tilde{\rho }}(\Omega )\) respectively. Using Lemma G.1 and the Fuchs-van de Graaf inequality, we have

$$\begin{aligned} P\left( \rho _{A_1^n B_1^n X_1^n E|\Omega } , \tilde{\rho }_{A_1^n B_1^n X_1^n E|\Omega } \right) \le 2 \sqrt{\frac{\epsilon _1}{P_{{\rho }}(\Omega )}}. \end{aligned}$$
(97)

Conditioning Eq. 96 on \(\Omega \), we get

$$\begin{aligned} P_{\tilde{\rho }}(\Omega ) \tilde{\rho }_{A_1^n B_1^n X_1^n E|\Omega } \le 2^{d_\beta } P_{\sigma }(\Omega ) \sigma _{A_1^n B_1^n X_1^n E|\Omega }. \end{aligned}$$
(98)

Together, the above two equations imply that

$$\begin{aligned} D^{\epsilon _2}_{\max }(\rho _{A_1^n B_1^n X_1^n E|\Omega } ||\sigma _{A_1^n B_1^n X_1^n E|\Omega }) \le n z_\beta (\epsilon , \delta ) + \frac{g_0(\epsilon _1)}{\beta -1} + \log \frac{P_{\sigma }(\Omega )}{P_{\tilde{\rho }}(\Omega )} \end{aligned}$$
(99)

for \(\epsilon _2:= 2\sqrt{\frac{\epsilon _1}{P_{{\rho }}(\Omega )}}\).

For \(\epsilon _3 >0\) and \(\alpha \in (1, 2)\), we can plug the above in the bound provided by Lemma 3.5 to get

$$\begin{aligned} H_{\min }^{\epsilon _2+\epsilon _3}(A_1^n|B_1^n E)_{\rho _{|\Omega }}&\ge \tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\sigma _{|\Omega }} - \frac{\alpha }{\alpha -1} n z_{\beta }(\epsilon , \delta ) \nonumber \\&\quad - \frac{1}{\alpha -1}\left( \alpha \log \frac{P_{\sigma }(\Omega )}{P_{\tilde{\rho }}(\Omega )}+ g_1(\epsilon _3, \epsilon _2)+ \frac{\alpha g_0(\epsilon _1)}{\beta -1} \right) . \end{aligned}$$
(100)

Now, note that using Eq. 79 and [8, Lemma B.7] we have

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A_1^n|B_1^n E)_{\sigma _{|\Omega }} = \tilde{H}^{\uparrow }_{\alpha }(A_1^n X_1^n|B_1^n E)_{\sigma _{|\Omega }}. \end{aligned}$$
(101)

For every k, we introduce a register \(D_k\) of dimension \(|D_k| = \lceil 2^{\max (f)- \min (f)} \rceil \) and a channel \(\mathcal {D}_k: X_k \rightarrow X_k D_k\) as

$$\begin{aligned} \mathcal {D}_k(\omega ) := \sum _x \langle {x|\omega |x}\rangle |{x}\rangle \langle {x}|\otimes \tau _x \end{aligned}$$
(102)

where for every x, the state \(\tau _x\) is a mixture between a uniform distribution on \(\{1, 2, \ldots \, \lfloor 2^{\max (f)- f(\delta _x)} \rfloor \}\) and a uniform distribution on \(\{1, 2, \ldots \, \lceil 2^{\max (f)- f(\delta _x)} \rceil \}\), so that

$$\begin{aligned} H(D_k)_{\tau _x} = \max (f)- f(\delta _x) \end{aligned}$$
(103)

where \(\delta _x\) is the distribution with unit weight at element x.

Define the channels \(\bar{\mathcal {M}}_k:= \mathcal {D}_k \circ {{\,\mathrm{\mathcal {M}}\,}}_k\), \(\bar{\mathcal {M}}'_k:= \mathcal {D}_k \circ {{\,\mathrm{\mathcal {M}}\,}}'_k\) and \(\bar{\mathcal {M}}^\delta _k:= \mathcal {D}_k \circ {{\,\mathrm{\mathcal {M}}\,}}^\delta _k = (1-\delta )\bar{\mathcal {M}}'_k + \delta \bar{\mathcal {M}}_k\) and the state

$$\begin{aligned} \bar{\sigma }_{A_1^n B_1^n X_1^n D_1^n E} := \bar{\mathcal {M}}^\delta _n \circ \cdots \bar{\mathcal {M}}^\delta _1(\rho ^{(0)}_{R_0 E}) \end{aligned}$$
(104)

Note that \(\bar{\sigma }_{A_1^n B_1^n X_1^n E} = {\sigma }_{A_1^n B_1^n X_1^n E}\). [9, Lemma 4.5] implies that this satisfies

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A_1^n X_1^n|B_1^n E)_{{\sigma }_{|\Omega }}&= \tilde{H}^{\uparrow }_{\alpha }(A_1^n X_1^n|B_1^n E)_{\bar{\sigma }_{|\Omega }} \nonumber \\&\ge \tilde{H}^{\uparrow }_{\alpha }(A_1^n X_1^n D_1^n |B_1^n E)_{\bar{\sigma }_{|\Omega }} - \max _{x_1^n \in \Omega } H_{\alpha }(D_1^n)_{\bar{\sigma }_{|x_1^n}} \end{aligned}$$
(105)

For \(x_1^n \in \Omega \), we have

$$\begin{aligned} H_{\alpha }(D_1^n)_{\bar{\sigma }_{|x_1^n}}&\le H (D_1^n)_{\bar{\sigma }_{|x_1^n}} \nonumber \\&\le \sum _{k=1}^n H(D_k)_{\tau _{x_k}} \nonumber \\&= \sum _{k=1}^n \max (f)- f(\delta _{x_k}) \nonumber \\&= n \max (f) - n f(\text {freq}(x_1^n)) \nonumber \\&\le n \max (f) - n h. \end{aligned}$$
(106)

We can get rid of the conditioning on the right-hand side of Eq. 105 by using [8, Lemma B.5]

$$\begin{aligned} \tilde{H}^{\uparrow }_{\alpha }(A_1^n X_1^n D_1^n |B_1^n E)_{\bar{\sigma }_{|\Omega }} \ge \tilde{H}^{\uparrow }_{\alpha }(A_1^n X_1^n D_1^n |B_1^n E)_{\bar{\sigma }} - \frac{\alpha }{\alpha -1}\log \frac{1}{P_{\sigma }(\Omega )}. \end{aligned}$$
(107)

We now show that the channels \(\bar{\mathcal {M}}'_k\) satisfy the second condition in Theorem 5.1. For an arbitrary \(k \in [n]\) and a sequence of channels \({{\,\mathrm{\mathcal {N}}\,}}_i \in \{\bar{\mathcal {M}}_i, \bar{\mathcal {M}}'_i\}\) for every \(1\le i < k\), let

$$\begin{aligned} \eta _{A_1^k B_1^k X_1^k D_1^k E} = \bar{\mathcal {M}}'_k \circ {{\,\mathrm{\mathcal {N}}\,}}_{k-1} \cdots \circ {{\,\mathrm{\mathcal {N}}\,}}_1 (\rho ^{(0)}_{R_0 E}). \end{aligned}$$

For this state, we have

$$\begin{aligned} I(A_1^{k-1} D_1^{k-1}: B_k | B_1^{k-1} E)_{\eta }&= I(A_1^{k-1} : B_k | B_1^{k-1} E)_{\eta } + I(D_1^{k-1}: B_k | A_1^{k-1} B_1^{k-1} E)_{\eta } \\&= 0 \end{aligned}$$

where \(I(A_1^{k-1}: B_k | B_1^{k-1} E)_{\eta } = 0\) because of the condition in Eq. 88, and \(I(D_1^{k-1}: B_k | A_1^{k-1} B_1^{k-1} E)_{\eta } = 0\) since \(X_1^{k-1}\) and hence \(D_1^{k-1}\) are determined by \(A_1^{k-1} B_1^{k-1}\). This implies that for this state \(A_1^{k-1} D_1^{k-1} \leftrightarrow B_1^{k-1} E \leftrightarrow B_k\). Thus, the maps \(\bar{\mathcal {M}}_k\) and \(\bar{\mathcal {M}}'_k\) satisfy the conditions required for applying Theorem 5.1. Specifically, we can use the bounds in Eqs. 73 and 77 for bounding \(\alpha \)-conditional Rényi entropy in Eq. 107

$$\begin{aligned}&\tilde{H}^{\uparrow }_{\alpha }(A_1^n X_1^n D_1^n |B_1^n E)_{\bar{\sigma }} \nonumber \\&\quad \ge \tilde{H}^{\uparrow }_{\alpha }(A_1^n D_1^n |B_1^n E)_{\bar{\sigma }} \nonumber \\&\quad \ge \sum _{k=1}^n \inf _{\omega _{R_{k-1} \tilde{R}_{k-1}}} \tilde{H}^{\downarrow }_{\alpha }(A_k D_k | B_k \tilde{R}_{k-1})_{\bar{\mathcal {M}}'_k(\omega )}\nonumber \\&\quad - \frac{\alpha }{\alpha -1} n \log \left( 1 + \delta \left( 2^{\frac{\alpha -1}{\alpha }2\log (|A||D||B|)}-1 \right) \right) . \end{aligned}$$
(108)

The analysis in the proof of [35, Proposition V.3] shows that the first term above can be bounded as

$$\begin{aligned}&\inf _{\omega _{R_{k-1} \tilde{R}_{k-1}}} \tilde{H}^{\downarrow }_{\alpha }(A_k D_k | B_k \tilde{R}_{k-1})_{\bar{\mathcal {M}}'_k(\omega )} \nonumber \\&\quad \ge \text {Max}(f) - \frac{(\alpha -1)\ln (2)}{2}\left( \log (2 |A|^2 + 1) + \sqrt{2+ \text {Var}(f)} \right) ^2 - (\alpha -1)^2 K_\alpha \end{aligned}$$
(109)

Combining Eqs. 105106107108 and 109, we have

$$\begin{aligned}&\tilde{H}^{\uparrow }_{\alpha }(A_1^n X_1^n|B_1^n E)_{\bar{\sigma }_{|\Omega }} \nonumber \\&\quad \ge nh - \frac{(\alpha -1)\ln (2)}{2}\left( \log (2 |A|^2 + 1) + \sqrt{2+ \text {Var}(f)} \right) ^2 - n(\alpha -1)^2 K_\alpha \nonumber \\&\quad - \frac{\alpha }{\alpha -1} n \log \left( 1 + \delta \left( 2^{\frac{\alpha -1}{\alpha }2\log (|A||D||B|)}-1 \right) \right) - \frac{\alpha }{\alpha -1}\log \frac{1}{P_{\sigma }(\Omega )}. \end{aligned}$$
(110)

Plugging this into Eq. 100, we get

$$\begin{aligned}&H_{\min }^{\epsilon _2+\epsilon _3}(A_1^n|B_1^n E)_{\rho _{|\Omega }}\nonumber \\&\quad \ge nh - \frac{(\alpha -1)\ln (2)}{2}\left( \log (2 |A|^2 + 1) + \sqrt{2+ \text {Var}(f)} \right) ^2 - n(\alpha -1)^2 K_\alpha \nonumber \\&\quad - \frac{\alpha }{\alpha -1} n \log \left( 1 + \delta \left( 4^{\frac{\alpha -1}{\alpha }(\log (|A||B|) + \text {max}(f) - \text {min}(f)+1)}-1 \right) \right) \nonumber \\&\quad - \frac{\alpha }{\alpha -1} n z_{\beta }(\epsilon , \delta ) - \frac{1}{\alpha -1}\left( \alpha \log \frac{1}{P_{{\rho }}(\Omega ) - \epsilon _1}+ g_1(\epsilon _3, \epsilon _2)+ \frac{\alpha g_0(\epsilon _1)}{\beta -1} \right) . \end{aligned}$$
(111)

where we have used \(P_{\tilde{\rho }}(\Omega )\ge P_{{\rho }}(\Omega ) - \epsilon _1\) since \(\frac{1}{2}\left\| \rho - \tilde{\rho } \right\| _1 \le P(\rho ,\tilde{\rho })\le \epsilon _1\). Note that the probability of \(\Omega \) under the auxiliary state \(\sigma \) cancels out. \(\square \)

5.5 Limitations and further improvements

As we pointed out previously, the dependence of the entropy loss per round on \(\epsilon \) is very poor (behaves as \(\sim \epsilon ^{1/24}\)) in Theorem 5.1. The classical version of this theorem has a much better dependence of \(O(\sqrt{\epsilon })\) on \(\epsilon \) (see Theorem F.1). The reason for the poor performance of the quantum version is that our bound on the channel divergence (Lemma 5.4) is very weak compared to the bound we can use classically. It should be noted, however, that if Lemma 5.4 were to be improved in the future, one could simply plug the new bound into our proof and derive an improvement for Theorem 5.1.

A better bound on the channel divergence would have an additional benefit. It could simplify the proof and the Markov chain assumption in our theorem. In particular, it would be much easier to carry out the proof if the mixed channels \({{\,\mathrm{\mathcal {M}}\,}}^{\delta }_k\) were defined as \((1-\delta ){{\,\mathrm{\mathcal {M}}\,}}'_k + \delta \tau _{A_k B_k} \otimes {{\,\textrm{tr}\,}}_{A_k B_k}\circ {{\,\mathrm{\mathcal {M}}\,}}_k\) (which is what is done classically), where \(\tau _{A_k B_k}\) is the completely mixed state on registers \(A_k B_k\). Here, instead of mixing the channel \({{\,\mathrm{\mathcal {M}}\,}}'_k\) with \({{\,\mathrm{\mathcal {M}}\,}}_k\), we mix it with \(\tau _{A_k B_k} \otimes {{\,\textrm{tr}\,}}_{A_k B_k}\circ {{\,\mathrm{\mathcal {M}}\,}}_k\), which also keeps \(D_{\max }({{\,\mathrm{\mathcal {M}}\,}}_k || {{\,\mathrm{\mathcal {M}}\,}}^{\delta }_k)\) small enough. Moreover, this definition ensures that the registers \(B_k\) produced by the map \({{\,\mathrm{\mathcal {M}}\,}}_k^\delta \) always satisfy the Markov chain conditions. If it were possible to show that the divergence between the real state \({{\,\mathrm{\mathcal {M}}\,}}_n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}_1 (\rho ^{(0)}_{R_0 E})\) and the auxiliary state \({{\,\mathrm{\mathcal {M}}\,}}^\delta _n \circ \cdots \circ {{\,\mathrm{\mathcal {M}}\,}}^\delta _1 (\rho ^{(0)}_{R_0 E})\) is small for this definition of \({{\,\mathrm{\mathcal {M}}\,}}_k^\delta \), then one could directly use the entropy accumulation theorem for lower bounding the entropy for the auxiliary state. We cannot do this in our proof as this definition of the mixed channel \({{\,\mathrm{\mathcal {M}}\,}}^\delta _k\) also increases the distance from the original channel \({{\,\mathrm{\mathcal {M}}\,}}_k\) to \(\epsilon +2\delta \) and this makes the upper bound in Lemma 5.3 large (finite even in the limit \(\epsilon \rightarrow 0\)).

It seems that it should be possible to weaken the assumptions for approximate entropy accumulation. The classical equivalent of this theorem (Theorem F.1) for instance can be proven very easily and requires a much weaker approximation assumption. It would be interesting if one could remove the “memory” registers \(R_k\) from the assumptions required for approximate entropy accumulation, since these are not typically accessible to the users in applications.

Another troubling feature of the approximate entropy accumulation theorem seems to be that it assumes that the size of the side information registers \(B_k\) is constant. One might wonder if this is necessary, since continuity bounds like the Alicki-Fannes-Winter (AFW) inequality do not depend on the size of the side information. It turns out that a bound on the side information size is indeed necessary in this case. We show a simple classical example to demonstrate this in Appendix E. The necessity of such a bound also rules out a similar approximate extension of the generalised entropy accumulation theorem (GEAT).